Reinforcement Learning : State-of-the-Art (Adaptation, Learning, and Optimization) 〈Vol. 12〉

個数:

Reinforcement Learning : State-of-the-Art (Adaptation, Learning, and Optimization) 〈Vol. 12〉

  • 在庫がございません。海外の書籍取次会社を通じて出版社等からお取り寄せいたします。
    通常6~9週間ほどで発送の見込みですが、商品によってはさらに時間がかかることもございます。
    重要ご説明事項
    1. 納期遅延や、ご入手不能となる場合がございます。
    2. 複数冊ご注文の場合、分割発送となる場合がございます。
    3. 美品のご指定は承りかねます。

  • 提携先の海外書籍取次会社に在庫がございます。通常2週間で発送いたします。
    重要ご説明事項
    1. 納期遅延や、ご入手不能となる場合が若干ございます。
    2. 複数冊ご注文の場合、分割発送となる場合がございます。
    3. 美品のご指定は承りかねます。
  • 【重要:入荷遅延について】
    各国での新型コロナウィルス感染拡大により、洋書・洋古書の入荷が不安定になっています。
    弊社サイト内で表示している標準的な納期よりもお届けまでに日数がかかる見込みでございます。
    申し訳ございませんが、あらかじめご了承くださいますようお願い申し上げます。

  • 製本 Hardcover:ハードカバー版/ページ数 620 p.
  • 言語 ENG
  • 商品コード 9783642276446

Full Description


Reinforcement learning encompasses both a science of adaptive behavior of rational beings in uncertain environments and a computational methodology for finding optimal behaviors for challenging problems in control, optimization and adaptive behavior of intelligent agents. As a field, reinforcement learning has progressed tremendously in the past decade.The main goal of this book is to present an up-to-date series of survey articles on the main contemporary sub-fields of reinforcement learning. This includes surveys on partially observable environments, hierarchical task decompositions, relational knowledge representation and predictive state representations. Furthermore, topics such as transfer, evolutionary methods and continuous spaces in reinforcement learning are surveyed. In addition, several chapters review reinforcement learning methods in robotics, in games, and in computational neuroscience. In total seventeen different subfields are presented by mostly young experts in those areas, and together they truly represent a state-of-the-art of current reinforcement learning research.Marco Wiering works at the artificial intelligence department of the University of Groningen in the Netherlands. He has published extensively on various reinforcement learning topics. Martijn van Otterlo works in the cognitive artificial intelligence group at the Radboud University Nijmegen in The Netherlands. He has mainly focused on expressive knowledgerepresentation in reinforcement learning settings.

Table of Contents

  Part I Introductory Part
1 Reinforcement Learning and Markov 3 (42)
Decision Processes
Martijn van Otterlo
Marco Wiering
1.1 Introduction 3 (2)
1.2 Learning Sequential Decision Making 5 (5)
1.3 A Formal Framework 10 (5)
1.3.1 Markov Decision Processes 10 (3)
1.3.2 Policies 13 (1)
1.3.3 Optimality Criteria and 13 (2)
Discounting
1.4 Value Functions and Bellman Equations 15 (2)
1.5 Solving Markov Decision Processes 17 (2)
1.6 Dynamic Programming: Model-Based 19 (8)
Solution Techniques
1.6.1 Fundamental DP Algorithms 20 (4)
1.6.2 Efficient DP Algorithms 24 (3)
1.7 Reinforcement Learning: Model-Free 27 (12)
Solution Techniques
1.7.1 Temporal Difference Learning 29 (4)
1.7.2 Monte Carlo Methods 33 (1)
1.7.3 Efficient Exploration and Value 34 (5)
Updating
1.8 Conclusions 39 (6)
References 39 (6)
Part II Efficient Solution Frameworks
2 Batch Reinforcement Learning 45 (30)
Sascha Lange
Thomas Gabel
Martin Riedmiller
2.1 Introduction 45 (1)
2.2 The Batch Reinforcement Learning 46 (3)
Problem
2.2.1 The Batch Learning Problem 46 (2)
2.2.2 The Growing Batch Learning Problem 48 (1)
2.3 Foundations of Batch RL Algorithms 49 (3)
2.4 Batch RL Algorithms 52 (8)
2.4.1 Kernel-Based Approximate Dynamic 53 (2)
Programming
2.4.2 Fitted Q Iteration 55 (2)
2.4.3 Least-Squares Policy Iteration 57 (1)
2.4.4 Identifying Batch Algorithms 58 (2)
2.5 Theory of Batch RL 60 (1)
2.6 Batch RL in Practice 61 (9)
2.6.1 Neural Fitted Q Iteration (NFQ) 61 (2)
2.6.2 NFQ in Control Applications 63 (2)
2.6.3 Batch RL for Learning in 65 (2)
Multi-agent Systems
2.6.4 Deep Fitted Q Iteration 67 (2)
2.6.5 Applications/Further References 69 (1)
2.1 Summary 70 (5)
References 71 (4)
3 Least-Squares Methods for Policy Iteration 75 (36)
Lucian Busoniu
Alessandro Lazaric
Mohammad Ghavamzadeh
Remi Munos
Robert Babuska
Bart De Schutter
3.1 Introduction 76 (1)
3.2 Preliminaries: Classical Policy 77 (2)
Iteration
3.3 Least-Squares Methods for Approximate 79 (10)
Policy Evaluation
3.3.1 Main Principles and Taxonomy 79 (2)
3.3.2 The Linear Case and Matrix Form 81 (4)
of the Equations
3.3.3 Model-Free Implementations 85 (4)
3.3.4 Bibliographical Notes 89 (1)
3.4 Online Least-Squares Policy Iteration 89 (2)
3.5 Example: Car on the Hill 91 (3)
3.6 Performance Guarantees 94 (10)
3.6.1 Asymptotic Convergence and 95 (3)
Guarantees
3.6.2 Finite-Sample Guarantees 98 (6)
3.7 Further Reading 104(7)
References 106(5)
4 Learning and Using Models 111(32)
Todd Hester
Peter Stone
4.1 Introduction 112(1)
4.2 What Is a Model? 113(2)
4.3 Planning 115(3)
4.3.1 Monte Carlo Methods 115(3)
4.4 Combining Models and Planning 118(2)
4.5 Sample Complexity 120(2)
4.6 Factored Domains 122(4)
4.7 Exploration 126(4)
4.8 Continuous Domains 130(3)
4.9 Empirical Comparisons 133(2)
4.10 Scaling Up 135(2)
4.11 Conclusion 137(6)
References 138(5)
5 Transfer in Reinforcement Learning: A 143(32)
Framework and a Survey
Alessandro Lazaric
5.1 Introduction 143(2)
5.2 A Framework and a Taxonomy for 145(10)
Transfer in Reinforcement Learning
5.2.1 Transfer Framework 145(3)
5.2.2 Taxonomy 148(7)
5.3 Methods for Transfer from Source to 155(4)
Target with a Fixed State-Action Space
5.3.1 Problem Formulation 155(1)
5.3.2 Representation Transfer 156(2)
5.3.3 Parameter Transfer 158(1)
5.4 Methods for Transfer across Tasks 159(5)
with a Fixed State-Action Space
5.4.1 Problem Formulation 159(1)
5.4.2 Instance Transfer 160(1)
5.4.3 Representation Transfer 161(1)
5.4.4 Parameter Transfer 162(2)
5.5 Methods for Transfer from Source to 164(4)
Target Tasks with a Different
State-Action Spaces
5.5.1 Problem Formulation 164(2)
5.5.2 Instance Transfer 166(1)
5.5.3 Representation Transfer 166(1)
5.5.4 Parameter Transfer 167(1)
5.6 Conclusions and Open Questions 168(7)
References 169(6)
6 Sample Complexity Bounds of Exploration 175(32)
Lihong Li
6.1 Introduction 175(1)
6.2 Preliminaries 176(2)
6.3 Formalizing Exploration Efficiency 178(6)
6.3.1 Sample Complexity of Exploration 178(2)
and PAC-MDP
6.3.2 Regret Minimization 180(2)
6.3.3 Average Loss 182(1)
6.3.4 Bayesian Framework 183(1)
6.4 A Generic PAC-MDP Theorem 184(2)
6.5 Model-Based Approaches 186(10)
6.5.1 Rmax 186(2)
6.5.2 A Generalization of Rmax 188(8)
6.6 Model-Free Approaches 196(3)
6.7 Concluding Remarks 199(8)
References 200(7)
Part III Constructive-Representational
Directions
7 Reinforcement Learning in Continuous 207(46)
State and Action Spaces
Hado van Hasselt
7.1 Introduction 207(5)
7.1.1 Markov Decision Processes in 208(3)
Continuous Spaces
7.1.2 Methodologies to Solve a 211(1)
Continuous MDP
7.2 Function Approximation 212(11)
7.2.1 Linear Function Approximation 213(4)
7.2.2 Non-linear Function Approximation 217(1)
7.2.3 Updating Parameters 218(5)
7.3 Approximate Reinforcement Learning 223(15)
7.3.1 Value Approximation 223(6)
7.3.2 Policy Approximation 229(9)
7.4 An Experiment on a Double-Pole Cart 238(4)
Pole
7.5 Conclusion 242(11)
References 243(10)
8 Solving Relational and First-Order 253(40)
Logical Markov Decision Processes: A Survey
Martijn van Otterlo
8.1 Introduction to Sequential Decisions 253(4)
in Relational Worlds
8.1.1 MDPs: Representation and 254(2)
Generalization
8.1.2 Short History and Connections to 256(1)
Other Fields
8.2 Extending MDPs with Objects and 257(4)
Relations
8.2.1 Relational Representations and 257(1)
Logical Generalization
8.2.2 Relational Markov Decision 258(1)
Processes
8.2.3 Abstract Problems and Solutions 259(2)
8.3 Model-Based Solution Techniques 261(7)
8.3.1 The Structure of Bellman Backups 262(1)
8.3.2 Exact Model-Based Algorithms 263(3)
8.3.3 Approximate Model-Based Algorithms 266(2)
8.4 Model-Free Solutions 268(8)
8.4.1 Value-Function Learning with 269(1)
Fixed Generalization
8.4.2 Value Functions with Adaptive 270(4)
Generalization
8.4.3 Policy-Based Solution Techniques 274(2)
8.5 Models, Hierarchies, and Bias 276(4)
8.6 Current Developments 280(3)
8.7 Conclusions and Outlook 283(10)
References 283(10)
9 Hierarchical Approaches 293(32)
Bernhard Hengst
9.1 Introduction 293(3)
9.2 Background 296(9)
9.2.1 Abstract Actions 297(1)
9.2.2 Semi-Markov Decision Problems 297(3)
9.2.3 Structure 300(1)
9.2.4 State Abstraction 301(2)
9.2.5 Value-Function Decomposition 303(1)
9.2.6 Optimality 303(2)
9.3 Approaches to Hierarchical 305(8)
Reinforcement Learning (HRL)
9.3.1 Options 306(1)
9.3.2 HAMQ-Learning 307(2)
9.3.3 MAXQ 309(4)
9.4 Learning Structure 313(4)
9.4.1 HEXQ 315(2)
9.5 Related Work and Ongoing Research 317(2)
9.6 Summary 319(6)
References 319(6)
10 Evolutionary Computation for 325(34)
Reinforcement Learning
Shimon Whiteson
10.1 Introduction 325(3)
10.2 Neuroevolution 328(2)
10.3 TWEANNs 330(4)
10.3.1 Challenges 332(1)
10.3.2 NEAT 333(1)
10.4 Hybrids 334(5)
10.4.1 Evolutionary Function 335(1)
Approximation
10.4.2 XCS 336(3)
10.5 Coevolution 339(4)
10.5.1 Cooperative Coevolution 339(3)
10.5.2 Competitive Coevolution 342(1)
10.6 Generative and Developmental Systems 343(2)
10.7 On-Line Methods 345(2)
10.7.1 Model-Based Methods 345(1)
10.7.2 On-Line Evolutionary Computation 346(1)
10.8 Conclusion 347(12)
References 348(11)
Part IV Probabilistic Models of Self and
Others
11 Bayesian Reinforcement Learning 359(28)
Nikos Vlassis
Mohammad Ghavamzadeh
Shie Mannor
Pascal Poupart
11.1 Introduction 359(2)
11.2 Model-Free Bayesian Reinforcement 361(11)
Learning
11.2.1 Value-Function Based Algorithms 361(4)
11.2.2 Policy Gradient Algorithms 365(4)
11.2.3 Actor-Critic Algorithms 369(3)
11.3 Model-Based Bayesian Reinforcement 372(8)
Learning
11.3.1 POMDP Formulation of Bayesian RL 372(1)
11.3.2 Bayesian RL via Dynamic 373(3)
Programming
11.3.3 Approximate Online Algorithms 376(1)
11.3.4 Bayesian Multi-Task 377(2)
Reinforcement Learning
11.3.5 Incorporating Prior Knowledge 379(1)
11.4 Finite Sample Analysis and 380(2)
Complexity Issues
11.5 Summary and Discussion 382(5)
References 382(5)
12 Partially Observable Markov Decision 387(28)
Processes
Matthijs T.J. Spaan
12.1 Introduction 387(2)
12.2 Decision Making in Partially 389(6)
Observable Environments
12.2.1 POMDP Model 389(2)
12.2.2 Continuous and Structured 391(1)
Representations
12.2.3 Memory for Optimal Decision 391(3)
Making
12.2.4 Policies and Value Functions 394(1)
12.3 Model-Based Techniques 395(9)
12.3.1 Heuristics Based on MDP Solutions 396(1)
12.3.2 Value Iteration for POMDPs 397(3)
12.3.3 Exact Value Iteration 400(1)
12.3.4 Point-Based Value Iteration 401(2)
Methods
12.3.5 Other Approximate Methods 403(1)
12.4 Decision Making Without a-Priori 404(4)
Models
12.4.1 Memory less Techniques 405(1)
12.4.2 Learning Internal Memory 405(3)
12.5 Recent Trends 408(7)
References 409(6)
13 Predictively Defined Representations of 415(26)
State
David Wingate
13.1 Introduction 416(4)
13.1.1 What Is "State"? 416(2)
13.1.2 Which Representation of State? 418(1)
13.1.3 Why Predictions about the Future? 419(1)
13.2 PSRs 420(8)
13.2.1 Histories and Tests 421(1)
13.2.2 Prediction of a Test 422(1)
13.2.3 The System Dynamics Vector 422(1)
13.2.4 The System Dynamics Matrix 423(1)
13.2.5 Sufficient Statistics 424(1)
13.2.6 State 424(1)
13.2.7 State Update 425(1)
13.2.8 Linear PSRs 425(1)
13.2.9 Relating Linear PSRs to POMDPs 426(1)
13.2.10 Theoretical Results on Linear 427(1)
PSRs
13.3 Learning a PSR Model 428(1)
13.3.1 The Discovery Problem 428(1)
13.3.2 The Learning Problem 429(1)
13.3.3 Estimating the System Dynamics 429(1)
Matrix
13.4 Planning with PSRs 429(2)
13.5 Extensions of PSRs 431(1)
13.6 Other Models with Predictively 432(4)
Defined State
13.6.1 Observable Operator Models 433(1)
13.6.2 The Predictive Linear-Gaussian 433(1)
Model
13.6.3 Temporal-Difference Networks 434(1)
13.6.4 Diversity Automaton 435(1)
13.6.5 The Exponential Family PSR 435(1)
13.6.6 Transformed PSRs 436(1)
13.7 Conclusion 436(5)
References 437(4)
14 Game Theory and Multi-agent 441(30)
Reinforcement Learning
Ann Nowe
Peter Vrancx
Yann-Michael De Hauwere
14.1 Introduction 441(4)
14.2 Repeated Games 445(9)
14.2.1 Game Theory 445(4)
14.2.2 Reinforcement Learning in 449(5)
Repeated Games
14.3 Sequential Games 454(7)
14.3.1 Markov Games 455(1)
14.3.2 Reinforcement Learning in Markov 456(5)
Games
14.4 Sparse Interactions in Multi-agent 461(6)
System
14.4.1 Learning on Multiple Levels 461(1)
14.4.2 Learning to Coordinate with 462(5)
Sparse Interactions
14.5 Further Reading 467(4)
References 467(4)
15 Decentralized POMDPs 471(36)
Frans A. Oliehoek
15.1 Introduction 471(2)
15.2 The Decentralized POMDP Framework 473(2)
15.3 Histories and Policies 475(5)
15.3.1 Histories 475(1)
15.3.2 Policies 476(1)
15.3.3 Structure in Policies 477(2)
15.3.4 The Quality of Joint Policies 479(1)
15.4 Solution of Finite-Horizon Dec-POMDPs 480(13)
15.4.1 Brute Force Search and Dec-POMDP 480(1)
Complexity
15.4.2 Alternating Maximization 481(1)
15.4.3 Optimal Value Functions for 481(4)
Dec-POMDPs
15.4.4 Forward Approach: Heuristic 485(4)
Search
15.4.5 Backwards Approach: Dynamic 489(4)
Programming
15.4.6 Other Finite-Horizon Methods 493(1)
15.5 Further Topics 493(14)
15.5.1 Generalization and Special Cases 493(2)
15.5.2 Infinite-Horizon Dec-POMDPs 495(1)
15.5.3 Reinforcement Learning 496(1)
15.5.4 Communication 497(1)
References 498(9)
Part V Domains and Background
16 Psychological and Neuroscientific 507(32)
Connections with Reinforcement Learning
Ashvin Shah
16.1 Introduction 507(1)
16.2 Classical (or Pavlovian) Conditioning 508(5)
16.2.1 Behavior 509(2)
16.2.2 Theory 511(1)
16.2.3 Summary and Additional 512(1)
Considerations
16.3 Operant (or Instrumental) 513(5)
Conditioning
16.3.1 Behavior 513(1)
16.3.2 Theory 514(2)
16.3.3 Model-Based Versus Model-Free 516(1)
Control
16.3.4 Summary and Additional 517(1)
Considerations
16.4 Dopamine 518(3)
16.4.1 Dopamine as a Reward Prediction 518(2)
Error
16.4.2 Dopamine as a General 520(1)
Reinforcement Signal
16.4.3 Summary and Additional 521(1)
Considerations
16.5 The Basal Ganglia 521(6)
16.5.1 Overview of the Basal Ganglia 522(1)
16.5.2 Neural Activity in the Striatum 523(1)
16.5.3 Cortico-basal Ganglia-thalamic 524(2)
Loops
16.5.4 Summary and Additional 526(1)
Considerations
16.6 Chapter Summary 527(12)
References 528(11)
17 Reinforcement Learning in Games 539(40)
Istvan Szita
17.1 Introduction 539(2)
17.1.1 Aims and Structure 540(1)
17.1.2 Scope 541(1)
17.2 A Showcase of Games 541(20)
17.2.1 Backgammon 542(3)
17.2.2 Chess 545(5)
17.2.3 Go 550(5)
17.2.4 Tetris 555(3)
17.2.5 Real-Time Strategy Games 558(3)
17.3 Challenges of Applying Reinforcement 561(7)
Learning to Games
17.3.1 Representation Design 561(3)
17.3.2 Exploration 564(1)
17.3.3 Source of Training Data 565(1)
17.3.4 Dealing with Missing Information 566(1)
17.3.5 Opponent Modelling 567(1)
17.4 Using RL in Games 568(3)
17.4.1 Opponents That Maximize Fun 568(2)
17.4.2 Development-Time Learning 570(1)
17.5 Closing Remarks 571(8)
References 572(7)
18 Reinforcement Learning in Robotics: A 579(34)
Survey
Jens Kober
Jan Peters
18.1 Introduction 579(2)
18.2 Challenges in Robot Reinforcement 581(4)
Learning
18.2.1 Curse of Dimensionality 582(1)
18.2.2 Curse of Real-World Samples 583(1)
18.2.3 Curse of Real-World Interactions 584(1)
18.2.4 Curse of Model Errors 584(1)
18.2.5 Curse of Goal Specification 585(1)
18.3 Foundations of Robot Reinforcement 585(4)
Learning
18.3.1 Value Function Approaches 586(2)
18.3.2 Policy Search 588(1)
18.4 Tractability through Representation 589(5)
18.4.1 Smart State-Action Discretization 590(2)
18.4.2 Function Approximation 592(1)
18.4.3 Pre-structured Policies 592(2)
18.5 Tractability through Prior Knowledge 594(2)
18.5.1 Prior Knowledge through 594(2)
Demonstrations
18.5.2 Prior Knowledge through Task 596(1)
Structuring
18.5.3 Directing Exploration with Prior 596(1)
Knowledge
18.6 Tractability through Simulation 596(3)
18.6.1 Role of Models 597(1)
18.6.2 Mental Rehearsal 598(1)
18.6.3 Direct Transfer from Simulated 599(1)
to Real Robots
18.7 A Case Study: Ball-in-a-Cup 599(4)
18.7.1 Experimental Setting: Task and 599(2)
Reward
18.7.2 Appropriate Policy Representation 601(1)
18.7.3 Generating a Teacher's 601(1)
Demonstration
18.7.4 Reinforcement Learning by Policy 601(2)
Search
18.7.5 Use of Simulations in Robot 603(1)
Reinforcement Learning
18.7.6 Alternative Approach with Value 603(1)
Function Methods
18.8 Conclusion 603(10)
References 604(9)
Part VI Closing
19 Conclusions, Future Directions and 613(18)
Outlook
Marco Wiering
Martijn van Otterlo
19.1 Looking Back 613(7)
19.1.1 What Has Been Accomplished? 613(1)
19.1.2 Which Topics Were Not Included? 614(6)
19.2 Looking into the Future 620(11)
19.2.1 Things That Are Not Yet Known 620(2)
19.2.2 Seemingly Impossible 622(1)
Applications for RL
19.2.3 Interesting Directions 623(1)
19.2.4 Experts on Future Developments 624(2)
References 626(5)
Index 631