A Tutorial on Linear Function Approximators for Dynamic Programming and Reinforcement Learning (Foundations and Trends® in Machine Learning)

個数：

A Tutorial on Linear Function Approximators for Dynamic Programming and Reinforcement Learning (Foundations and Trends® in Machine Learning)

Geramifard, Alborz/ Walsh, Thomas J./ Stefanie, Tellex

ウェブストア価格 ¥18,068（本体¥16,426）
now publishers Inc（2013/12発売）
外貨定価 UK£ 58.50
ポイント 164pt

在庫がございません。海外の書籍取次会社を通じて出版社等からお取り寄せいたします。
通常6～9週間ほどで発送の見込みですが、商品によってはさらに時間がかかることもございます。
【重要ご説明事項】
1. 納期遅延や、ご入手不能となる場合がございます。
2. 複数冊ご注文の場合は、ご注文数量が揃ってからまとめて発送いたします。
3. 美品のご指定は承りかねます。

●3Dセキュア導入とクレジットカードによるお支払いについて

【入荷遅延について】
世界情勢の影響により、海外からお取り寄せとなる洋書・洋古書の入荷が、表示している標準的な納期よりも遅延する場合がございます。
おそれいりますが、あらかじめご了承くださいますようお願い申し上げます。

◆画像の表紙や帯等は実物とは異なる場合があります。

◆ウェブストアでの洋書販売価格は、弊社店舗等での販売価格とは異なります。
また、洋書販売価格は、ご注文確定時点での日本円価格となります。
ご注文確定後に、同じ洋書の販売価格が変動しても、それは反映されません。

製本 Paperback:紙装版/ペーパーバック版／ページ数 92 p.
言語 ENG
商品コード 9781601987600
DDC分類 006.31

Full Description

A Markov Decision Process (MDP) is a natural framework for formulating sequential decision-making problems under uncertainty. In recent years, researchers have greatly advanced algorithms for learning and acting in MDPs.

This book reviews such algorithms, beginning with well-known dynamic programming methods for solving MDPs such as policy iteration and value iteration, then describes approximate dynamic programming methods such as trajectory based value iteration, and finally moves to reinforcement learning methods such as Q-Learning, SARSA, and least-squares policy iteration. It describes algorithms in a unified framework, giving pseudocode together with memory and iteration complexity analysis for each. Empirical evaluations of these techniques, with four representations across four domains, provide insight into how these algorithms perform with various feature sets in terms of running time and performance.

This tutorial provides practical guidance for researchers seeking to extend DP and RL techniques to larger domains through linear value function approximation. The practical algorithms and empirical successes outlined also form a guide for practitioners trying to weigh computational costs, accuracy requirements, and representational concerns. Decision making in large domains will always be challenging, but with the tools presented here this challenge is not insurmountable.

1: Introduction 2: Dynamic Programming and Reinforcement Learning 3: Representations 4: Empirical Results 5: Summary. Acknowledgements. References