Simulation-Based Algorithms for Markov Decision Processes (2nd ed.) [Chang, Hu, Fu & Marcus 2013-02-23].pdf
(
2643 KB
)
Pobierz
Communications and Control Engineering
For further volumes:
www.springer.com/series/61
Hyeong Soo Chang
r
Jiaqiao Hu
r
Michael C. Fu
Steven I. Marcus
r
Simulation-Based
Algorithms
for Markov
Decision Processes
Second Edition
Hyeong Soo Chang
Dept. of Computer Science and Engineering
Sogang University
Seoul, South Korea
Jiaqiao Hu
Dept. Applied Mathematics & Statistics
State University of New York
Stony Brook, NY, USA
Michael C. Fu
Smith School of Business
University of Maryland
College Park, MD, USA
Steven I. Marcus
Dept. Electrical & Computer Engineering
University of Maryland
College Park, MD, USA
ISSN 0178-5354 Communications and Control Engineering
ISBN 978-1-4471-5021-3
ISBN 978-1-4471-5022-0 (eBook)
DOI 10.1007/978-1-4471-5022-0
Springer London Heidelberg New York Dordrecht
Library of Congress Control Number: 2013933558
© Springer-Verlag London 2007, 2013
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection
with reviews or scholarly analysis or material supplied specifically for the purpose of being entered
and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of
this publication or parts thereof is permitted only under the provisions of the Copyright Law of the
Publisher’s location, in its current version, and permission for use must always be obtained from Springer.
Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations
are liable to prosecution under the respective Copyright Law.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
While the advice and information in this book are believed to be true and accurate at the date of pub-
lication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any
errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect
to the material contained herein.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)
To Jung Won and three little rascals, Won,
Kyeong & Min, who changed my days into
a whole world of wonders and joys
– H.S. Chang
To my family – J. Hu
To my mother, for continuous support, and to
Lara & David, for mixtures of joy & laughter
– M.C. Fu
To Shelley, Jeremy, and Tobin – S. Marcus
Preface to the 2nd Edition
Markov decision process (MDP) models are widely used for modeling sequential
decision-making problems that arise in engineering, computer science, operations
research, economics, and other social sciences. However, it is well known that many
real-world problems modeled by MDPs have huge state and/or action spaces, lead-
ing to the well-known curse of dimensionality, which makes solution of the result-
ing models intractable. In other cases, the system of interest is complex enough
that it is not feasible to explicitly specify some of the MDP model parameters,
but simulated sample paths can be readily generated (e.g., for random state tran-
sitions and rewards), albeit at a non-trivial computational cost. For these settings,
we have developed various sampling and population-based numerical algorithms to
overcome the computational difficulties of computing an optimal solution in terms
of a policy and/or value function. Specific approaches include multi-stage adap-
tive sampling, evolutionary policy iteration and random policy search, and model
reference adaptive search. The first edition of this book brought together these al-
gorithms and presented them in a unified manner accessible to researchers with
varying interests and background. In addition to providing numerous specific algo-
rithms, the exposition included both illustrative numerical examples and rigorous
theoretical convergence results. This book reflects the latest developments of the
theories and the relevant algorithms developed by the authors in the MDP field,
integrating them into the first edition, and presents an updated account of the top-
ics that have emerged since the publication of the first edition over six years ago.
Specifically, novel approaches include a stochastic approximation framework for
a class of simulation-based optimization algorithms and applications into MDPs
and a population-based on-line simulation-based algorithm called approximation
stochastic annealing. These simulation-based approaches are distinct from but com-
plementary to those computational approaches for solving MDPs based on explicit
state-space reduction, such as neuro-dynamic programming or reinforcement learn-
ing; in fact, the computational gains achieved through approximations and para-
meterizations to reduce the size of the state space can be incorporated into most of
the algorithms in this book.
vii
Plik z chomika:
musli_com
Inne pliki z tego folderu:
Algorithm Design for Networked Information Technology Systems [Ghosh 2003-11-18].pdf
(122310 KB)
Algorithm Design.pdf
(43807 KB)
3D Imaging in Medicine_ Algorithms, Systems, Applications [Höhne, Fuchs & Pizer 2011-12-08].pdf
(21977 KB)
2D Object Detection and Recognition_ Models, Algorithms, and Networks [Amit 2002-11-01].pdf
(7379 KB)
A History of Algorithms - From the Pebble to the Microchip.djvu
(6719 KB)
Inne foldery tego chomika:
0_Computer History
1_Principles of Programming Languages
3_Theory
4_Theory of Computation
5_Parallel and Distributed
Zgłoś jeśli
naruszono regulamin