Deep Reinforcement Learning Hands-On – Maxim Lapan – 1st Edition

Descripción

Esta guía práctica le enseñará cómo se puede utilizar el aprendizaje profundo $DL$ para resolver problemas complejos del mundo real. Características principales Explore el aprendizaje de refuerzo profundo $RL$, desde los primeros principios hasta los últimos algoritmos Evalúe métodos de RL de alto perfil, que incluyen iteración de valor, redes Q profundas, gradientes de políticas, TRPO, PPO, DDPG, D4PG, estrategias de evolución y algoritmos genéticos Manténgase al día con los últimos desarrollos de la industria, incluidos los chatbots impulsados ????por la Descripción del libro Los desarrollos recientes en aprendizaje de refuerzo $RL$, combinados con aprendizaje profundo $DL$, han visto un progreso sin precedentes en el entrenamiento de agentes para resolver problemas complejos de una manera similar a la humana. El uso de algoritmos de Google para jugar y vencer a los conocidos juegos de arcade de Atari ha impulsado el campo a la prominencia, y los investigadores están generando nuevas ideas a un ritmo rápido. Deep Reinforcement Learning Hands-On es una guía completa de las últimas herramientas de DL y sus limitaciones. Evaluará métodos que incluyen entropía cruzada y gradientes de políticas, antes de aplicarlos a entornos del mundo real.

Enfréntete tanto al conjunto de juegos virtuales de Atari como a los favoritos de la familia, como Connecta. El libro proporciona una introducción a los conceptos básicos de RL, brindándote los conocimientos necesarios para codificar agentes de aprendizaje inteligentes para asumir una formidable variedad de tareas prácticas. Descubre cómo implementar Q-learning en entornos de “mundo de cuadrícula”, enséñale a tu agente a comprar y negociar acciones, y descubre cómo los modelos de lenguaje natural están impulsando el auge de los chatbots. Lo que aprenderás Comprenda el contexto DL de RL e implemente modelos DL complejos Aprenda los fundamentos de RL: procesos de decisión de Markov Evalúe los métodos RL que incluyen entropía cruzada, DQN, actor-crítico, TRPO, PPO, DDPG, DAPG y otros Descubra cómo lidiar con espacios de acción discretos y continuos en varios entornos Derrote a los juegos de arcade de Atari usando el método de iteración de valor Cree su propio entorno OpenAI Gym para entrenar a un agente de negociación de acciones Enseñe a su agente a jugar Connecta usando AlphaGo Zero Explore la última investigación profunda de RL sobre temas que incluyen chatbots impulsados ????por IA A quién es este libro Se asume cierta fluidez en Python. Los enfoques básicos del aprendizaje profundo $DL$ deberían resultar familiares para los lectores y será útil tener cierta experiencia práctica en DL. Este libro es una introducción al aprendizaje profundo por refuerzo $RL$ y no requiere conocimientos previos de RL.

Ver más
  • Deep Reinforcement Learning Hands-On
    Why subscribe?
    PacktPub.com
    Contributors
    About the author
    About the reviewers
    Packt is Searching for Authors Like You
    Preface
    Who this book is for
    What this book covers
    To get the most out of this book
    Download the example code files
    Download the color images
    Conventions used
    Get in touch
    Reviews
    1. What is Reinforcement Learning?
    Learning – supervised, unsupervised, and reinforcement
    RL formalisms and relations
    Reward
    The agent
    The environment
    Actions
    Observations
    Markov decision processes
    Markov process
    Markov reward process
    Markov decision process
    Summary
    2. OpenAI Gym
    The anatomy of the agent
    Hardware and software requirements
    OpenAI Gym API
    Action space
    Observation space
    The environment
    Creation of the environment
    The CartPole session
    The random CartPole agent
    The extra Gym functionality – wrappers and monitors
    Wrappers
    Monitor
    Summary
    3. Deep Learning with PyTorch
    Tensors
    Creation of tensors
    Scalar tensors
    Tensor operations
    GPU tensors
    Gradients
    Tensors and gradients
    NN building blocks
    Custom layers
    Final glue – loss functions and optimizers
    Loss functions
    Optimizers
    Monitoring with TensorBoard
    TensorBoard 101
    Plotting stuff
    Example – GAN on Atari images
    Summary
    4. The Cross-Entropy Method
    Taxonomy of RL methods
    Practical cross-entropy
    Cross-entropy on CartPole
    Cross-entropy on FrozenLake
    Theoretical background of the cross-entropy method
    Summary
    5. Tabular Learning and the Bellman Equation
    Value, state, and optimality
    The Bellman equation of optimality
    Value of action
    The value iteration method
    Value iteration in practice
    Q-learning for FrozenLake
    Summary
    6. Deep Q-Networks
    Real-life value iteration
    Tabular Q-learning
    Deep Q-learning
    Interaction with the environment
    SGD optimization
    Correlation between steps
    The Markov property
    The final form of DQN training
    DQN on Pong
    Wrappers
    DQN model
    Training
    Running and performance
    Your model in action
    Summary
    7. DQN Extensions
    The PyTorch Agent Net library
    Agent
    Agent's experience
    Experience buffer
    Gym env wrappers
    Basic DQN
    N-step DQN
    Implementation
    Double DQN
    Implementation
    Results
    Noisy networks
    Implementation
    Results
    Prioritized replay buffer
    Implementation
    Results
    Dueling DQN
    Implementation
    Results
    Categorical DQN
    Implementation
    Results
    Combining everything
    Implementation
    Results
    Summary
    References
    8. Stocks Trading Using RL
    Trading
    Data
    Problem statements and key decisions
    The trading environment
    Models
    Training code
    Results
    The feed-forward model
    The convolution model
    Things to try
    Summary
    9. Policy Gradients – An Alternative
    Values and policy
    Why policy?
    Policy representation
    Policy gradients
    The REINFORCE method
    The CartPole example
    Results
    Policy-based versus value-based methods
    REINFORCE issues
    Full episodes are required
    High gradients variance
    Exploration
    Correlation between samples
    PG on CartPole
    Results
    PG on Pong
    Results
    Summary
    10. The Actor-Critic Method
    Variance reduction
    CartPole variance
    Actor-critic
    A2C on Pong
    A2C on Pong results
    Tuning hyperparameters
    Learning rate
    Entropy beta
    Count of environments
    Batch size
    Summary
    11. Asynchronous Advantage Actor-Critic
    Correlation and sample efficiency
    Adding an extra A to A2C
    Multiprocessing in Python
    A3C – data parallelism
    Results
    A3C – gradients parallelism
    Results
    Summary
    12. Chatbots Training with RL
    Chatbots overview
    Deep NLP basics
    Recurrent Neural Networks
    Embeddings
    Encoder-Decoder
    Training of seq2seq
    Log-likelihood training
    Bilingual evaluation understudy (BLEU) score
    RL in seq2seq
    Self-critical sequence training
    The chatbot example
    The example structure
    Modules: cornell.py and data.py
    BLEU score and utils.py
    Model
    Training: cross-entropy
    Running the training
    Checking the data
    Testing the trained model
    Training: SCST
    Running the SCST training
    Results
    Telegram bot
    Summary
    13. Web Navigation
    Web navigation
    Browser automation and RL
    Mini World of Bits benchmark
    OpenAI Universe
    Installation
    Actions and observations
    Environment creation
    MiniWoB stability
    Simple clicking approach
    Grid actions
    Example overview
    Model
    Training code
    Starting containers
    Training process
    Checking the learned policy
    Issues with simple clicking
    Human demonstrations
    Recording the demonstrations
    Recording format
    Training using demonstrations
    Results
    TicTacToe problem
    Adding text description
    Results
    Things to try
    Summary
    14. Continuous Action Space
    Why a continuous space?
    Action space
    Environments
    The Actor-Critic (A2C) method
    Implementation
    Results
    Using models and recording videos
    Deterministic policy gradients
    Exploration
    Implementation
    Results
    Recording videos
    Distributional policy gradients
    Architecture
    Implementation
    Results
    Things to try
    Summary
    15. Trust Regions – TRPO, PPO, and ACKTR
    Introduction
    Roboschool
    A2C baseline
    Results
    Videos recording
    Proximal Policy Optimization
    Implementation
    Results
    Trust Region Policy Optimization
    Implementation
    Results
    A2C using ACKTR
    Implementation
    Results
    Summary
    16. Black-Box Optimization in RL
    Black-box methods
    Evolution strategies
    ES on CartPole
    Results
    ES on HalfCheetah
    Results
    Genetic algorithms
    GA on CartPole
    Results
    GA tweaks
    Deep GA
    Novelty search
    GA on Cheetah
    Results
    Summary
    References
    17. Beyond Model-Free – Imagination
    Model-based versus model-free
    Model imperfections
    Imagination-augmented agent
    The environment model
    The rollout policy
    The rollout encoder
    Paper results
    I2A on Atari Breakout
    The baseline A2C agent
    EM training
    The imagination agent
    The I2A model
    The Rollout encoder
    Training of I2A
    Experiment results
    The baseline agent
    Training EM weights
    Training with the I2A model
    Summary
    References
    18. AlphaGo Zero
    Board games
    The AlphaGo Zero method
    Overview
    Monte-Carlo Tree Search
    Self-play
    Training and evaluation
    Connect4 bot
    Game model
    Implementing MCTS
    Model
    Training
    Testing and comparison
    Connect4 results
    Summary
    References
    Book summary
    Other Books You May Enjoy
    Leave a review - let other readers know what you think
    Index
  • Citar Libro

Descargar Deep Reinforcement Learning Hands-On

Tipo de Archivo
Idioma
Descargar RAR
Descargar PDF
Páginas
Tamaño
Libro
Inglés
827 pag.
11 mb

¿Qué piensas de este libro?

No hay comentarios

guest
Valorar este libro:
0 Comentarios
Comentarios en línea
Ver todos los comentarios
0
Nos encantaría conocer tu opinión, comenta.x