This article is transferred from the public number qubit,Original address

Little is known about how the brain works, but we know that the brain can learn through repeated attempts. When we make the right choices, we get rewards and we are punished when we make inappropriate choices. This is how we adapt to the environment. Today, we can use powerful computing power to model this specific process in software, which is intensive learning.

Recently, an article on the Algorithmia blog introduced intensive learning in detail from five aspects: basic knowledge, decision-making process, practical application, practical challenges, and learning resources. The qubits are carried, the following is the translation:

Basic knowledge

We can use video games to understand Reinforcement Learning (RL), which is the simplest mental model. It happens that video games are also one of the most widely used areas of reinforcement learning algorithms. In classic video games, there are the following types of objects:

  • Agent (agent), free to move, corresponding to the player;
  • Actions, made by agents, including moving up and selling items, etc.;
  • Reward, obtained by the agent, including gold coins and killing other players;
  • Environment, refers to the map or room where the agent is located;
  • State, the current state of the agent, such as being located in a particular square or a corner of the room in the map;
  • Goal, the goal of the agent is to get as many rewards as possible;

These objects are specific components of reinforcement learning, and of course can be modeled on the various parts of machine learning. In reinforcement learning, after setting up the environment, we can guide the agent by status one by one, and get rewards when the agent makes the correct action. If you understand the Markov decision process (https://en.wikipedia.org/wiki/Markov_decision_process), you can better understand the above process.

In the labyrinth below, there is a mouse:

Imagine you are the mouse. What do you do to collect as many rewards (water droplets and cheese) as possible in the maze? In each state, the position in the maze, you need to figure out what steps are needed to get a nearby reward. When there are 3 rewards on the right and 1 rewards on the left, you will choose to go right.

This is how reinforcement works. In each state, the agent calculates and evaluates all possible actions (up, down, left, and right) and selects the action that will receive the most rewards. After a few steps, the mice in the maze will become familiar with the maze.

But how do you determine which action will give the best results?

Decision-making process

Decision Making, which is how to make agents do the right thing in an intensive learning environment, gives two ways.

Strategic learning

Policy Learning can be understood as a set of very detailed instructions that tell the agent what to do at each step. This strategy can be compared to: When you are close to the enemy, if the enemy is stronger than you, then go back. We can also think of this strategy as a function, which has only one input, the current state of the proxy. But knowing your strategy in advance is not an easy task. We need to understand this complex function that maps states to targets.

There are some interesting studies in this area of ​​using deep learning to explore strategic issues in an intensive learning scenario. Andrej Karpathy built a neural network to teach agents to play table tennis (http://karpathy.github.io/2016/05/31/rl/). This doesn't sound surprising, because neural networks can nicely approximate arbitrarily complex functions.

Q-Learningalgorithm

Another way to direct the agent is to have the agent act on its own, depending on the current environment, rather than explicitly telling it what to do in each state. Unlike strategy learning, the Q-Learning algorithm has two inputs, a state and an action, and returns a corresponding value for each state action pair. When you are faced with a choice, this algorithm calculates the expected value of the agent when it takes different actions (up, down, left, and right).

The innovation of Q-Learning is that it not only estimates the short-term value of action taken in the current state, but also the potential future value that can be brought about by the specified actions. This is similar to discounted cash flow analysis in corporate finance, which also takes into account all potential future values ​​when determining the current value of an action. Since future rewards will be less than current rewards, the Q-Learning algorithm also uses a discount factor to simulate this process.

Strategy learning and the Q-Learning algorithm are the two main methods of coaching agents in reinforcement learning, but some researchers try to use deep learning techniques to combine the two or propose other innovative solutions. DeepMind proposes a neural network (https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf) called Deep Q Networks (DQN) to approximate the Q-Learning function and obtain Very good results. Later, they combined the Q-Learning method with strategy learning and proposed a method called A3C (https://arxiv.org/abs/1602.01783).

Combining neural networks with other methods can sound complicated. Keep in mind that these training algorithms have a simple goal of effectively directing agents throughout the environment to get the most out of it.

Practical application

Although intensive learning research has been underway for decades, it is reported to be limited in its current business environment (https://www.oreilly.com/ideas/practical-applications-of-reinforcement-learning- In-industry). There are many reasons for this, but they all face a common problem: there is still a gap between the performance of reinforcement learning on some tasks and the current application algorithms.

Most of the applications of reinforcement learning have been in video games over the past decade. The latest intensive learning algorithms have achieved great results in both classic and modern games, and in some games they have defeated human players with a big advantage.

The above image is from DQN paper by DeepMind. In more than half of the test games, the agents in the paper are superior to the human test benchmarks, usually twice the human level. But in some games, this algorithm performs worse than humans.

Reinforcement learning also has some successful practical applications in robotics and industrial automation. We can understand robots as agents in the environment, and reinforcement learning has proven to be a viable guide. It is worth mentioning that Google also uses reinforcement learning to reduce the operating costs of the data center.

Reinforcement learning is also expected to be applied in health care and education, but most of the current research is still in the laboratory.

Practice challenge

The application prospects of reinforcement learning are very bright, but the practice path will be very tortuous.

The first is the data issue. Reinforcement learning usually requires a large amount of training data to achieve performance levels that other algorithms can achieve at high efficiency. DeepMind recently proposed a new algorithm called RainbowDQN that requires the 1800 megaframe Atari game interface, or about 83 hour game video to train the model, and humans learn the game much less time than the algorithm. This problem also appears in the task of gait learning.

Another challenge in strengthening learning in practice is domain-specificity. Reinforcement learning is a general-purpose algorithm that should theoretically apply to a variety of different types of problems. However, most of these problems have a domain-specific solution that tends to outperform reinforcement learning methods, such as the online trajectory optimization of the MuJuCo robot. Therefore, we have to weigh the relationship between scope and intensity.

Finally, in reinforcement learning, the most pressing issue at present is to design reward functions. Algorithm designers often have some subjective understanding when designing rewards. Even if there is no such problem, reinforcement learning may fall into local optimum when training.

The above mentioned a lot of challenges in the practice of reinforcement learning, I hope that subsequent research can continue to solve these problems.

Learning Resources

Function library

1, RL-Glue: Provides a standard interface for connecting reinforcement learning agents, environments, and labs, and for cross-language programming.

http://glue.rl-community.org/wiki/Main_Page

2, Gym: Developed by OpenAI, is a toolkit for developing enhanced learning algorithms and performance comparisons that train agents to learn many tasks, including walking and playing table tennis games.

https://gym.openai.com/

3, RL4J: is an enhanced learning framework integrated under the deeplearning4j library, and has been licensed by Apache 2.0.

https://github.com/deeplearning4j/rl4j

4, TensorForce: A TensorFlow library for reinforcement learning.

https://github.com/reinforceio/tensorforce

Proceedings

1, self-game with universal reinforcement learning algorithm to master chess and chess

Title: Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

https://arxiv.org/abs/1712.01815

This article has 13 authors and proposed the AlphaZero method. In this paper, the authors extend the previous AlphaGo Zero approach to a single AlphaZero algorithm that can surpass human performance in multiple challenging areas, as well as “whiteboard” reinforcement learning (“whiteboard”). It means that all knowledge is derived from the senses and experience, that is, learning from scratch). Starting from random chess, in addition to the rules of the game, without inputting any domain knowledge, AlphaZero achieved the performance of transcending human level in chess, chess and Go in 24 hours, and was convincing on all three kinds of chess. The results beat the current world championship program.

2, deepening reinforcement learning review

Title: Deep Reinforcement Learning: An Overview

https://arxiv.org/abs/1701.07274

This paper outlines some of the latest exciting work in Deep Reinforcement Learning, which highlights six core elements, six important mechanisms, and twelve related applications. The article first introduces the background of machine learning, deep learning and reinforcement learning, and then discusses the core elements of reinforcement learning, including DQN networks, strategies, rewards, models, planning and search.

3, use Intensive Learning to play Atari games

Title: Playing Atari with Deep Reinforcement Learning

https://arxiv.org/abs/1312.5602

This is the NIPS paper of DeepMind's 2014 year. This paper proposes a deep learning method that uses reinforcement learning to learn control strategies directly from high-dimensional perceptual input. The model is a convolutional neural network that uses Q-learning variants for training, the input is the original pixel, and the output is a value function that predicts future rewards. This method is applied to Atari 2600 games without the need to adjust the structure and learning algorithms. In the seven games tested, 6 is more than the previous method and 3 is more than human.

4, using deep reinforcement learning to achieve human level control

Title: Human-Level Control Through Deep Reinforcement Learning

https://web.stanford.edu/class/psych209/Readings/MnihEtAlHassibis15NatureControlDeepRL.pdf

This is the Nature paper of DeepMind's 2015 year. Reinforcement learning theory is rooted in psychology and neuroscience about animal behavior, and it is a good explanation of how agents optimize their control of the environment. In order to successfully use the reinforcement learning algorithm in a real and complex physical world, the agent must face this difficult task: using high-dimensional sensor input data to derive effective representation of the environment and generalizing previous experience into a new unknown environment. .

Lecture tutorial

1, reinforcement learning (Georgia Tech, CS 8803)

https://www.udacity.com/course/reinforcement-learning—ud600

Official website: If you are interested in machine learning and want to learn from a theoretical perspective, you should choose this course. This course introduces the classic papers and the latest work, and takes everyone from the perspective of computer science to explore the charm of automatic decision-making. This course examines the corresponding efficient algorithms for single-agent and multi-agent planning and learning near-best decisions from experience. At the end of the course, you will have the ability to reproduce the papers that have been published in the intensive study.

2, reinforcement learning (Stanford, CS234)

http://web.stanford.edu/class/cs234/index.html

Official website introduction: To achieve true artificial intelligence, the system must be able to learn independently and make the right decisions. Reinforcement learning is a powerful paradigm that can be applied to many tasks, including robotics, game games, consumer modeling, and medical services. This course provides a detailed introduction to intensive learning. You can learn about current problems and main methods through learning, including how to generalize and search.

3, deep reinforcement learning (Berkeley, CS 294, Fall 2017)

http://rll.berkeley.edu/deeprlcourse/

Official website introduction: This course requires a certain amount of basic knowledge, including reinforcement learning, numerical optimization and machine learning. We encourage students who are unfamiliar with the following concepts to read the references provided below. A brief review of these things will begin before the class begins.

4, deep learning with Python (Udemy advanced tutorial)

https://www.udemy.com/deep-reinforcement-learning-in-python/

Official website introduction: This course mainly introduces the application of deep learning and neural network in reinforcement learning. This course requires a certain amount of basic knowledge (including reinforcement learning foundation, Markov decision, dynamic programming, Monte Carlo search and sequential differential learning), and deep learning basic programming.