Machine learning depends on tuning? This idea is out of date.
The Google Brain team has released a new study:
Only rely onNeural network architectureThe searched network, without training, without tuning, can perform tasks directly.
Such a network is calledWHEN, weights are unknowable neural networks.
It achieved 92% accuracy without training and weight adjustment on the MNIST digital classification task, which is comparable to the performance of the trained linear classifier.
Apart from Supervised learning, WANN can also be qualified for many enhancementsLearning tasks.
David Ha, one of the team members, sent the results to Twitter and has won 1300 praise:
So let's take a look at the effect first.
effect
Google brain used the WANN to handle 3 reinforcement learning tasks.
(Give each group of neurons share the same weight.)
The first task, Cart-Pole Swing-Up.
This is a classic control mission, a slide rail, a trolley, and a pole on the car.
The car runs in the range of the slide rails, and the poles are shaken from the natural drooping state, and remain in the upright position without falling.
(This task is harder than simply Cart-Pole:
The initial position of the Cart-Pole pole is upright, and you don't need a car to shake it up, just keep it. )
The difficulty is reflected in that there is no way to solve it with a Linear Controller. The reward for each time step is based on the car to the rails.distanceAnd the pole swinging角度.
WANN's Best Network (Champion Network) looks like this:
It has performed well without training:
Best performingSharing weightIt gave the team a very satisfactory result: it reached equilibrium with only a few swings.
The second task, Bipedal Waker-v2.
A two-legged "biological", going forward on a randomly generated road, crossed the bulge and crossed the crater. How much to reward, just look at it from the start to the hang upHow long,as well asMotor torque cost (To encourage efficient exercise).
The movement of each leg is controlled by a hip joint and a knee joint. There are 24 inputs that will guide its movements: including frontal terrain data detected by Lidar, joint speed experienced by the body, and so on.
Compared to the first taskLow dimensional inputThe possible network connections here are more diverse:
Therefore, WANN is required to choose the wiring method from input to output.
This high-dimensional task, WANN is also completed with high quality.
You see, this is the best architecture to search out, which is a lot more complicated than the low-dimensional tasks just now:
It runs under the weight of -1.5, which looks like this:
The third task, CarRacing-v0.
This is a top-down, pixel-based racing game.
A car is controlled by three consecutive commands: throttle, steering, and braking. The goal is to pass as many bricks as possible within the stipulated time. The track is randomly generated.
The researchers handed over the work of interpreting each pixel (Pixel Interpretation) to a pre-trained variational self-encoder (VAE) that compresses pixel representations into 16 potential dimensions.
This 16 dimension is the dimension of the network input. The learned feature is used to detect the ability of WANN to learn Abstract Associations, rather than coding explicit geometric relationships between different inputs.
This is the best network of WANN, under the weight of -1.4 sharing, untrained racing results:
Although the road went awkwardly, it rarely ran away.
And put the best networkFine tuningIf you don't need training, it will be smoother:
To sum up, inSimplicity and modularityThe second and third tasks performed well. The two-legged controller only used 25 of 17 possible inputs, ignoring the speed of many LIDAR sensors and knee joints.
The WANN architecture can not only complete tasks without training a single weight, but also use only210 network connections (Connections), an order of magnitude less than the 2804 connections used in the current State-of-the-Art model.
After completing the intensive study, the team aimed again.MNIST, extending WANN to the classification task of supervised learning.
An ordinary network, in the case of random initialization of parameters, the accuracy of MNIST may only be10%about.
The network architecture WANN searched by the new method uses random weights to run, and the accuracy rate has exceeded80%;
If it is just mentioned, feed itCollection of multiple weights, the accuracy rate has been reached91.6%.
In contrast, the fine-tuned weights give an accuracy of 91.9%, trained weights, and can bring 94.2% accuracy.
Let's compare it, a linear classifier with thousands of weights:
It is also just as accurate as WANN when it is completely untrained, not fine-tuned, and only fed some random weights.
The paper emphasizes that MINST handwritten digit classification isHigh-dimensional classification task. WANN performed very well.
And there is no weight, it is better than other values, everyone is very balanced:So random weights are feasible.
However, each of the different networks formed by different weights has its own number that is good at distinguishing, so a WANN with multiple weights can be used as a self-contained Ensemble.
Principle of implementation
How to achieve a very high accuracy without training weight parameters, how is WANN done?
The neural network not only has the power to bias these parameters, but the topology of the network and the choice of the activation function will affect the final result.
Researchers at Google's brain questioned at the beginning of the paper: How important is the weighting parameter of a neural network compared to its architecture? The extent to which a neural network architecture can affect a given task without learning any weight parameters.
To this end, the researchers proposed a neural network architecture search method that does not require training weights to find the minimum neural network architecture for performing intensive learning tasks.
Google researchers have also used this method in the field of supervised learning, using only random weights to achieve a much higher accuracy on MNIST than random guessing.
Paper fromArchitecture search, Bayesian neural network, algorithmic information theory, network pruning, neuroscienceInspired by these theories.
In order to generate a WANN, the impact of weights on the network must be minimized. Random sampling with weights can ensure that the final network is the product of architectural optimization, but it is too difficult to perform random sampling of weights in high-dimensional space.
The researchers took a "simple and rude" approach, forcing weight-sharing on ownership and reducing the number of weights to one. This efficient approximation can drive searches for better architectures.
Steps
Solved the problem of weight initialization, the next question is how to receive the search weight agnostic neural network. It is divided into four steps:
1, creating the initial minimum neural network topology group.
2, evaluate each network with multiple rollouts, and assign different share weight values to each rollout.
3, sorting the network based on performance and complexity.
4 creates new groups based on the highest ranked network topology and makes probabilistic choices through competitive outcomes.
Then, the algorithm repeats from the 2 step, and in successive iterations, a weight agnostic topologies with increasing complexity are generated.
Topology search
The operation for searching the neural network topology is affectedNeural evolutionary algorithmInspired by (NEAT). In NEAT, the topology and weight values are optimized at the same time, and the researchers ignore the weights and only perform topology search operations.
The figure above shows the specific operation of the network topology space search:
At the beginning, the network is the leftmost minimal topology, with only some of the inputs and outputs connected.
Then, the network changes in three ways:
1,Insert node: Split an existing connection into a new node.
2,Add a connection: Join two previously unconnected nodes and add a new one.
3,Change activation function: Reassign the activation function of the hidden node.
The rightmost side of the graph shows the possible activation functions for weights in the range [2,2], such as linear functions, step functions, sine cosine functions, ReLU, and so on.
Weight is still important
Compared to traditional fixed-topology networks, WANN can achieve better results with a single random share weight.
Although WANN achieves the best results in multiple tasks, WANN is not completely independent of the weight value, and sometimes fails when a single weight value is randomly assigned.
WANN works by coding the relationship between input and output. Although the importance of weights is not high, their consistency, especially the consistency of symbols, is key.
Another benefit of random sharing weights is that the effect of adjusting a single parameter becomes unimportant and does not require the use of a gradient-based approach.
The results of the intensive learning task allow the author to consider the scope of application of the WANN approach. They also tested the performance of WANN on the image classification basic task MNIST, and the results were not good when the weight was close to 0.
Reddit users questioned the results of WANN. For the case where the random weight is close to 0, the performance of the network is not good. The specific performance of the learning experiment is that the car will run out of the limited range.
In this regard, the author gives an explanation, in the case that the weight tends to 0, the output of the network will also tend to 0, so the latter optimization is difficult to achieve better performance.
Portal
Source code:
https://github.com/weightagnostic/weightagnostic.github.io
This article is reproduced in the public intelligence expert,Original address
Comments