This article is transferred fromHow to Fool Artificial Intelligence"

The original text is in English, and Google's machine translation is not very effective, but it does not affect the overall understanding.

Self-driving cars.Voice-operated smart home.Automated medical diagnosis.A chat robot that can replace teachers and therapists.

Thanks to artificial intelligence, these technologies no longer exist only in the far-fetched world of science fiction.Driven by AI, 在 Hundreds of itemsAmong the disruptive technologies in the industry, they only account forThree quarters.

Artificial intelligence is rapidly integrating into our daily lives.

However, the problem remains until we delegate a lot of responsibility to the algorithms that magically do the work for us.

Do they always work?Can we always trust them?

Can we trust their lives?

If bad actors have a wayPurposelyWhat should I do to induce an algorithm to make a wrong decision?

For better or worse, this has actually been extensively studied in a subfield of machine learning calledFight against machine learning.It gave birth toTechniques that can "follow" ML modelsAnd canTechniques to defend against these attacks.

Adversarial attacks involve providing modelsAdversarial example,These onesExamplesBeInput designed by the attacker to deliberately deceive the model to produce wrong output.

Before delving into how adversarial examples are created, let's look at some examples.

Examples in confrontation

Let's play a game of finding differences:

Picture release

If you don't think of anything, don't worry.The tired eyes of the screen have not disappointed you.

In human eyes, they should look exactly like fluffy pandas.

But for image classification algorithms, there is a big difference between the two.This is what the award-winning visual recognition system GoogLeNet saw:

Picture release
Classification results of GoogLeNet

It made a wrong mistake on the second image.In fact, its misunderstanding of the panda for a gibbon is even worse than the original correct decision.More confident.

what happened! ?

As shown above, apply the filter to the original image.Although the changes are imperceptible to us-we still see the same black and white coats, blunt mouths and classic dark circles that tell us that we are looking at a panda-what the algorithm sees is completely different.

Panda image with a layer of noise (often calledDisturbance)The combination is an adversarial example, leading to model classification errors.

Some other visual confrontation examples:

Just like the visual illusion of a machine, the visual confrontation example can make the model "translucent" and see what is not there.

Picture release
After being covered by a few pieces of tape, the stop sign became a speed limit sign in the eyes of the self-driving car.

Imagine what impact this will have once self-driving cars come out.Although the tape can easily be mistaken for our careless graffiti, the car will ignore the stop sign and drive directly into the accident.

Picture release
Colorful squares do othersInvisibleAn object recognition algorithm.

If integrated into the T-shirt design, it can effectively become the "invisible cloak" of the automatic surveillance system.

There is also an audio example...

Picture release
Small changes in the amplitude of the sound waves cause "how are you?" called "open the door".

As you can imagine, the ability to make drastic changes to the audio input (we can't even hear our own voice) could have quite serious consequences in the future of voice-controlled smart homes.

All in all, the adversarial example is very cool...alsoVery irritatingWorry.

To understand the "destructive power" of all these latest algorithms, let's take a look at how the model learns to make decisions in the first place.

Train machine learning algorithms

At the most basic level, the machine learning algorithm consists ofComposition of artificial neurons.If we think of a neural network as a factory that removes output from input, neurons are like smaller pipelines, that is, sub-units that make up a more comprehensive processing system.

The neuron inputs the input (Xn), each input is multiplied by the weight (Wn), added, added to the deviation (b), and fed into the activation function before separating the output.

Picture release

regardlessWeight and prejudicebelieved to beModel parameters, The internal variables of the model, allowing it to process the input data in a specific way.In the training process, the accuracy of the model can be improved by updating the parameters.*G9QAXILJJWEiNu6Pf9GeNQ.png

When multiple layers of neurons are stacked on top of each other, the output of one layer becomes the input of the next layer, and we getDeep neural network.This allows more complex calculations and data processing.*uKaERRDZxh8kkxMJ9Cozjg.png

Each of these nodes has its own weights and biases, all of which contribute to the parameter set of the entire model.

So, how do we train these model parameters so that they give us the most accurate output?

answer:Loss function.

Conceptually, the loss function representsThe distance between the model output and the target output.It basically tells us how good our model is.Mathematically speaking, a higher loss value indicates a lower precision output, and a lower loss value indicates a higher precision output.

According to the modelInput, actual output, expected output and parametersFind the loss function of the model,ThisIs the key to training the model.

Gradient descent: find the best model parameters

Intuitively, we wantMinimize the loss function while updating the value of each parameter,So that the prediction of the model can become more accurate.We passedGradient descent comesComplete this process.This process allows us to find the minimum point of the loss function.

Picture release

Let us look at an example.Hypothetical loss functionf(w)It is represented by the quadratic function shown on the left.It only depends on one parameter:w.

The algorithm first selects a random initial value W0, and then calculates the derivativeF'(Watt₀ OfF (W 0)OnW 0, Since the slope of the derivative is negative, we know that we can reduceF(X),By increasingW 0.

Therefore, we use light weight in the model to get the right, fromW 0ToW 1, As our losses are reducedF (w ^ ₀ ToF (w ^ ₁ .This simple adjustment is calledLearning steps.exist(watt₁,F (Watt₁ ), the derivative is taken again.Since it again has a negative slope, another learning step is required on the right and below.

Picture release

Gradually, we arewmThe minimum value of the curve is reached at where the slope of the derivative is zero.At this point, our loss is minimal.

And... the viola!We finally get the weight w, which has been trained to help our model make more accurate decisions.

When the parameters of the model involve a large number of weights and deviations, the complexity of this process is much higher.Instead of the quadratic curve, it will targetAll weights and deviationsPlot the loss function.This is a size that we can't even visualize.

However, the intuition remains the same.We will take the derivative of the loss function for each parameter and update the parameter set accordingly in a stepwise manner.

Now that we have a basic understanding of the learning methods of machine learning models, we can delve into how to make adversarial examples to "break" them.

How to create adversarial examples?

This process also depends on the loss function.Essentially, we attack machine learning models in the same way as learning machine learning models.

Training is throughOn Minimize the loss functionTo update the model parameters at the same time, andThe adversarial example is throughUpdate the input while maximizing the loss functionTo generate.

But please wait a moment.Doesn't this just provide us with input that looks very different from the original input?How do the adversarial examples we see (like the panda at the beginning of this article) look the same as the original pictures?

To answer this question, let's look at the mathematical representation of the adversarial example:

Picture release

J (X, Y)Represents the loss function,among themXFor input. YFor output.In the case of image data, X will be a matrix of values ​​representing each pixel.

OfDerivative operations represented by symbols Function in all its input pixels.As before, when we try to determine whether to fine-tune each pixel value up or down, The sign of the slope of each derivative (positive and negative)Allimportant,thereforesign()The function works.

In order to make the adjustment to the pixel value invisible to our eyes, we multiply these changes by a very small valueε.

Therefore, the wholeε.sign(∇xJ(X, Y)) The value is oursDisturbance,A matrix of values ​​representing the change of each image pixel.Disturbance is added to our original image to create a confrontational image.

Picture release
Add a layer of perturbation (also called "noise") to the image of the goldfish to create adversarial examples.

This is calledFast gradient symbol method(FGSM).

One caveat: Assume that the attacker has full access to the model's gradients and parameters.In the real world, this is usually not the case.

Usually, only the model developer can understand the exact parameters of the algorithm.However, since there are multiple attack methods, there are multiple methods to solve this problem.

Types of adversarial attacks

Picture release

Attack methods can be classified according to three different criteria:

  1. the amountknowledge, The attacker has the right model
  2. 位置Attacks within the period of model development and deployment
  3. Attacker'sintentionOr goal.

Let's break it down further.

Knowledge specific attack

  • White box attack:The attacker has full access to the internal structure of the model (including gradients and parameters), and then can use the internal structure to generate adversarial examples.
  • Black box attack:attackerDon't HaveInformation about the internal structure of the model.The model is considered a "black box" because it can only be observed from the outside-we can only see what output it provides for input.However, using these inputs and outputs, we can create and train "agent" models from which adversarial examples can be generated.

Location-specific attack

  • Training attack: willThe manipulated input corresponding to the error output is injected into the training data, so even before deployment, the model architecture itself has defects.
  • Reasoning attack:No tampering with training data or model architecture.When training the model to prompt incorrect outputRear,Enter adversarial input into the model.

Purpose-specific attack

  • Targeted attacks:Manipulate the input to change the output tospecific When wrong answer.For example, the attacker's goal may be to recognize a stop sign as a speed limit sign.
  • Non-targeted attack:Manipulate the input to change the output toAnything other than the correct answer.For example, when generating adversarial examples, the attacker can recognize the stop sign as a speed limit sign, a yield sign, or a U-turn sign, and that's it.

The method we use FGSM is aWhite box untargeted attack.For the different applications listed above,alsoThere are other mechanisms, such asBasic Iterative Method (BIM)Attack based on Jacobi saliency map (JSMA).

In order to keep this article at an introductory level, I will not go into the details of how other methods work.If you are interested, I suggest you read the comprehensive taxonomy of the National Institute of Standards and Technology:NIST internal or interagency report (NISTIR) 8269 (draft), taxonomy and terminology of... This NIST inter-agency/internal report (NISTIR) aims to move towards ensuring

Application of adversarial examples

Now that we have a good understanding of what adversarial attacks are and how they work, please allow me to zoom in temporarily.

Let's put the technical factor in context.

If the real world is released, what is the danger of adversarial examples?

  1. Autopilotcar:In addition to the stop sign example shown earlier, the autonomous driving system can also move from a few tapes strategically placed on the ground into the wrong lane or drive in the opposite direction.
  2. Medical diagnosis:Benign tumors may be misclassified as malignant tumors, leading to unnecessary treatment for patients.
  3. face recognition:These people do not exceed a pair of glasses (the price is only $0.22), so they tricked the facial recognition system to recognize them as celebrities.Even the FBI's facial recognition database no longer seems inevitable.
Picture release

4.Military strike:As AI algorithms are increasingly integrated into military defense systems, adversarial attacks pose a very obvious threat to national security itself.What if a strike is launched against the intended target?

5.Voice command:More and more Alexas and Echos enter the family.Pure voice messages can send a "mute" command to these virtual assistants, thereby disabling the alarm and opening the door.

um, yes.This is very, very worrying.

Fortunately, this does not mean that it is time to give up our precious algorithms.Research is ongoing on defense mechanisms that can protect our models from these hacker attacks.

Defense against attack

Confrontation trainingIs the most common form of defense.

It involvesPre-generate adversarial examples and teach our model to make it match the training phasecorrectOutput match.It's like strengthening the model's immune system to prepare for the model's attack before it happens.

Although this is an intuitive solution, it is definitely not perfect.Not only is it very cumbersome, but it is almost never foolproof.A large number of adversarial examples must be generated, which is computationally expensive.Not to mention that the model is still defenseless against any untrained examples.

As leading machine learning researcher Ian Goodfellow puts it,It's like "playplayw mouse game; It may close some loopholes, but open others. "

Okay, so... what else can we do?

Optic neuroscience as inspiration

This is where it becomes more interesting.

As you read this article, you may wonder: Why are there adversarial examples in the first place?When machine learning models cannot do it, why are our eyes so good at observing tiny perturbations in images?

Some researchers believe that the model’s sensitivity to adversarial examples is not “wrong” but ratherHumans and algorithms see the worldOfFundamentally different wayThe natural result of.

In other words, although noise may seem insignificant to us, they are important functions that machine learning models can use because they process information with higher complexity.

For more information on this theory, I strongly recommend that you check the following article.The adversarial example is not an error, but a function to read the paper download data set. In the past few years, there have been some adversarial examples-or the input content is slightly

If there are adversarial examples due to major differences in the way the human brain and the model process data, do weable to pass Make machine learning models more like the brainTo solve the problem?

It turns out that this is the exact problem solved by MIT-IBM Watson AI Lab.Specifically, the researchers hope to passAdd toImitating the mammalian visual cortexelement,Make convolutional neural networks (algorithms that process visual data) more robust.

And it works!This model is called VOneBlock, and its performance is better than the latest existing algorithms.

Bridging the gap between neuroscience and AI-more specifically,Integrate neuroscience discoveries into the development of machine learning model architecture-It is an exciting research field.

Fortunately, we barely scratched the surface.There is still much hope for a world where we can’t wait to have smart homes, self-driving cars, faster and more accurate medical diagnosis, easier access to education and mental health resources, etc.

More powerful machine learning algorithms are under development.

Important points

  • An adversarial attack involves providing inputs to a machine learning model, and these inputs purposefully indicate incorrect output, which is essentially "spoofing" the model.
  • The loss function represents the distance between the output of the model and the true output.
  • The machine learning model is trained by updating its internal parameters while minimizing the loss function.
  • Create adversarial examples by updating the input while maximizing the loss function.
  • Different types of adversarial attacks can be classified based on the attacker's knowledge and intentions and the location of the attack.
  • Adversarial attacks can have catastrophic consequences for audiovisual systems used in autonomous vehicles, facial recognition, and voice assistants.
  • Adversarial attacks can be defended through adversarial training.
  • Recent studies have shown that by deriving inspiration from neuroscience, machine learning models can be made more robust against adversarial examples.