Interpretability remains one of the biggest challenges of modern deep learning applications. Recent advances in computational models and deep learning research have enabled the creation of highly complex models, including thousands of hidden layers and tens of millions of neurons. While creating incredibly advanced deep neural network models is relatively straightforward, understanding how these models create and use knowledge remains a challenge. Recently, researchers at the Google Brain teamConcept activation vector(CAV)New methodIt provides a new perspective for the interpretability of deep learning models.

Interpretability and accuracy

To understand CAV technology, it is important to understand the nature of the interpretability challenge in deep learning models. In the current generation of deep learning technology, there is a permanent friction between the accuracy of the model and our ability to interpret its knowledge. Interpretability-accuracy friction is the friction between being able to complete complex knowledge tasks and understanding how these tasks are completed. Knowledge and control, performance and responsibility, efficiency and simplicity...choose your favorite dilemma, all of which can be explained by balancing the trade-off between accuracy and interpretability.

Are you concerned about getting the best results, or are you concerned about how to produce these results? This is a question that data scientists need to answer in every deep learning scenario. Many deep learning techniques are inherently complex, and although they are very accurate in many scenarios, their interpretation is very difficult to understand. If we can draw some of the most famous deep learning models in a chart related to accuracy and interpretability, we will get the following:

The interpretability in the deep learning model is not a single concept and can be seen across multiple levels:

Achieving the interpretability of each layer defined in the above figure requires several basic building blocks.In a recent paperGoogle's researchers outlined some of the basic building blocks that they thought were interpretable.

Google summarized the interpretability principles as follows:

– Understand the role of the hidden layer:Most of the knowledge in the deep learning model is formed in the hidden layer. Understanding the functions of different hidden layers at the macro level is critical to explaining the deep learning model.

– Understand how the node is activated:The key to interpretability is not to understand the function of the individual neurons in the network, but to interconnect the neuron populations that are excited together at the same spatial location. Segmenting the network by interconnecting neuron groups will provide a simpler level of abstraction to understand its functionality.

– Understand how concepts are formed:Understanding the depth of neural network formation, and then combining the individual concepts that can be combined into the final output is another key building block of interpretability.

These principles are the theoretical basis behind Google's new CAV technology.

Concept activation vector

Following the ideas discussed in the previous section, the natural way of interpretability should be to describe the predictions of the deep learning model based on the input characteristics it considers. A typical example is a logistic regression classifier where coefficient weights are often interpreted as the importance of each feature. However, most deep learning models operate on features such as pixel values ​​that do not correspond to advanced concepts that are easily understood by humans. Furthermore, the internal values ​​of the model (eg, neural activation) appear to be incomprehensible. While techniques such as saliency maps are effective in measuring the importance of particular pixel regions, they cannot be associated with higher level concepts.

The core idea behind CAV is to measure the relevance of concepts in model output. The conceptual CAV is simply a vector of values ​​(eg, activation) in the set of examples of the concept. In their paper, the Google research team outlined a new linear interpretable method called Testing with CAV (TCAV) that uses directional derivatives to quantify the sensitivity of model predictions to the underlying advanced concepts of CAV learning. Conceptually, the definition of TCAV has four goals:

– Accessibility:Users rarely need ML expertise.

- custom made:Adapting to any concept (eg, gender) is not limited to the concepts considered during training.

– Plug-in ready:Works without retraining or modifying the ML model.

– Global quantization: A single quantitative measure can be used to explain an entire class or set of examples, not just a single data input.

To achieve the above objectives, the TCAV method is divided into three basic steps:

1) defines related concepts for the model.

2) understands the sensitivity of predictions to these concepts.

3) infers a global quantitative interpretation of the relative importance of each concept to each model's prediction class.

The first step in the TCAV approach is to define the concept of interest (CAV). TCAV accomplishes this by selecting a set of examples that represent the concept or finding a separate data set labeled as a concept. CAV is learned by training a linear classifier to distinguish between the examples of concepts and the activations produced by the examples in any layer.

The second step is to generate a TCAV score that is used to quantify the sensitivity of the prediction to a particular concept. TCAV achieves this by using a directional derivative that measures the sensitivity of the ML prediction to changes in the conceptual direction input of the neural activation layer.

The final step is to assess the global relevance of learning CAV to avoid relying on unrelated CAV. After all, one of the drawbacks of TCAV technology is the possibility to learn meaningless CAV. After all, using a randomly selected set of images will still produce CAV. Testing based on this random concept is unlikely to make sense. To address this challenge, TCAV introduced a statistical significance test that evaluates CAV for a random number of training runs (usually 500 times). The idea is that meaningful concepts should result in consistent TCAV scores in the training run.

TCAV is in action

Compared to other interpretable methods, the Google Brain team conducted several experiments to evaluate the efficiency of TCAV. In one of the most compelling tests, the team used a notable map to try to predict the relevance of the title or image to understand the concept of a taxi. The output of the saliency map is as follows:

Using these images as test data sets, the Google Brain team experimented with 50 people on Amazon Mechanical Turk.Each worker performs a series of six tasks (3 object classes x 2s efficiency graph types), all of which are specific to a single model. The order of tasks is random. In each task, the staff first saw four images and their corresponding saliency masks. They then assessed how important they thought the image was to the model (10 score), how important the title was to the model (10 score), and how confident they were in the answer (5 score). In total, special pilots rated 60 unique images (120 unique saliency maps).

The basic fact of the experiment is that the image concept is more relevant than the title concept. However, when observing the saliency map, one considers the heading concept to be more important (model with 0% noise), or no difference (model with 100% noise). In contrast, the TCAV results correctly indicate that the image concept is more important.

TCAV is one of the most innovative neural network interpretation methods in the past few years. Initial technology codeAvailable on GitHubWe should expect to see some ideas for mainstream deep learning framework adaptation.

This article was transferred from awardsdatascience,Original address