Recently released by Google researchers and atRecSys 2019Paper published on Copenhagen (Denmark, Denmark) [ 1 ], providing a video platform about themYoutubeHow to recommend which videos to watch insights. After reading this article, I will try to summarize my findings.


When a user watches a video on Youtube, a list of recommended videos that the user may like in a particular order will be displayed. This article focuses on two goals:
1) needs optimizationDifferent goals; the exact target function has not been defined, but the goal is dividedParticipationGoal (clicks, time spent) andSatisfactionGoal (like rate, dismissal rate).
2) reduces the implicit introduction of users due to locationChoice biasEven lower-ranking videos may lead to higher engagement and satisfaction, but because of this location, users are more likely to click on the first recommendation.

How to effectively learn to reduce this prejudice is an open question.


Complete architecture of the model
Figure 1: The complete architecture of the model

The model described in this article focuses on two main goals. Wide and deep [ 2The model architecture used combines the functions of a wide model linear model (memory) and a deep neural network (generalization). A broad and in-depth model will generate forecasts for each defined goal (participation and satisfaction). The goal is classified as a binary classification problem (ie, whether you like video) and a regression problem (ie, the level of the video). A separate ranking model has been added at the top of this model. This is just a weighted combination of output vectors as different prediction targets. Manually adjust these weights to achieve optimal performance for different targets. Advanced methods, such as paired or paired methods, have been proposed to improve performance, but production has not been achieved due to increased computation time.

Figure 2: Replace the shared underlying with MMoE
Figure 2: Replace the shared underlying with MMoE

The deep part of the wide and deep model is the multi-door expert mix (MMoE) [ 3The model was adopted. Enter the current video's features (content, title, subject, upload time, etc.) and the user being watched (time, user profile, etc.). The concepts behind the MMoE model are based on the weights that effectively share different goals. The underlying layer of sharing is divided into multiple experts, all of which are used to predict different goals. Each target has a door function. This strobe function is a softmax function with input from the original shared layer and different expert layers. This softmax function will determine which expert layers are important for different goals. As shown in 3, different experts are more important for different goals.

Figure 3: Experts' utilization of multiple tasks on YouTube
Figure 3: Experts' utilization of multiple tasks on YouTube

The main part of the model focuses on reducing the system selection bias introduced by the recommended video location. This wider part is called the "shallow tower" and can be a simple linear model that uses simple features such as click-to-position video and the device used to watch the video. The output of the shallow tower is combined with the output of the MMoE model, which is a key component of the Wide&Deep model architecture. In this way, the model will focus more on the location of the video. During training, the 10% dropout rate is used to prevent positional features from becoming too important in the model. If you don't use the Wide&Deep architecture, but instead add the location as a single feature, the model may not care about that feature at all.


The results of this paper show that replacing the shared underlying with MMoE can improve the performance of the model's participation (time spent watching the recommended video) and satisfaction (survey response). The number of MMoE experts and the increase in multiplication operations further improve the performance of the model. This number cannot be increased in the live settings due to computational limitations.

Further results indicate that the degree of bonding can be improved by reducing the selection bias caused by the use of shallow towers. This is a significant improvement over adding input only in the MMoE model.

Interesting words

  • Despite Google's strong computing infrastructure, caution is still needed in terms of training and service costs.
  • By using a broad and in-depth model, you can design a network that pre-defines features that are important to you.
  • The MMoE model is very effective when you need a model with multiple goals.
  • Even with a powerful and complex model architecture, humans are manually adjusting the weight of the last layer to determine the actual ranking based on different objective predictions.

About the author

Tim ElfrinkDutch Data Science ConsultingVantage AIData scientist. If you need help creating a machine learning model for your data, feel free to passInfo@vantage-ai.comcontact us.


[1] original paper:https : // Id = 3346997
[2] How deep learning works:https : //
[3] MMoE with video explains:https :// Multi-door learning

This article is transferred from medium,Original address