This article is transferred from the public number AI technology review,Original address

Quick App is a short video social platform for ordinary people to record and share life. It is understood that in 2018, the daily life of fast hands has been growing steadily. As of 2018 in 12, the fast-handed users have more than 1.6 billion-day users, 3 billion-month users, and daily short videos over 1500.

With such a large number of users, there are a lot of interesting content generated every day. Fast traffic does not tend to be red, no traffic tilt, and each video has an exposure opportunity, so many users are more "grounded." In order to reduce the threshold of shooting video, users who assist these "grounded gas" are better at content production, and a lot of AI technology is used on the Quick App.

At present, the special effects of the fast hand have beauty makeup effects, dance machine games, rain control effects, AR special effects, hair dye effects, background replacement and magic expressions. These special effects use artificial intelligence techniques such as face key points, human key point recognition, gesture recognition, visual inertia odometer, hair segmentation, and background segmentation.

Among these special effects, the "Memo face Kmoji" magic expression is a very interesting gameplay, which was launched at the end of 12 last year. Through this function, the user can use the camera to shoot and generate his own exclusive face AR virtual image, and at the same time can capture the user's expression, and the fine movements such as blinking, opening the mouth, raising the eyebrows, and sticking the tongue can be accurately restored. This is the first time that a short video platform has implemented a user-defined AR avatar for shooting.

So, what AI technologies are used behind the magical expression of "Moe Kmoji"? The fast team shared their technical solutions with us.

Detailed explanation of the artificial intelligence technology solution behind Mengmo Kmoji

Face attributes and expression recognition based on 3D analysis and 2D information fusion

The basis for the implementation of the personalized facial system is face attributes and expression recognition, which requires 3D analysis and integration with 2D information.

For image information, the fast-handed technology team will use 3D reconstruction technology to restore the 3D structure, and organically integrate with 2D information, and based on these reconstruction, analysis and fusion, the analysis of face attributes, analysis of face features from various dimensions, Generate personalized avatars. At the same time, facial expression analysis is also performed in real time to drive the generated avatar.

On this basis, the fast-handling technical team will also use the human body to carry out relevant analysis, such as hair, shoulders, etc., to lay the foundation for the integration with the real scene, and to personalize the activity through the self-developed mobile-side realistic rendering engine. The face is presented to the user in real time.

3D face reconstruction

3D face reconstruction is a very important part of the entire system. The fast hand technical team collected three-dimensional data of tens of thousands of faces, including various age groups, races, face types, etc., as well as various facial expressions corresponding to each individual, thus establishing a three-dimensional image covering almost all face space and expression space. Face database. Through this database, you can model any expression of any face. The fast-handed technical team developed a face-to-face technology to reconstruct the three-dimensional face of each individual's various expressions by portraying facial expression changes through more than a hundred key points. On the other hand, through efficient neural network technology, 3D face reconstruction can be guaranteed to run in real time on low-performance phones.

Face attribute perception

In terms of face attribute perception, the fast-handed technical team uses neural networks to perceive the fine-grained attributes of the face, including gender, age, skin color, face type, eyes, and fine-grained information. Compared with similar products, automatic face customization is possible. At the same time, the use of massive face data, multi-task collaborative learning, capture the subtle features of the face. The distinction between fine-grained attributes is very difficult. Some problems are difficult to distinguish even in the human eye. For this reason, the fast-handed technical team has done a lot of elaborate design, combining techniques such as classification/regression/segmentation to improve the accuracy of automatic pinching.

Facial expression recognition

Facial expressions are complex and subtle information, and people's perception of expressions is particularly sensitive. It is difficult to achieve image recognition by allowing the machine to recognize subtle/exaggerated/flexible/stable facial expression signals.

The fast hand technical team models and solves the problem through 2D's RGB visual information, obtains face key points and real-time reconstructed 3D models, and models the 1D, 2D and 3D three different modal information to solve the problem. The expression of the face drives the avatar to do all kinds of realistic movements. At the same time, thanks to the quantification of the deep neural network model, the solution can be adapted to any model by compressing and accelerating the problem of mobile phone performance.

High quality rendering

The final rendering of the germination effect is inseparable from rendering. In order to achieve high-quality rendering, the fast-handling technology team uses advanced PBR technology to achieve PC-game quality on the mobile side. At the same time, relying on deep AI technology accumulation, Meng noodle The material that is most suitable for the user can be intelligently selected according to the external environment and the image characteristics of the user to achieve an optimal rendering effect.

In order to achieve a more realistic experience, the fast-handed technical team introduced a physics engine to achieve the movement of soft bodies such as hair and cloth. In order for the user to get the best experience, the rendering engine will choose the appropriate rendering quality for different models.

Mobile prediction model optimization

In order to make the AI ​​model run smoothly on the mobile phone side, the fast hand technical team mainly made three optimizations:

  1. Firstly, in the image preprocessing, the various preprocessing operations of the image are combined, and the image memory involved in the preprocessing is uniformly distributed and recovered to reduce the consumption of memory resources and improve the efficiency of allocation and use;
  2. Secondly, make full use of NEON acceleration and Apple's own accelerate acceleration, so that the entire runtime only takes up 2M space;
  3. Finally, under the premise of ensuring the prediction accuracy, the local INT8 quantization is performed on the AI ​​model. After optimization, the running speed can be increased by 1 times or more, and the occupied space of the AI ​​prediction model is also compressed to nearly one quarter.

The above is the AI ​​technology used in the magical expression of "Moe Kmoji". In addition to the avatar, the fast-handed technology group has also expanded its application based on the system used by Mengmen: "Portrait 3D Lighting" and "Other You in the World". Lei believes that the "other you in the world" function is particularly creative. By analyzing the user's face data, it is very important to find out which of the fast-handed users looks like the current user. At the fast-hand headquarters, the Lei Feng network editors experienced the scene. The people who found out are very similar in appearance to their facial features, hairstyles and faces, and even "other me in the world." I look forward to the rapid development of AI technology, creating more fun features for us.

Easyai public number