Handwriting input has always been valued by various manufacturers as an important way of interacting with devices. Especially for middle-aged and old friends, it is more like to input by hand instead of keyboard. In addition, for some complex languages, interactive presentation scenarios, trial teaching, etc., handwriting input plays an indispensable role.
In 15, Google introduced handwriting input in 82 language and upgraded to 100 language last year. But with the rapid development of machine learning, researchers are constantly refactoring the past methods to bring users a faster and more accurate experience.
The previous model was based on a manual design approach that cut the input stroke into a single character and understood it with the corresponding decoder. In order to improve the accuracy and speed, the researchers developed an end-to-end handwriting recognition system based on the cyclic neural network, which is analyzed and processed by converting the input stroke into a Bezier sequence.RNNA higher accuracy recognition result is obtained.In this article, the researchers used the Latin alphabet as an example to illustrate the story behind the new handwritten character recognition.
Contacts, curves and cyclic neural networks
Any handwritten character recognition system requires a contact from the finger/input device. The strokes we enter on the screen or on the tablet can be thought of as a series of out-of-contact sequences with timestamps. Considering that the input devices are different in size and resolution, the researchers first normalized the input contact coordinates. The contact sequence is then described using a cubic Bezier curve so that the RNN can better understand the shape of the stroke sequence.
The Bezier curve has a long application history in handwriting recognition. Based on the Bezier curve, the input data can be expressed more continuously, which is more robust to different sampling rates and resolutions. In the Bezier curve representation, each curve can be represented as a polynomial of the starting point, the ending point and two control points, and fewer parameters accurately express the input stroke.
This approach replaces Google's previous split-and-decode based approach, which previously required splitting the input stroke into a single character and then using decoding to find the most likely character. Another advantage of using Bezier curves to express input strokes is that it can more closely represent the input contact sequence, which will facilitate the model's timing dependence of the input from the input. The process of fitting the "go" character with a Bezier curve is shown in the figure above. The original set of input points contains 186 contact coordinates, where the yellow, blue, pink, and green dots in the figure can be used to represent the sequence of four cubic Bezier curves, and the letter o can be orange, The three Bezier curve sequences represented by emerald green and white are described.
Based on the input represented by the Bezier sequence, we need to decode the sequence to get the represented characters. RNN is an efficient way to process sequence input, so the researchers used a multi-layer RNN to decode the sequence data and generate a matrix representing the likelihood of the letter they represent for each input sequence, thus calculating the characters represented by the handwritten strokes. .
In the actual process, the researchers chose a two-way quasi-cyclic neural network as a processing model. The alternating convolution and regression layers in this model have the possibility of parallel processing in theory, and also the weight of the network. The ability to maintain the model in a few cases. Since handwritten character recognition is more often done on mobile devices, small-sized models are the key to maintaining speed.
In order to decode the sequence of curves and identify the corresponding characters, the cyclic neural network generates a decoding matrix that represents the likelihood of the letters. Each column of the matrix represents a Bezier curve, and each row represents the probability of a letter corresponding to the corresponding input curve. In the decoding matrix in the above figure, we can see for each column that it forms the corresponding probability distribution on 26 letters together with the previous sequence. The first to third curve sequences correspond to blank (representing characters that have not yet been recognized, from the CTC algorithm), and when the fourth curve is reached, the network gets a higher probability at the g letter, which means that the RNN The letter g is identified in the previous four curves, and we can see that the position corresponding to the letter o has a higher probability on the eighth curve. The sequence can be processed to decode the curve into corresponding characters.
In addition, there are two interesting phenomena worth noting. For the recognition of the letter g, the activation of the y letter (the penultimate one) in the fourth column is also higher, because g and y look similar. For the recognition of the o letter, the probability of the corresponding o input after each curve input is constantly increasing, which is also consistent with our intuition. The more complete the circle represented by o is, the more likely it is o.
In addition, the researchers also introduced a finite state language model decoder to combine the output of the network, which has a greater possibility of input for some common character combinations, so that the decoded characters can be quickly converted into word output. .
To sum up, the new method is divided into three main steps. First, the contact sequence is converted into a compact Bezier curve, and then decoded by QRNN. Finally, the corresponding word is combined by the character result. Although it seems simple, this method reduces the error rate of recognition compared to the original method.20%-40%!
About model training
The training of the model is divided into two parts, one is based on the connectionist temporal classification (CTC) loss training, and the other is based on Bayesian optimization decoder tuning. The training data mainly includes three data sets, namely IAM-OnDB online handwritten character data set, IBM-UB-1 English data set, and ICDAR 2013 Chinese data set Chinese Isolated Characters, Vietnamese data set in ICFHR2018. For a detailed data set link, please refer to the end of the article.
For handwriting recognition, the guarantee that the accurate model has no speed is unbearable for the user. In order to reduce the delay of handwriting input, the researchers implemented the model on tensorflow Lite, and successfully reduced the size of the model and the final application installation package through a series of methods such as quantization. A sophisticated model with a small implementation will make it easier for mobile phones to understand our handwriting. If you want to know more details, pleaseReference original
This article is transferred from the public number to the door venture,Original address