Abstract
The problem of cheating in handwritten academic essays has become more significant over last several years. One of the cheating cases is submitting the same paper, photographed in different environment (for example, from another angle, in different light or in lower quality), or changed by means of automatic augmentation. The existing methods are not designed to work on large collections of handwritten documents. The proposed approach consists of three stages. The first stage is embedding generation, the second one is finding closest candidates in the collection of handwritten documents and the final one is similarity estimation between query image and each of candidates obtained at previous step. Our solution showed Recall@1 80% and 59% with FPR 4.8% and 5.5% on Synthetic and Real data respectively. The search latency is 5.5 seconds per query for the collection of 10 000 images. The results showed that the developed method is robust enough to work on large collections of handwritten documents.