Methods of solving the problem of coreference and searching for noun phrases in natural languages
- Authors: Kozlova А.А.1, Kudinov I.D.1, Lemtyuzhnikova D.V.1
-
Affiliations:
- V.A. Trapeznikov Institute of Control Sciences of RAS
- Issue: No 1 (2025)
- Pages: 149-162
- Section: ARTIFICIAL INTELLIGENCE
- URL: https://rjmseer.com/0002-3388/article/view/684564
- DOI: https://doi.org/10.31857/S0002338825010122
- EDN: https://elibrary.ru/AIJIND
- ID: 684564
Cite item
Abstract
Coreference is a task in the field of natural language processing aimed at linking words and phrases in a text that point to the same extra-linguistic object or referent. It is applicable in text summarization, question answering, information retrieval and dialog systems. In this paper, existing methods for solving the co-referencing problem are dissected and a method based on the application of a two-stage machine learning model is proposed. The language model converts text tokens into vector representations. Then, for each pair of tokens, based on their vector representations, an estimate of the probability of finding these tokens either in one name group or in two coreference name groups is computed. Thus, the method simultaneously searches for name groups and predicts the coreference relation between them.
Full Text

About the authors
А. А. Kozlova
V.A. Trapeznikov Institute of Control Sciences of RAS
Author for correspondence.
Email: sankamoro@mail.ru
Russian Federation, Moscow
I. D. Kudinov
V.A. Trapeznikov Institute of Control Sciences of RAS
Email: ilja@kdsli.ru
Russian Federation, Moscow
D. V. Lemtyuzhnikova
V.A. Trapeznikov Institute of Control Sciences of RAS
Email: darabbt@gmail.com
Russian Federation, Moscow
References
- Гируцкий А.А. Введение в языкознание. Минск: Высш. шк., 2022. ISBN 978-985-06-3430-6.
- Chomsky N. Aspects of the Theory of Syntax. Cambridge: MIT press, 2014. № 11.
- Nivre J., Zeman D., Ginter F., Tyers F. Universal Dependencies // 15th Conf. of the European Chapter of the Association for Computational Linguistics. Valencia, 2017.
- Sukthanker R., Poria S., Cambria E., Thirunavukarasu R. Anaphora and Coreference Resolution: A Review // Information Fusion. 2020. V. 59. P. 139–162; https://doi.org/10.1016/j.inffus.2020.01.010
- Soon W.M., Lim D.C.Y., Ng H.T. A Machine Learning Approach to Coreference Resolution of Noun Phrases // Computational Linguistics. 2001. V. 27. № 4. P. 521–544; https://doi.org/10.1162/089120101753342653
- Toldova S., Ionov M. Coreference Resolution for Russian: The Impact of Semantic Features // Computational Linguistics and Intellectual Technologies. 2017. V. 1. № 16. P. 339–348.
- Haghighi A., Klein D. Simple Coreference Resolution with Rich Syntactic and Semanticfeatures // Conference on Empirical Methods in Natural Language Processing (EMNLP). Singapore, 2009. P. 1152–1161; https://doi.org/10.3115/1699648.1699661
- Le. K., He L., Lewis M., Zettlemoyer L. End-to-end Neural Coreference Resolution // Conference on Empirical Methods in Natural Language Processing (EMNLP). Copenhagen, 2017. P. 188–197; https://doi.org/10.18653/v1/d17-1018
- Hochreiter S., Schmidhuber J. Long Short-Term Memory // Neural Computation. 1997. V. 9. № 8. P. 1735–1780; https://doi.org/10.1162/neco.1997.9.8.1735.
- Olah C. Understanding LSTM Networks. 2015. [Электронный ресурс] URL: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
- Lee K., He L., Zettlemoyer L. Higher-order Coreference Resolution with Coarse-to-fine Inference // Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT). 2018. V. 2. P. 687–692; https://doi.org/10.18653/v1/n18-2108
- Le T.A., Petrov M.A., Kuratov Y.M., Burtsev M.S. Sentence Level Representation and Language Models in the Task of Coreference Resolution for Russian // Computational Linguistics and Intellectual Technologies. 2019. V. 2. № 18. P. 364–373.
- Shen T., Zhou T., Long G., Jiang J., Pan S., Zhang C. Disan: Directional Self-Attention Network for RnN/CNN-free Language Understanding // 32nd AAAI Conference on Artificial Intelligence (AAAI). 2018. P. 5446–5455.
- Peng H., Khashabi D., Roth D. Solving Hard Coreference Problems // Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT). 2015. P. 809–819; https://doi.org/10.3115/v1/n15-1082
- Sysoev A.A. Coreference Resolution in Russian: State-of-the-Art // Approaches Application and Evolvement. 2017. V. 16. P. 327–347.
- Toldova S.Ju., Roytberg A., Ladygina A.A. et al. RU-EVAL-2014: Evaluating Anaphora and Coreference Resolution for Russian // Computational Linguistics and Intellectual Technologies. 2014. № 13. P. 681–694.
- Bogdanov A.V., Dzhumaev S.S., Skorinkin D.A., Starostin A.S. Anaphora Analysis Based on ABBYY Compreno Linguistic Technologies // Computational Linguistics and Intellectual Technologies. 2014; https://doi.org/10.13140/2.1.2600.7688
- Anisimovich K.V., Druzhkin K.Y., Zuev K.A. Syntactic and Semantic Parser Based on ABBYY Compreno Linguistic Technologies // Computational Linguistics and Intellectual Technologies. 2012. V. 11. № 18. P. 90–103.
- Ionov M., Kutuzov A. The Impact of Morphology Processing Quality on Automated Anaphora Resolution for Russian. M., 2014. № 13. P. 232–241.
- Kamenskaya M., Khramoin I., Smirnov I. et al. Data-driven Methods for Anaphora Resolution of Russian Texts // Computational Linguistics and Intellectual Technologies. 2014. P. 241–250.
- Protopopova E.V., Bodrova A.A., Volskaya S.A. et al. Anaphoric Annotation and Corpus-based Anaphora Resolution: An Experiment // Computational Linguistics and Intellectual Technologies. 2014. № 13. P. 562–571.
- Budnikov A.E., Toldova S.Y., Zvereva D.S. et al. Ru-eval-2019: Evaluating Anaphora and Coreference Resolution for Russian // Computational Linguistics and Intellectual Technologies. 2019.
- Vilain M., Burger J.D., Aberdeen J. et al. A Model-Theoretic Coreference Scoring Scheme // Conference on Message Understanding. Columbia: Association for Computational Linguistics, 1995. P. 45–52; https://doi.org/10.3115/1072399
- Bagga A., Baldwin B. Algorithms for Scoring Coreference Chains // The First International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference. Citeseer. 1998. V. 1. P. 563–566.
- Luo X. On Coreference Resolution Performance Metrics // Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language. Vancouver: Association for Computational Linguistics, 2005. P. 25–32; https://doi.org/10.3115/1220575.1220579
- Pradhan S., Moschitti A., Xue N. et al. CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes // Joint Сonference on EMNLP and CoNLL-shared task. Jeju Island, 2012. P. 1–40.
- Moosavi N.S., Strube M. Which Coreference Evaluation Metric Do You Trust? A Proposal for a Link-based Entity Aware Metric // Proc. 54th Annual Meeting of the Association for Computational Linguistics. Berlin, 2016. V. 1. P. 632–642; https://doi.org/10.18653/v1/P16-1060
- Mikolov T., Chen K., Corrado G., Dean J. Efficient Estimation of Word Representations in Vector Space // ArXiv preprint arXiv:1301.3781. 2013.
- Olah C. Understanding LSTM Networks. 2015. [Электронный ресурс] URL: https://colah.github.io/posts/2015-08-Understanding-LSTMs/
- Hochreiter S., Schmidhuber J. Long Short-term Memory // Neural computation. 1997. V. 9. P. 1735–1780; https://doi.org/10.1162/neco.1997.9.8.1735
- Bahdanau D., Cho K., Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate // ArXiv preprint arXiv:1409.0473. 2014.
- Luong M.-T., Pham H., Manning C.D. Effective Approaches to Attention Based Neural Machine Translation // ArXiv preprint arXiv:1508.04025. 2015.
- Abadi M., Agarwal A., Barham et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. [Электронный ресурс] URL: https://www.tensorflow.org/
- Abdaoui A., Pradel C., Sigel G. Load What You Need: Smaller Versions of Mutlilingual BERT // SustaiNLP / EMNLP. ArXiv:2010.05609. 2020.
Supplementary files
