Methods of solving the problem of coreference and searching for noun phrases in natural languages

А. А. Kozlova; Козлова А. А.; I. D. Kudinov; Кудинов И. Д.; D. V. Lemtyuzhnikova; Лемтюжникова Д. В.

doi:10.31857/S0002338825010122

Methods of solving the problem of coreference and searching for noun phrases in natural languages

作者: Kozlova А.А.¹, Kudinov I.D.¹, Lemtyuzhnikova D.V.¹
隶属关系:
1. V.A. Trapeznikov Institute of Control Sciences of RAS
期: 编号 1 (2025)
页面: 149-162
栏目: ARTIFICIAL INTELLIGENCE
URL: https://rjmseer.com/0002-3388/article/view/684564
DOI: https://doi.org/10.31857/S0002338825010122
EDN: https://elibrary.ru/AIJIND
ID: 684564

如何引用文章

全文:

开放存取

##reader.subscriptionAccessGranted##
受限制的访问

订阅或者付费存取

详细
全文:
作者简介
参考
补充文件
统计

详细

Coreference is a task in the field of natural language processing aimed at linking words and phrases in a text that point to the same extra-linguistic object or referent. It is applicable in text summarization, question answering, information retrieval and dialog systems. In this paper, existing methods for solving the co-referencing problem are dissected and a method based on the application of a two-stage machine learning model is proposed. The language model converts text tokens into vector representations. Then, for each pair of tokens, based on their vector representations, an estimate of the probability of finding these tokens either in one name group or in two coreference name groups is computed. Thus, the method simultaneously searches for name groups and predicts the coreference relation between them.

关键词

coreference, natural language processing, machine learning, language models

全文:

参考

Гируцкий А.А. Введение в языкознание. Минск: Высш. шк., 2022. ISBN 978-985-06-3430-6.
Chomsky N. Aspects of the Theory of Syntax. Cambridge: MIT press, 2014. № 11.
Nivre J., Zeman D., Ginter F., Tyers F. Universal Dependencies // 15th Conf. of the European Chapter of the Association for Computational Linguistics. Valencia, 2017.
Sukthanker R., Poria S., Cambria E., Thirunavukarasu R. Anaphora and Coreference Resolution: A Review // Information Fusion. 2020. V. 59. P. 139–162; https://doi.org/10.1016/j.inffus.2020.01.010
Soon W.M., Lim D.C.Y., Ng H.T. A Machine Learning Approach to Coreference Resolution of Noun Phrases // Computational Linguistics. 2001. V. 27. № 4. P. 521–544; https://doi.org/10.1162/089120101753342653
Toldova S., Ionov M. Coreference Resolution for Russian: The Impact of Semantic Features // Computational Linguistics and Intellectual Technologies. 2017. V. 1. № 16. P. 339–348.
Haghighi A., Klein D. Simple Coreference Resolution with Rich Syntactic and Semanticfeatures // Conference on Empirical Methods in Natural Language Processing (EMNLP). Singapore, 2009. P. 1152–1161; https://doi.org/10.3115/1699648.1699661
Le. K., He L., Lewis M., Zettlemoyer L. End-to-end Neural Coreference Resolution // Conference on Empirical Methods in Natural Language Processing (EMNLP). Copenhagen, 2017. P. 188–197; https://doi.org/10.18653/v1/d17-1018
Hochreiter S., Schmidhuber J. Long Short-Term Memory // Neural Computation. 1997. V. 9. № 8. P. 1735–1780; https://doi.org/10.1162/neco.1997.9.8.1735.
Olah C. Understanding LSTM Networks. 2015. [Электронный ресурс] URL: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Lee K., He L., Zettlemoyer L. Higher-order Coreference Resolution with Coarse-to-fine Inference // Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT). 2018. V. 2. P. 687–692; https://doi.org/10.18653/v1/n18-2108
Le T.A., Petrov M.A., Kuratov Y.M., Burtsev M.S. Sentence Level Representation and Language Models in the Task of Coreference Resolution for Russian // Computational Linguistics and Intellectual Technologies. 2019. V. 2. № 18. P. 364–373.
Shen T., Zhou T., Long G., Jiang J., Pan S., Zhang C. Disan: Directional Self-Attention Network for RnN/CNN-free Language Understanding // 32nd AAAI Conference on Artificial Intelligence (AAAI). 2018. P. 5446–5455.
Peng H., Khashabi D., Roth D. Solving Hard Coreference Problems // Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT). 2015. P. 809–819; https://doi.org/10.3115/v1/n15-1082
Sysoev A.A. Coreference Resolution in Russian: State-of-the-Art // Approaches Application and Evolvement. 2017. V. 16. P. 327–347.
Toldova S.Ju., Roytberg A., Ladygina A.A. et al. RU-EVAL-2014: Evaluating Anaphora and Coreference Resolution for Russian // Computational Linguistics and Intellectual Technologies. 2014. № 13. P. 681–694.
Bogdanov A.V., Dzhumaev S.S., Skorinkin D.A., Starostin A.S. Anaphora Analysis Based on ABBYY Compreno Linguistic Technologies // Computational Linguistics and Intellectual Technologies. 2014; https://doi.org/10.13140/2.1.2600.7688
Anisimovich K.V., Druzhkin K.Y., Zuev K.A. Syntactic and Semantic Parser Based on ABBYY Compreno Linguistic Technologies // Computational Linguistics and Intellectual Technologies. 2012. V. 11. № 18. P. 90–103.
Ionov M., Kutuzov A. The Impact of Morphology Processing Quality on Automated Anaphora Resolution for Russian. M., 2014. № 13. P. 232–241.
Kamenskaya M., Khramoin I., Smirnov I. et al. Data-driven Methods for Anaphora Resolution of Russian Texts // Computational Linguistics and Intellectual Technologies. 2014. P. 241–250.
Protopopova E.V., Bodrova A.A., Volskaya S.A. et al. Anaphoric Annotation and Corpus-based Anaphora Resolution: An Experiment // Computational Linguistics and Intellectual Technologies. 2014. № 13. P. 562–571.
Budnikov A.E., Toldova S.Y., Zvereva D.S. et al. Ru-eval-2019: Evaluating Anaphora and Coreference Resolution for Russian // Computational Linguistics and Intellectual Technologies. 2019.
Vilain M., Burger J.D., Aberdeen J. et al. A Model-Theoretic Coreference Scoring Scheme // Conference on Message Understanding. Columbia: Association for Computational Linguistics, 1995. P. 45–52; https://doi.org/10.3115/1072399
Bagga A., Baldwin B. Algorithms for Scoring Coreference Chains // The First International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference. Citeseer. 1998. V. 1. P. 563–566.
Luo X. On Coreference Resolution Performance Metrics // Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language. Vancouver: Association for Computational Linguistics, 2005. P. 25–32; https://doi.org/10.3115/1220575.1220579
Pradhan S., Moschitti A., Xue N. et al. CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes // Joint Сonference on EMNLP and CoNLL-shared task. Jeju Island, 2012. P. 1–40.
Moosavi N.S., Strube M. Which Coreference Evaluation Metric Do You Trust? A Proposal for a Link-based Entity Aware Metric // Proc. 54th Annual Meeting of the Association for Computational Linguistics. Berlin, 2016. V. 1. P. 632–642; https://doi.org/10.18653/v1/P16-1060
Mikolov T., Chen K., Corrado G., Dean J. Efficient Estimation of Word Representations in Vector Space // ArXiv preprint arXiv:1301.3781. 2013.
Olah C. Understanding LSTM Networks. 2015. [Электронный ресурс] URL: https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Hochreiter S., Schmidhuber J. Long Short-term Memory // Neural computation. 1997. V. 9. P. 1735–1780; https://doi.org/10.1162/neco.1997.9.8.1735
Bahdanau D., Cho K., Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate // ArXiv preprint arXiv:1409.0473. 2014.
Luong M.-T., Pham H., Manning C.D. Effective Approaches to Attention Based Neural Machine Translation // ArXiv preprint arXiv:1508.04025. 2015.
Abadi M., Agarwal A., Barham et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. [Электронный ресурс] URL: https://www.tensorflow.org/
Abdaoui A., Pradel C., Sigel G. Load What You Need: Smaller Versions of Mutlilingual BERT // SustaiNLP / EMNLP. ArXiv:2010.05609. 2020.