The Role of NLP in Coreference Resolution in Sindhi Text
Keywords:
Natural language prodessing, coreference, inflected language, machine translationAbstract
Finding terms in a text that relates to the same thing is a significant difficulty in natural language processing (NLP). We call this procedure "coreference resolution." This task is crucial for many NLP applications, such as information extraction, text summarization, and machine translation. Even though coreference resolution has been thoroughly explored in English and other commonly used languages, the difficulties presented by the Arabic language call for unique strategies catered to its unique linguistic and grammatical traits. Sindhi is a highly inflected language with a rich derivational and inflectional morphology system, flexible word order, and intricate agreement patterns. These linguistic features introduce complexities that impact traditional coreference resolution techniques. Additionally, Arabic exhibits variations across dialects, further complicating the task due to differences in syntactic structures and lexical choices.
References
K. Yin, K. DeHaan, and M. Alikhani, “Conference on Empirical Methods in Natural Language Processing.,” Signed coreference resolution. In Proceedings of the 2021, pp. 4950–4961, 2021.
A. Khan and S. Dasgupta, “Syntax and Semantics in Sindhi Coreference Resolution.,” Journal of Linguistic Computing, vol. 34, no. 2, pp. 211–228, 2024.
A. Gupta and P. Jain, “Coreference challenges in Sindhi narratives.,” In Proceedings of the International Conference on Natural Language Processing (ICON), pp. 150–155, 2022.
A. Gupta and P. Jain, “Exploring Coreference Challenges in Sindhi Narratives.,” International Journal of Computational Linguistics and Applications, vol. 12, no. 3, pp. 45–58, 2022.
R. Liu, R. Mao, A. T. Luu, and E. Cambria, “A brief survey on recent advances in coreference resolution.,” Artif Intell Rev, pp. 1–43, 2023, doi: https://doi.org/10.1007/s10462-023-10506-3.
R. Singh et al., “Cross-Linguistic Studies in Coreference Resolution: Insights for Sindhi.,” Computational Linguistics Journal, vol. 45, no. 4, pp. 567–580, 2023.
S. Patel and R. Mehta, “Adapting BERT for Low-Resource Languages: A Case Study on Sindhi,” Proceedings of the Annual Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 123–134, 2023.
M. Jacobsen, M. H. Sørensen, and L. Derczynski, “Optimal size-performance tradeoffs: Weighing pos tagger models,” arXiv preprint arXiv:2104.07951., 2021.
J., Devlin, M. W., Chang, K., Lee, and K. Toutanova, “"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.,” Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), pp. 4171–4186, 2018.
SindhiNLP Consortium, “Sindhi Coreference Corpus: Annotated Dataset for Coreference Resolution in Sindhi.,” Sindhi NLP, 2023.
A. B. Sindhi and C. Das, “Annotated Corpora for Coreference Resolution in Sindhi: Challenges and Opportunities.,” Journal of Language Resources and Evaluation, vol. 28, no. 1, pp. 89–104, 2022.
J. A. Mahar and G. Q. Memon, “Sindhi part of speech tagging system using wordnet.,” International Journal of Computer Theory and Engineering, vol. 2, no. 4, p. 53, 2010.
A. H. Aliwy, “Arabic morphosyntactic raw text part of speech tagging system.,” 2013.
H. H. Mohammadi, A. Talebpour, A. M. Aznaveh, and S. Yazdani, “Review of coreference resolution in English and Persian,” arXiv preprint arXiv:2211.04428., 2022.
M. Leghari and M. Rahman, “Towards Transliteration between Sindhi Scripts Using Roman Script. Mehwish Leghari & Mutee U Rahman (2015). Towards Transliteration between Sindhi Scripts Using Roman Script,” Linguistics and Literature Review, vol. 1, no. 2, pp. 95–104, 2015.
I. A. Ismaili, Z. Bhatti, W. J. Soomro, and D. N. Hakro, “Word segmentation model for Sindhi text.,” American Journal of Computing Research Repository, vol. 2, no. 1, pp. 1–7, 2014.
Downloads
Published
Issue
Section
License
Copyright (c) 2023 ILMA University
This work is licensed under a Creative Commons Attribution 4.0 International License.