As NLP (Natural language processing) tools are expanded to include new languages one of the big bottlenecks is labelled data availability. This issue is particularly acute for low-resource languages. So the question of annotation detail and quality is important. How much detail is needed for supervised learning? Is there a minimum number of labels to capture linguistic patterns?
In this project, you will explore the importance of labels/tags for word-level modelling (morphology and phonology) by performing an ablation study. You will be training several ML (machine learning) models for word-level phenomenon and contextualise your study findings in the body of existing literature. You may even choose to utilise information theoretic metrics to quantify the informativeness of each tag.
- Experience with Python.
- Strong interest and skills in linguistics, NLP, language
- Experience in NLP/computational linguistic experience is preferable.
- Completed coursework in Document Analysis (COMP4650) , machine learning, AI, or data science.
machine learning, natural language processing, computational linguistics, language