Ablation study: What level of linguistic detail is needed for word-level modelling?
People
Supervisory Chair
External Member

Description
As NLP (Natural language processing) tools are expanded to include new languages one of the big bottlenecks is labelled data availability. This issue is particularly acute for low-resource languages. So the question of annotation detail and quality is important. How much detail is needed for supervised learning? Is there a minimum number of labels to capture linguistic patterns?
Goals
In this project, you will explore the importance of labels/tags for word-level modelling (morphology and phonology) by performing an ablation study. You will be training several ML (machine learning) models for word-level phenomenon and contextualise your study findings in the body of existing literature. You may even choose to utilise information theoretic metrics to quantify the informativeness of each tag.
Requirements
- Experience with Python.
- Strong interest and skills in linguistics, NLP, language
- Experience in NLP/computational linguistic experience is preferable.
- Completed coursework in Document Analysis (COMP4650) , machine learning, AI, or data science.
Background Literature
Keywords
machine learning, natural language processing, computational linguistics, language