High and Low Dimensional Relationships in Machine Learning

People

External Member

Tommy Liu, Primary Supervisor

Description

The transformation of data from low to high dimensions are an essential part of Data Analytics and Machine Learning as a whole. Popular examples of these transformations involve Matrix Factorisiation (i.e. PCA, KPCA), Graph Embeddings (i.e TSNE, UMAP), and many more. 
 
This project seeks to draw connections between transformed and untransformed data so that insights from transformed data may be used for insights into the untransformed versions. This will be particularly useful in very high dimensional data where it may be difficult to build the desired models. It would be useful if we can get similar insights into the data from a lower dimensional transformed version of the data. Alternatively insights into what the relative effects of the data transformation on the analysis outcomes would also be desired.
 

Goals

  • Develop a principled approach to determine the choice of dimension transformation and the effects they may have upon a model. 
  • Develop one or more schemes to transfer effects from one dimension (i.e. feature importance) to another based on the transformation scheme.
  • Assess the quality of these insights upon several real-world datasets
  • Implement codes to carry out the above tasks

Requirements

  • Background and experience in basic Machine Learning (i.e. COMP3670/4670/4660/4650, STAT3040/4040) is required.
  • Understanding of the fundamentals of Linear Algebra (i.e MATH1014/1115/1116/+ or equivalents) is required. 
  • Experience with Python/R/Matlab is strongly desirable

Background Literature

  • Transferring information between dimensions: doi.org/10.1088/2632-2153/ac0167
  • Generalising dimension reduction/matrix factorisiation: doi.org/10.48550/arXiv.1410.0342
  • Explaning the effects that dimension reduction may have: doi.org/10.1016/j.eswa.2021.115020
 

Gain

  • Experience in Dimension Reduction/Expansion - a vital step in many Data Analysis workflows
  • Possible publications in an unexplored domain
  • Coding Experience with Machine Learning Libraries

Keywords

Dimension Reduction/Augmentation, Machine Learning, Data Science/Analytics

Updated:  10 August 2021/Responsible Officer:  Dean, CECS/Page Contact:  CECS Marketing