BagAnything: Improve outcomes by bagging multiple models
People
Supervisor
Description
In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. This can include bootstrap aggregating (bagging) of multiple models training using the same methods on different data subsets, or bagging different methods on the same (or different) sets. Examples of ensemble learning in action include the highlight successful random forest method, which improves upon decision trees, but the same approach can apply to any base regressor or classifier estimator. There are often advantages to mixing methods, and to training on different subsets of features to generate feature importance profiles. In this project you will develop a master python module to enable researchers to undertake ensemble learning based on the bagging of any regressor or classifier, and/or subset of features. The module will be parallel, and compatible with the sklearn platform, and thoroughly tested on an extensive selection of models to demonstrate utility. By using the sklearn template, the expectation is that the “BagAnything” module will be submitted for consideration by the developers for inclusion in the platform. The data sets will be provided.
Goals
To implement and test a new parallel “BagAnything” python module for bagging of existing regressors and classifiers, and the generation of feature importance profiles from statistical sampling of feature sets.
Requirements
Python programming and experience in data science and machine learning is essential (such as COMP3720, COMP4660, COMP4670, COMP6670, COMP8420). Familiarity with platforms such as scikit-learn, Pytorch, Tensorflow and Keras is desirable.
Background Literature
Breiman, Leo (1996). "Bagging predictors". Machine Learning. 24 (2): 123–140.
Gain
This is a 24cp (2 semester) project.
Keywords
machine learning, materials informatics