BagAnything: Improve outcomes by bagging multiple models

Description

In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.  This can include bootstrap aggregating (bagging) of multiple models training using the same methods on different data subsets, or bagging different methods on the same (or different) sets.  Examples of ensemble learning in action include the highlight successful random forest method, which improves upon decision trees, but the same approach can apply to any base regressor or classifier estimator.  There are often advantages to mixing methods, and to training on different subsets of features to generate feature importance profiles. In this project you will develop a master python module to enable researchers to undertake ensemble learning based on the bagging of any regressor or classifier, and/or subset of features.  The module will be parallel, and compatible with the sklearn platform, and thoroughly tested on an extensive selection of models to demonstrate utility. By using the sklearn template, the expectation is that the “BagAnything” module will be submitted for consideration by the developers for inclusion in the platform.  The data sets will be provided.

Goals

To implement and test a new parallel “BagAnything” python module for bagging of existing regressors and classifiers, and the generation of feature importance profiles from statistical sampling of feature sets.

Requirements

Python programming and experience in data science and machine learning is essential (such as COMP3720, COMP4660, COMP4670, COMP6670, COMP8420).  Familiarity with platforms such as scikit-learn, Pytorch, Tensorflow and Keras is desirable.

Background Literature

Breiman, Leo (1996). "Bagging predictors". Machine Learning. 24 (2): 123–140.

Gain

This is a 24cp (2 semester) project.

Keywords

machine learning, materials informatics

Updated:  10 August 2021/Responsible Officer:  Dean, CECS/Page Contact:  CECS Marketing