Comparing DL and BERT Methods on Commits Sentiment Analysis




Developers must be able to constantly learn new technologies, adapt to new environments, and overcome challenges when learning and practicing their craft. However, failure to overcome obstacles in these situations can introduce a sense of mounting frustration in developers that can negatively impact learning outcomes. Prior research (by other authors) using bio-metric sensors demonstrated a correlation between bugs (and how long developers are "stuck" in a bug) with feelings and expressions of frustration, anger and toxicity.

Version Control Systems (VCS) are an essential aspect of implementing software, both commercial and Open-Source (OSS). Given that OSS is community-driven, commit messages are even more relevant as they are used to communicate changes, kickstart discussions between developers, document refactoring activities, and identify security issues. Thus, commit messages are often used as developer-centred documentation, highlighting multiple changes such as work in progress and even actionable insights. Also, because of how VCS is used, commit messages are summaries of a group of changes. 
In a prior work, we analysed 2.1M commits of software projects in three different programming languages (Java, Python, C) to detect expressions of frustration using traditional machine learning models. However, regardless of the diversity of the training sample (over 12K manually classified commits), traditional machine learning models are not able to accurately detect frustration due to the nuances of the language--namely, expressions of sarcasm, frustration directed to the self (personal expressions), typos (better <> bitter), and so on.
About this project: You will work with the existing, already cleaned up datasets to test a multiple deep learning classifiers (CNN, LSTM, bi-LSTM, RNN), and pre-trained models like BERTs (AlBERT, ROBERTa, DisTILBERT), and perform an extensive cross-prorgramming language, and cross-classifier analysis. You will be expected to calculate word embeddings, use F1 measures, compare to manually-validated datasets, and produce an analysis of specific cases leading to type-I and type-II errors.
  • Note that you will be given an existing dataset, cleaned up and organised.
  • The datasets are large. Your computer should be able to handle them.
  • You will be working with Python, Keras + Tensorflow implementations, plus the BERTs' own implementations.



This project has been completed in S2, 2022.


Te student must have experience with:

  • Deep Learning implemented in Python with Keras and Tensorflow
  • Setup of CNN, LSTM, BERT models
  • Setup of WordEmbeddings
  • Ability/hardware to handle a very large dataset (2.1 million commits)
  • Demonstrated academic writing skills.
  • Excellent attention to details to compare results accross models and across programming languages.



  • Empirical Software Engineering and Natural Language Processing
  • Mining Software Repositories with Cross-Programming Language Comparison
  • Investigation of deep learning and BERT methods
  • Deel Learning
  • BERT, pre-trained models

Updated:  10 August 2021/Responsible Officer:  Dean, CECS/Page Contact:  CECS Marketing