Software Defect Prediction Using AWEIG+ADACOST Bayesian Algorithm for Handling High Dimensional Data and Class Imbalance Problem

  • Joko Suntoro Universitas Semarang
  • Febrian Wahyu Christanto Universitas Semarang
  • Henny Indriyawati Universitas Semarang
Keywords: Software Defect, Prediction, Class Imbalanced, High Dimensional Data


The most important part in software engineering is a software defect prediction. Software defect prediction is defined as a software prediction process from errors, failures, and system errors. Machine learning methods are used by researchers to predict software defects including estimation, association, classification, clustering, and datasets analysis. Datasets of NASA Metrics Data Program (NASA MDP) is one of the metric software that researchers use to predict software defects. NASA MDP datasets contain unbalanced classes and high dimensional data, so they will affect the classification evaluation results to be low. In this research, data with unbalanced classes will be solved by the AdaCost method and high dimensional data will be handled with the Average Weight Information Gain (AWEIG) method, while the classification method that will be used is the Naïve Bayes algorithm. The proposed method is named AWEIG + AdaCost Bayesian. In this experiment, the AWEIG + AdaCost Bayesian algorithm is compared to the Naïve Bayesian algorithm. The results showed the mean of Area Under the Curve (AUC) algorithm AWEIG + AdaCost Bayesian yields better than just a Naïve Bayes algorithm with respectively mean of AUC values are 0.752 and 0.696.


Download data is not yet available.

Author Biographies

Joko Suntoro, Universitas Semarang

Program Studi Teknik Informatika

Febrian Wahyu Christanto, Universitas Semarang

Program Studi Teknik Informatika

Henny Indriyawati, Universitas Semarang

Program Studi Sistem Informasi