The Neonates' Body Length and Body Weight in Two Different Periods of Time

Agron M Rexhepi* and Behlul Brestovci

Acta Scientific Medical Sciences (ASMS)(ISSN: 2582-0931)

Research Article Volume 6 Issue 2

Assessing the Impact of Unbalance in Data on Predicting Breast Cancer Occurrence Using Machine Learning Models

Shuning Yin, Raji Sundararajan* and Gaurav Nanda

School of Engineering Technology, Purdue University, West Lafayette, USA

*Corresponding Author: Raji Sundararajan, School of Engineering Technology, Purdue University, West Lafayette, USA.

Received: December 03, 2021; Published: January 25, 2022

Reprints View PDF Related Articles

Abstract

With over 2 million new cases each year with over 600,000 deaths, breast cancer is the most common cancer among women all over the world. Early detection and treatment can save 1000s of lives. To achieve this goal, we used the machine learning library, WEKA and the Breast Cancer Surveillance Consortium (BCSC) dataset with 154,899 screening records. This dataset has twelve variables, and we considered the variable “breast_cancer_history”, as the main variable to be predicted. The various machine learning (ML) classifiers examined include Naïve Bayes, Logistic Regression, Multilayer Perceptron, and Support Vector Machine with five different test conditions. Since the dataset had less cancer cases (“Class 1”), compared to non-cancer (heavily unbalanced with very high percentage of “Class 0”) cases, we examined the prediction performance of these algorithms on “balanced” and “unbalanced” datasets. Stratified sampling method was used to create unbalanced and balanced datasets. The unbalanced dataset included all the original screening data (154,899 cases) and the balanced dataset included the same number of cases for both class 0 and class 1 (13,279 each). Of the four ML classifiers, Multilayer Perceptron had the best predicting performance for both unbalanced and balanced dataset. Overall, the balanced dataset had better prediction results for all four classifiers than the unbalanced dataset. For medical decision support purposes, we can use the prediction outputs from the ML models, trained on both balanced and unbalanced training data.

Keywords: Breast Cancer; Machine Learning; Data Analysis; WEKA; Prediction

References

Citation

Citation: Shuning Yin., et al. “Assessing the Impact of Unbalance in Data on Predicting Breast Cancer Occurrence Using Machine Learning Models”.Acta Scientific Medical Sciences 6.2 (2022): 159-170.

Copyright

Copyright: © 2022 Shuning Yin., et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.