Project Materials




Need help with a related project topic or New topic? Send Us Your Topic 



Chapter One:


1.1 Background Of The Study
Breast cancer is now one of the most common tumours affecting humans, particularly women, and early detection would go a long way towards minimising the damage caused by this cancer to its victims.

Breast cancer causes are numerous, including family history, obesity, hormones, radiation therapy, and even reproductive variables. Every year, one million women are newly diagnosed with breast cancer; according to a World Health Organisation report, half of them die since the cancer is frequently detected late (Aaltonen et al., 1998).

There are two types of breast cancer: malignant and benign. Breast cancer can be classified as malignant or benign by scientifically examining the characteristics of breast tumours, lumps, or other abnormalities detected in the breast.

Cancer in the benign stage has a lower risk and is not life-threatening, but cancer in the malignant stage is life-threatening. Malignant tumours spread to nearby cells and can spread to other areas, however benign masses cannot spread to other tissues and so only expand within the benign mass (Aaltonen et al., 1998; Huang et al., 2017).

To effectively diagnose breast cancer as benign or malignant, researchers used a type of artificial intelligence (AI) known as machine learning. Machine learning techniques are used to create models that accept attributes that qualify a breast cancer case and produce an output label for the type of cancer, label 1 for benign and label 2 for malignant.

Various machine learning models, including Neutral Network, Support Vector Machine (SVM), K Nearest Neighbour (KNN), Decision Tree, Naïve Bayes (NB), and logistic regression (LR), have been employed in the past to classify breast cancer. Accurate classification of breast cancer would result in early discovery, diagnosis, therapy, and, where possible, complete eradication of the cancer.

1.1.1 Data Mining

This might be viewed as “mining knowledge in data” or as the extraction of information from a vast or voluminous dataset. It is the most significant part of machine learning (Kaymak, Helwan, & Uzun, 2017), but pattern recognition is the primary emphasis of data mining (Jothi, Rashid, & Hussain, 2015).

Data mining techniques can be used on medical data sets to trace and predict significant patterns in order to save lives, improve treatment accuracy, lower treatment costs, and reduce human error (Manjusha, Sankaranarayanan, & Seena, 2015).

Data mining techniques include abnormality detection, regression, grouping, summarization, and association rules. Data mining takes several processes to identify relevant patterns, including pre-processing (cleaning, feature extraction, selection, and dimensionality reduction).

ii. Clustering is an unsupervised learning technique that groups a set of related data.

iii. Classification – This is a supervised machine learning technique that requires a data set (training data) in order to build associations between data items. When test data is submitted, it will be classified according to the learned relationship. The aim of this research will be classification.


1.1.2 Classification.

In data mining, classification consists of two processe: first, model training with test data to determine the class label of unknown test instances; second, performance evaluation to check the accuracy of the classifier model, which entails calculating the differences between the classified and actual values for each attribute tuple in the test dataset (Jouni, Issa, Harb, Jacquemod, & Leduc, 2016; Kaymak et al., 2017).

1.2 Statement of the Problem

One of the issues with classification is the adoption of appropriate approaches to fit the model based on the nature of data. The optimum machine learning model to use in the context of dependency among data features, unbalanced data, and sparsely valued data characteristics is still being researched.

1.3 Research Aim and Objectives

The goal is to create a method for identifying breast cancer.

The key aims include studying and applying logistic regression to classify breast cancer.

2. On the same dataset, compare logistic regression to other existing machine learning classification models.

3. Performance analysis and conclusion.

1.4 Limitations of the Study

This research focuses on the use of logistic regression to classify breast cancer using the Wisconsin Breast Cancer Dataset (WBCD) from UCI’s machine learning online repository. This model’s performance is exclusively tested using the precision, recall, and f1-scores.
1.5 Paper Organisation

The paper is divided into five chapters. The first chapter introduces the research essence, aim, and objective; in the second chapter, a literature review of prior works linked to this research is discussed.

The third chapter discusses the materials used and the methodology employed. Chapter four contains performance analysis and discussion, followed by a review of the work, conclusion, and recommendations for future work in Chapter 5.

Need help with a related project topic or New topic? Send Us Your Topic 


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.