Data mining

please find the attached document for question details.

24 hrsApa styleNo Plagiarism

Don't use plagiarized sources. Get Your Custom Essay on
Data mining
Just from $13/Page
Order Essay

You have been asked by management (manufacturing, healthcare, retail, financial, and etc.,) to create a research report using a data mining tool, data analytic, BI tool. It is your responsibility to search, download, and produce outputs using one of the tools. You will need to focus your results on the data set you select.

The paper should include the following as Header sections.

Example of topics (these are just examples) please find your own Title involving Data Mining

1. Using data mining techniques for learning systems….

2. How to improve Health Care System using data mining techniques…

3. Design and develop Network/Information Security using data mining techniques…

4. How efficiently extract knowledge from a big data using data mining techniques…

5. Using data mining techniques to improve the financial/stock information systems…

Types of Data Analytic Tools:

https://www.octoparse.com/blog/top-30-big-data-tools-for-data-analysis/

Excel with Solver, but has limitations

R Studio

Tableau Public has a free trial

Microsoft Power BI

Search for others with trial options

Examples of Dataset:

http://www.rdatamining.com/

https://www.forbes.com/sites/bernardmarr/2016/02/12/big-data-35-brilliant-and-free-data-sources-for-2016/#4b3e96f1b54d

Example: Project Construction Format:

You should follow the following content format:

Introduction

Background [Discuss tool, benefits, or limitations]

Review of the Data [What are you reviewing?]

Exploring the Data with the tool

Classifications Basic Concepts and Decision Trees

Other Alternative Techniques

Summary of Results

References

(Ensure to use the Author, APA citations with any outside content).

Assignment Instructions:

1. No ZIP file

2. The submitted assignment must be typed by ONE Single MS Word/PDF file.

3. At least 12 pages (not including heading and content list pages) and 7 references.

4. Use 12-font size and 1.5 lines space

5. No more than 4 figures and 3 tables

6. Follow APA style and content format: UC follows the APA (American Psychological Association) for writing style in all its courses which require a Paper or Essay.

http://www.apastyle.org/

HAL Id: hal-02196156
https://hal.archives-ouvertes.fr/hal-02196156

Submitted on 26 Jul 2019

HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.

HEART DISEASE PREDICTION USING DATA
MINING TECHNIQUES

S Anitha, N Sridevi

To cite this version:
S Anitha, N Sridevi. HEART DISEASE PREDICTION USING DATA MINING TECHNIQUES.
Journal of Analysis and Computation, 2019. �hal-02196156�

https://hal.archives-ouvertes.fr/hal-02196156

https://hal.archives-ouvertes.fr

Journal of Analysis and Computation (JAC)

(An International Peer Reviewed Journal), www.ijaconline.com, ISSN 0973-2861

Volume XIII, Issue II, February 2019

Dr. S. Anitha and Dr. N. Sridevi 48

HEART DISEASE PREDICTION USING DATA MINING

TECHNIQUES

Dr. S. Anitha1, Dr. N. Sridevi2

1
Assistant Professor, Department of Computer Science, Avinashilingam Institute for Home Science

and Higher Education for Women, Coimbatore, India
2
Assistant Professor, College of Administrative and Financial Sciences, AMA International

University, Salmabad, Kingdom of Bahrain.

ABSTRACT

Heart diseases have an abundant pact of attention in medical research due to its impact on human

health. Heart diseases are amongst the nation’s prominent cause of death. Data mining has

developed as a vital approach for computing applications in medical informatics. Numerous

algorithms connected with data mining have considerably helped to recognise medical data more

evidently .In this work, supervised machine learning algorithms namely SVM, KNN and Naive

Bayes are used to predict the heart diseases. The machine learning algorithms are implemented

using R programming language. The performances of the algorithms are measured in terms of

accuracy. The functionality of the algorithms are examined and the outcomes were deliberated.

Keywords – KNN, SVM, Naïve Bayes, Heart diseases, Data mining

[1] INTRODUCTION

There is an overwhelming progress in the amount of electronic health records being collected

by healthcare facilities. Accuracy is particularly important when it comes to patient care and

computerizing this enormous amount of data improves the quality of the whole system. But how do

healthcare providers examine through all the information efficiently? This is where data mining has

recognised to be extremely effective. Data mining combines statistical analysis, machine learning and

database technology to mine hidden patterns and relationships from large databases [3].

Data mining is an interdisciplinary subfield of computer science and statistics with an overall

goal to extract information from a data set and transform the information into an understandable

structure for further use. Hence this research work is intended to use data mining techniques in health

care data to predict the outcomes.

Cardiovascular diseases (CVDs) have now developed as the primary cause of death in India.

Heart disease and stroke are the prime causes and are accountable for >80% of CVD deaths [1]. A

foremost challenge facing healthcare organizations is the provision of quality services at reasonable

costs. Quality service indicates diagnosing patients correctly and administering treatments that are

HEART DISEASE PREDICTION USING DATA MINING TECHNIQUES

Dr. S. Anitha and Dr. N. Sridevi 49

effective. Clinical decisions are often made based on doctors’ perception and practice rather than on

the knowledge-rich data hidden in the database. This practice points to uninvited biases, mistakes and

extreme medical expenses which affects the quality of facility delivered to patients [2].

Supervised learning trains a model on known input and output data so that it can predict

future outputs, and unsupervised learning, which finds hidden patterns or intrinsic structures in input

data. This research work is intended to use supervised machine learning algorithms to predict the

heart diseases. Supervised methods are an effort to determine the association between input attributes

and a target attribute. The relationship revealed is represented in a structure referred to as a model.

Classification model and regression model are the two main models in supervised learning.

Here this work concentrates on classification model. Classification deals with allocating observations

into distinct classes, rather than appraising continuous quantities. This research work uses some of the

classification algorithms like SVM, Naïve Bayes and KNN to predict the heart diseases and compare

their performance.

[2] RELATED WORK

Several research works have been done on diagnosis of heart disease. They used different

data mining techniques for diagnosis & achieved different results for different methods.

Chaitrali S. Dangare et.al [4], in their work on “Improved Study of Heart Disease Prediction System

using Data Mining Classification Techniques”, used neural networks, Decision tree and Naïve Bayes

algorithms. Among the three, neural network algorithm predicts the heart disease with highest

accuracy.

Sellappan Palaniappan et.al [5], “Intelligent Heart Disease Prediction System Using Data Mining

Techniques” namely Decision Trees, Naïve Bayes and Neural Network. Their experimental results

show that each technique has its unique strength in realizing the objectives of the defined mining

goals.

Poornima Singh et.al, [6], “Effective heart disease prediction system using data mining techniques”, a

prediction system is developed using neural network for predicting the risk of heart level. The

achieved outcomes have shown that the designed diagnostic system can efficiently forecast the risk

level of heart diseases.

Era Singh Kajal and Nishika [7] in their work, “Prediction of Heart Disease using Data Mining

Techniques”, used K-mean clustering and MAFIA algorithm for Heart disease prediction system and

achieved the accuracy of 89%.

Mirpouya Mirmozaffari et.al [8], “Data Mining Classification Algorithms for Heart Disease

Prediction”, proved that Random tree algorithm gives highest accuracy and lowest errors among the

highest performance algorithm.

Aditya Methaila et al [9], “Early Heart Disease Prediction Using Data Mining Techniques”, intends to

use data mining Classification Techniques, namely Decision Trees, Naïve Bayes and Neural Network,

along with weighted association Apriori algorithm and MAFIA algorithm.

Journal of Analysis and Computation (JAC)
(An International Peer Reviewed Journal), www.ijaconline.com, ISSN 0973-2861

Volume XIII, Issue II, February 2019

Dr. S. Anitha and Dr. N. Sridevi 50

A. Sheik Abdullah et al [10], “ A Data mining Model for predicting the Coronary Heart Disease using

Random Forest Classifier”, a Data mining model has been developed using Random Forest classifier

to increase the forecast accuracy and to examine several events related to CHD.

[3] DATA SET

The data set used in this research work is from UCI Machine Learning Repository [11]. The

dataset is a collection of medical analytical reports with values for 76 attributes, but all published

experiments refer to using a subset of 14 of them. Hence this research work also intended to use only

the 14 attributes. The various attribute and their description are shown in the table 1.

TABLE 1: Attribute Information
Name Description

Age Age in years

Sex 1=Male ,0=Female

Trestbps Resting blood pressure(in mm Hg)

Cp Chest pain type

Chol Serum Cholesterol in mg/dl

Fbs Fasting blood sugar> 120 mg/dl

Restecg Resting Electrocardiography results

Thalach Maximum heart rate achieved

Exang Exercise induced angina

Old peak

ST

Depression induced by exercise relative

to rest

Slope The slope of the peak exercise segment

Ca

Number of major vessels colour by

fluoroscopy that ranges between 0 and 3

Thal

3=Normal

6=fixed defect

7=reversible defect

[4] DATA MINING TECHNIQUES USED FOR PREDICTION

Three different data mining classification techniques namely KNN, Naive Bayes and Support Vector

Machine are used to analyse the dataset.

K-NEAREST NEIGHBOR (KNN) ALGORITHM

Let (xi, ci) where i = 1, 2……., n be data points. xi denotes feature values & ci denotes labels for xi

for each i. Assuming the number of classes as ‘c’.

Ci ∈ {1, 2, 3, ……, c} for all values of i

HEART DISEASE PREDICTION USING DATA MINING TECHNIQUES

Dr. S. Anitha and Dr. N. Sridevi 51

Let x be a fact for which label is not identified, and we would like to discover the label class using k-

nearest neighbor algorithms.

PSEUDO CODE

• Calculate “d(x, xi)” i =1, 2, ….., n; where d denotes the Euclidean distance between the points
is calculated as follows

• distance=

• Organise the premeditated n Euclidean distances in increasing order.

• Let k be a +ve number, take the leading k distances from this sorted list.

• Find those k-points corresponding to these k-distances.

• Let ki denotes the quantity of facts fitting to the i
th class among k points i.e. k ≥ 0

• If ki >kj ∀ i ≠ j then put x in class i.

NAIVE BAYES ALGORITHM

Bayesian rational is useful to decision making. The representation for Naive Bayes is

probabilities. It works on Bayes theorem of probability to predict the class of unknown data set. A list

of probabilities is stored to file for a learned naive Bayes model. This includes:

• Class Likelihoods: The likelihoods of each class in the training dataset.

• Conditional Likelihoods: The conditional likelihoods of each input value given each class
value.

PSEUDO CODE

Learning Phase: Learning a naive Bayes model from your training data is fast. Given a training set S

and F features and L classes,For each target value of ci(ci=c1,….,cL)

(ci) estimate P(ci) with examples in S;

For all feature value xjk of each feature xj(j=1,…,F;k=1,…,Nj)

(xj=xjk| ci) estimate P(xjk|ci) with examples in S;

Output: F* L conditional probabilistic models

Testing Phase: Training is fast because only the probability of each class and the probability of each

class given different input (x) values need to be calculated. Given an unknown instance x’=(a’1 ,…,a’n
)

Look up tables to assign the label c* to X’ if

[ (a’1|c*)… (a’n|c*)] (c*)>[ (a’1|ci)… (a’n|ci)] (ci),ci ≠c*,ci=c1,…,cL

SUPPORT VECTOR MACHINE

A Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification

purpose. SVMs have been extensively researched in the data mining and machine learning field and

applied to applications in various domains. SVMs are more commonly used in classification

https://dataaspirant.com/2015/04/11/five-most-popular-similarity-measures-implementation-in-python/

https://dataaspirant.com/2015/04/11/five-most-popular-similarity-measures-implementation-in-python/

Journal of Analysis and Computation (JAC)
(An International Peer Reviewed Journal), www.ijaconline.com, ISSN 0973-2861
Volume XIII, Issue II, February 2019

Dr. S. Anitha and Dr. N. Sridevi 52

problems. Two special properties of SVMs are that SVMs achieve (1) high generalization by

maximizing the margin and (2) support an efficient learning of nonlinear functions by kernel trick.

SVM CLASSIFICATION

The primary method of SVMs is a twofold classifier where the output of cultured task is either true or

false. Twofold SVMs are classifiers which distinguish data points of two classes. Each data points is

represented by an n-dimensional vector. A linear classifier generally separates the two classes with a

hyper plane, there are many linear classifiers that correctly classify the two groups of data.

In direction to attain extreme separation between the two classes, SVM picks the hyper plane which

has the biggest boundary. The boundary is the summation of the shortest distance from the separating

hyper plane to the nearest data point of both categories. Such a hyper plane is possible to simplify

well, implication that the hyper plane properly categorizes “invisible” or testing data points.

PSEUDO CODE

Step: 1 The data points D or training set can be expressed mathematically as follows.

D = {(x1,y1),(x2,y2),…,(xm,ym)}

Here, xi is a n-dimensional real vector, yi is either 1 or -1 denoting the class to which the point xi

belongs.

Step: 2 The SVM classification function F(x) takes the following form where w is the weight vector

and b is the bias, which will be computed by SVM in the training process.

F(x) = w.x-b (1)

Step: 3 To correctly classify the training set, F(x) must return positive numbers for positive data

points and negative numbers otherwise, that is, for every point xi in D,

w.xi – b > 0 if yi =1 , and

w.xi – b < 0 if yi = -1 (2)

These conditions can be revised such that

yi ( w.xi – b ) > 0, ∀( xi, yi) ∈ D (3)

If there exists such a linear function F that correctly classifies every point in D or satisfies Eq.(3) then

D is called linearly separable.

Step: 4 F needs to maximize the margin where margin is the distance from the hyperplane to the

closest data points. To maximise the margin Eq.(3) is revised into the following Eq.(4).

yi ( w.xi – b ) ≥ 1, ∀( xi, yi) ∈ (4)

The distance from the hyperplane to a vector xi is formulated as . Thus, the margin becomes

margin = (5)

Because when xi are the neighbouring vectors, F(x) will return 1 according to Eq.(4). The

neighbouring vectors, that satisfy Eq.(4) with equality sign, are called support vectors.

HEART DISEASE PREDICTION USING DATA MINING TECHNIQUES

Dr. S. Anitha and Dr. N. Sridevi 53

Thus the objective of the SVM is to find the optimal separating hyper plane which maximizes the

margin of the training data.

[5] EXPERIMENTAL RESULTS

R programming is used for implementing the classification techniques. The dataset consists

of 302 records in the Heart diseases database. For the experimental purpose the dataset is divided

into training dataset and testing dataset in the ratio of 70:30 respectively. The data mining

classification algorithms namely KNN, Naïve Bayes and Support Vector Machine are implemented

using R programming.

In the arena of machine learning, a confusion matrix, otherwise called an error matrix, is a

particular table design that permits perception of the execution of a calculation. Each line of the

matrix speaks to the examples in an anticipated class while every section speaks to the cases in a real

class. The confusion matrix has the following entries. They are, TP (True Positive): It means the

quantity of records delegated genuine while they were in reality evident. FN (False Negative): It

signifies the quantity of records delegated false while they were in reality obvious. FP (False

Positive): It indicates the quantity of records named genuine while they were in reality false. TN

(True Negative): It means the quantity of records delegated false while they were in reality false.

The confusion matrix obtained by three different algorithms is given below. In the table2,

Class 0 represents heart diseases and Class 1 represents no heart diseases.

Table 2: Confusion Matrix obtained using the Classification Algorithms

Class 0 Class 1

Class 0 25 7

Class 1 14 44

Confusion matrix of KNN

Algorithm

Class 0 Class 1

Class 0 30 5

Class 1 7 48

Confusion matrix of Naïve Bayes Algorithm

Class 0 Class 1

Class 0 26 7

Class 1 13 44

Confusion matrix of SVM Algorithm

From table3, it is evident that among the three algorithms used for prediction of heart diseases using

the clinical data, Naïve Bayes algorithm predicts the diseases with the highest accuracy of 86.6%

when compared to KNN and SVM.

Journal of Analysis and Computation (JAC)
(An International Peer Reviewed Journal), www.ijaconline.com, ISSN 0973-2861
Volume XIII, Issue II, February 2019

Dr. S. Anitha and Dr. N. Sridevi 54

Table 3: Accuracy of the classification algorithms

[6] CONCLUSION

Heart disease is the most common diseases in India. Early detection of heart diseases will increase the

survival rate hence this research work is intended to predict the whether the patient has heart disease

or not with the help of clinical data which will assist the diagnosis process. Three supervised machine

learning algorithms namely KNN, Naive Bayes and SVM are compared in terms of accuracy using

the heart diseases dataset. From the experimental results it’s evident that Naïve Bayes algorithm

predicts the heart disease with the accuracy of 86.6%. In future, the performance of Naïve Bayes

algorithm can be compared with various classification algorithms like, random forest, decision tree.

REFERENCES

[1] Prabhakaran et al, “Cardiovascular Disease in India”, Circulation, Vol 133, No.16, pg.no:
1605 – 1620, 2016.

[2] G.Subbalakshmi et al., “Decision Support in Heart Disease Prediction System using Naive
Bayes”, Indian Journal of Computer Science and Engineering (IJCSE), Vol.2 ,

No.2,pg.no:170-176, 2011.

[3] Thuraisingham, B., “A Primer for Understanding and Applying Data Mining”, IT
Professional, 28-31, 2000.

[4] Chaitrali S. Dangare et.al, “Improved Study of Heart Disease Prediction System using Data
Mining Classification Techniques”, International Journal of Computer

Applications,Vol.47,No.10,pg.no:44 – 48, 2012.

[5] Sellappan Palaniappan et.al, “Intelligent Heart Disease Prediction System Using Data Mining
Techniques”, IJCSNS International Journal of Computer Science and Network Security,

Vol.8, No.8,pg.no: 343 – 350, 2008.

[6] Poornima Singh et.al, “Effective heart disease prediction system using data mining
techniques”, International Journal of Nano medicine, pg.no:121- 124, 2018.

[7] Era Singh Kajal and Nishika, “Prediction of Heart Disease using Data Mining Techniques”,
International Journal of Advance Research, Ideas and Innovations in Technology, Vol.2,

No.3, pg.no: 1 – 7, 2016.

Classification

Algorithm

Accuracy in %

KNN 76.67

Naïve Bayes 86.6

SVM 77.7

HEART DISEASE PREDICTION USING DATA MINING TECHNIQUES

Dr. S. Anitha and Dr. N. Sridevi 55

[8] Mirpouya Mirmozaffari et.al, “Data Mining Classification Algorithms for Heart Disease
Prediction”, International Journal of Computing, Communications & Instrumentation Engg.

(IJCCIE), Vol. 4, No.1, pg.no: 11-15, 2017.

[9] Aditya Methaila et al, “Early Heart Disease Prediction Using Data Mining Techniques”,
Computer Science and Information Technology, pg.no:53 – 59,2014.

[10] A. Sheik Abdullah et al , “ A Data mining Model for predicting the Coronary Heart Disease
using Random Forest Classifier”, International Conference on Recent Trends in

Computational Methods, Communication and Controls, pg.no:22 – 25, 2012.

[11] https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/heart-disease.names

What Will You Get?

We provide professional writing services to help you score straight A’s by submitting custom written assignments that mirror your guidelines.

Premium Quality

Get result-oriented writing and never worry about grades anymore. We follow the highest quality standards to make sure that you get perfect assignments.

Experienced Writers

Our writers have experience in dealing with papers of every educational level. You can surely rely on the expertise of our qualified professionals.

On-Time Delivery

Your deadline is our threshold for success and we take it very seriously. We make sure you receive your papers before your predefined time.

24/7 Customer Support

Someone from our customer support team is always here to respond to your questions. So, hit us up if you have got any ambiguity or concern.

Complete Confidentiality

Sit back and relax while we help you out with writing your papers. We have an ultimate policy for keeping your personal and order-related details a secret.

Authentic Sources

We assure you that your document will be thoroughly checked for plagiarism and grammatical errors as we use highly authentic and licit sources.

Moneyback Guarantee

Still reluctant about placing an order? Our 100% Moneyback Guarantee backs you up on rare occasions where you aren’t satisfied with the writing.

Order Tracking

You don’t have to wait for an update for hours; you can track the progress of your order any time you want. We share the status after each step.

image

Areas of Expertise

Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

Areas of Expertise

Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

image

Trusted Partner of 9650+ Students for Writing

From brainstorming your paper's outline to perfecting its grammar, we perform every step carefully to make your paper worthy of A grade.

Preferred Writer

Hire your preferred writer anytime. Simply specify if you want your preferred expert to write your paper and we’ll make that happen.

Grammar Check Report

Get an elaborate and authentic grammar check report with your work to have the grammar goodness sealed in your document.

One Page Summary

You can purchase this feature if you want our writers to sum up your paper in the form of a concise and well-articulated summary.

Plagiarism Report

You don’t have to worry about plagiarism anymore. Get a plagiarism report to certify the uniqueness of your work.

Free Features $66FREE

  • Most Qualified Writer $10FREE
  • Plagiarism Scan Report $10FREE
  • Unlimited Revisions $08FREE
  • Paper Formatting $05FREE
  • Cover Page $05FREE
  • Referencing & Bibliography $10FREE
  • Dedicated User Area $08FREE
  • 24/7 Order Tracking $05FREE
  • Periodic Email Alerts $05FREE
image

Our Services

Join us for the best experience while seeking writing assistance in your college life. A good grade is all you need to boost up your academic excellence and we are all about it.

  • On-time Delivery
  • 24/7 Order Tracking
  • Access to Authentic Sources
Academic Writing

We create perfect papers according to the guidelines.

Professional Editing

We seamlessly edit out errors from your papers.

Thorough Proofreading

We thoroughly read your final draft to identify errors.

image

Delegate Your Challenging Writing Tasks to Experienced Professionals

Work with ultimate peace of mind because we ensure that your academic work is our responsibility and your grades are a top concern for us!

Check Out Our Sample Work

Dedication. Quality. Commitment. Punctuality

Categories
All samples
Essay (any type)
Essay (any type)
The Value of a Nursing Degree
Undergrad. (yrs 3-4)
Nursing
2
View this sample

It May Not Be Much, but It’s Honest Work!

Here is what we have achieved so far. These numbers are evidence that we go the extra mile to make your college journey successful.

0+

Happy Clients

0+

Words Written This Week

0+

Ongoing Orders

0%

Customer Satisfaction Rate
image

Process as Fine as Brewed Coffee

We have the most intuitive and minimalistic process so that you can easily place an order. Just follow a few steps to unlock success.

See How We Helped 9000+ Students Achieve Success

image

We Analyze Your Problem and Offer Customized Writing

We understand your guidelines first before delivering any writing service. You can discuss your writing needs and we will have them evaluated by our dedicated team.

  • Clear elicitation of your requirements.
  • Customized writing as per your needs.

We Mirror Your Guidelines to Deliver Quality Services

We write your papers in a standardized way. We complete your work in such a way that it turns out to be a perfect description of your guidelines.

  • Proactive analysis of your writing.
  • Active communication to understand requirements.
image
image

We Handle Your Writing Tasks to Ensure Excellent Grades

We promise you excellent grades and academic excellence that you always longed for. Our writers stay in touch with you via email.

  • Thorough research and analysis for every order.
  • Deliverance of reliable writing service to improve your grades.
Place an Order Start Chat Now
image

Order your essay today and save 30% with the discount code Happy