The common way for Diabetes Educators to inform diabetes patients of their nutrition therapy is by introducing food substitution. The existing categorization mechanism is not efficiently for classify the food for diabetic patient. Clustering Data Mining (DM) Techniques can be a very useful tool to collect food items with the same elements into groups. This paper looks at the use of K-mean to Cluster food dataset into groups based on food elements using RapidMiner tool .The output from the clustering algorithm will help other recommendation systems software to provide patient with a good recommendation for there diabetes diet.
Keywords
data mining; diabetes, data set ,K-meant.
1. Introduction
Food and nutrition are a key to have good health. They are important for everyone to maintain a healthy diet especially for diabetic patients who have several limitations. Nutrition therapy is a major solution to prevent, manage and control diabetes by managing the nutrition based on the belief that food provides vital medicine and maintains a good health. Typically, diabetic patients need to avoid additional sugar and fat for finding the substitution from the same food group [4].The effective clustering from the various actual nutrients is needed to apply. The clustering will encourage diabetics to eat the widest possible variety of permitted food to ensure getting the full range of trace elements and other nutrients. This paper is set out as follows. Section 2, introduces some related work of data mining and diabetic diet. Section 3, describes the used data set and summarize the main features that it contains. Data preparation process is presented in Section 4. Section 5, describes the materials and methods used in this study. In Section 6, the conclusion is given.
2. Literature Review
Li et al [1], this study proposed an automated food ontology constructed for diabetes diet care. The methods include generating an ontology skeleton with hierarchical clustering algorithms (HCA)also it is used intersection naming for class naming and instance ranking by granular ranking and positioning .This study based on dataset from food nutrition composition database of the Department Of Health the dataset. Phanich et al [2], proposed Food Recommendation System (FRS) by using food clustering analysis for diabetic patients. The system will recommend the proper substituted
foods in the context of nutrition and food characteristic. They used Self-Organizing Map (SOM) and K-mean clustering for food clustering analysis which is based on the similarity of eight significant nutrients for diabetic patient. This study is based on the dataset “Nutritive values for Thai food†provided by Nutrition Division, Department of Health, Ministry of Public Health (Thailand).
3. Dataset Description
This study is based on the dataset provided by The USDA National Nutrient Database for Standard Reference (SR)[3].the Values in the database based on the results of laboratory analyses or calculated by using appropriate algorithms, factors, or recipes, as indicated by the source in the Nutrient Data file. Not every food item contains a complete nutrient profile. The used data set is an abbreviated file with fewer nutrients but all the food items was included. The Dataset contains all the food items with nutrients with 7540 records and 52 attributes. Table1, 2 and 3 show data set attributes and their description. In order to check for missing value I used Rapid Miner tool. Table 4 present sample of data set.
4. Data Preparation
The quality of the results of the mining process is directly proportional to the quality of the data. I need first to prepare the data set by applying Data preprocessing strategies. Data preprocessing is an important and critical step in the data mining process, and it has a huge impact on the success of a data mining project. The purpose of data preprocessing is to cleanse the dirty/noise data. Fig. 1 shows the different strategies in the data preprocessing phase. In this study I focused on data cleaning and data reduction.
Figure 1 strategies in data preprocessing
Table 1 description of data set attributes from 1- 24Table 2 description of data set attributes from 25-48
Table 3 description of data set attributes from 49-52
Table 4 Sample of dataset
Shrt_Desc
Water
Energ_Kcal
Protein
Lipid_Tot
Ash
Carbohydrt
Sugar_Tot
others…
BUTTER,WITH SALT
15.87
717
0.85
81.11
2.11
0.06
0.06
BUTTER,WHIPPED,WITH SALT
15.87
717
0.85
81.11
2.11
0.06
0.06
BUTTER OIL,ANHYDROUS
0.24
876
0.28
99.48
0
0
0
CHEESE,BLUE
42.41
353
21.4
28.74
5.11
2.34
0.5
CHEESE,BRICK
41.11
371
23.24
29.68
3.18
2.79
0.51
Data Cleaning
Data cleaning, also called data cleansing or scrubbing, deals with detecting and removing errors and Inconsistencies from data in order to improve the quality of data [6]. The aim of data cleaning is to raise the data quality to a level suitable for the clustering analyses. The Methods used for data cleaning are fill in missing values and eliminate data redundancy.
Missing value:
It is common for the dataset to have fields that contain unknown or missing values. There are a variety of legitimate reasons why this can happen. There are a number of methods for treating records that contain missing values [7]:
1. Omit the incorrect field(s)
2. Omit the entire record that contains the incorrect field(s)
3. Automatically enter/correct the data with default values e.g. select the mean from the range
4. Derive a model to enter/correct the data
5. Replace all values with a global constant
Within this study both missing and unknown data have been set to zero.
Duplicated Records
Duplicate records do not share a common key and/or they contain errors that make duplicate matching a difficult task. Errors are introduced as the result of transcription errors, incomplete information, lack of standard formats, or any combination of these factors [7] . The data set used in this study include data objects that are duplicate. Using RapidMiner to removing duplication .As result from this process the 7540 records decreased to 7139 record.
Data Reduction
Data reduction can be achieved in many ways one way is by selecting features [5], The used data set contains many Irrelevant features that contain almost no useful information for data mining task As [2] I will focus only on eight attributes out of fifty two attributes, as they are important for diabetes diet.
The eight nutrients include:
Carbohydrate
Energy
Fat
protein
Fiber
vitamin E
Vitamin B1(also known as thiamine)
Vitamin C
Data Normalization
Data normalization is one of the preprocessing procedures in data mining, where the attribute data are scaled so as to fall within a small specified range such as -1.0 to 1.0 or 0.0 to 1.0.
Normalization before clustering is specially needed for distance metric, such as Euclidian distance, which are sensitive to differences in the magnitude or scales of the attributes.
The K-Means typically uses Euclidean distance to measure the distortion between a data object and its cluster centroid .However, the clustering results can be greatly affected by differences in scale among the dimension from, which the distances are computed. Data normalization is the linear transformation of data to a specific range. Therefore, it is worthwhile to enhance clustering quality by normalizing the dynamic range of input data objects into specific range [8].in this study I will normalize data to the range of [0, 1] . Figure 2 show the result from the data preprocessing
Figure 2 Result from Preprocessing(Data cleaning , Data Reduction , Data Normalization)
5. Data Analysis Methodology
After data preparation, a second step is using a K-means to cluster food data set. In order to work with optimal k-value as [2] used the Davies-Bouldin index [9] to evaluate the optimal k-value. The k-value is optimal when the related index is smallest. For this study,
I used K=19 since it gives the smallest value.
The final result is the food clusters which foods in the same group provide the approximate amount of the eight nutrients. Data analysis solution RapidMiner was used to analysis the data set and cluster food item. The whole process sequence shown in figure 3.figure 4, 5, 6 shows the final result.
Figure 3 data analysis process
Figure4 food Items clustered into 19 clusters
Figure4 distribution of 8 Nutrients into clusters from (0-12)
Figure4 distribution of 8 Nutrients into clusters from (13-18)
5.1 K-mean Evaluation
a performance based on the number of clusters.
This operation builds a derived index from the number of clusters by using the formula 1 – (k / n) with k number of clusters and n covered examples. It is used for optimizing the coverage of a cluster result in respect to the number of clusters. By applying the K-mean model to this data set the Cluster number index = 0.997 witch indicate a good coverage.
Find Out How UKEssays.com Can Help You!
Our academic experts are ready and waiting to assist with any writing project you may have. From simple essay plans, through to full dissertations, you can guarantee we have a service perfectly matched to your needs.
View our services
6. Conclusion
Data mining has been widely used in many health care fields. The Diabetes Diet Care was one of the health problems that data mining play role on it .this experiment are conducted based on USDA National Nutrient dataset. The results demonstrate that K-mean is very effective and it can successfully create food groups that will help in many recommendations systems.
We provide professional writing services to help you score straight A’s by submitting custom written assignments that mirror your guidelines.
Get result-oriented writing and never worry about grades anymore. We follow the highest quality standards to make sure that you get perfect assignments.
Our writers have experience in dealing with papers of every educational level. You can surely rely on the expertise of our qualified professionals.
Your deadline is our threshold for success and we take it very seriously. We make sure you receive your papers before your predefined time.
Someone from our customer support team is always here to respond to your questions. So, hit us up if you have got any ambiguity or concern.
Sit back and relax while we help you out with writing your papers. We have an ultimate policy for keeping your personal and order-related details a secret.
We assure you that your document will be thoroughly checked for plagiarism and grammatical errors as we use highly authentic and licit sources.
Still reluctant about placing an order? Our 100% Moneyback Guarantee backs you up on rare occasions where you aren’t satisfied with the writing.
You don’t have to wait for an update for hours; you can track the progress of your order any time you want. We share the status after each step.
Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.
Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.
From brainstorming your paper's outline to perfecting its grammar, we perform every step carefully to make your paper worthy of A grade.
Hire your preferred writer anytime. Simply specify if you want your preferred expert to write your paper and we’ll make that happen.
Get an elaborate and authentic grammar check report with your work to have the grammar goodness sealed in your document.
You can purchase this feature if you want our writers to sum up your paper in the form of a concise and well-articulated summary.
You don’t have to worry about plagiarism anymore. Get a plagiarism report to certify the uniqueness of your work.
Join us for the best experience while seeking writing assistance in your college life. A good grade is all you need to boost up your academic excellence and we are all about it.
We create perfect papers according to the guidelines.
We seamlessly edit out errors from your papers.
We thoroughly read your final draft to identify errors.
Work with ultimate peace of mind because we ensure that your academic work is our responsibility and your grades are a top concern for us!
Dedication. Quality. Commitment. Punctuality
Here is what we have achieved so far. These numbers are evidence that we go the extra mile to make your college journey successful.
We have the most intuitive and minimalistic process so that you can easily place an order. Just follow a few steps to unlock success.
We understand your guidelines first before delivering any writing service. You can discuss your writing needs and we will have them evaluated by our dedicated team.
We write your papers in a standardized way. We complete your work in such a way that it turns out to be a perfect description of your guidelines.
We promise you excellent grades and academic excellence that you always longed for. Our writers stay in touch with you via email.