Author: Ed Nelson
Department of Sociology M/S SS97
California State University, Fresno
Fresno, CA 93740
Email:
ednelson@csufresno.edu
Note to the Instructor: This is the seventh in a series of 13 exercises that were written for an introductory research methods class. The first exercise focuses on the research design which is your plan of action that explains how you will try to answer your research questions. Exercises two through four focus on sampling, measurement, and data collection. The fifth exercise discusses hypotheses and hypothesis testing. The last eight exercises focus on data analysis. In these exercises we’re going to analyze data from one of the
Monitoring the Future Surveys
(i.e., the 2017 survey of high school seniors in the United States). This data set is part of the collection at the Inter-university Consortium for Political and Social Research at the University of Michigan. This data set is freely available to the public and you do not have to be a member of the Consortium to use it. We’re going to use SDA (Survey Documentation and Analysis) to analyze the data which is an online statistical package written by the Survey Methods Program at UC Berkeley and is available without cost wherever one has an internet connection. A weight variable is automatically applied to the data set so it better represents the population from which the sample was selected. You have permission to use this exercise and to revise it to fit your needs. Please send a copy of any revision to the author so I can see how people are using the exercises. Included with this exercise (as separate files) are more detailed notes to the instructors and the exercise itself. Please contact the author for additional information.
This page
in MS Word ( x) format is attached.
The goal of this exercise is to explore measures of central tendency (mode, median, and mean) and dispersion (range, standard deviation, and variance). The exercise also gives you practice in using FREQUENCIES in SDA.
Data analysis always starts with describing variables one-at-a-time. Sometimes this is referred to as univariate (one-variable) analysis. Central tendency refers to the center of the distribution.
There are three commonly used measures of central tendency – the mode, median, and mean of a distribution. The mode is the most common value or values in a distribution.
[1]
The median is the middle value of a distribution.
[2]
The mean is the sum of all the values divided by the number of values.
We’re going to use the Monitoring the Future (MTF) Survey of high school seniors for this exercise. The MTF survey is a multistage cluster sample of all high school seniors in the United States. The survey of seniors started in 1975 and has been an annual survey ever since. To access the MTF 2017 survey follow the instructions in the Appendix. Your screen should look like Figure 7-1. Notice that a weight variable has already been entered in the WEIGHT box. This will weight the data so the sample better represents the population from which the sample was selected.
Figure 7-1
MTF is an example of a social survey. The investigators selected a sample from the population of all high school seniors in the United States. This particular survey was conducted in 2017 and is a relatively large sample of a little more than 12,000 students. In a survey we ask respondents questions and use their answers as data for our analysis. The answers to these questions are used as measures of various concepts. In the language of survey research these measures are typically referred to as variables.
Run FREQUENCIES in SDA for the variable v2196. This variable is the number of miles per week that students drive. Here’s the question from the survey – “During an average week, how much do you usually drive a car, truck, or motorcycle?” To run the frequency distribution, enter the variable name, v2196, in the ROW box. The WEIGHT box is already filled in. Click on RUN THE TABLE to get the frequency distribution. Your screen should look like
Figure 7-2
.
Figure 7-2
The responses to this question were divided into a set of six categories – none, 1 to 10, 11 to 50, 51 to 100, 101 to 200, and more than 200. This was done to make the question easier to answer. It’s difficult for respondents to remember the precise number of miles they drove per week. It’s a lot easier to select one of these categories. But this means that we don’t have the exact number of miles driven. Keep that in mind as we think about measures of central tendency.
Rerun the table but this time check the box for SUMMARY STATISTICS under TABLE OPTIONS and click on the drop-down arrow next to TYPE OF CHART and select BAR CHART. Below the frequency distribution you should see the statistics that SDA computes for you and the bar chart. The summary statistics should look like Figure 7-3.
Figure 7-3
Your output will display a number of summary statistics. Three of these statistics are commonly used measures of central tendency – mode, median, and mean.
Mean = 3.02. Clearly this is wrong. The mean number of miles driven can’t be 3.02 miles. SDA has computed the mean of the categorical values for this variable. In other words, SDA has added up all the 1’s, 2’s, 3’s, 4’s, 5’s, and 6’s and divided that sum by the total number of cases. Notice that SDA gives you the sum which is 36,749.77. To get the mean, SDA divided that sum by 12,169.1 which equals 3.02 which is the mean of the categorical values.
[3]
Let’s see if we can figure out a way to get SDA to compute the actual mean and not just the mean of the categorical values.
We can do this be changing the categorical values so they are the midpoint of the miles driven for each category. That would mean we would have to do the following.
How are we going to tell SDA to make these changes? By the way, this is called recoding. We’re recoding the categorical values of 1, 2, 3, 4, 5, and 6 into the values above. Follow these steps to recode in SDA.
Now tell SDA to compute the summary statistics for the recoded variable. The mean should be 60.00 this time. Notice that the mode is now 0 since that is the value for the first category and the median is 30 which is in the third category. Remember that this is based on the assumption that all the cases in each category fall at the midpoint of that category.
One of the variables in the data set is v2197 which is the number of driving tickets respondents received in the last twelve months. The response categories are 0, 1, 2, 3, and 4 or more. The only problem is the last open-ended category. Let’s assume that no one received more than six tickets. So the last category would be 4 to 6 with a midpoint of 5. Follow the procedure described in Part I and compute the mode, median, and mean. Write a paragraph discussing what these measures of central tendency mean.
The first thing to consider is the level of measurement (nominal, ordinal, interval, ratio) of your variable (see
6RM
).
How skewed is your distribution? Go back and look at the bar chart for v2196. Notice that there is a long tail to the right of the distribution. The category with the largest number of cases is the first category which represents those who did not drive at all. But there are quite a few respondents who report driving quite a bit. For example, 11.5% report driving between 101 and 200 miles and 8.3% said they drive more than 200 miles per week. That’s what we call a positively skewed distribution where there is a long tail towards the right or the positive direction. Now look at the median and mean for the recoded variable. The mean (60.00) is larger than the median (30.0). The respondents that drove a lot miles pull the mean up. That’s what happens in a skewed distribution. The mean is pulled in the direction of the skew. The opposite would happen in a negatively skewed distribution. The long tail would be towards the left and the mean would be lower than the median. In a heavily skewed distribution the mean is distorted and pulled considerably in the direction of the skew. So consider reporting only the median in a heavily skewed distribution. That’s why you almost always see median income reported and not mean income. Imagine what would happen if your sample happened to include Bill Gates. The income distribution would have this very, very large value which would pull the mean up but not affect the median.
Is there more than one clearly defined peak in your distribution? For example, consider a hypothetical distribution of 100 cases in which there are 50 cases with a value of two and fifty cases with a value of 8. The median and mean would be five but there are really two centers of this distribution – two and eight. The median and the mean aren’t telling the correct story about the center. You’re better off reporting the two clearly defined peaks of this distribution and not reporting the median and mean.
Run FREQUENCIES for the following variables. Once you have entered the variable names in the ROW box, ask for the SUMMARY STATISTICS and a BAR CHART. For each variable write a sentence or two indicating which measure(s) of central tendency (i.e., mode or median) would be appropriate to use to describe the center of the distribution and what the values of those statistics mean. For some variables there will be more than one appropriate measure of central tendency.
Dispersion or variation refers to the degree that values in a distribution are spread out or dispersed. The most commonly used measures – range, standard deviation, variance – are only appropriate for interval and ratio level variables (see exercise 6RM). The variables in the MTF survey are entirely nominal and ordinal variables but as you have seen in this exercise we can recode some of these variables so they are ratio variables.
The range is the difference between the highest and the lowest values in the distribution. We don’t actually know the highest value for v2196 since the last category is more than 200 miles. Earlier in this exercise we assumed that the largest value was 300. If that is the case, what would the range be for the recoded variable?
The range is not a very stable measure since it depends on the two most extreme values – the highest and lowest values. These are the values most likely to change from sample to sample.
The variance is the sum of the squared deviations from the mean divided by the number of cases minus 1 and the standard deviation is just the square root of the variance. Your instructor may want to go into more detail on how to calculate the variance by hand. Look back at the summary statistics for your recode of v2196. The variance equals 5,458.65. What will the standard deviation equal?
The variance and the standard deviation can never be negative. A value of 0 means that there is no variation or dispersion at all in the distribution. All the values are the same. The more variation there is, the larger the variance and standard deviation.
So what does the variance and the standard deviation for v2196 mean? That’s hard to answer because you don’t have anything to compare it to. But if you knew the standard deviation for both men and women you would be able to determine whether men or women have more variation. Instead of comparing the standard deviations for men and women you would compute a statistic called the Coefficient of Relative Variation (CRV). CRV is equal to the standard deviation divided by the mean of the distribution. A CRV of 2 means that the standard deviation is twice the mean and a CRV of 0.5 means that the standard deviation is one-half of the mean. You would compare the CRV’s for men and women to see whether men or women have more variation relative to their respective means.
How do we get SDA to compute the means and standard deviations for both men and women? Click on ANALYSIS and then on COMPARISON OF MEANS in the blue horizontal bar at the top of your screen. Enter the variable for which you want to compute the mean and standard deviation in the DEPENDENT box. We’re going to use the same variable we used in part I (v2196). Be sure to enter the recode that you used in part 1. Enter the variable (V2150) that you want to use to divide the sample into men and women in the ROW box. SDA will automatically calculate the mean number of miles driven for both men and women. To get the standard deviations, check the STD DEV box under TABLE OPTIONS. Uncheck the STD ERRORS box under TABLE OPTIONS since you won’t need this statistic. The mean will be the top number in each box of your output and the standard deviation will be right below the mean. Compute the Coefficient of Variation for both men and women and write a sentence or two discussing whether men or women have more variation.
By the way, you might also have wondered why you need both the variance and the standard deviation when the standard deviation is just the square root of the variance. You’ll just have to take my word for it that you will need both as you go further in statistics.
[1] Frequency distributions can be grouped or ungrouped. Think of age. We could have a distribution that lists all the ages in years of the respondents to our survey. We could also divide age into a series of categories such as under 30, 30 to 39, 40 to 49, 50 to 59, 60 to 69, and 70 and older. In a grouped frequency distribution the mode would be the most common category or categories.
[2] In a grouped frequency distribution the median would be the category that contains the middle value.
[3] We need to clear something up. Why is the total number of cases 12,169.1 and not a whole number? When you weight the cases by the weight variable, you will get a fractional number of cases. Don’t worry about this. It’s a technical issue and not important to us in this discussion.
We provide professional writing services to help you score straight A’s by submitting custom written assignments that mirror your guidelines.
Get result-oriented writing and never worry about grades anymore. We follow the highest quality standards to make sure that you get perfect assignments.
Our writers have experience in dealing with papers of every educational level. You can surely rely on the expertise of our qualified professionals.
Your deadline is our threshold for success and we take it very seriously. We make sure you receive your papers before your predefined time.
Someone from our customer support team is always here to respond to your questions. So, hit us up if you have got any ambiguity or concern.
Sit back and relax while we help you out with writing your papers. We have an ultimate policy for keeping your personal and order-related details a secret.
We assure you that your document will be thoroughly checked for plagiarism and grammatical errors as we use highly authentic and licit sources.
Still reluctant about placing an order? Our 100% Moneyback Guarantee backs you up on rare occasions where you aren’t satisfied with the writing.
You don’t have to wait for an update for hours; you can track the progress of your order any time you want. We share the status after each step.
Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.
Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.
From brainstorming your paper's outline to perfecting its grammar, we perform every step carefully to make your paper worthy of A grade.
Hire your preferred writer anytime. Simply specify if you want your preferred expert to write your paper and we’ll make that happen.
Get an elaborate and authentic grammar check report with your work to have the grammar goodness sealed in your document.
You can purchase this feature if you want our writers to sum up your paper in the form of a concise and well-articulated summary.
You don’t have to worry about plagiarism anymore. Get a plagiarism report to certify the uniqueness of your work.
Join us for the best experience while seeking writing assistance in your college life. A good grade is all you need to boost up your academic excellence and we are all about it.
We create perfect papers according to the guidelines.
We seamlessly edit out errors from your papers.
We thoroughly read your final draft to identify errors.
Work with ultimate peace of mind because we ensure that your academic work is our responsibility and your grades are a top concern for us!
Dedication. Quality. Commitment. Punctuality
Here is what we have achieved so far. These numbers are evidence that we go the extra mile to make your college journey successful.
We have the most intuitive and minimalistic process so that you can easily place an order. Just follow a few steps to unlock success.
We understand your guidelines first before delivering any writing service. You can discuss your writing needs and we will have them evaluated by our dedicated team.
We write your papers in a standardized way. We complete your work in such a way that it turns out to be a perfect description of your guidelines.
We promise you excellent grades and academic excellence that you always longed for. Our writers stay in touch with you via email.