4 discussions due in 48 hours

4 DISCUSSIONS DUE IN 48 HOURS

ALL ARE ATTACHED

Don't use plagiarized sources. Get Your Custom Essay on

Just from $13/Page

Order Essay

INSTRUCTIONS ARE ATTACHED

Qualitative Methodologies

Qualitative methodologies involve collecting non-numerical data, usually through interviews or observation. There are many approaches to qualitative research and no fully agreed upon “list” of methodologies. The text (Malec and Newman, 2013) describes six approaches in Section 3.1. The Frank and Polkinghorne (2010) article also describes three main qualitative approaches. The best way to learn about a variety of qualitative research methods is to read reports or articles of research around a topic you are interested in.

Instructions:

For your initial post, choose two articles that use a qualitative research method to answer a research question on your topic of interest. Remember that qualitative research is exploratory in nature, and is used to go deeper into issues of interest and explore nuances related to the problem at hand. Common data collection methods used in qualitative research include group discussions, focus groups, in-depth interviews, and uninterrupted observations. Data analysis typically involves identifying themes or categories, or providing in- depth descriptions of the data. Use the Anderson (2006) and Lee (1992) articles to obtain a better understanding of what qualitative research includes.

· Briefly describe the particular qualitative research approach/methodology utilized in each of the two articles you selected (e.g. case study, ethnographic study, phenomenological study, etc.).

· Refer to the week’s readings (or recommended articles) to help you explain.

· Compare and contrast the two qualitative methods used:

· What is the same and what is different and why?

· Does either methods seem a good fit to explore your topic of interest?

· Why/why not?

Post should be at least 300 words.

Qualitative Validity

Many researchers, particularly those from the hard sciences like mathematics or physics, consider quantitative research, with the ability to determine “statistical significance,” as more rigorous than qualitative research. Qualitative research does not lend itself to such mathematical determination of validity, rather it is highly focused on providing descriptive and/or exploratory results. However, this does not relieve the qualitative researcher from designing studies that are rigorous and high in “trustworthiness,” often the word used to describe validity in a qualitative study. There is no agreed upon set of criteria for ensuring a quality qualitative study, but there are a number of models of quality criteria.

Instructions:

· After reading the assigned articles by Shenton (2004) and Freeman, deMarrais, Preissle, Roulston, and St. Pierre (2007), discuss at least three things a qualitative researcher can consider to increase the validity of a study’s results.

· Give at least one example from one of the qualitative study articles you have found on your own topic of how a claim (reported result) is supported.

· How does that article report on the validity of the study’s results?

· Do the authors do a good job of demonstrating validity? If not, what could/should they have done differently?

Post should be at least 300 words.

Measurement Scales

Conducting psychological research generally involves collecting a large amount of data. In order to summarize and draw conclusions about the data, we must first have some knowledge about the various statistical concepts. Descriptive statistics and inferential statistics are what we use to describe or summarize the data and make conclusions and predictions about a population. How variables are defined and measured is critically important in order to evaluate the validity of the research.

Instructions:

· Provide one example (your own) of each measurement scale, and provide an example (your own) of how a variable might be measured by different scales.

· Explain which of these you found most challenging to identify and why.

Your post should be at least 300 words.

Hypothesis Testing

Hypothesis testing is a method for testing a prediction or hypothesis about a measurable variable in a population. Hypothesis testing involves five steps including: stating the hypotheses, collecting data, calculating the statistics, comparing the statistical findings with a critical value, and making a decision to accept or reject the null hypothesis.

Instructions:

· Present a research topic and research question you are interested in studying and develop a null hypothesis and alternative hypothesis to answer the research question.

· Discuss whether your alternative hypothesis is directional or non-directional.

· What would be a type 1 error and type 2 error in the examples you provided? Explain how both of these could be minimized in your example.

· What do you consider to be the most challenging aspect of defining hypotheses and why?

Your post should be at least 300 words.

Required Texts

You can find more helpful items for Constellation at the following site:

https://content.rockies.edu/support/tutorials/ (Links to an external site.)

Constellation

: Malec, T. & Newman, M. (2013). Research methods: Building a knowledge base. San Diego, CA: Bridgepoint Education, Inc.

Required References

American Psychological Association. (2010). Ethical principles of psychologists and code of conduct: 2010 amendment. Standard 8: Research and Publication. Available at

http://www.apa.org/ethics/code/ (Links to an external site.)

Anderson, J. D. (2006). Qualitative and quantitative research. Available at

http://web20kmg.pbworks.com/w/file/fetch/82037432/QualitativeandQuantitativeEvaluationResearch (Links to an external site.)

Benedict K (2014, April 11). Correlation – The Basic Idea Explained [Video file]. Retrieved from

Correlation – The Basic Idea Explained (Links to an external site.)

Conway, A. (2014). Circuit court involved youth in Virginia: A descriptive, cross-sectional, quantitative research study. London: SAGE Publications Ltd. doi: 10.4135/978144627305014535709

Diem, K. G. (2002). A step-by-step guide to developing effective questionnaires and survey procedures for program evaluation & research. Available at

http://njaes.rutgers.edu/pubs/publication.asp?pid=FS995 (Links to an external site.)

Edwards, K. & Dardis, C. (2014).Conducting mixed-methodological dating violence research: Integrating quantitative survey and qualitative data. London: SAGE Publications Ltd. doi: 10.4135/978144627305013516582

Explorable (2010). Experimental research. Available at

https://explorable.com/experimental-research (Links to an external site.)

Explorable (2010). The scientific method. Available at

https://explorable.com/scientific-method (Links to an external site.)

Frank, G., & Polkinghorne, D. (2010). Qualitative research in occupational therapy: From the first to the second generation. OTJR: Occupation, Participation and Health, 30(2), 51-57. doi:10.3928/15394492-20100325-02

Freeman, M., deMarrais, K., Preissle, J., Roulston, K., & St Pierre, E. A. (2007). Standards of evidence in qualitative research: An incitement to discourse. Educational Researcher, 36(1), 25-32. doi: 10.3102/0013189X06298009.

Ijalba, E. (2014). Using qualitative and quantitative methods to conduct research in parent education with immigrant families of children with autism spectrum disorders. London: SAGE Publications Ltd. doi: 10.4135/978144627305014533926

Mariampolski, H. (2001). Qualitative vs. quantitative. Qualitative Market Research, 22-25. SAGE Publications Ltd doi: 10.4135/9781412985529.n13

Onwuegbuzie, A. & Leech, N. L. (2005). On becoming a pragmatic researcher: The importance of combining quantitative and qualitative research methodologies. International Journal of Social Research Methodology, 8(5), 375-387. doi: 10.1080/13645570500402447

Park, J., & Park, M. (2016).

Qualitative versus quantitative research methods: Discovery or justification?

download

Journal Of Marketing Thought, 3(1), 1-7.

Polkinghorne, D. E. (2005). Language and meaning: Data collection in qualitative research. Journal of Counseling Psychology, 52(2), 137-145. doi:10.1037/0022-0167.52.2.137

Rice, G. T. (2005). Developing high quality multiple-choice test questions. Available at

http://circle.adventist.org/files/jae/en/jae200567043006 (Links to an external site.)

Shenton, A.K. (2004). Strategies for ensuring trustworthiness in qualitative research projects. Education for Information, 22(2), 63-75.

Smith, Lara (2013, November 18). Correlation Basics [Video file]. Retrieved from

Correlation Basics (Links to an external site.)

Stoltenberg, C. D., & Pace, T. M. (2007). The scientist-practitioner model: Now more than ever. Journal of Contemporary Psychotherapy, 37(4), 195-203. doi:10.1007/s10879-007-9054-0

Svensson, C. (2014). Qualitative methodology in unfamiliar cultures: Relational and ethical aspects of fieldwork in Malaysia. London: SAGE Publications Ltd doi: 10.4135/978144627305014533923

Trochim, W. M. K. (2006). Research methods: Knowledge base. Available at

http://www.socialresearchmethods.net/kb/ (Links to an external site.)

Tsene, L. (2016). Qualitative multi-method research: Media social responsibility. London: SAGE Publications Ltd. doi: 10.4135/978144627305015595393

Kim Steele/Photodisc/Getty Images

chapter

Survey Research—Describing

and Predicting Behavior

Chapter Contents

• Introduction to Survey Research
• Designing

Questionnaires

• Sampling From the Population
• Analyzing Survey Data
• Ethical Issues in Survey Research

CO_

new85743_04_c04_169-212.indd 169 6/18/13 12:02 PM

CHAPTER 4Section

4.1 Introduction to Survey Research

In a highly influential book published in the 1960s, the sociologist Erving Goffman (1963) defined stigma as an unusual characteristic that triggers a negative evaluation. In his words, “The stigmatized person is one who is reduced in our minds from a
whole and usual person to a tainted, discounted one” (1963, p. 3). People’s beliefs about
stigmatized characteristics exist largely in the eye of the beholder but have substantial
influence on social interactions with the stigmatized (see Snyder, Tanke, & Berscheid,
1977). A large research tradition in psychology has been devoted to understanding both
the origins of stigma and the consequences of being stigmatized. According to Goffman
and others, the characteristics associated with the greatest degree of stigma have three
features in common: They are highly visible, they are perceived as controllable, and they
are misunderstood by the public.

Recently, researchers have taken considerable interest in people’s attitudes toward mem-
bers of the gay and lesbian community. Although these attitudes have become more posi-
tive over time, this group still encounters harassment and other forms of discrimination
on a regular basis (see National Gay Task Force, 1984). One of the top recognized experts
on this subject is Gregory Herek, professor of psychology at the University of Califor-
nia at Davis (http://psychology.ucdavis.edu/herek/). In a 1988 article, Herek conducted
a survey of heterosexuals’ attitudes toward both lesbians and gay men, with the goal
of understanding the predictors of negative attitudes. Herek approached this research
question by constructing a scale to measure attitudes toward these groups. In three stud-
ies, participants were asked to complete this attitude measure, along with other existing
scales assessing attitudes about gender roles, religion, and traditional ideologies.

Herek’s (1988) research revealed that, as hypothesized, heterosexual males tended to hold
more negative attitudes about gay men and lesbians than heterosexual females. However,
the same psychological mechanisms seemed to explain the prejudice in both genders. That
is, negative attitudes were associated with increased religiosity, more traditional beliefs
about family and gender, and fewer experiences actually interacting with gay men and
lesbians. These associations meant that Herek could predict people’s attitudes toward gay
men and lesbians based on knowing their views about family, gender, and religion, as well
as their past interactions with the stigmatized group. Herek’s primary contribution to the
literature in this paper was the insight that reducing stigma toward gay men and lesbians
“may require confronting deeply held, socially reinforced values” (1988, p. 473). And this
insight was possible only because people were asked to report these values directly.

4.1 Introduction to Survey Research

Whether you are aware of it or not, you have been encountering survey research for most of your life. Every time your telephone rings during dinnertime, and the person on the other end of the line insists on knowing your household
income and favorite brand of laundry detergent, he or she is helping to conduct survey
research. When news programs try to predict the winner of an election two weeks early,
these reports are based on survey research of eligible voters. In both cases, the researcher
is trying to make predictions about the products people buy or the candidates they will
elect based on people’s reports of their own attitudes, feelings, and behaviors.

TX_

new85743_04_c04_169-212.indd 170 6/18/13 12:02 PM

http://psychology.ucdavis.edu/herek/

171

CHAPTER 4Section 4.1 Introduction to Survey Research

Surveys can be used in a variety of contexts and are most appropriate for questions
that involve people describing their attitudes, their behaviors, or a combination of the
two. For example, if you wanted to examine the predictors of attitudes toward the death
penalty, you could ask people their opinions on this topic and also ask them about their
political party affiliation. Based on these responses, you could test whether political affil-
iation predicted attitudes toward the death penalty. Or, imagine you wanted to know
whether students who spent more time studying were more likely to do well on their
exams. This question could be answered using a survey that asked students about their
study habits and then tracked their exam grades. We will return to this example near the
end of the chapter as we discuss the process of analyzing survey data to test our hypoth-
eses about predictions.

The common thread running through these two examples is that they require people to
report either their thoughts (e.g., opinions about the death penalty) or their behaviors
(e.g., the hours they spend studying). Thus, in deciding whether a survey is the best fit
for your research question, the key is to consider whether people will be both able and
willing to report these things accurately. We will expand on both of these issues in the
next section.

In this chapter, we continue our journey along the continuum of control, moving on to
survey research, in which the primary goal is either describing or predicting attitudes and
behavior. For our purposes, survey research refers to any method that relies on people’s
reports of their own attitudes, feelings, and behaviors. So, for example, in Herek’s (1988)
study, the participants reported their attitudes toward lesbians and gay men, rather than
these attitudes being somehow directly observed by the researchers. Compared with the
qualitative and descriptive designs for observing behavior we discussed in Chapter 3,
survey research tends to yield more control over both data collection and question con-
tent. Thus, survey research falls somewhere between quantitative descriptive research
(Chapter 3) and the explanatory research involved in experimental designs (Chapter 5).
This chapter provides an overview of survey research from conceptualization through
analysis. We will cover the types of research questions that are best suited to survey
research and provide an overview of the decisions to consider in designing and conduct-
ing a survey study. We will then cover the process of data collection, with a focus on
selecting the people who will complete your survey. Finally, we will cover the three most
common approaches for analyzing survey data, bringing us back full circle to addressing
our research questions.

Distinguishing Features of Surveys

Survey research designs have three distinguishing features that set them apart from other
designs. First, all survey research relies on either written or verbal self-reports of peo-
ple’s attitudes, feelings, and behaviors. This means that researchers will ask participants
a series of questions and record their responses. This approach has several advantages,
including being relatively straightforward and allowing access to psychological processes
(e.g., “Why do you support candidate X?”). However, researchers are also cautious in their
interpretation of self-reported data because participants’ responses can reflect a combina-
tion of their true attitude and their concern over how this attitude will be perceived. Scien-
tists refer to this as social desirability, which means that people may be reluctant to report

new85743_04_c04_169-212.indd 171 6/18/13 12:02 PM

CHAPTER 4Section 4.1 Introduction to Survey Research

unpopular attitudes. So if you were to ask people their attitudes about different racial
groups, their answers might reflect both their true attitude and their desire not to appear
racist. We return to the issue of social desirability and discuss some tricks for designing
questions that can help to sidestep these concerns and capture respondents’ true attitudes.

The second distinguishing feature of survey research is that it has the ability to access
internal states that cannot be measured through direct observation. In our discussion of
observational designs in Chapter 3, we learned that one of the limitations of these designs
was a lack of insight into why people do what they do. Survey research is able to address
this limitation directly: By asking people what they think, how they feel, and why they
behave in certain ways, researchers come closer to capturing the underlying psychologi-
cal processes.

However, people’s reports of their internal states should be taken with a grain of salt,
for three reasons. First, as mentioned, these reports may be biased by social desirability
concerns, particularly when unpopular attitudes are involved. Second, there is a large
literature in social psychology suggesting that people may not be very accurate at under-
standing the true reasons for their behavior. In a highly cited review paper, psychologists
Richard Nisbett and Tim Wilson (1977) argued that we make poor guesses about why we
do things, and those guesses are based more on our assumptions than on any real intro-
spection. Thus, survey questions can provide access to internal states, but these should
always be interpreted with caution. Third, on a more practical note, survey research allows
us to collect large amounts of data with relatively little effort and few resources. However,
their actual efficiency depends on the decisions made during the design process. In reality,
efficiency is often in a delicate balance with the accuracy and completeness of the data.

Broadly speaking, survey research can be conducted using either verbal or written self-
reports (or a combination of the two). Before we dive into the details of writing and for-
matting a survey, it is important to understand the pros and cons of administering your
survey as an interview (i.e., an oral survey) or a questionnaire (i.e., a written survey).

Interviews

An interview involves an oral question-and-answer exchange between the researcher and
the participant. This exchange can take place either face-to-face or over the phone. So
our telemarketer example from earlier in the chapter represents an interview because the
questions are asked orally, via phone. Likewise, if you are approached in a shopping mall
and asked to answer questions about your favorite products, you are experiencing a sur-
vey in interview form because the questions are administered out loud. And, if you have
ever taken part in a focus group, in which a group of people gives their reactions to a new
product, the researchers are essentially conducting an interview with the group. (For a
more in-depth discussion of focus groups and other interview techniques, see Chapter 3,
Section 3.2, Qualitative Research Interviews.)

Interview Schedules
Regardless of how the interview is administered, the interviewer (i.e., the researcher) has a
predetermined plan, or script, for how the interview should go. This plan, or script, for the
progress of the interview is known as an interview schedule. When conducting an inter-
view—including those telemarketing calls—the researcher/interviewer has a detailed

new85743_04_c04_169-212.indd 172 6/18/13 12:02 PM

173

CHAPTER 4Section 4.1 Introduction to Survey Research

plan for the order of questions to be asked, along with follow-up questions depending on
the participant’s responses.

Broadly speaking, there are two types of interview schedules. A linear (also called “struc-
tured”) schedule will ask the same questions in the same order for all participants. In
contrast, a branching schedule unfolds more like a flowchart, with the next question
dependent on participants’ answers. A branching schedule is typically used in cases with
follow-up questions that make sense only for some of the participants. For example, you
might first ask people whether they have children; if they answer “yes,” you could then
follow up by asking how many.

One danger in using a branching schedule is that it is based partly on your assumptions
about the relationships between variables. Granted, it is fairly uncontroversial to ask only
people with children to indicate how many children they have. But imagine the follow-
ing scenario in which you first ask participants for their household income, and then ask
about their political donations:

• “How much money do you make? $18,000? Okay, how likely are you to donate
money to the Democratic Party?”

• “How much money do you make? $250,000? Okay, how likely are you to donate
money to the Republican Party?”

The assumption implicit in the way these questions branch is that wealthier people are
more likely to be Republicans and less wealthy people to be Democrats. This might be
supported by the data or it might not. But by planning the follow-up questions in this
way, you are unable to capture cases that do not fit your stereotypes (i.e., the wealthy
Democrats and the poor Republicans). The lesson here is to be careful about letting your
biases shape the data collection process, as this can create invalid or inaccurate findings.

Advantages and Disadvantages of Interviews
Spoken interviews have a number of
advantages over written surveys. For
one, people are often more motivated
to talk than they are to write. Let’s
say that an undergraduate research
assistant is dispatched to a local
shopping mall to interview people
about their experiences in romantic
relationships. The researcher may
have no trouble at all recruiting par-
ticipants, many of whom will be
eager to divulge the personal details
about their recent relationships. But
for better or for worse, these experi-
ences will be more difficult for the
researcher to capture in writing.
Related to this observation, people’s
oral responses are typically richer
and more detailed than their written

Alina Solovyova-Vincent/E1/Getty Images

Conducting interviews may allow a researcher to gather
more detailed and richer responses.

new85743_04_c04_169-212.indd 173 6/18/13 12:02 PM

174

CHAPTER 4Section 4.1 Introduction to Survey Research

responses. Think of the difference between asking someone to “Describe your views on
gun control” versus “Indicate on a scale of 1 to 7 the degree to which you support gun con-
trol.” The former is more likely to capture the richness and subtlety involved in people’s
attitudes about guns.

On a practical note, using an interview format also allows you to ensure that respondents
understand the questions. If written questionnaire items are poorly worded, people are
forced to guess at your meaning, and these guesses introduce a big source of error vari-
ance (variance from random sources that are irrelevant to the trait or ability the question-
naire is purporting to measure). But if an interview question is poorly asked, people find
it much easier to ask the interviewer to clarify. Finally, using an interview format allows
you to reach a broader cross-section of people and to include those who are unable to read
and write—or, perhaps, unable to read and write the language of your survey.

Interviews also have three clear disadvantages compared with written surveys. First,
interviews are more costly in terms of both time and money. It certainly used more of my
time to go to a shopping mall than it would have taken to mail out packets of surveys
(but no more money—these research assistant gigs tend to be unpaid!). Second, the inter-
view format allows many opportunities to glean personal bias from the interview. These
biases are unlikely to be deliberate, but participants can often pick up on body language
and subtle facial expressions when the interviewer disagrees with their answers. These
cues may lead them to shape their responses in order to make the interviewer happier
(the influence of social desirability again). Third, interviews can be difficult to score and
interpret, especially with open-ended questions. Although administering them may be
easy, scoring them is relatively more complicated, often involving subjectivity or bias in
the interpretation. Because the researcher often has to make judgments based on personal
beliefs about the quality of the response, multiple raters are generally used to score the
responses in order to minimize bias.

The best way to understand the pros and cons of interviewing is that both are a con-
sequence of personal interactions. The interaction between interviewer and interviewee
allows for richer responses but also the potential for these responses to be biased. As a
researcher, you have to weigh these pros and cons and decide which method is the best fit
for your survey. In the next section, we turn our attention to the process of administering
surveys in writing.

Questionnaires

A questionnaire is a survey that involves a written question-and-answer exchange
between the researcher and the participant. The questionnaire can be in open-ended for-
mat (e.g., the participant writes in his or her answer) or forced-choice response format
(e.g., the participant selects from a set of responses, such as with multiple choice ques-
tions, rating scales, or true/false questions), which will be discussed later in this chapter.
The exchange is a bit different from what we saw with interview formats. In written sur-
veys, the questions are designed ahead of time and then given to participants, who write
their responses and return the questionnaire to the researcher. In the next section, we will
discuss details for designing these questions. But before we get there, let’s take a quick
look at the process of administering written surveys.

new85743_04_c04_169-212.indd 174 6/18/13 12:02 PM

CHAPTER 4Section 4.1 Introduction to Survey Research

Distribution Methods
Questionnaires can be distributed in three primary ways, each with its own pattern of
advantages and disadvantages.

Distributing by Mail

Until recently, one common way to distribute surveys was to send paper copies through
the mail to a group of participants (see Section 4.3, Sampling From the Population, for
more discussion on how this group is selected). Mailing surveys is relatively inexpensive
and relatively easy to do, but is unfortunately one of the worst methods when it comes to
response rates. People tend to ignore questionnaires they receive in the mail, dismissing
them as one more piece of junk. There are a few tricks available to researchers to increase
response rates, including providing incentives, making the survey interesting, and mak-
ing it as easy as possible to return the results (e.g., with a postage-paid envelope). How-
ever, even using all of these tricks, researchers consider themselves extremely lucky to
get a 30% response rate from a mail survey. That is, if you mail 1,000 surveys, you will be
doing well to receive 300 back. Because of this low return on investment, researchers have
begun using other methods for their written surveys.

Distributing in Person

Another option is to distribute a written survey in person, simply handing out copies
and asking participants to fill them out on the spot. This method is certainly more time-
consuming, as a researcher has to be stationed for long periods of time in order to collect
data. In addition, people are less likely to answer the questions honestly because the
presence of a researcher makes them worry about social desirability. Last, the sample
for this method is limited to people who are in the physical area at the time that ques-
tionnaires are being handed out. As we will discuss later, this might lead to problems in
the composition of the sample. On the plus side, however, this method tends to result in
higher compliance rates because it is harder to say no to someone face-to-face than it is
to ignore a piece of mail.

Distributing Onlin

Over the past 20 years, Internet, or Web–based, surveys have become increasingly
common. In Web-based survey research, the questionnaire is designed and posted on a
Web page, to which participants are directed in order to complete the questionnaire. The
advantages of online distribution are clear: This method is easiest for both researchers
and participants and may give people a greater sense of anonymity, thereby encouraging
more honest responses. In addition, response times are faster and the data are easier to
analyze because they are already in digital format. The disadvantages include the fol-
lowing: Specific groups being underrepresented because they do not have access to the
Internet, the researcher has little to no control over sample selection, and the researcher
receives responses only from those who are interested in the topic—so-called self-
selection bias. All these limitations could raise questions about the validity and reliability
of the data collected. In addition, several ethical issues might arise regarding informed con-
sent and the privacy of participants. So when considering conducting Web-based surveys,
researchers should evaluate all the advantages and disadvantages, as well as any ethical or
legal implications.

new85743_04_c04_169-212.indd 175 6/18/13 12:02 PM

CHAPTER 4Section

4.2 Designing Questionnaires

For readers interested in more information on designing and conducting Internet research,
Sam Gosling and John Johnson’s 2010 book Advanced Methods for Conducting Online
Behavioral Research is an excellent resource. In addition, several groups of psychological
researchers have been attempting to understand the psychology of Internet users. (You
can read about recent studies on this website: http://www.spring.org.uk/2010/10/internet
-psychology.php.)

Advantages and Disadvantages of Questionnaires
Just as with interview methods, written questionnaires have their own set of advantages
and disadvantages. Written surveys allow researchers to collect large amounts of data
with little cost or effort, and they can offer a greater degree of anonymity than interviews.
Anonymity can be a particular advantage in dealing with sensitive or potentially embar-
rassing topics. That is, people may be more willing to answer a questionnaire about their
alcohol use or their sexual history than they would be to discuss these things face-to-face
with an interviewer. On the downside, written surveys miss out on the advantages of
interviews because no one is available to clarify confusing questions or to gather more
information as needed. Fortunately, there is one relatively easy way to minimize this prob-
lem: Write questions (and response choices, if using multiple choice or forced choice for-
mats) that are as clear as possible. In the next section, we turn our attention to the process
of designing questionnaires.

4.2 Designing Questionnaires

One of the most important elements when conducting survey research is deciding how to construct and assemble the questionnaire items. In some cases, you will be able to use questionnaires that other researchers have developed in order to
answer your research questions. For example, many psychology researchers use standard
scales that measure behavior or personality traits, such as self-esteem, prejudice, depres-
sion, or stress levels. The advantage of these ready-made measures is that other people
have already gone to the trouble of making sure they are valid and reliable. So if you
are interested in the relationship between stress and depression, you could distribute the
Perceived Stress Scale (Cohen, Kamarck, & Mermelstein, 1983) and the Beck Depression
Inventory (Beck, Steer, Ball, & Ranieri, 1996) to a group of participants and move on to
the fun part of data analyses. For further discussion, see Chapter 2, Section 2.2, Reliability
and Validity.

However, in many cases there is no perfect measure for your research question—
either because no one has studied the topic before or because the current measures do
not accurately assess the construct of interest you are investigating. When this happens,
you will need to go through the process of designing your own questionnaire. In this
section, we discuss strategies for writing questions and choosing the most appropriate
response format.

new85743_04_c04_169-212.indd 176 6/18/13 12:02 PM

The Online Society: 50 Internet Psychology Studies

177

CHAPTER 4Section 4.2 Designing Questionnaires

Five Rules for Designing Better Questionnaires

Each of the following rules is designed to make your questions as clear and easy to under-
stand as possible in order to minimize the potential for error variance. We discuss each
one and illustrate them with contrasting pairs of items, consisting of “bad” items that do
not follow the rule, and then “better” items that do.

1. Use simple language. One of the simplest and most important rules to keep in
mind is that people have to be able to understand your questions. This means
you should avoid jargon and specialized language whenever possible.

BAD: “Have you ever had an STD?”

BETTER: “Have you ever had a sexually transmitted disease?”

BAD: “What is your opinion of the S-CHIP?”

BETTER: “What is your opinion of the State Children’s Health Insurance
Program?”

It is also a good idea to simplify the
language as much as possible so that
people spend time answering the ques-
tion rather than trying to decode your
meaning. For example, words like
assist and consider can be replaced with
words like help and think. This may
seem odd—or perhaps even conde-
scending to your participants—but it is
always better to err on the side of sim-
plicity. Remember, if people are forced
to guess at the meaning of your ques-
tions, these guesses add error variance
to their answers.

In addition, when developing and
administering surveys to various pop-
ulations, it is important to remember
to use language and examples that are

not culturally or linguistically biased. For example, participants from various cultures
may not be familiar with questions involving the historical development of the United
States or the nature of gender roles in the United States. Thus, with the changing U.S.
population and the different languages being spoken in the home, it is important to use
language that does not discriminate against particular populations. Also, it is advisable to
avoid slang at all costs.

iStockphoto/Thinkstock

Simple language is one characteristic of an effective
questionnaire.

new85743_04_c04_169-212.indd 177 6/18/13 12:02 PM

CHAPTER 4Section 4.2 Designing Questionnaires

2. Be precise. Another way to ensure that people understand the question is to be
as precise as possible in your wording. Questions that are ambiguous in their
wording will introduce an extra source of error variance into your data because
people may interpret these questions in varying ways.

BAD: “What kind of drugs do you take?” (Legal drugs? Illegal drugs? Now?
In college?)

BETTER: “What kind of prescription drugs are you currently taking?”

BAD: “Do you like sports?” (Playing? Watching? Which sports?)

BETTER: “How much do you like watching basketball on television?”

3. Use neutral language. It is important that your questions be designed to mea-
sure your participants’ attitudes, feelings, or behaviors rather than to manipu-
late their responses. That is, you should avoid leading questions, which are
written in such a way that they suggest an answer.

BAD: “Do you beat your children?” (Clearly, beating isn’t good; who would
say yes?)

BETTER: “Is it acceptable to use physical forms of discipline?”

BAD: “Do you agree that the president is an idiot?” (Hmmm . . . I wonder
what the researcher thinks. . . .)

BETTER: “How would you rate the president’s job performance?”

This guideline can also be used to sidestep social desirability concerns. If you suspect
that people may be reluctant to report holding an attitude such as using corporal punish-
ment with their children, it helps to phrase the question in a nonthreatening way—“using
physical forms of discipline” versus “beating your children.” Many current measures of
prejudice adopt this technique. For example, McConahay’s Modern Racism Scale contains
items such as “Discrimination against Blacks is no longer a problem in the United States”
(McConahay, 1986). People who hold prejudicial attitudes are more likely to agree with
statements like this one than with more blunt ones like “I hate people from Group X.”

4. Ask one question at a time. One remarkably common error that people make
in designing questions is to include a double-barreled question, which fires off
more than one question at a time. When you fill out a new patient question-
naire at your doctor’s office, these forms often ask whether you suffer from
“headaches and nausea.” What if you suffer from only one of these? Or what if
you have a lot of nausea and only an occasional headache? The better approach
is to ask about each of these symptoms separately.

BAD: “Do you suffer from pain and numbness?”

BETTER: “How often do you suffer from pain?” “How often do you suffer from
numbness?”

BAD: “Do you like watching football and boxing?”

BETTER: “How much do you enjoy watching professional football on TV?”
“How much do you enjoy watching boxing on TV?”

new85743_04_c04_169-212.indd 178 6/18/13 12:02 PM

179

CHAPTER 4Section 4.2 Designing Questionnaires

5. Avoid negations. One final and simple way to clarify your questions is to
avoid questions with negative statements because these can often be difficult
to understand. In the following examples, the first is admittedly a little silly,
but the second comes from a real survey of voter opinions.

BAD: “Do you never not cheat on your exams?” (Wait—what? Do I cheat?
Do I not cheat? What is this asking?)

BETTER: “Have you ever cheated on an exam?”

BAD: “Are you against rejecting the ban on pesticides?” (Wait—so am I for
the ban? Against the ban? What is this asking?)

BETTER: “Do you support the current ban on pesticides?”

Participant Response Options

In this section, we turn our attention to the issue of deciding how participants should
respond to your questions. The decisions you make at this stage affect the type of data
you ultimately collect, so it is important to choose carefully. We also review the primary
decisions you will need to make about response options, as well as the pros and cons of
each one.

One of the first choices you have to make is whether to collect open-ended or fixed-
format responses. As the names imply, fixed-format responses require participants
to choose from a list of options (e.g., “Pick your favorite color”), whereas open-ended
responses ask participants to provide unstructured responses to a question or statement
(e.g., “How do you feel about legalizing marijuana?”). Open-ended responses tend to
be richer and more flexible but harder to translate into quantifiable data—analogous to
the trade-off we discussed in comparing written versus oral survey methods. To put it
another way, some concepts are difficult to reduce to a seven-point fixed-format scale, but
these scales are easier to analyze than a paragraph of free-flowing text.

Another reason to think carefully about this decision is that fixed-format responses will,
by definition, restrict people’s options in answering the question. In some cases, these
restrictions can even act as leading questions. In a study of people’s perceptions of his-
tory, James Pennebaker and his colleagues (2006) asked respondents to indicate the “most
significant event over the last 50 years.” When this was asked in an open-ended way (i.e.,
“list the most significant event”), 2% of participants listed the invention of computers. In
another version of the survey, this question was asked in a fixed-format way (i.e., “choose
the most significant event”). When asked to select from a list of four options (World War II,
Invention of Computers, Tiananmen Square, or Man on the Moon), 30% chose the inven-
tion of computers! In exchange for having easily coded data, the researchers accidentally
forced participants into a smaller number of options and ended up with a skewed sense
of the importance of computers in people’s perceptions of history.

Fixed-Format Options
Although fixed-format responses can sometimes constrain or skew participants’ answers,
the reality is that researchers tend to use them more often than not. This decision is
largely practical; fixed-format responses allow for more efficient data collection from a

new85743_04_c04_169-212.indd 179 6/18/13 12:02 PM

180

CHAPTER 4Section 4.2 Designing Questionnaires

much larger sample. (Imagine the chore of having to hand-code 2,000 essays!) But once
you have decided on this option for your questionnaire, the decision process is far from
over. In this section, we discuss three possibilities you can use to construct a fixed-format
response scale.

True/False

One option is to ask questions using a true/false format, which asks participants to
indicate whether they endorse a statement. For example:

“I attended church last Sunday” True False

“I am a U.S. citizen” True False

“I am in favor of abortion” True False

This last example may strike you as odd, and in fact illustrates an important point about
the use of true/false formats: They are best used for statements of fact rather than attitudes.
It is relatively straightforward to indicate whether you attended church or whether you
are a U.S. citizen. However, people’s attitudes toward abortion are often complicated—
one can be “pro-choice” but still support some common-sense restrictions, or “pro-life”
but support exceptions (e.g., in cases of rape). The point is that a true/false question can-
not even come close to capturing this complexity. However, for survey items that involve
simple statements of fact, the true/false format can be a good option.

Multiple Choice

A second option is to use a multiple-choice format, which asks participants to select
from a set of predetermined responses.

“Which of the following is your favorite fast-food restaurant?”
a. McDonald’s
b. Burger King
c. Wendy’s
d. Taco Bell

“Who did you vote for in the 2008 presidential election?”
a. John McCain
b. Barack Obama

“How do you travel to work on most days? (Select all that apply.)”
a. drive alone
b. carpool
c. take public transportation

As you can see in these examples, multiple-choice questions give you quite a bit of free-
dom in both the content and the response scaling of your questions. You can ask partici-
pants either to select one answer or, as in the last example, to select all applicable answers.
You can cover everything from preferences (e.g., favorite fast-food restaurant) to behav-
iors (e.g., how you travel to work).

new85743_04_c04_169-212.indd 180 6/18/13 12:02 PM

181

CHAPTER 4Section 4.2 Designing Questionnaires

In addition to assessing preferences or behaviors, as in the preceding examples, multiple-
choice questions are often used to examine knowledge and abilities. For example, intel-
ligence and achievement tests utilize multiple-choice questions to assess the cognitive and
academic abilities of individuals. In these cases, there is only one correct response choice
and three to four incorrect or “distracter” response choices. Most research indicates that
there should be at least three but no more than four incorrect response choices. Providing
more than four incorrect response choices can make the question too difficult. It is also
important that the correct response choice not obviously differ from the incorrect choices.
Thus, all response choices, both correct and incorrect, should be approximately the same
length and be plausible choices. If the incorrect response choices are obviously wrong, this
will make the item too easy, which will affect its validity. In addition, including obviously
wrong answers allows those individuals who do not know the answer to deduce the cor-
rect response.

Regardless of whether you are developing multiple-choice questions to assess behaviors,
knowledge, or abilities, all questions should be clear, unambiguous, and brief so that they
can be easily understood. In addition, as with all types of question formats, it is best to
avoid negative questions such as, “Which of the following is not correct?” because they
can be confusing to test-takers and cause the question to be invalid (i.e., not measure what
it is supposed to be measuring). Finally, it is best practice to avoid response choices that
include “All of the above” or “None of the above.” Most individuals know that when such
response choices are provided, they are usually the correct answer.

You may have already spotted a downside to multiple-choice formats. Whenever you pro-
vide a set of responses, you are restricting participants to those choices. This is the problem
that Pennebaker and colleagues encountered when asking people about the most signifi-
cant events of the last century. In each of the preceding examples, the categories fail to cap-
ture all possible responses. What if your favorite restaurant is In-and-Out Burger? What if
you voted for Ralph Nader? What if you telecommute or ride your bicycle to work? There
are two relatively easy ways to avoid (or at least minimize) this problem. First, plan care-
fully when choosing the response options. During the design process, it helps to brain-
storm with other people to ensure you are capturing the most likely responses. However,
in many cases, it is almost impossible to provide every option that people might think of.
The second solution is to provide an “other” response to your multiple-choice question.
This allows people to write in an option that you neglected to include. For example, our
last question about traveling to work could be rewritten as follows:

“How do you travel to work on most days? (Select all that apply.)”
a. drive alone
b. carpool
c. take public transportation
d. other (please specify):__________________

This way, people who telecommute, bicycle, or even ride their trained pony to work will
have a way to respond rather than skipping the question. And, if you start to notice a pat-
tern in these write-in responses (e.g., 20% of people adding “bicycle”), then you will have
gained valuable knowledge to improve the next iteration of the survey.

new85743_04_c04_169-212.indd 181 6/18/13 12:02 PM

182

CHAPTER 4Section 4.2 Designing Questionnaires

Rating Scales

Last, but certainly not least, another option is to use a rating scale format, which asks
participants to respond on a scale that represents a continuum.

“Sometimes it is necessary to sacrifice liberty in the name of security.”

1 2 3 4 5

strongly agree agree neither agree nor disagree disagree strongly disagree

“I would vote for a candidate who supported the death penalty.”

1 2 3 4 5

Always often about half of the time seldom never

“The political party in power right now has messed things up.”

1 2 3 4 5
strongly agree agree neither agree nor disagree disagree strongly disagree

This format is well suited to capturing attitudes and opinions and, in fact, is one of the
most common approaches to attitude research. Rating scales are easy to score, and they
give participants some flexibility in indicating their agreement with or endorsement of
the questions. As a researcher, you have two critical decisions to make about the construc-
tion of rating scale items. Both have implications for how you will analyze and interpret
your results.

First, you’ll need to decide on the anchors, or labels, for your response scale. Rating scales
offer a good deal of flexibility in these anchors, as you can see in the preceding examples.
You can frame questions in terms of “agreement” with a statement or “likelihood” of a
behavior; alternatively, you can customize the anchors to match your question (e.g., “not
at all necessary”). Scales that use anchors of strongly agree and strongly disagree are also
referred to as Likert scales. At a fairly simple level, the choice of labels affects the interpre-
tation of the results. For example, if you asked the “political party” question, you would
have to be aware that the anchors were phrased in terms of agreement with the state-
ment. In discussing these results, you would be able to discuss how much people agreed
with the statement, on average, and whether agreement correlated with other factors. If
this seems like an obvious point, you would be amazed at how often researchers (or the
media) will take an item like this and spin the results to talk about the “likelihood of vot-
ing” for the party in power—confusing an attitude with a behavior! So, in short, make
sure you are being honest when presenting and interpreting research data.

new85743_04_c04_169-212.indd 182 6/18/13 12:02 PM

183

CHAPTER 4Section 4.2 Designing Questionnaires

At a more conceptual level, you need to decide whether the anchors for your rating scale
make use of a bipolar scale, which has polar opposites at its endpoints and a neutral point
in the middle, or a unipolar scale, which assesses the presence or absence of a single con-
struct. The difference between these scales is best illustrated by an example:

Bipolar: How would you rate your current mood?

1 2 3 4 5 6 7

very happy happy slightly happy neither happy nor sad slightly sad sad very sad

Unipolar: How would you rate your current mood?

1 2 3 4 5

not at all sad slightly sad moderately sad very sad completely sad

1 2 3 4 5

not at all happy slightly happy moderately happy very happy completely happy

With the bipolar option, participants are asked to place themselves on a continuous scale
somewhere between sad and happy, which are polar opposites. The assumption in using a
bipolar scale is that the endpoints represent the only two options—participants can be sad,
happy, or somewhere in between. In contrast, with the unipolar option, participants are
asked to rate themselves on a continuous scale, indicating their level of either sadness or
happiness. The assumption in using a pair of unipolar scales is that it is possible to experi-
ence varying degrees of each item: For example, participants can be moderately happy but
also a little bit sad. The decision to use a bipolar or a unipolar scale comes down to the
context. What is the most logical way to think about these constructs? What have previous
researchers done?

In the 1970s, Sandra Lipsitz Bem revolutionized the way researchers thought about gen-
der roles by arguing against a bipolar approach. Previously, gender role identification had
been measured on a bipolar scale from “masculine” to “feminine,” the assumption being
that a person could be one or the other. Bem (1974) argued instead that people could eas-
ily have varying degrees of masculine and feminine traits. Her scale, the Bem Sex Role
Inventory, asks respondents to rate themselves on a set of 60 unipolar traits. Someone
with mostly feminine and hardly any masculine traits would be described as “feminine.”
Someone with high ratings on both masculine and feminine traits would be described as
“androgynous.” And, someone with low ratings on both masculine and feminine traits
would be described as “undifferentiated.” You can view and complete Bem’s scale online
at this website: http://garote.bdmonkeys.net/bsri.html.

new85743_04_c04_169-212.indd 183 6/18/13 12:02 PM

http://garote.bdmonkeys.net/bsri.html

184

CHAPTER 4Section 4.2 Designing Questionnaires

The second critical decision in constructing a rating scale item is to decide on the number
of points in the response scale. You may have noticed that all the examples in this section
have an odd number of points (e.g., five or seven). This is usually preferable for rating
scale items because the middle of the scale (e.g., “3” or “4”) allows respondents to give
a neutral, middle-of-the-road answer. That is, on a scale from strongly disagree to strongly
agree, the midpoint can be used to indicate “neither” or “I’m not sure.” However, in some
cases, you may not want to allow a neutral option in your scale. By using an even number
of points (e.g., four or six), you can essentially force people to either agree or disagree with
the statement; this type of scaling is referred to as forced choice.

So how many points should your scale have? As a general rule, more points will translate
into more variability in responses—the more choice people have, the more likely they are
to distribute their responses among those choices. From a researcher’s perspective, the big
question is whether this variability is meaningful. For example, if you wanted to assess
college students’ attitudes about a student fee increase, opinions will likely vary depend-
ing on the size of the fee and the ways in which it will be used. Thus, a five- or seven-
point scale would be preferable to a two-point (yes or no) scale. However, past a certain
point, increasing the scale range ceases to be linked to meaningful variation in attitudes.
In other words, the difference between a 5 and a 6 on a seven-point scale is fairly intuitive
for your participants to grasp. But what is the real difference between an 80 and an 81 on a
100-point scale? When scales become too large, you risk introducing another source of
error variance as participants impose their interpretations on the scaling. In sum, more
points do not always translate into a better scale.

Back to our question: How many points should you have? The ideal compromise sup-
ported by many statisticians is to use a seven-point scale for bipolar scales. The reason
has to do with the differences between scales of measurement. As you’ll remember from
our discussion in Chapter 2, the way variables are measured has implications for data
analyses. For the most popular statistical tests to be legitimate, variables need to lie on
either an interval scale (i.e., with equal intervals between points) or a ratio scale (i.e., with
a true zero point). Based on mathematical modeling research, statisticians have concluded
that the variability generated by a seven-point scale is most likely to mimic an interval
scale (see, e.g., Nunnally, 1978). So a seven-point scale is often preferable because it allows
us the most flexibility in data analyses. Note, however, that whereas seven-point scales
produce reliable results with bipolar scales, unipolar scales tend to perform the best with
five-point scales.

Finalizing the Questionnaire

Once you have finished constructing the questionnaire items, one last important step
remains before beginning to collect data. This section discusses a few guidelines for assem-
bling the items into a coherent questionnaire. The main issues at this stage are to think
carefully about the order of the individual items; how many items to include; whether to
include open-ended, fixed-format, or multiple choice questions; and writing the instruc-
tions in clear and concise language.

new85743_04_c04_169-212.indd 184 6/18/13 12:02 PM

185

CHAPTER 4Section 4.2 Designing Questionnaires

First, keep in mind that the first few questions will set the tone for the rest of the question-
naire. It is best to start with questions that are both interesting and nonthreatening to help
ensure that respondents complete the questionnaire with open minds. For example:

BAD OPENING: “Do you agree that your child’s teacher is incompetent?”
(threatening and also a leading question)

BETTER OPENING: “How would you rate the performance of your child’s
teacher?”

BAD OPENING: “Would you support a 1% sales tax increase?” (boring)

BETTER OPENING: “How do you feel about raising taxes to help fund education?”

Second, strive whenever possible to have continuity in the different sections of your ques-
tionnaire. Imagine you are constructing a survey to give to college freshmen—you might
have questions on family background, stress levels, future plans, campus engagement,
and so on. It is best to have the questions grouped by topic on the survey. So, for instance,
students would fill out a set of questions about future plans on one page and then a set
of questions about campus engagement on another page. This approach makes it eas-
ier for participants to progress through the questions without having to mentally switch
between topics.

Third, remember that individual questions are always read in context. This means that if
you start your college student survey with a question about plans for the future and then
ask about stress, respondents will likely have their future plans in mind when they think
about their level of stress. Another example is a graduate school that would administer a
gigantic survey packet to every student enrolled in its Introductory Psychology course.
One year, a faculty member included a measure of identity, asking participants to com-
plete the statements “I am ______” and “I am not ______.” As the students started to ana-
lyze data from this survey, they found that an astonishing 60% of students had filled in the
blank with “I am not a homosexual.” This response seemed pretty unusual until they real-
ized that the questionnaire immediately preceding this one in the packet was a measure of
prejudice toward gay and lesbian individuals. So, as these students completed the identity
measure, they had homosexuality on their minds and felt compelled to point out that they
were not homosexual. This proves once again that results can be skewed by the context.

Finally, once you have assembled a draft version of your questionnaire, do a test run.
This test run, called pilot testing, involves giving the questionnaire to a small sample of
people, getting their feedback, and making any necessary changes. One of the best ways
to pilot test is to find a patient group of friends to complete your questionnaire because
this group will presumably be willing to give more extensive feedback. Another effective
way to pilot test a questionnaire is to administer it to individuals in the target group. In
soliciting their feedback, you should ask questions like the following:

Was anything confusing or unclear?

Was anything offensive or threatening?

new85743_04_c04_169-212.indd 185 6/18/13 12:02 PM

186

CHAPTER 4Section 4.2 Designing Questionnaires

How long did the questionnaire take you to complete?

Did it get repetitive or boring? Did it seem too long?

Were there particular questions that you liked or disliked? Why?

The answers to these questions will give you valuable information to revise and clarify
your questionnaire before devoting the resources for a full round of data collection. In the
next section, we turn our attention to the question of how to find and select participants
for this stage of the research.

Research: Thinking Critically

“Beautiful People Convey Personality Traits Better During First Impressions”

Medical News Today

A new University of British Columbia study has found that people identify the personality traits of
people who are physically attractive more accurately than others during short encounters.

The study, published in the December [2010] edition of Psychological Science, suggests people pay
closer attention to people they find attractive, and is the latest scientific evidence of the advantages
of perceived beauty. Previous research has shown that individuals tend to find attractive people
more intelligent, friendly, and competent than others.

The goal of the study was to determine whether a person’s attractiveness impacts others’ ability to
discern their personality traits, says Prof. Jeremy Biesanz, UBC Dept. of Psychology, who coauthored
the study with PhD student Lauren Human and undergraduate student Genevieve Lorenzo.

For the study, researchers placed more than 75 male and female participants into groups of five to
11 people for three-minute, one-on-one conversations. After each interaction, study participants
rated partners on physical attractiveness and five major personality traits: openness, conscientious-
ness, extraversion, agreeableness, and neuroticism. Each person also rated his or her own personality.

Researchers were able to determine the accuracy of people’s perceptions by comparing participants’
ratings of others’ personality traits with how individuals rated their own traits, says Biesanz, adding
that steps were taken to control for the positive bias that can occur in self-reporting.

Despite an overall positive bias towards people they found attractive (as expected from previous
research), study participants identified the “relative ordering” of personality traits of attractive par-
ticipants more accurately than others, researchers found.

“If people think Jane is beautiful, and she is very organized and somewhat generous, people will see
her as more organized and generous than she actually is,” says Biesanz. “Despite this bias, our study
shows that people will also correctly discern the relative ordering of Jane’s personality traits—that
she is more organized than generous—better than others they find less attractive.”

The researchers say this is because people are motivated to pay closer attention to beautiful people
for many reasons, including curiosity, romantic interest, or a desire for friendship or social status.
“Not only do we judge books by their covers, we read the ones with beautiful covers much closer
than others,” says Biesanz, noting the study focused on first impressions of personality in social situ-
ations, like cocktail parties.

Although participants largely agreed on group members’ attractiveness, the study reaffirms that
beauty is in the eye of the beholder. Participants were best at identifying the personalities of people
they found attractive, regardless of whether others found them attractive.

(continued)

new85743_04_c04_169-212.indd 186 6/18/13 12:02 PM

187

CHAPTER 4Section

4.3 Sampling From the Population

Research: Thinking Critically (continued)
According to Biesanz, scientists spent considerable efforts a half-century ago seeking to determine
what types of people perceive personality best, to largely mixed results. With this study, the team
chose to investigate this long-standing question from another direction, he says, focusing not on who
judges personality best, but rather whether some people’s personalities are better perceived.

Think about it:

1. Suppose the following questions were part of the questionnaire given after the 3-minute
one-on-one conversations in this study. Based on the goals of the study and the rules dis-
cussed in this chapter, identify the problem with each of the following questions and suggest
a better item.

a. Jane is very neat.

1 2 3 4 5

strongly agree agree neither agree or disagree disagree strongly disagree

main problem:

better item:

b. Jane is generous and organized.

1 2 3 4 5
strongly agree agree neither agree or disagree disagree strongly disagree
main problem:
better item:

c. Jane is extremely attractive TRUE FALSE

main problem:
better item:

2. What are the strengths and weaknesses of using a fixed-format questionnaire in this study
versus open-ended responses?

3. The researchers state that they took steps to control for the “positive bias that can occur in self-
reporting.” How might social desirability influence the outcome of this particular study? What
might the researchers have done to reduce the effect of social desirability?

George, R. (2010, December 31). Beautiful people convey personality traits better during first impressions. Medical News Today.
Retrieved from http://www.medicalnewstoday.com/articles/212245.php

4.3 Sampling From the Population

By now, you should have a good feel for how to construct survey items. Once you have finalized your measures, the next step is to find a group of people to fill out the survey. But where do you find this group? And how many of them do you need? On
the one hand, you want as many people as possible in order to capture the full range of
attitudes and experiences. On the other hand, researchers have to conserve time and other

new85743_04_c04_169-212.indd 187 6/18/13 12:02 PM

http://www.medicalnewstoday.com/articles/212245.php

188

CHAPTER 4Section 4.3 Sampling From the Population

resources, which often means choosing a smaller sample of people. In this section, we will
examine the strategies researchers can use in selecting samples for their studies.

Researchers refer to the entire collection of people who could possibly be relevant for
a study as the population. For example, if you were interested in the effects of prison
overcrowding in this country, you would want to study the population of prisoners in the
United States. If you wanted to study voting behavior in the next U.S. presidential elec-
tion, your population would be United States residents eligible to vote. And if you wanted
to know how well college students cope with the transition from high school, your popu-
lation would include every college student who was graduated from high school and is
now enrolled in any college in the country.

You may have spotted an obvious practical complication with these populations. How on
earth are you going to get every college student, much less every prisoner, in the coun-
try to fill out your questionnaire? You can’t; instead, researchers will collect data from
a sample, a subset of the population. Instead of trying to reach all prisoners, you might
sample inmates from a handful of state prisons. Rather than attempt to survey all college
students in the country, researchers might restrict their studies to a collection of students
at one university.

The goal in choosing a sample for quantitative research is to make it as representative as
possible of the larger population. This is the goal, though it is not always practical. That
is, if you choose students at one university, they need to be reasonably similar to college
students elsewhere in the country. If the phrase “reasonably similar” sounds vague, this
is because the basis for evaluating a sample varies depending on the hypothesis and the
key variables of your study. For example, if you wanted to study the relationship between
family income and stress levels, you would need to make sure that your sample mirrored
the population in the distribution of income levels. Thus, a sample of students from a state
university might be a better choice than students from, say, Harvard (which costs about
$50,000 per year). On the other hand, if your research question dealt with the pressures
faced by students in selective private schools, then Harvard students could be a represen-
tative sample for your study.

Figure 4.1 is a conceptual illustration of both a representative and nonrepresentative sam-
ple, drawn from a larger population. The population in this case consists of 144 individu-
als, split evenly between Xs and Os. Thus, we would want our sample to come as close
as possible to capturing this 50/50 split. The sample of 20 individuals on the left is rep-
resentative of the sample because it is split evenly between Xs and Os. But the sample of
20 individuals on the right is nonrepresentative because it contains 75% Xs. Because there
are far fewer Os than we might expect in the right-hand population, this sample does not
accurately represent the population. This failure of the sample to represent the population
is also referred to as sampling bias.

new85743_04_c04_169-212.indd 188 6/18/13 12:02 PM

189

CHAPTER 4Section 4.3 Sampling From the Population

Figure 4.1: Representative and nonrepresentative samples of a population

So, where do these samples come from? As a researcher, you have two broad categories of
sampling strategies at your disposal: probability sampling and nonprobability sampling.

Probability Sampling

Probability sampling is used when each person in the population has a known chance
of being in the sample. This is possible only in cases where you know the exact size of
the population. For instance, the 2010 population of the United States was 308,745,538
(United States Census Bureau, 2010). If you were to have selected a U.S. resident at ran-
dom, each resident would have had a one in 308,745,538 chance of being selected. When-
ever you have information about total population, probability-sampling strategies are the
most powerful approach because they greatly increase the odds of getting a representative
sample. Within this broad category of probability sampling are three specific strategies:
simple random sampling, stratified random sampling, and cluster sampling.

Simple Random Sampling
Simple random sampling, the most straightforward approach, involves randomly pick-
ing study participants from a list of everyone in the population. The term for this list is
a sampling frame (e.g., imagine a list of every resident of the United States). To have a
truly representative random sample, several criteria need to be met: You must have a sam-
pling frame, you must choose from it randomly, and you must have a 100% response rate
from those you select. (As we discussed in Chapter 2, it can threaten the validity of your
hypothesis test if people drop out of your study.)

POPULATION
(50% X’s, 50% O’s)

X X XO O OXO OX XO OX XO OX

O O OX X XOX XO OX XO OX XO

X X XO O OXO OX XO OX XO OX
O O OX X XOX XO OX XO OX XO
X X XO O OXO OX XO OX XO OX
O O OX X XOX XO OX XO OX XO
X X XO O OXO OX XO OX XO OX
O O OX X XOX XO OX XO OX XO

X X X X X X X X X X

O O O O O O O O O O
X X X X X X X X X X

X X X X XO O O O O

REPRESENTATIVE SAMPLE
(50% X’s, 50% O’s)

NONREPRESENTATIVE
SAMPLE

(75% X’s, ONLY 25% O’s)

new85743_04_c04_169-212.indd 189 6/18/13 12:02 PM

190

CHAPTER 4Section 4.3 Sampling From the Population

Stratified Random Sampling
Stratified random sampling, a varia-
tion of simple random sampling, is
used when subgroups of the popu-
lation might be left out of a purely
random sampling process. Imagine
a city with a population that is 80%
Caucasian, 10% Hispanic, 5% Afri-
can American, and 5% Asian. If you
were to choose 100 residents at ran-
dom, the chances are very good that
your entire sample would consist
of Caucasian residents. As a result,
you would inadvertently ignore the
perspective of all ethnic minority
residents. To prevent this problem,
researchers use stratified random
sampling—breaking the sampling
frame into subgroups and then sam-
pling a random number from each
subgroup. In the preceding city
example, you could divide your list of residents into four ethnic groups and then pick a
random 25 from each of these groups. The result would be a sample of 100 people who
captured opinions from each ethnic group in the population.

Cluster Sampling
Cluster sampling, another variation of random sampling, is used when you do not have
access to a full sampling frame (i.e., a full list of everyone in the population). Imagine that
you wanted to do a study of how cancer patients in the United States cope with their ill-
ness. Because there is not a list of every cancer patient in the country, you have to get a little
creative with your sampling. The best way to think about cluster sampling is as “samples
within samples.” Just as with stratified sampling, you divide the overall population into
groups; however, cluster sampling is different in that you are dividing into groups based
on more than one level of analysis. In the cancer example, you could start by dividing the
country into regions, then randomly selecting cities from within each region, and then
randomly selecting hospitals from within each city, and finally randomly selecting cancer
patients from each hospital. The result would be a random sample of cancer patients from,
say, Phoenix, Miami, Dallas, Cleveland, Albany, and Seattle; taken together, these patients
would constitute a fairly representative sample of cancer patients around the country.

Nonprobability Sampling

The other broad category of sampling strategies is known as nonprobability sampling.
These strategies are used in the (remarkably common) case in which you do not know the
odds of any given individual being in the sample. This is an obvious shortcoming—if you
do not know the exact size of the population and do not have a list of everyone in it, there
is no way to know that your sample is representative! But despite this limitation, research-
ers use nonprobability sampling on a regular basis.

iStockphoto/Thinkstock

Stratified random sampling allows researchers to include
all subgroups of a population in a study.

new85743_04_c04_169-212.indd 190 6/18/13 12:02 PM

191

CHAPTER 4Section 4.3 Sampling From the Population

Nonprobability sampling is used in qualitative, quantitative, and mixed methods research
whose focus is on selecting relatively small samples in a purposeful manner rather than
selecting samples that are representative of the entire population. Unlike probability sam-
pling, nonprobability sampling does not include randomly selected, stratified samples
that provide for generalization. Rather, the procedures used to select samples are purpose-
ful; that is, the samples are selected to obtain information-rich cases that yield in-depth
insights. The sampling techniques used for quantitative and qualitative research probably
make up one of the biggest differences between the two methods. The following sections
will discuss some of the most common nonprobability strategies. All of the nonprobability
strategies described (except convenience sampling) are considered categories of purpo-
sive sampling, or sampling with a purpose. Convenience sampling is considered a form
of so-called accidental or haphazard (serendipitous) sampling, since the cases are selected
based on ready availability. Even so, convenience sampling can be purposeful, in that
researchers do their best to recruit in ways that are convenient and yet attract individuals
that meet meaningful criteria.

In many cases, it is not possible to obtain a sampling frame. When researchers study rare or
hard-to-reach populations, or investigate potentially stigmatizing conditions, they often
recruit by word of mouth. The term for this is snowball sampling—imagine a snowball
rolling down a hill, picking up more snow (or participants) as it goes. If you wanted to
study how often homeless people took advantage of social services, you would be hard-
pressed to find a sampling frame that listed the homeless population. Instead, you could
recruit a small group of homeless people and ask each of them to pass the word along to
others, and so on. The resulting sample is unlikely to be representative, but researchers
often have to compromise for the sake of obtaining access to a population.

One of the most popular nonprobability strategies is known as convenience sampling, or
simply enrolling people who show up for the study. Any time you see results of a viewer
poll on your favorite 24-hour news station, the results are likely based on a convenience
sample. CNN and Fox News do not randomly select from a list of their viewers; they post
a question on-screen or online, and people who are motivated enough to respond will do
so. For that matter, a large majority of psychology research studies are based on conve-
nience samples of undergraduate college students. Often, experimenters in psychology
departments advertise their studies on a bulletin board or website, and students sign up
for studies for extra cash or to fulfill a research requirement for a course. Students often
pick a particular study based on whether it fits their busy schedules or whether the adver-
tisement sounds interesting. Another example of convenience sampling is a researcher
seeking to collect information from human resources managers to study the issue of bul-
lying in organizations. The researcher might use a database of potential participants who
belong to one or more chapters of the Society of Human Resource Management (SHRM),
an organization for human resources professionals. In this example, the convenience com-
ponent of the sample is using a body of existing data; although these data are not repre-
sentative of all human resources managers or all organizations, they are a useful proxy for
a broad sample of human resource managers across industries.

Other popular nonprobability strategies include extreme or deviant case sampling, typ-
ical case sampling, heterogeneous sampling, expert sampling, criterion sampling, and
theory-based sampling. Whatever strategy you decide on, the goal here is to make you
mindful that all the decisions that you make as a researcher have inherent strengths
and weaknesses.

new85743_04_c04_169-212.indd 191 6/18/13 12:02 PM

192

CHAPTER 4Section 4.3 Sampling From the Population

Extreme or deviant case sampling is used when we want information-rich data on unusual
or special cases. For example, if we are interested in studying university diversity plans,
it would be important to examine plans that work exceptionally well, as well as plans
that have high expectations but are not working for some reason or another. The focus is
on examining outstanding successes and failures to learn lessons about those conditions.

On the other hand, sometimes researchers are not interested in learning about unusual
cases but rather want to learn about typical cases. Typical case sampling involves sam-
pling the most frequent or “normal” case. It is often used to describe typical cases to
people unfamiliar with a setting, program, or process. For example, a typical case sample
may be used to study the practicum experiences of psychology students from universities
that are rated average. One of the biggest drawbacks of this method involves the difficulty
of knowing how to identify or define a typical or normal case.

Heterogeneous sampling, which is also called maximum variation sampling, may include
both extreme and typical cases. It is used to select a wide variety of cases in relation to the
phenomenon being investigated. The idea here is to obtain a sample that includes diverse
characteristics from multiple dimensions. For example, if the researchers are interested in
examining the experiences of community health clinic patients, they may wish to conduct
a focus group and then select 10 different groups that represent various demographics
from several health clinics in the area. Heterogeneous sampling is beneficial when it is
desirable to view patterns across a diverse set of individuals. It is also valuable in describ-
ing experiences that may be central or core to most individuals. Heterogeneous sampling
can be problematic in small samples, though, because it is difficult to obtain a wide variety
of cases using this method.

Expert sampling involves sampling a panel of individuals who have known expertise
(knowledge and training) in a particular area. While expert sampling can be used in both
qualitative and quantitative research, it is used more often in qualitative research. Expert
sampling involves a process of identifying individuals with known expertise in an area
of interest, obtaining informed consent from each expert, and then collecting information
from them either individually or as a group.

The goal of criterion sampling is to select cases “that meet some predetermined criterion
of importance” (Patton, 2002, p. 238). A criterion could be a particular illness or experience
that is being investigated, or even a program or a situation. For example, criterion sam-
pling could be used to investigate a range of topics: the experiences of individuals who
have attempted suicide, why students are exceedingly absent from online course rooms,
or cases that have exceeded the standard waiting time at a doctor’s office. The idea with
this method is that the participants meet a particular criterion of interest.

Theory-based sampling is a version of criterion sampling that focuses on obtaining cases
that represent theoretical constructs. As Patton (2002) discussed, theory-based sampling
involves the researcher obtaining “sampling incidents, slices of life, time periods, or peo-
ple on the basis of their potential manifestation or representation of important theoretical
constructs” (p. 238). This type of sampling arose from grounded theory research and fol-
lows a more deductive or theory-testing approach. As Glaser (1978) described, theory-
based sampling is “the process of data collection for generating theory whereby the analyst

new85743_04_c04_169-212.indd 192 6/18/13 12:02 PM

193

CHAPTER 4Section 4.3 Sampling From the Population

jointly collects, codes, and analyzes his data and decides which data to collect next and
where to find them” (p. 36). Thus, data collection is driven by the emerging theory, and
participants are selected based on their knowledge of the topic.

Choosing a Sampling Strategy

Although quantitative researchers strive for representative samples, there is no such thing
as a perfectly representative one. There is always some degree of sampling error, defined
as the degree to which the characteristics of the sample differ from the characteristics of
the population. Instead of aiming for perfection, then, researchers aim for an estimate of
how far from perfection their samples are. These estimates are known as the error of esti-
mation, or the degree to which the data from the sample are expected to deviate from the
population as a whole.

One of the main advantages of a probability sample is that we are able to calculate these
errors of estimation (or margins of error). In fact, you have likely encountered errors of
estimation every time you see the results of an opinion poll. For example, CNN may
report that “Candidate A is leading the race with 60% of the vote, 6 3%.” This means
Candidate A’s percentage in the sample is 60%, but based on statistical calculations, her
real percentage is between 57% and 63%. The smaller the error (3% in this example), the
more closely the results from the sample match the population. Naturally, researchers
conducting these opinion polls want the error of estimation to be as small as possible;
imagine how nonpersuasive it would be to learn that “Candidate A has a 10-point lead,
6 20 points.” In general, these errors are minimized when three conditions are met: The
overall population is smaller; the sample itself is larger; and there is less variability in the
sample data. When samples are created using a probability method, all of this information
is available because these methods require knowing the population.

If probability sampling is so powerful, why are nonprobability strategies so popular? One
reason is that convenience samples are more practical; they are cheaper, easier, and almost
always possible to enroll with relatively few resources because you can avoid the costs of
large-scale sampling. A second reason is that convenience is often a good starting point for
a new line of research. For example, if you wanted to study the predictors of relationship
satisfaction, you could start by testing hypotheses in a controlled setting using college
student participants, and then you could extend your research to the study of adult mar-
ried couples. Finally, and relatedly, in many types of qualitative research, it is acceptable
to have a nonrepresentative sample because you do not need to generalize your results.
If you want to study the prevalence of alcohol use in college students, it may be perfectly
acceptable to use a convenience sample of college students. Although, even in this case,
you would have to keep in mind that you were studying drinking behaviors among stu-
dents who volunteered to complete a study on drinking behaviors.

There are also cases, however, where it is critical to use probability sampling despite
the extra effort it requires. Specifically, researchers use probability samples any time it
is important to generalize and any time it is important to predict behavior of a popula-
tion. The best example for understanding these criteria is to think of political polls. In
the lead-up to an election, each campaign is invested in knowing exactly what the voting
public thinks of its candidate. In contrast to a CNN poll, which is based on a convenience

new85743_04_c04_169-212.indd 193 6/18/13 12:02 PM

194

CHAPTER 4Section 4.3 Sampling From the Population

sample of viewers, polls conducted by a campaign will be based on randomly selected
households from a list of registered voters. The resulting sample is much more likely to be
representative, much more likely to tell the campaign how the entire population views its
candidate, and therefore, much more likely to be useful.

Determining a Sufficient Sample Size

Several factors come into play when determining what a sufficient sample size is for a
research study. Probably the most important factor is whether the study is a quantitative
or a qualitative one. In quantitative research, there is one basic rule for sample size: The
larger, the better. This is because a larger sample will be more representative of the popu-
lation being studied. Having said that, even small samples using quantitative data can
yield meaningful correlations if the researcher makes efforts to achieve a representative
sample or a meaningful convenience sample.

How does one make a practical decision, though, regarding the exact sample size a par-
ticular research situation requires? Gay, Mills, and Airasian (2009) provided the following
guidelines for determining sufficient sample size based on the size of the population:

• For populations that include 100 or fewer individuals, the entire population
should be sampled.

• For populations that include 400–600 individuals, 50% of the population should
be sampled.

• For populations that include 1,500 individuals, 20% of the population should be
sampled.

• For populations larger than 5,000, about 8% of the population should be sampled.

Thus, as we can see, the larger the population size, the smaller the percentage required to
obtain a representative sample. This does not mean that the sample shrinks as the popula-
tion increases, but rather the opposite: The sample size required actually increases when
studying larger populations.

Although larger sample sizes are generally preferred in quantitative studies, the size of an
adequate sample also depends on the similarities and dissimilarities among members of
the population. If the population is fairly diverse for the construct being measured, then
a larger sample will likely be required in order to obtain a representative sample. Popula-
tions whose members have similar characteristics will require smaller samples. Research-
ers have developed fairly sophisticated methods for determining sufficient sample sizes
through the use of statistical power analysis; however, such methods are beyond the scope
of this book. Gay et al.’s (2009) guidelines are sufficient for most types of research.

As discussed in Chapter 3, recommended samples for qualitative research are typically
much smaller than for quantitative research, which strives to be representative of larger
populations. In qualitative research, sample sizes are not based on numbers but rather
on how well the variables of interest are represented (Houser, 2009). Unlike quantitative
approaches, which require large samples, qualitative techniques do not have any rules
regarding sample size. Thus, sample size depends more on what the researcher wants to
know, the purpose of the inquiry, what the findings will be useful for, how credible they

new85743_04_c04_169-212.indd 194 6/18/13 12:02 PM

195

CHAPTER 4Section

4.4 Analyzing Survey Data

will be, and what can be done with available time and resources. Qualitative research can
be very costly and time-consuming, so choosing information-rich cases will yield the great-
est return on investment. As noted by Patton (2002), “The validity, meaningfulness, and
insights generated from qualitative inquiry have more to do with the information-richness
of the cases selected and the observational/analytical capabilities of the researcher than
with sample size” (p. 245).

Nonresponse Bias in Survey Research

Sometimes participants do not submit a survey or do not fill it out completely. When this
occurs, it is considered nonresponse bias, which can affect the size and characteristics of
the sample, as well as the external validity of the study. External validity refers to how well
the results obtained from a sample can be extended to make predictions about the entire
population, which will be further discussed in Chapter 5. Nonresponses are particularly
problematic if a large number of participants or a specific group of participants fails to
respond to particular questions. For example, perhaps all females skip a certain question,
or several participants skip certain questions because they are too personal. This omission
creates bias not only in the characteristics of the sample (e.g., the sample may no longer be
representative) but also in the size of the sample that is required for the study.

Nonresponses can occur for many reasons, including the survey being too long, the ques-
tions being worded awkwardly, the survey topic being uninteresting, or, in the case with
Web-based surveys, the participants not knowing how to access or log into the website to
complete the survey.

There are a few ways to minimize the threat of nonresponse bias. These include increas-
ing the sample size to account for the possibility of nonresponses, making sure that
survey directions and questions are worded clearly, making sure the survey is not too
long, providing rewards or incentives for completing the survey, sending out reminders
to complete the survey, and providing a cover letter that describes the exact reasons for
conducting the survey.

4.4 Analyzing Survey Data

Once you have designed a survey, chosen an appropriate sample, and collected some data, now comes the fun part. As with the quantitative descriptive designs covered in Chapter 3, the goal of analyzing survey data is to subject your hypothe-
ses to a statistical test. Surveys can be used both to describe and predict thoughts, feelings,
and behaviors. However, since we have already covered the basics of descriptive analysis
in Chapter 3, this section will focus on predictive analyses, which are designed to assess
the associations between and among variables.

Researchers typically use three approaches to test predictive hypotheses: correlational
analyses, chi-square analyses, and regression analyses. Each one has its advantages and
disadvantages, and each is most appropriate for a different kind of data. Correlational
analysis allows one to examine the strength, direction, and statistical significance of a

new85743_04_c04_169-212.indd 195 6/18/13 12:02 PM

196

CHAPTER 4Section 4.4 Analyzing Survey Data

relationship; chi-square analysis determines whether two nominal variables are indepen-
dent from or related to one another; and simple linear regression is the method used to
“predict” scores for one variable based on another. In this section, we will walk through
the basics of each analysis.

Correlational Analysis

In the beginning of this chapter, we encountered an example of a survey research ques-
tion: What is the relationship between the number of hours that students spend studying
and their grades in the class? In this case, the hypothesis claims that we can predict some-
thing about a student’s grades by knowing how many hours he or she spends studying.

Imagine we collected a small amount of data to test this hypothesis, shown in Table 4.1.
(Of course, if we really wanted a good test of this hypothesis, we would need more than
10 people in the sample, but this will do as an illustration.)

Table 4.1: Data for quiz grade/hours studied example

Participant Hours Studied

Quiz Grade

1 1 2

2 1 3

3 2 4

4 3 5

5 3 6

6 3 6

7 4 7

8 4 8

9 4 9

10 5 9

The Logic of Correlation
The important question here is whether and to what extent we can predict grades based
on study time. One common statistic for testing these kinds of hypotheses is a correla-
tion, which assesses the linear relationship between two variables. A stronger correlation
between two variables translates into a stronger association between them. Or, to put it
differently, the stronger the correlation between study time and quiz grade, the more accu-
rately you can predict grades based on knowing how long the student spends studying.

Before we calculate the correlation between these variables, it is always a good idea to
visualize the data on a graph. The scatterplot in Figure 4.2 displays our sample data from
the studying/quiz grade study.

new85743_04_c04_169-212.indd 196 6/18/13 12:02 PM

197

CHAPTER 4Section 4.4 Analyzing Survey Data

Figure 4.2: Scatterplot for quiz grade/hours studied example

Each point on the graph represents one participant. For example, the point in the top right
corner represents a student who studied for 5 hours and earned a 9 on the quiz. The two
points in the bottom right represent students who studied for only 1 hour and earned a 2
and a 3 on the quiz.

There are two reasons to graph data before conducting statistical tests. First, a graph con-
veys a general sense of the pattern—in this case, students who study less appear to do
worse on the quiz. As a result, we will be better informed going into our statistical calcula-
tions. Second, the graph ensures that there is a linear relationship between the variables.
This is a very important point about correlations: The math is based on how well the data
points fit a straight line, which means nonlinear relationships might be overlooked.
Figure 4.3 demonstrates a robust nonlinear finding in psychology regarding the relation-
ship between task performance and physiological arousal. As this graph shows,
people tend to perform their best on just about any task when they have a moderate level
of arousal.

When arousal is too high, it is difficult to
calm down and concentrate; when arousal
is too low, it is difficult to care about the
task at all. If we simply ran a correlation
with data on performance and arousal,
the correlation would be zero because the
points do not fit a straight line. Thus, it is
critical to visualize the data before jump-
ing ahead to the statistics. Otherwise, you
risk overlooking an important finding in
the data.

Q
u

iz
G

ra
d

Hours Studied

Quiz Grade and Hours Spent
Studying

8
6
4
2
0

0 1 2 3 4 5 6

arousal

performance

Figure 4.3: Curvilinear relationship
between arousal and performance

new85743_04_c04_169-212.indd 197 6/18/13 12:02 PM

198

CHAPTER 4Section 4.4 Analyzing Survey Data

Interpreting Coefficients
Once we are satisfied that our data look linear, it is time to calculate our statistics. This is
typically done using a computer software program, such as SPSS (IBM), SAS/STAT (SAS),
or Microsoft Excel. The number used to quantify our correlation is called the correlation
coefficient. This number ranges from 21 to 11 and contains two important pieces of
information:

• The size of our relationship is based on the absolute value of our correlation coef-
ficient. The farther our coefficient is from zero in either direction, the stronger the
relationship between variables. For example, both a 1 .80 and a 2 .80 indicate
strong relationships.

• The direction of the relationship is based on the sign of our correlation coefficient.
A 1 .80 would indicate a positive correlation, meaning that as one variable
increases, so does the other variable. A 2 .80 would indicate a negative correlation,
meaning that as one variable increases, the other variable decreases. (Refer back to
Section 2.1, Overview of Research Designs, for a review of these two terms.)

So, for example, a 1 .20 is a weak positive relationship and a 2 .70 is a strong negative
relationship.

When we calculate the correlation for our quiz grade study, we get a coefficient of .96,
indicating a strong positive relationship between studying and quiz grade. What does this
mean in plain English? Students who spend more hours studying tend to score higher on
the quiz.

How do we know whether to get excited about a correlation of .96? As with all of our sta-
tistical analyses, we look up this value in a critical value table, or, more commonly, let the
computer software do this for us. This gives us a p value representing the odds that our
correlation is the result of random chance. In this case, the p value is less than .001. This
means that the chance of our correlation being a random fluke is less than 1 in 1,000, so we
can feel pretty confident in our results. Note, however, that because the p value associated
with r is dependent on sample size, even a tiny, unimportant correlation can be statisti-
cally significant when drawing conclusions from a large population.

We now have all the information we need to report this correlation in a research paper.
The standard way of reporting a correlation coefficient includes information about the
sample size and p value, as well as the coefficient itself. Our quiz grade study would be
reported as shown in Figure 4.4.

So where does this leave our
hypothesis? We started by pre-
dicting that students who spend
more time studying would per-
form better on their quizzes than
those who spend less time study-
ing. We then designed a study to
test this hypothesis by collecting
data on study habits and quiz
grades. Finally, we analyzed these

Statistical symbol
for the correlation
coefficient, always

italicized and
lowercase

Degrees of
freedom, always

n–2 for a
correlation

Correlation
Coefficient

p value

r (8) = .964, p < .001

Figure 4.4: Correlation coefficient diagram

new85743_04_c04_169-212.indd 198 6/18/13 12:02 PM

199

CHAPTER 4Section 4.4 Analyzing Survey Data

data and found a significant, strong, positive correlation between hours studied and quiz
grade. Based on this study, our hypothesis has been supported—students who study
more have higher quiz grades! Of course, because this is a correlational study, we are
unable to make causal statements. It could be that studying more for an exam helps you
to learn more. Or, it could be the case that previous low quiz grades make students give
up and study less. Or, the third variable of motivation could cause students to both study
more and perform better on the quizzes. To tease these explanations apart and determine
causality, we will need an experimental type of research design, which we will cover in
Chapter 5.

Regression Analysis

Correlations are the best tool for testing the linear relationship between pairs of quantita-
tive variables. However, in many cases, we are interested in comparing the influence of
several variables at once. Imagine that you wanted to expand the investigation about hours
studying and quiz grade by looking at other variables that might predict students’ quiz
grades. We have already learned that the hours students spend studying positively cor-
relate with their grades. But what about SAT scores? We might predict that students with
higher standardized test scores will do better in all of their college classes. Or what about
the number of classes that students have previously taken in the subject area? We might
predict that increased familiarity with the subject would be associated with higher scores.
In order to compare the influence of all three variables, we will use a slightly different
analytic approach. Multiple regression analysis is a variation on correlational analysis;
in it, more than one predictor variable is used to foresee a single outcome variable. In this
example, we would be attempting to predict the outcome variable of quiz scores based on
three predictor variables: SAT scores, number of previous classes, and hours studied.

Multiple regression analysis requires an extensive set of calculations; consequently, it is
always done using computer software. A detailed look at these calculations is beyond the
scope of this book, but a conceptual overview will help you understand the unique advan-
tages of this form of analysis. Essentially, the calculations for multiple regression are based
on the correlation coefficients between each of our predictor variables, and between each
of these variables and the outcome variable. These correlations for our revised quiz grade
study are shown in Table 4.2. If we scan the top row, we can see the correlations between
quiz grade and the three predictor variables: SAT score (r 5 .14), previous classes (r 5 .24),
and hours studied (r 5 .25). The remainder of the table shows correlations between the
various predictor variables; for example, hours studied and previous classes correlate at
r 5 .24. When we conduct multiple regression analysis using computer software, the soft-
ware will use all of these correlations in performing its calculations.

Table 4.2: Correlations for a multiple regression analysis

Quiz Grade SAT Score Previous Classes Hours Studied

Quiz Grade — .14 .24* .25*

SAT Score — .02 2.02

Previous Classes — .24*

Hours Studied —

new85743_04_c04_169-212.indd 199 6/18/13 12:02 PM

200

CHAPTER 4Section 4.4 Analyzing Survey Data

The advantage of multiple regression is that it considers both the individual and the
combined influence of the predictor variables. Figure 4.5 is a visual diagram of the indi-
vidual predictors of quiz grades. The numbers along each line are known as regression
coefficients, or beta weights. These are standardized coefficients that allow comparison
across predictors. Their values are very similar to correlation coefficients but differ in an
important way: They represent the effects of each predictor variable while controlling for
the effects of all the other predictors. That is, the value of b 5 .21 linking hours studied
with quiz grades is the independent contribution of hours studied, controlling for SAT
scores and previous classes. If we compared the size of these regression coefficients, we
would see that, in fact, hours spent studying were still the largest predictor of quiz grades
(b 5 .21) compared with both SAT scores (b 5 .14) and previous classes (b 5 .19).

Figure 4.5: Predictors of quiz grades

Even if individual variables have only a small influence, they can add up to a larger com-
bined influence. So, if we were to analyze the predictors of quiz grades in this study, we
would find a combined multiple correlation coefficient of r 5 .34. The multiple correla-
tion coefficient represents the combined association between the outcome variable and
the full set of predictor variables. Note that in this case, the combined r of .34 is larger
than any of the individual correlations (r) in Table 4.2, which ranged from .14 to .25. These
numbers mean that we are better able to predict quiz grades from examining all three
variables than we are from examining any single variable. Or, as the saying goes, the
whole is greater than the sum of its parts!

Multiple regression analysis is an incredibly useful and powerful analytic approach, but
it can also be a tough concept to grasp. Before we move on, let’s revisit the concept in the
form of an analogy. Imagine you’ve just eaten the most delicious hamburger of your life
and are determined to understand what made it so good. Lots of factors will contribute
to the taste of your hamburger: the quality of the meat, the type and amount of cheese,
the freshness of the bun, perhaps the smoked chili peppers layered on top. If you were to
approach this investigation using multiple regression analysis, you would be able to sepa-
rate out the influence of each variable (e.g., How important is the cheese compared with
the smoked peppers?) as well as take into account the full set of ingredients (e.g., Does
the freshness of the bun really matter when the other elements taste so good?). Ultimately,
you would be armed with the knowledge of which elements are most important in craft-
ing the perfect hamburger. And you would understand more about the perfect hamburger
than if you had examined each ingredient in isolation.

SAT Score

Previous Classes

Hours Studied
Quiz Grade

b = .14, p > .05

b = .19, p < .05

b = .21, p < .05

new85743_04_c04_169-212.indd 200 6/18/13 12:02 PM

201

CHAPTER 4Section 4.4 Analyzing Survey Data

Chi-Square Analyses

Both correlations and regressions are well suited to testing hypotheses about prediction,
as long as it is possible to demonstrate a linear relationship between two variables. But
linear relationships require that variables be measured on one of the quantitative scales—
that is, ordinal, interval, or ratio scales (see Section 2.3, Scales and Types of Measurement,
for a review). What if we wanted to test the association between nominal, or categori-
cal, variables? In these cases, we would need an alternative statistic called the chi-square
statistic, which determines whether two nominal variables are independent from or
related to one another. Chi-square is often abbreviated with the symbol x2, which shows
the Greek letter chi with the superscript 2 for squared. (This statistic is also referred to as
the chi-square test for independence—a slightly longer but more descriptive synonym.)

The idea behind this test is similar to that of the correlation coefficient. If two variables are
independent, then knowing the value of one variable does not tell you anything about the
value of the other. As we will see in the following examples, a larger chi-square reflects a
larger deviation from what we would expect by chance and is thus an index of statistical
significance.

The Logic of Chi-Square
To determine whether two variables are associated, the chi-square works by comparing
the observed frequencies (collected data) with the expected frequencies if the variables
were unrelated. If the results significantly deviate from these expected frequencies, then
we conclude that our variables are related. And, consequently, we are able to predict one
variable based on knowing the values of the other. Let’s look at a couple of examples to
make this more concrete.

First, let’s say we wanted to know whether gender is related to political party affiliation.
We might randomly select 100 men and 100 women and ask them whether they identified
as Republican or Democrat. Because both of these variables are nominal—that is, they
identify only categories, not quantitative measures—chi-square will be our best choice to
test the association between them. The first step in conducting this analysis is to arrange
our data in a contingency table, which displays the number of individuals in each of the
combinations of our nominal variables. We encountered these tables before in our exam-
ples of observational studies in Chapter 3 (Section 3.1) but stopped short of conducting
the statistical analyses. So imagine we get the results shown in Table 4.3a from our survey
of gender and party affiliation.

Table 4.3a: Gender and party affiliation

Male Female

Democrat 60 60

Republican 40

In this case, there is no association between sex and party affiliation. It does not matter
that the sample consists of 60% Democrats and 40% Republicans. What matters for our

new85743_04_c04_169-212.indd 201 6/18/13 12:02 PM

202

CHAPTER 4Section 4.4 Analyzing Survey Data

hypothesis test is that the pattern for males is the same as the pattern for females: Our
sample consists of 1.5 times the number of Democrats for both sexes. In other words,
knowing a person’s sex does not tell us anything about their political affiliation.

For illustration purposes, imagine we tested the same hypothesis again but could recruit
only 50 women, compared with 100 men. Now, if we found the same 60%/40% split
among men again, and assuming that the variables were still not related, here’s the ques-
tion: What would we expect the split to look like among women? If the ratio of Democrats
to Republicans remains at 1.5 to 1, we would expect to see the women divided into 30
Democrats and 20 Republicans (i.e., the same ratio; shown in Table 4.3b). This concept is
referred to as the expected frequency, or the frequency you would expect to see if the vari-
ables were not related. In this example, we have a 60/40 split among men. If gender were
unrelated to party affiliation, we would expect to see the same pattern among women.

Table 4.3b: Gender and party affiliation with unequal ns

Male Female

Democrat 60

Republican 40

The chi-square statistic is calculated by comparing our observed data with these expected
frequencies. In our gender and party affiliation example, the observed data match the
expected frequencies, meaning that the variables are not related. Let’s walk through
another example and see how these calculations work.

Calculating Chi-Square
In this second example, imagine we wanted to know whether people in rural or urban
areas were more likely to support a sales tax increase. It would be easy to speculate why
either group might be more likely to do so—perhaps people living in cities are more politi-
cally liberal or perhaps people living in small towns are better able to see the benefits of
higher local taxes. So, once again, imagine we surveyed a sample of 100 people, asking
them to indicate both their party affiliation and their support for a sales tax proposal. We
get the following contingency table of results (in Table 4.4a). Notice that we have more
urban than rural residents, reflecting the higher population density in cities. But, as with
our preceding gender and political affiliation example, the raw numbers are less impor-
tant than the ratios within each group.

new85743_04_c04_169-212.indd 202 6/18/13 12:02 PM

203

CHAPTER 4Section 4.4 Analyzing Survey Data

Table 4.4: Chi-square example: Support for a sales tax increase

4.4a: Observed data

Rural Urban Total

Support 10 45 55

Don’t support 30 15

Total 40 60 100

4.4b: Expected frequencies

Rural Urban Total

Support 10 (24.75) 45 (33) 55

Don’t support 30 (18) 15 (27) 45

Total 40 60 100

4.4c: Calculating deviations between observed and expected values

Rural Urban Total

Support (10 2 24.75)2/24.75 (55 2 33)2/33 55

Don’t support (30 2 18)2/18 (15 2 27)2/27 45

Total 40 60 100

4.4d: Deviations between observed and expected values

Rural Urban Total

Support 8.79 14.67 65

Don’t support 8 5.33 45

Total 40 60 100

The first stage in calculating our chi-square is to determine the expected frequencies. We
begin by calculating the sums across each row in column, as shown in Table 4.4a. This
gives us a sense of the overall patterns in the data. Overall 40% of the sample consisted
of rural residents, compared with 60% urban residents. And, overall, 55% supported the
sales tax increase, while 45% did not. But, as in our previous example, these descriptive
statistics do not tell us anything about the relationship between the two variables. If the
variables are independent, then the 55/45 split in support for the sales tax will not differ
based on where people live.

So we need to determine how much these observed data differ from what would be
expected under independence. That is, what would these cells look like if there were no
relationship? These expected frequencies are calculated using the following formula:

Expected Frequency 5
R 3 C
Total N

new85743_04_c04_169-212.indd 203 6/18/13 12:02 PM

204

CHAPTER 4Section 4.4 Analyzing Survey Data

For each of the four cells, we multiply the row total (R) by the column total (C), and then
divide by the total N in the sample. For example, in the rural resident/support cell, we
would multiply the row total (55) by the column total (40), and then divide by the total
sample size (100); (55 3 40) 4 100 5 22. Conceptually, this makes perfect sense: If the two
variables are unrelated, we can guess the value of each cell using the overall totals. Table
4.4b shows expected frequencies for each cell in parentheses.

The second stage in calculating chi-square is to determine the extent to which our observed
data deviate from these expected frequencies. We will need to calculate this deviation in
each cell and then add them up for a total chi-square. So, for each cell we (1) subtract the
expected from the observed value; (2) square the difference to remove any negative num-
bers; and (3) divide by the expected frequency in order to standardize the deviation. The
final chi-square value is obtained by adding up all four of these deviation scores, which
translates into the following formula:

x2 5 a
1observed 2 expected2 2

expected

For example, in our rural resident/support cell, we calculated an expected frequency of 2,
representing the number we would expect under independence. But in our sample, there
were 10 people in this cell. To calculate how much this deviates from what is expected,
we (1) subtract 22 from 10 (5 212), (2) square this difference to remove the negative num-
ber (5144), and (3) divide this by the expected frequency to standardize the deviation
(144 4 22 5 6.55). Tables 4.4c and 4.4d illustrate the steps for obtaining these deviation
scores in each of the four cells.

Finally, we add up all four of our deviation scores (one for each cell) to get the total chi-
square value:

x2 5 a
1observed 2 expected2 2

expected
5 6.55 1 14.67 1 8 1 5.33 5 34.55

Our final chi-square value, 34.55, represents the sum of our deviations from the expected
value. The larger this number, the more our observed data differ from the expected fre-
quencies. Remember that these expected frequencies represent our null hypothesis—we
would expect these frequencies only if the variables were unrelated. So the greater our
chi-square value, the more our variables are related to one another. In the present exam-
ple, this means we can predict a person’s support for a sales tax increase based on where
he or she lives, which is consistent with our initial hypothesis.

But how do we know whether our value of 34.55 is meaningful? As with the other statisti-
cal tests we have discussed, this requires looking up our result in a critical value table to
determine whether the calculated value is above threshold. In this case, the critical value
for a chi-square with a 2 3 2 table 5 3.84, so we can feel confident in our value of 34.55—
almost 10 times higher than the threshold value!

However, unlike correlation and regression coefficients, our chi-square results cannot
tell us anything about the direction or magnitude of the relationship. A larger chi-square
reflects a larger deviation from what we would expect by chance and is thus an index of
statistical significance. In order to interpret the patterns of our data, we need to visually
inspect the numbers in our data table. Better yet, we can create a bar graph like we did in
Chapter 3 to visually display these frequencies.

new85743_04_c04_169-212.indd 204 6/18/13 12:02 PM

205

CHAPTER 4Section

4.5 Ethical Issues in Survey Research

As Figure 4.6 shows, the cell frequencies suggest a fairly clear interpretation: People who
live in urban settings are much more likely than people who live in rural settings to sup-
port a sales tax increase. In fact, urban residents support the increase by a 3-to-1 margin,
while rural residents oppose the increase by a 3-to-1 margin.

Figure 4.6: Graph of chi-square results

4.5 Ethical Issues in Survey Research

Like all research, surveys should be carried out in ways that protect and avoid harm to the participants. Informed consent, as we discussed in Chapter 1 (Section 1.7, Ethics in Research), is a requirement for survey research and should include a clear
description of the survey content and the purpose for conducting the research. Whether
the survey is completed in person or online, a cover letter should be included that con-
tains this information. Although not always required for surveys (as there is minimal risk
of harm to participants taking questionnaires and surveys), some researchers also like to
obtain signed consent forms from the participants. This is especially true for institutional
review boards (IRBs), who want to ensure that participants were fully informed of any
sensitive information that may be collected, any potential limits to the confidentiality of
the data, or access to private records (such as medical records) that are being sought in
addition to the survey. In any case, a researcher should never administer a survey if the
participant has not provided verbal or written consent and should never use the data
other than for reasons for which the participant provided consent.

Assurances of participant anonymity and confidentiality are also important, especially
with respect to response rates to sensitive questions and response rates in general. Partici-
pants are likely to feel more comfortable responding to sensitive subjects when they know
that they are participating anonymously. Anonymity ensures that there is no way for their
responses to be linked back to them. Conducting survey research in a completely anony-
mous manner is much easier with online surveys; however, researchers can take steps

Rural Urban

45
40

5
10
0

Support

Don’t Support

new85743_04_c04_169-212.indd 205 6/18/13 12:02 PM

206

CHAPTER 4

Summary

to protect the anonymity of participants in individual or group administrations as well.
For example, anyone who has access to the surveys and completed data must commit in
writing to preserving their confidentiality. Any links between the answers and the par-
ticipants’ personal identifying information, such as names, email addresses, and phone
numbers, should be minimized by removing the latter from the data collection and coding
it (e.g., numbering each participant with an ID code rather than using his or her name). If
personal information must be kept, it should be separated from the survey responses and
destroyed as soon as the study is over. Any person who can identify the participant by
looking at the pattern of responses, such as a supervisor, should not be permitted to view
the survey responses. It is also important to be careful when reporting results for a small
subpopulation of the sample, whose personal information may be identifiable. Finally,
when the study is completed, researchers must ensure that they either destroy all survey
responses and personal information or store them securely.

Another important consideration for researchers is to be aware of their own biases and
the impact that those might have on the testing process. This is especially important in
face-to-face inquiries. For example, if a researcher has a strong opinion or bias toward a
particular topic, the questions might be worded in a way that persuades the participants
to answer in a specific manner. Additionally, if the researcher has a strong belief about the
topic, he or she may make this evident during the testing process, which may encourage
the participant to respond in a way that is consistent with what the researcher believes. As
we discussed in Chapter 3 (Section 3.1), participant-expectancy bias can occur during obser-
vations and interviews, as well as during survey research, and needs to be considered
when analyzing and interpreting data.

Finally, when using standardized questionnaires or surveys that have been developed by
other researchers, professional training and competence in the tests being used are essen-
tial. It is unethical for any researchers to administer, score, and interpret a test that they
have not been trained on. In addition, it is unethical for researchers to select tests based on
limited knowledge and experience and assume that these tests are reliable and valid for
the purpose for which they are using them.

Summary

This chapter has covered the process of survey research from conceptualization through analysis. We first discussed the types of research questions that are best suited to survey research—essentially, those that can be answered based on people’s
observations of their own behavior and characteristics. Survey research can involve either
verbal reports (i.e., interviews) or written reports (i.e., questionnaires). In both cases, sur-
veys are distinguished by their reliance on people’s self-reports of their attitudes, feelings,
and behaviors.

This chapter covered several key points for writing survey items. The take-home point to
our Five Rules for Designing Better Questionnaires is that your questions should be writ-
ten as clearly and unambiguously as possible. This helps to minimize the error variance

new85743_04_c04_169-212.indd 206 6/18/13 12:02 PM

207

CHAPTER 4 Summary

that might result from participants imposing their own guesses and interpretations on the
material. In designing survey items, you also have a broad choice between open-ended
and fixed-format responses. The former provide richer and more extensive data but are
harder to score and code; the latter are easier to code but can constrain people’s responses
to your choice of categories. If and when you settle on a fixed-format response, you have
another set of decisions to make regarding the response scaling, labels, and general format.

Once you have constructed the scale, it is time to begin collecting the data. This chap-
ter discussed the concept of sampling, or choosing a portion of the population to use for
your study. Broadly speaking, sampling can be either “probability” or “nonprobability,”
depending on whether you have a known population size from which you sample ran-
domly. Probability sampling is more likely to result in a representative sample, but this
approach is not possible in all studies. In fact, a significant proportion of psychology
research studies uses a form of nonprobability sampling called convenience sampling,
meaning that the sample consists of those who show up for the study.

This chapter also covered three approaches to analyzing survey data and testing hypoth-
eses about prediction. The first, correlational analysis, is a very popular way to analyze survey
data. The correlation is a statistical test that gives an assessment of the linear relationship
between two variables. The stronger the correlation between variables, the more we can
accurately predict one based on knowing the other. Second, regression analyses allow us
to expand our investigations into multiple predictors. The advantage of multiple regres-
sion analysis is that it considers both the individual and the combined influence of the
predictor variables. However, both correlation and regression require the variables to be
quantitative—that is, measured on an ordinal, interval, or ratio scale. In cases where our
survey produces nominal or categorical data, we use an alternative called the chi-square
statistic, which determines whether two nominal variables are independent or related.
The chi-square works by examining the extent to which our observed data deviate from
the pattern we would expect if the variables were unrelated—that is, the null hypothesis.
The common thread running through these analyses is that they measure the association
between variables and do not tell us anything about the causal relationship between them.
To make causal statements, we have to conduct experiments, which we will cover in the
next chapter.

Finally, this chapter discussed the ethical issues that arise in survey research. In addition
to the concepts discussed in Chapter 1 regarding informed consent and confidentiality, it
is important that researchers conducting survey research ensure anonymity so that partic-
ipants feel safe responding to sensitive topics and comfortable participating in the overall
study. Participant response rates tend to be higher when participants know that there is
no way to link them to their responses. Additionally, the researcher should be aware of
his or her own biases and the impact that these might have on the collection, analysis, and
interpretation of the data, as well as on how participants may respond. Also, when using
standardized tests or tests that have been developed by other researchers, it is imperative
that researchers be trained and competent in administering and scoring the results, as well
as in interpreting those results.

new85743_04_c04_169-212.indd 207 6/18/13 12:02 PM

208

CHAPTER 4

Key Terms

anchors Labels, or endpoints, for a rating
scale.

beta weights See regression coefficients.

bipolar scale Rating scale that has polar
opposites as its anchors.

branching schedule An interview format
in which questions take different directions
depending on participants’ answers.

chi-square statistic A statistical test
similar to the correlation coefficient; deter-
mines whether two nominal variables are
independent or related.

cluster sampling A variation of simple
random sampling that involves dividing
the sample into groups based on more
than one level of analysis.

contingency table A data summary table
that shows the number of individuals in
each combination of the nominal vari-
ables; used as the first step in calculating
chi-square.

convenience sampling A nonprobability
sampling strategy that involves simply
enrolling people who show up for the
study.

correlation Statistical test that assesses the
linear relationship between two variables;
the stronger the correlation between vari-
ables, the more accurately the prediction
about one based on knowing the other.

correlation coefficient The number used
to quantify a correlation; this coefficient (r)
ranges from 21 to 11 and contains infor-
mation about both the size and direction of
the correlation.

criterion sampling A qualitative sampling
method used to select cases that meet a
predetermined criterion of importance.

double-barreled question A flawed sur-
vey item that asks more than one question
at a time.

error of estimation The degree to which
the data from the sample are expected to
deviate from the population as a whole.

error variance Variance from random
sources that are irrelevant to the trait or
ability that a questionnaire is purporting to
measure.

expected frequency The frequency one
would expect to see if the variables were
not related.

expert sampling Sampling of a panel of
individuals who have known expertise
(knowledge and training) in a particular
area.

extreme or deviant case sampling A
form of sample selection used when the
researcher wants information-rich data on
unusual or special cases; the focus is on
examining both ends of the spectrum of
outcomes (e.g., outstanding successes and
failures of the phenomenon being studied).

fixed-format response Answer to a lim-
iting question or statement, involving
choosing from a list of options; on a sur-
vey, fixed-format responses are easier to
code but can constrain the data into nar-
row categories.

forced choice A rating scale that requires
respondents to agree or disagree with a
statement, usually through the use of an
even number of scale points.

Key Terms

new85743_04_c04_169-212.indd 208 6/18/13 12:02 PM

209

CHAPTER 4Key Terms

heterogeneous (maximum variation)
sampling A qualitative sampling method
that may include both extreme and typi-
cal cases; used to select a wide variety of
cases in relation to the phenomenon being
investigated.

interview A verbally administered survey.

interview schedule A plan, or script, for
the progress of an interview, describing
the list of questions and the order in which
they should be asked.

leading question A flawed survey item
worded in a way that suggests an answer.

Likert scale Format that uses anchors of
“strongly agree” and “strongly disagree”
to rate responses to a survey question.

linear schedule An interview format that
asks the same questions in the same order
for all participants.

multiple-choice format A fixed-format-
response survey format that asks partici-
pants to select from a set of predetermined
responses.

multiple correlation coefficient A number
that represents the combined association
between the outcome variable and the full
set of predictor variables.

multiple regression analysis A variation
on correlational analysis in which more
than one predictor variable is used to pre-
dict a single outcome variable.

nonprobability sampling A group of
sampling strategies used when the odds of
any given individual’s being in the sample
are unknown.

nonresponse bias Bias introduced when
certain questions or entire surveys are not
completed by participants, affecting the
characteristics and size of the sample.

open-ended response Unstructured
answer to a question or statement; on a
survey, open-ended responses provide rich
data but are difficult to code.

pilot testing A “test run” of a survey that
involves giving the questionnaire to a
small sample of people, getting their feed-
back, and making any necessary changes.

population The entire collection of people
who could possibly be relevant for a study.

probability sampling A group of data-
collection strategies used when each per-
son in the population has a known chance
of being in the sample.

purposive sampling A qualitative sam-
pling method that includes selecting
relatively small samples in a purposeful
manner.

questionnaire A survey that is adminis-
tered in writing.

rating scale A fixed-format response that
asks participants to place responses on a
continuum.

regression coefficients (beta weights) Val-
ues that represent the effects of each
predictor variable while controlling for the
effects of all the other predictors.

sampling bias The failure of the sample
to represent the underlying distribution in
the population.

new85743_04_c04_169-212.indd 209 6/18/13 12:02 PM

210

CHAPTER 4

Apply Your Knowledge

1. For each of the following poorly written questionnaire items, identify the major
problem and then rewrite it so that the problem is resolved.
a. How much do you like cats and ponies?

main problem:
better item:

b. Do you think that John McCain’s complete lack of personality proved that
he would have been a terrible president?

main problem:
better item:

sampling error The degree to which the
characteristics of the sample differ from
the characteristics of the population.

sampling frame A list of all members of a
particular population (e.g., a list of every
resident of the United States) and a neces-
sary requirement for probability sampling
strategies.

self-reports Participants’ reports of their
own attitudes, feelings, and behaviors.

self-selection bias Bias introduced into a
survey study when the researcher receives
responses only from those who are inter-
ested in the topic.

simple random sampling A probability
sampling strategy that involves randomly
picking participants from a list of everyone
in the population.

snowball sampling A nonprobability
sampling strategy that involves recruiting
by word-of-mouth referrals.

social desirability Participants’ reluctance
to give unpopular answers to survey ques-
tions; concern over how their attitude will
be perceived.

stratified random sampling A variation
of simple random sampling, used when
subgroups of the population might be left
out of a purely random sampling pro-
cess; breaking the sampling frame into
subgroups and then sampling a random
number from each subgroup.

survey research Any method that relies
on people’s observations of their own
behavior.

theory-based sampling A qualitative
sampling method (a version of criterion
sampling) that focuses on obtaining cases
that represent theoretical constructs.

true/false format A fixed-format survey
response that asks participants to indicate
whether they endorse a statement.

typical case sampling Research method
that involves sampling the most frequent
or “normal” case from a population.

unipolar scale Rating scale that assesses a
single construct.

new85743_04_c04_169-212.indd 210 6/18/13 12:02 PM

211

CHAPTER 4

Critical Thinking & Discussion Questions

c. Do you dislike not playing basketball?

main problem:
better item:

d. Do you support SB 1070?

main problem:
better item:

e. How often do you take drugs?

main problem:
better item:

2. Dr. Truxillo is interested in Arizona residents’ thoughts and feelings about
global warming. For each of the following examples, state the sampling method
used by her research assistants.
a. Reese sets up a table in the mall and hands a survey to people who

approach her.
b. Catherine randomly chooses 5 cities, then chooses 3 neighborhoods in each,

then randomly samples 5,000 households for a phone survey.
c. Jason starts with a list of the entire population of Arizona and selects partici-

pants by dialing random phone numbers.
d. Anna gets the master list from Jason and divides the population according

to education level. She then randomly chooses 500 high school dropouts,
500 college graduates, and 500 people with some postgraduate education.

3. Based on each of the following study descriptions, choose whether the best
analysis would be a correlation, a multiple regression, or a chi-square.
a. Jim is interested in the relationship between annual income and self-

reported happiness.
b. Shelia is interested in whether some ethnic groups are more likely to use

counseling services (a yes-or-no question).
c. Angela is interested in knowing the best predictors of recovery from depres-

sion, comparing the influence of drugs, therapy, and family resources.
d. Adam is interested in whether high school dropouts or college graduates are

more likely to vaccinate their children.
e. Nicole is interested in understanding the best predictors of weight loss.
f. Samantha is interested in the relationship between self-esteem and

prejudice.

Critical Thinking & Discussion Questions

1. In survey research, explain the trade-off between the “richness” of people’s
responses and the ease of analyzing their responses.

2. When doing interviews, the researcher has a personal interaction with the
subject. Why is this both good and bad?

new85743_04_c04_169-212.indd 211 6/18/13 12:02 PM

new85743_04_c04_169-212.indd 212 6/18/13 12:02 PM

John Foxx/Stockbyte/Thinkstock

chapter

Qualitative and Descriptive

Designs—Observing Behavior

Chapter Contents

• Qualitative and Descriptive Research Designs
• Qualitative Research Interviews
• Critiquing a Qualitative Study
• Writing the Qualitative Research Proposal
• Describing Data in Descriptive Research

CO_

new85743_03_c03_103-168.indd 103 6/18/13 12:00 PM

CHAPTER 3Introduction

In the fall of 2009, Phoebe Prince and her family relocated from Ireland to South Hadley, Massachusetts. Phoebe was immediately singled out by bullies at her new high school and subjected to physical threats, insults about her Irish heritage, and harassing posts
on her Facebook page. This relentless bullying continued until January of 2010, ending
only because Phoebe elected to take her own life in order to escape her tormentors (United
Press International, 2011). Tragic stories like this one are all too common, and it should
come as no surprise that the Centers for Disease Control and Prevention (CDC) have iden-
tified bullying as a serious problem facing our nation’s children and adolescents (Centers
for Disease Control and Prevention [CDC], 2012).

Scientific research on bullying began in Norway in the late 1970s in response to a wave of
teen suicides. Work begun by psychologist Dan Olweus—and since continued by many
others—has documented both the frequency and the consequences of bullying in the
school system. Thus, we know that approximately one third of children are victims of bul-
lying at some point during development, with between 5% and 10% bullied on a regular
basis (Griffin & Gross, 2004; Nansel et al., 2001). Victimization by bullies has been linked
to a wide range of emotional and behavioral problems, including depression, anxiety, self-
reported health problems, and an increased risk of both violent behavior and suicide (for
a detailed review, see Griffin & Gross, 2004). Recent research even suggests that bullying
during adolescence may have a lasting impact on the body’s physiological stress response
(Hamilton et al., 2008).

But most of this research has a common limitation: It has studied the phenomenon of bul-
lying using self-report survey measures. That is, researchers typically ask students and
teachers to describe the extent of bullying in the schools or have students fill out a col-
lection of survey measures, describing in their own words both bullying experiences and
psychological functioning. These studies are conducted rigorously, and the measures they
use certainly meet the criteria of reliability and validity that we discussed in Chapter 2
(Section 2.2, Reliability and Validity). However, as Wendy Craig, Professor of Psychology
at Queen’s University, and Debra Pepler, a Distinguished Professor at York University,
suggested in a 1997 article, this questionnaire approach is unable to capture the full con-
text of bullying behaviors. And, as we have already discussed, self-report measures are
fully dependent on people’s ability to answer honestly and accurately.

In order to address this limitation, Craig and Pepler (1997) decided to observe bully-
ing behaviors as they occurred naturally on the playground. Among other things, the
researchers found that acts of bullying occurred approximately every 7 minutes, lasted
only about 38 seconds, and tended to occur within 120 feet of the school building. They
also found that peers intervened to try to stop the bullying more than twice as often as
adults did (11% versus 4%, respectively). These findings add significantly to scientific
understanding of when and how bullying occurs. And for our purposes, the most nota-
ble thing about them is that none of the findings could have been documented without
directly observing and recording bullying behaviors on the playground. By using this
technique, the researchers were able to gain a more thorough understanding of the phe-
nomenon of bullying and thus able to provide real-world advice to teachers and parents.
Qualitative research is valuable when the nature of a phenomenon such as bullying, its
signs, symptoms, dynamics, and emotional consequences are not well understood.

One recurring theme in this book is that it is absolutely critical to pick the right research
design to address your hypothesis. Over the next three chapters, we will be discussing

TX_

new85743_03_c03_103-168.indd 104 6/18/13 12:00 PM

CHAPTER 3Section

3.1 Qualitative and Descriptive Research Designs

three specific categories of research designs, proceeding in order of increasing control
over elements of the design: descriptive designs, quasi-experimental designs, and true
experimental designs. This chapter will also focus on qualitative research designs that
have similar levels of control as the case study, in which the primary goal is to examine
phenomena of interest in great detail. We will begin by discussing qualitative designs,
including ethnography study, phenomenological study, and grounded theory study. We
will then discuss three prominent examples of descriptive designs that can be used in
either qualitative or quantitative approaches—case studies, archival research, and obser-
vational research—covering the basic concepts, the pros and cons, and contrasting quali-
tative and quantitative approaches of each design (see Figure 3.1). We go on to discuss
interview techniques and then offer guidelines for presenting descriptive data in graphi-
cal, numerical, and narrative form. Finally, we show how to critique a study and write a
proposal for qualitative research projects.

Figure 3.1: Qualitative and descriptive research on the continuum of control

3.1 Qualitative and Descriptive Research Designs

We learned in Chapter 1 that researchers generally take one of two broad approaches to answering their research questions. Quantitative research is a systematic, empirical approach that attempts to generalize results to other con-
texts, whereas qualitative research is a more descriptive approach that attempts to gain a
deep understanding of particular cases and contexts. Before we discuss specific examples
of both qualitative and descriptive designs, it is important to understand that descriptive
designs can represent either quantitative or qualitative perspectives, whereas qualitative
designs represent only qualitative perspectives. In this section, we examine the qualitative
and descriptive approaches in more detail.

In Chapter 1, we used the analogy of studying traffic patterns to contrast qualitative and
quantitative methods—a quantitative researcher would do a “flyover” and perform a sta-
tistical analysis, whereas a qualitative researcher would likely study a single busy inter-
section in detail. This illustrates a key point about the latter approach. All qualitative
approaches have two characteristics in common: (1) Focusing on phenomena that occur
in natural or real-world settings; and (2) studying those phenomena in their complexity.

Increasing Control . . .Increasing Control . . .

• Ethnographic Study
• Phenomenological Study
• Grounded Theory Study
• Case Study
• Archival Research
• Observational Research

Qualitative and
Descriptive Methods

• Survey Research

Predictive
Methods

• Pre-experiments
• Quasi-experiments
• “True” Experiments

Experimental
Methods

new85743_03_c03_103-168.indd 105 6/18/13 12:00 PM

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

Qualitative researchers focus on interpreting and making sense out of what they observe
rather than trying to simplify and quantify these observations. In general, qualitative
research involves collecting interviews, recordings, and observations made in a natural
setting. Regardless of the overall approach (qualitative or quantitative), however, collect-
ing data in the real world results in less control and structure than does collecting data
in a laboratory setting. But whereas quantitative researchers might view reduced control
as a threat to reliability and validity, qualitative researchers view it as a strength of the
study because the phenomenon of interest is being studied in its natural environment.
By conducting observations in a natural setting, it is possible to capture people’s natural
and unfiltered responses. The concepts of reliability and validity for both qualitative and
quantitative approaches are discussed further in Chapter 5.

As an example, consider two studies on the ways people respond to traumatic events.
In a 1993 paper, psychologists James Pennebaker and Kent Harber took a quantitative
approach to examining the community-wide impact of the 1989 Loma Prieta earthquake
(centered in the San Francisco Bay Area). These researchers conducted phone surveys
of 789 area residents, asking people to indicate, using a 10-point scale, how often they
“thought about” and “talked about” the earthquake during the 3-month period after its
occurrence. In analyzing these data, Pennebaker and Harber discovered that people tend
to stop talking about traumatic events about 2 weeks after they occur but keep thinking
about the event for approximately 4 more weeks. That is, the event is still on people’s
minds, but they decide to stop discussing it with other people. In a follow-up study using
the 1991 Gulf War, these researchers found that this conflict between thoughts and their
verbalization leads to an increased risk of illness (Pennebaker & Harber, 1993). Thus, the
goal of the study was to gather data in a controlled manner and test a set of hypotheses
about community responses to trauma.

Contrast this approach with the more qualitative one taken by the developmental psy-
chologist Paul Miller and colleagues (2012), who used a qualitative approach to study the
ways that parents model coping behavior for their children. These researchers conducted
semistructured interviews of 24 parents whose families had been evacuated following the
2007 wildfires in San Diego County and an additional 32 parents whose families had been
evacuated following a 2008 series of deadly tornadoes in Tennessee. Owing to a lack of
prior research on how parents teach their children to cope with trauma, Miller and col-
leagues approached their interviews with the goal of “documenting and describing” (p. 8)
these processes. That is, rather than attempt to impose structure and test a strict hypoth-
esis, the researchers focused on learning from these interviews and letting the interview-
ees’ perspectives drive the acquisition of knowledge.

Qualitative research is undertaken in many academic disciplines, including, psychology,
sociology, anthropology, biology, education, history, and medicine (Leedy & Ormrod,
2010). Although once frowned upon in the fields of psychology and education, due to
their subjective nature, qualitative techniques have gained wide acceptance as legitimate
research. In fact, many researchers argue that qualitative research is the beginning step
to all types of inquiry. Thus, qualitative research can explore unknown topics, unknown
variables, and inadequate theory bases and thereby assist in the generating of hypotheses
for future quantitative studies.

Unlike quantitative studies, qualitative studies do not allow the researcher to iden-
tify cause-and-effect relationships among variables. Rather, the focus is on describing,

new85743_03_c03_103-168.indd 106 6/18/13 12:00 PM

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

interpreting, verifying, and evaluating phenomena, such as personal experiences, events,
and behaviors, in their natural environment. The most common forms of qualitative data
collection techniques are observations, interviews, videotapes, focus groups, and docu-
ment review. Creswell (2009) lists the following characteristics as generally present in
most types of qualitative research:

• Data collection occurs in the natural or real-world setting where participants
experience the issue or problem being investigated.

• The researcher is the key instrument used to collect data through means of exam-
ining documents, observing behavior, or interviewing participants.

• Multiple sources of data are collected and reviewed.
• As discussed in Chapter 1, qualitative researchers use inductive data analysis

and build patterns and themes from the bottom up.
• Focus is on understanding the participants’ experiences, not on what the

researcher believes those experiences mean.
• The research process is emergent and can change after the researcher enters the

field and begins collecting data.
• Researchers as well as participants and readers interpret what they see, hear, and

understand. This results in multiple views of the problem.
• Researchers attempt to develop a complex picture of the problem under investi-

gation, utilizing multiple methods of data collection.

Descriptive research does not fit neatly into the categories of either qualitative or quanti-
tative methodologies; instead, it can utilize qualitative, quantitative, or a mixture of both
methods to describe and interpret events, conditions, behaviors, feelings, and situations.
In all cases, descriptive research investigates situations as they are, and similar to quali-
tative designs, does not involve changing (controlling) the situation under investigation
or attempting to determine cause-and-effect relationships. However, unlike qualitative
designs, descriptive designs usually yield quantitative data that can be analyzed using
statistical analyses. That is, descriptive research gathers data that describe events and then
organizes, tabulates, depicts, and describes the collected data, often using visual aids such
as graphs, tables, and charts.

Collecting data for descriptive research can be done with a single method or a variety
of methods, depending upon the research questions. The most common data collection
methods utilized in descriptive research include surveys, interviews, observations, and
portfolios. In general, descriptive research often yields rich data that can lead to important
recommendations and findings.

In the following six sections, we examine six specific examples of qualitative and descrip-
tive designs: ethnography, phenomenological studies, grounded theory studies, case
studies, archival research, and observational research. The sections on ethnography, phe-
nomenological studies, and grounded theory studies will focus specifically on the quali-
tative uses of these methods, since these are qualitative-only research methods. Because
case studies, archival research, and observational research share the goals of describing
attitudes, feelings, and behaviors, each one can be undertaken from either a quantitative
or a qualitative perspective. In other words, qualitative and quantitative researchers use
many of the same general methods but do so with different ends in mind. To illustrate this
flexibility, we will end these three sections with a paragraph that contrasts qualitative and
quantitative uses of the particular method.

new85743_03_c03_103-168.indd 107 6/18/13 12:00 PM

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

Ethnography Study (Qualitative Design)

Ethnographies were first developed by anthropologists to examine human society and
various cultural groups but are now frequently used in the sociology, psychology, and
education fields. In fact, today ethnographies are probably the most widely used qualita-
tive method for researching social and cultural conditions. Unlike case studies (which will
be discussed later in this chapter) that examine a particular person or event, ethnogra-
phies focus on an entire cultural group or a group that shares a common culture. Although
culture has various definitions, it usually refers to “the beliefs, values and attitudes that
shape the behavior of a particular group of people” (Merriam & Associates, 2002, p. 8).
The concept of what a culture is has also changed over time. Recently, more research has

focused on smaller groups, such as classrooms
and work offices, than on larger groups, such as
northwest Alaskan Natives.

Regardless of whether the cultural group is a
classroom or an entire ethnic group in a particu-
lar region of the world, ethnographic research
involves studying an entire community in order
to obtain a holistic picture of it. For example, in
addition to studying behaviors, researchers will
examine the economic, social, and cultural con-
texts that shape the community or were formed
by the community.

In order to thoroughly study a particular cultural
group, researchers will often immerse themselves
in the community. That is, the researcher will live
in the study community for a prolonged period
and participate in the daily routine and activities
of those being studied. This is called participant
observation. Such prolonged involvement is nec-
essary in order to observe and record processes
that occur over time. Participant observation is
an important data collection procedure in ethno-
graphic research; thus, it is imperative that the
researcher establish rapport and build trusting

relationships with the individuals he or she is studying (Hennink, Hutter, & Bailey, 2011).
Establishing trusting relationships can be a quite lengthy process, which is why ethno-
graphic studies usually span long periods of time.

Steps in Ethnographic Research
Several steps are involved in conducting site-based research and data collection. First, the
researcher must select a site or community that will address the research questions being
asked. Because researchers should not have any expectations regarding the outcome of
the study, it is best if the researcher selects a site that he or she is not affiliated with. Select-
ing sites that the researcher is acquainted with may make it difficult for him or her to
study the group in an unbiased manner.

Ingram Publishing/Thinkstock

Employees who are part of an office culture
are an example of those who might be
studied in an ethnography.

new85743_03_c03_103-168.indd 108 6/18/13 12:01 PM

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

The next step involves gaining entry into the site. This can be a difficult task, as some
researchers may not be well received. Therefore, a successful entrance into a site requires
having access to a gatekeeper, an individual “who can provide a smooth entrance into
the site” (Leedy & Ormrod, 2010, p. 139). Gatekeepers may include a principal of a school,
a leader of a community, a director of a company, a tribal shaman, or any other well-
respected leader of a particular cultural group.

Once inside the site, the researcher must take several delicate steps, including establishing
rapport with individuals and forming trusting relationships. As mentioned previously,
establishing rapport is one of the most critical aspects of participant observation and pro-
vides a foundation for the quality and quantity of data that will be collected. Initially,
establishing trust will involve interacting with everyone. At some point, however, the
researcher will generally select key “informants” who can assist him or her in collecting
the data. Finally, similar to all types of research, the researcher will need to inform indi-
viduals about why he or she is there and the purpose of the study.

As with case studies, data collection and data analysis tend to occur simultaneously. Data
collection may include making observations, obtaining recordings, conducting inter-
views, and/or collecting records from the group. As the information is being collected,
the researcher will read through it in great detail to obtain a general sense of what has
been collected and to reflect on what all the data mean.

The next step is to organize the data based on events, issues, opinions, behaviors, and
other factors and begin to analyze it by sorting the data into categories. The categorized
information will allow the researcher to observe any potential patterns or commonalities
that may exist, as well as to identify any key or critical events.

In addition to categorizing and observing patterns, the researcher will generally develop
thick descriptions of the data, which “involves reading the data and delving deeper into
each issue by exploring its context, meaning, and the nuances that surround it” (Hennink,
Hutter, & Bailey, 2011, p. 239). For example, thick descriptions answer questions about
the data such as, What is the issue? Why does it occur? When does it occur? What are the
perceptions about the issue? What are some explanations about the issue? and, Is the issue
related to other data? Thick descriptions provide additional information on potential con-
nections and relationships that will be useful during data interpretation.

Pros and Cons of Ethnography
Through extensive and expansive investigation that is often personally involving for the
researcher, ethnography allows the examination of a particular cultural group in great
detail. This method provides a holistic picture and understanding of the group as well as
diverse aspects of it. It also allows great flexibility in the types of data collection methods
that can be used. However, as we have seen, ethnographic research requires a long process
of obtaining data and, therefore, can be quite expensive and time consuming. In addi-
tion, if one is not familiar with the various data collection methods, immersing oneself
into a group without a clear idea of how to collect data from it can be overwhelming and
distracting.

new85743_03_c03_103-168.indd 109 6/18/13 12:01 PM

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

As with all forms of participant observation, researcher bias and participant-expectancy
bias (or the participant-observer effect) should be considered when examining the results
of ethnographic research, and in all qualitative research for that matter. Researcher bias
occurs when the researcher influences the results in order to portray a certain outcome.
This type of bias can influence how the data are collected, as well as how it is analyzed
and interpreted. It can also impact what type of data is collected, how the data are catego-
rized, and what types of conclusions are drawn from the data analysis. For example, if a
researcher is not able to lay aside his or her beliefs or assumptions, the type of data col-
lected and the conclusions that are drawn could be biased or misleading. Also, we must
take into account the influence that the researcher has on the participants’ behaviors and
actions. Human nature being what it is, participants sometimes alter their normal behav-
iors to be consistent with what they think the researcher is expecting from them or act
differently simply because they are being observed.

Phenomenological Study (Qualitative Design)

In the same way that ethnography focuses on cultural groups and their behaviors and expe-
riences, a phenomenological study focuses on the person’s perceptions and understand-
ings of an experience. A phenomeno-
logical study is one that attempts to
understand the inner experiences of
an event, such as a person’s percep-
tions, perspectives, and understand-
ings (Leedy & Ormrod, 2010). Phe-
nomenological studies are concerned
primarily with understanding what
it is like to experience certain events.
For example, researchers might be
interested in studying the experi-
ences of military spouses who have
spouses deployed, wounded sol-
diers coming back from war, juvenile
offenders’ perceptions of the thera-
peutic relationship in counseling, or
elderly individuals being placed into
a nursing home. In any situation, the
idea is to better understand the sub-
jective or personal perspectives of
different people as they experience a
particular event.

Some researchers conduct phenomenologicalstudies to obtain a more thorough under-
standing of an experience that they have personally gone through. Looking at an experi-
ence or phenomenon from multiple perspectives can allow them to generalize about what
it is like to experience that phenomenon. However, regardless of the reason for wanting to
conduct the research, it is important that the researcher set aside his or her personal beliefs
and attitudes toward the experience in order to see and fully understand the essence of
the phenomenon being studied (Merriam & Associates, 2002).

Tyler Stableford/Iconica/Getty Images

Phenomenological studies attempt to under-stand what
it is like to experience a certain event, such as returning
home from war.

new85743_03_c03_103-168.indd 110 6/18/13 12:01 PM

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

Steps in Phenomenological Research
Phenomenological research is generally conducted through in-depth, unstructured, and
recorded interviews with a select participant sample (see Section 3.2, Qualitative Research
Interviews). The sample size is usually between 5 and 25 participants who have directly
experienced the phenomenon being studied (Creswell, 1998). Unstructured interviews are
conducted individually with each participant, which allows the researcher to follow the
participant’s experiences thoroughly and ask spontaneous questions based on what is
being discussed. Generally, unstructured interviews do not contain any predetermined
questions, although some researchers develop a few questions to guide the interview,
which is acceptable in phenomenological research. Thus, a typical phenomenological
interview is more like an informal conversation, although the participant does most of
the talking and the researcher does most of the listening. In addition to listening, the
researcher should also note any meaningful facial expressions or body language, as these
can provide additional information regarding the intensity of a feeling or thought.

In phenomenological studies, data are usually analyzed by identifying common themes
across people’s experiences. Themes are created by first transcribing the information from
the interview in full and then editing to remove any unnecessary content. The next step
is to group common statements from the interviews into categories that reflect the vari-
ous aspects of the experience as well as to examine any divergent perspectives among
subjects. The final step is to develop an overall description of how people experience the
phenomenon (Leedy & Ormrod, 2010).

Pros and Cons of Phenomenological Studies
Phenomenological studies give researchers a comprehensive view of a particular phenom-
enon, which is experienced by many but illumined by studying the subjective responses
of a few. Unstructured interviews provide a wealth of data while allowing participants to
describe their experiences in their own way and under their own terms. Phenomenologi-
cal studies are rich in personal experiences and provide a more complete or holistic view
of what people experience.

Phenomenological studies can also be flawed if the interviews veer off topic or commu-
nication misunderstandings crop up. For example, some recorded information may be
difficult to understand. In addition, interviews, data analysis, and data interpretation can
be influenced by researcher bias regarding the experience. As mentioned previously, if
a researcher has personally experienced the phenomenon being studied (rape would be
an emotionally charged example), it is possible that he or she may bring preconceived
notions or prejudices to the study, which will in turn influence how the data are collected
and interpreted.

Grounded Theory Study (Qualitative Design)

Unlike most qualitative research, grounded theory does not begin from a theoretical
perspective or theory but rather utilizes data that are collected to develop new theories
or hypotheses. According to Smith and Davis (2010), “A grounded theory is one that is
uncovered, developed, and conditionally confirmed through collecting and making sense
of data related to the issue at hand” (p. 54). Thus, theories are built from “grounded”

new85743_03_c03_103-168.indd 111 6/18/13 12:01 PM

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

data that have been systematically analyzed and reanalyzed. Grounded theory is typi-
cally used in qualitative research; however, grounded theory can utilize either qualitative
or quantitative data (Glaser, 2008), or a mixture of the two. As Glaser posits, grounded
theory is not only considered a qualitative method but a general method in research. For
example, you may use grounded theory as the only method for your qualitative study,
or you may choose to use it as the first step toward identifying constructs and generat-
ing hypotheses about their relationships to one another. You may then want to employ a
quantitative, cause-and-effect design to further test your hypotheses that were developed
from your grounded theory study.

Grounded theory is especially useful for exploring the relationships and behaviors of
groups that either have not been previously studied or have been inadequately studied.
Grounded theory has been used to study a wide variety of topics, such as stress manage-
ment in Olympic champions (Fletcher & Sakar, 2012), the role of leaders in knowledge
management (Lakshman, 2007), reflections of therapists during role-playing sessions
(Rober, Elliot, Buysse, Loots, & Corte, 2008), normalizing risky sexual behaviors in female
adolescents (Weiss, Jampol, Lievano, Smith, & Wurster, 2008), and team leadership dur-
ing trauma resuscitation (Xiao, Seagull, Mackenzie, & Klein, 2004), to name a few. When
choosing to utilize the grounded theory approach, the idea is to select a topic that has been
minimally explored.

Steps in Grounded Theory Research
In grounded theory research, data are simultaneously collected, coded, and analyzed.
This procedure differs from quantitative methods because during the research process,
data collection and analysis do not occur sequentially. Rather, in grounded theory, data
analysis begins almost immediately when data collection starts. Grounded theory can
utilize a variety of data collection techniques, including interviews, observations, focus
groups, historical records, videotapes, diaries, news reports, and any other form of data
that is relevant to the research question (Leedy & Ormrod, 2010), although in-depth inter-
views are the most commonly used method.

One of the most widely used approaches to data analysis in grounded theory is the
one suggested by Strauss and Corbin (1990). In this approach, data analysis begins by
developing categories to classify the data. This process, called open coding, involves the
researcher labeling and organizing the data into categories or themes and smaller sub-
categories that describe the phenomenon being investigated. In this step, initial coding is
generally guided by some of the literature review, as well as by topic guides developed
by the researcher that direct the coding of themes and categories, based upon the study’s
research questions. Glaser (1978) suggests three questions to be used in generating and
identifying open codes:

1. What is this data a study of?
2. What category does this incident indicate?
3. What is actually happening in the data?

The next step in data analysis is axial coding, which involves finding connections or rela-
tionships between the categories and subcategories (Smith & Davis, 2010). Strauss (1987)
indicates that axial coding should involve the examination of antecedent conditions,
interactions among subjects, strategies, tactics, and consequences. The idea here is to fit

new85743_03_c03_103-168.indd 112 6/18/13 12:01 PM

113

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

together all the pieces, similar to a jigsaw puzzle. Strauss and Corbin (1990) further sug-
gest that axial coding focus on asking the questions Who, When, Where, Why, How, and
With what consequences. As new data are collected, the researcher will move constantly
between data collection, open coding, and axial coding to refine the categories. Also dur-
ing this process, hypotheses are generated and continually tested, based on new data
coming in. Data collection and analysis continue until the categories are completely satu-
rated. Saturation occurs when no additional supporting or disconfirming data are being
found to develop a category. Thus, saturation occurs when we have learned everything
that we can about a category.

The final step, selective coding, involves the researcher combining the categories and their
interrelationships into theoretical constructs or a “story line that describes what happens
in the phenomenon being studied” (Leedy & Ormrod, 2010, p. 143). In other words, the
researcher is integrating and refining the categories so that the categories can be related
to the core categories, or categories that lie at the core of the theory being generated. It is
from this process that theories are generated.

To illustrate the process of grounded theory research, consider an investigation of the active
or passive roles played by companions who accompany patients to their dental appoint-
ments. To examine how these companions affect the interactions between the patient and
the dental provider, we could begin collecting data using field notes and audio recordings.
Next we would compare the interactions among companions, patients, and dentists by
assessing their similarities and differences. We would then identify codes from the initial
data collected, and develop categories to organize the codes. The following step would
be to develop hypotheses about the patterns we observed. Next, we would continue to
collect and analyze data for an extended period to test those hypotheses and develop
more patterns. We would continue collecting data and refining hypotheses until we were
able to account for and explain all examples (the saturation point). We would then gener-
ate a theory from the data regarding the roles that companions play when attending den-
tal appointments with patients.

Pros and Cons of Grounded Theory Studies
Grounded theory gives the researcher significant flexibility with respect to the types of
data collection methods and the ability to readjust the investigation as new data are being
collected (Houser, 2009). Grounded theory also provides a thorough analysis of the data,
which can lead to fairly solid theories or hypotheses about a particular phenomenon.
Additionally, through systematic data collection and analysis procedures, the researcher
is able to explore the complexity of the problem, which often produces richer and more
informative results.

Despite the advantages of being able to develop theories from data collected, there are
some disadvantages to grounded theory. Probably the biggest disadvantage involves
the difficulty in managing large amounts of data. Since there are no standard guidelines
regarding how to identify categories, the novice researcher may have difficulty devel-
oping categories and analyzing the data appropriately. Identifying when a category has
become saturated and when a theory has been completely formed can also be difficult
and requires some experience. Additionally, grounded theory research can be very time
consuming and tedious.

new85743_03_c03_103-168.indd 113 6/18/13 12:01 PM

114

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

Case Studies (Qualitative or Descriptive Design)

At the 1996 meeting of the American Psychological Association (APA), James Pennebaker—
chair of psychology at the University of Texas at Austin—delivered an invited address,
describing his research on the benefits of therapeutic writing. Rather than follow
the expected route of showing graphs and statistical tests to support his arguments,
Pennebaker told a story. In the mid-1980s, when Pennebaker’s lab was starting to study
the effects of structured writing on physical and psychological health, one study partici-
pant was an American soldier who had served in the Vietnam War. Like many others,
this soldier had had difficulty adjusting to what had happened during the war and con-
sequent trouble reintegrating into “normal” civilian life. In Pennebaker’s study, he was
asked to simply spend 15 minutes per day, over the course of a week, writing about a
traumatic experience—in this case, his tour of duty in Vietnam. At the end of this week,
as you might expect, this veteran felt awful; these were unpleasant memories that he had
not relived in over a decade. But during the next few weeks, amazing things started to
happen. He slept better, he made fewer visits to his doctor, and he even reconnected with
his wife after a long separation.

Pennebaker’s presentation is an example of a case study that provides a detailed, in-depth
analysis of one person over a period of time. Although this case study was collected as
part of a larger quantitative experiment, case studies are usually conducted in a therapeu-
tic setting and involve a series of interviews. An interviewer will typically study the sub-
ject in detail, recording everything from direct quotes and observations to his or her own
interpretations. We encountered this technique briefly in Chapter 2 (Section 2.1, Overview
of Research Designs), in discussing Oliver Sacks’s case studies of individuals learning to
live with neurological impairments.

Pros and Cons of Case Studies
In psychology, case studies are a form of qualitative research; thus, they represent the low-
est point on our continuum of control. Because they involve one person at a time, without
a control group, case studies are often unsystematic. That is, the participants are chosen
because they tell a compelling story or because they represent an unusual set of circum-
stances rather than being selected randomly. Studying these individuals allows for a great
deal of exploration, which can often inspire future research. However, it is nearly impos-
sible to generalize from one case study to the larger population. In addition, because the
case study includes both direct observation and the researcher’s interpretation, there is
a risk that a researcher’s biases might influence the interpretations. For example, Pen-
nebaker’s investment in demonstrating that writing has health benefits could have led to
more positive interpretations of the Vietnam veteran’s outcomes. However, in this par-
ticular case study, Pennebaker’s hypothesis about the benefits of writing was supported
because his findings mirror those seen in hundreds of controlled experimental studies
that involved thousands of people. This body of work allows us to feel confident about
the conclusions from the single case.

Case studies have two distinct advantages over other forms of research. First is the simple
fact that anecdotes are persuasive. Despite Pennebaker’s nontraditional approach to a
scientific talk, the audience came away utterly convinced of the benefits of therapeutic

new85743_03_c03_103-168.indd 114 6/18/13 12:01 PM

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

writing. And, despite the fact that Oliver Sacks studied one neurological patient at a time,
the stories in his books shed very convincing light on the ability of humans to adapt to
their circumstances and have a wide appeal to the lay reader. Second, case studies pro-
vide a useful way to study rare populations and individuals with rare conditions. For
example, from a scientific point of view, the ideal might be to gather a random sample
of individuals living with severe memory impairment due to alcohol abuse and conduct
some sort of controlled study in a laboratory environment. This approach could allow
us to make causal statements about the results, as we will discuss in Chapter 5 (Section
5.4, Experimental Designs). However, from a practical point of view, this study would
be nearly impossible to conduct, making case studies such as Sacks’s interviews with
William Thompson the best strategy for understanding this condition in depth.

Examples of Case Studies
Throughout the history of psychology, case studies have been used to address a number
of important questions and to provide a starting point for controlled quantitative studies.
For example, in developing his theories of cognitive development, the Swiss psychologist
Jean Piaget studied the way that his own children developed and changed their thinking
styles. Piaget proposed that children would progress through a series of four stages in
the way that they approached the world—sensorimotor, preoperational, concrete opera-
tional, and formal operational—with each stage involving more sophisticated cognitive
skills than the previous stage. By observing his own children, Piaget noticed preliminary
support for this theory and later was able to conduct more controlled research with larger
populations.

Perhaps one of the most famous case studies in psychology is the story of Phineas Gage,
a 19th-century railroad worker who suffered severe brain damage. In September of 1848,
Gage was working with a team to blast large sections of rock to make way for new rail
lines. After a large hole was drilled into a section of rock, Gage’s job was to pack the hole
with gunpowder, sand, and a fuse and then tamp it down with a long cylindrical iron
rod (known as a “tamping rod”). On this particular occasion, it seems Gage forgot to
pack in the sand. So when the iron rod struck gunpowder, the powder exploded, sending
the 3-foot long iron rod through his face, behind his left eye, and out the top of his head.
Against all odds, Gage survived this incident with relatively few physical side effects.
However, everyone around him noticed that his personality had changed—Gage became
more impulsive, violent, and argumentative. Gage’s physician, John Harlow, reported the
details of this case in an 1868 article. The following passage is a great example of the rich
detail that is often characteristic of case studies:

He is fitful, irreverent, indulging at times in the grossest profanity (which
was not previously his custom), manifesting but little deference for his fel-
lows, impatient of restraint or advice when it conflicts with his desires.
A child in his intellectual capacity and manifestations, he has the animal
passions of a strong man. Previous to his injury, although untrained in the
schools, he possessed a well-balanced mind, and was looked upon by those
who knew him as a shrewd, smart businessman, very energetic and per-
sistent in executing all his plans of operation. In this regard his mind was
radically changed, so decidedly that his friends and acquaintances said he
was “no longer Gage.” (Harlow, 1868, pp. 339–342)

new85743_03_c03_103-168.indd 115 6/18/13 12:01 PM

116

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

Gage’s transformation ultimately
inspired a large body of work in
psychology and neuroscience that
attempts to understand the con-
nections between brain areas and
personality. The area of his brain
destroyed by the tamping rod is
known as the frontal lobe, now
understood to play a critical role
in impulse control, planning, and
other high-level thought processes.
Gage’s story is a perfect illustration
of the pros and cons of case stud-
ies. On the one hand, it is difficult
to determine exactly how much the
brain injury affected his behavior
because he is only one person. On

the other hand, Gage’s tragedy inspired researchers to think about the connections among
mind, brain, and personality. As a result, we now have a vast—and still growing—under-
standing of the brain. This illustrates a key point about case studies: Although individual
cases provide limited knowledge about people in general, they often lead researchers to
conduct additional work that does lead to generalizable knowledge.

Qualitative Versus Quantitative Approaches
Case studies tend to be qualitative more often than not. The goal of this method is to
study a particular case in depth as a way to learn more about a rare phenomenon. In
both Pennebaker’s study of the Vietnam veteran and Harlow’s study of Phineas Gage,
the researcher approached the interview process as a way to gather information and learn
from the bottom up about the interviewee’s experience. However, it is certainly possible
for a case study to represent quantitative research. This is often the case when research-
ers conduct a series of case studies, learning from the first one of the initial few and then
developing hypotheses to test on future cases. For example, a researcher could use the
case of Phineas Gage as a starting point for hypotheses about frontal lobe injury, perhaps
predicting that other cases would show similar changes in personality. Another way in
which case studies can add a quantitative element is for researchers to conduct analyses
within a single subject. For example, a researcher could study a patient with brain dam-
age for several years following an injury, tracking the association between deterioration
of brain regions with changes in personality and emotional responses. At the end of the
day, though, these examples would still suffer from the primary downside of case studies:
Because they study a single individual, it is difficult to generalize their findings.

Courtesy Everett Collection

Various views show an iron rod embedded in Phineas Gage’s
(1823–1860) skull.

new85743_03_c03_103-168.indd 116 6/18/13 12:01 PM

117

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

Research: Thinking Critically

Acupuncture of Benefit to Those with Unexplained Symptoms

By the Peninsula College of Medicine and Dentistry, Exeter, UK

Attending frequently with medically unexplained symptoms is distressing for both patient and doc-
tor. In these settings, effective treatment or management options are limited: One in five patients
has symptoms that remain unexplained by conventional medicine. Studies have shown that the cost
to the National Health Service (NHS, United Kingdom) of managing the treatment of a patient with
medically unexplained symptoms can be twice that of a patient with a diagnosis.

A research team from the Institute of Health Services Research, Peninsula Medical School, University
of Exeter, has carried out a randomised control trial and a linked interview study regarding 80 such
patients from GP (General Practitioner) practices across London to investigate their experiences of
having five-element acupuncture added to their usual care. This is the first trial of traditional acu-
puncture for people with unexplained symptoms.

The results of the research are published in the British Journal of General Practice. They reveal that
acupuncture had a significant and sustained benefit for these patients and, consequently, acupunc-
ture could be safely added to the therapies used by practitioners when treating frequently attending
patients with medically unexplained symptoms.

The patient group was made up of 80 adults, 80% female, with an average age of 50 years and from
a variety of ethnic backgrounds who had consulted their GP at least eight times in the past year.
Nearly 60% reported musculoskeletal health problems, of which almost two thirds had been pres-
ent for a year.

In the 3 months before taking part in the study, the 80 patients had accounted for the following NHS
experiences: 21 patient in-days; 106 outpatient clinic visits; 52 hospital clinic visits (for treatments
such as physiotherapy, chiropody, and counselling); 44 hospital visits for investigations (including 10
magnetic resonance imaging [MRI]scans); and 75 visits to non-NHS practitioners such as opticians,
dentists, and complementary therapists.

The patients were randomly divided into an acupuncture group and a control group. Eight acupunc-
turists administered individual five-element acupuncture to the acupuncture group immediately, up
to 12 sessions over 26 weeks. The same numbers of treatments were made available to the control
group after 26 weeks.

At 26 weeks, the patients were asked to complete a number of questionnaires including the individu-
alized health status questionnaire “Measure Yourself Medical Outcome Profile.”

The acupuncture group registered a significantly improved overall score when compared with the
control group. They also recorded improved well-being but did not show any change in GP and other
clinical visits and the number of medications they were taking. Between 26 and 52 weeks, the acu-
puncture group maintained their improvement and the control group, now receiving their acupunc-
ture treatments, showed a “catch-up” improvement.

The associated qualitative study, which focused on the patients’ experiences, supported the quan-
titative work. This element identified that the participating patients had a variety of long-standing
symptoms and disability, including chronic pain, fatigue, and emotional problems, which affected
their ability to work, socialize, and carry out everyday tasks. A lack of a convincing diagnosis to explain
their symptoms led to frustration, worry, and low mood.

(continued)

new85743_03_c03_103-168.indd 117 6/18/13 12:01 PM

118

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

Research: Thinking Critically (continued)
Participating patients reported that their acupuncture consultations became increasingly valuable.
They appreciated the amount of time they had with each acupuncturist and the interactive and holis-
tic nature of the sessions—there was a sense that the practitioners were listening to their concerns
and, via therapy, doing something positive about them.

As a result, many patients were encouraged to take an active role in their treatment, resulting in
cognitive and behavioural lifestyle changes, such as a new self-awareness about what caused stress
in their lives, and a subsequent ability to deal with stress more effectively, and taking their own initia-
tives based on advice from the acupuncturists about diet, exercise, relaxation, and social activities.

Comments from participating patients included: “The energy is the main thing I have noticed. You
know, yeah, it’s marvellous! Where I was going out and cutting my grass, now I’m going out and cut-
ting my neighbour’s after because he’s elderly”; “I had to reduce my medication. That’s the big help
actually, because medication was giving me more trouble . . . side effects”; and “It kind of boosts
you, somehow or another.”

Dr. Charlotte Paterson, who managed the randomised control trial and the longitudinal study of
patients’ experiences, commented: “Our research indicates that the addition of up to 12 five-element
acupuncture consultations to the usual care experienced by the patients in the trial was feasible and
acceptable and resulted in improved overall well-being that was sustained for up to a year.

This is the first trial to investigate the effectiveness of acupuncture treatment to those with unex-
plained symptoms, and the next development will be to carry out a cost-effectiveness study with a
longer follow-up period. While further studies are required, this particular study suggests that GPs
may recommend a series of five-element acupuncture consultations to patients with unexplained
symptoms as a safe and potentially effective intervention.

Paterson added: “Such intervention could not only result in potential resource savings for the NHS,
but would also improve the quality of life for a group of patients for whom traditional biomedicine
has little in the way of effective diagnosis and treatment.”

Peninsula College of Medicine and Dentistry. (2011, May 27). Acupuncture and those with unexplained symptoms. From Paterson,
C., Taylor, R., Griffiths, P., Britten, N., Rugg, S., Bridges, J., McCallum, B., Kite, G. (2011). Acupuncture for ‘frequent attenders’
with medically unexplained symptoms: a randomised controlled trial (CACTUS Study). British Journal of General Practice,
Volume 61, Number 587, June 2011 , pp. e295-e305(11) and Rugg, S. , Paterson, C., Britten, N., Bridges, J., Griffiths, P. (2011).
Traditional acupuncture for people with medically unexplained symptoms: a longitudinal qualitative study of patients’ experiences.
British Journal of General Practice, Volume 61, Number 587, June 2011 , pp. e306-e315(10).

Think about it:

1. In this study, researchers interviewed acupuncture patients using open-ended questions and
recorded their verbal responses, which is a common qualitative research technique. What
advantages does this approach have over administering a quantitative questionnaire with
multiple-choice items?

2. What are some advantages of adding a qualitative element to a controlled medical trial
like this?

3. What would be some disadvantages of relying exclusively on this approach?

new85743_03_c03_103-168.indd 118 6/18/13 12:01 PM

119

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

Archival Research (Qualitative or Descriptive Design)

Moving slightly further along the continuum of control, we come to archival research,
which involves drawing conclusions by analyzing existing sources of data, including both
public and private records. Sociologist David Phillips (1997) hypothesized that media cov-
erage of suicides would lead to “copycat” suicides. He tested this hypothesis by gathering
archival data from two sources: front-page newspaper articles devoted to high-profile sui-
cides and the number of fatalities in the 11-day period following coverage of the suicide.
By examining these patterns of data, Phillips found support for his hypothesis. Specifi-
cally, fatalities appeared to peak 3 days after coverage of a suicide, and increased publicity
was associated with a greater peak in fatalities.

Pros and Cons of Archival Research
It is difficult to imagine a better way to test Phillips’s hypothesis about copycat suicides.
You could never randomly assign people to learn about suicides and then wait to see
whether they killed themselves. Nor could you interview people right before they com-
mitted suicide to determine whether they were being inspired by media coverage. Archi-
val research provides a way to test the hypothesis by examining existing data and thereby
avoids most of the ethical and practical problems of other research designs. Related to this
point, archival research also neatly sidesteps issues of participant reactivity, or the ten-
dency of people to behave differently when they are aware of being observed. Any time
you conduct research in a laboratory, participants are aware that they are in a research
study and may not behave in a completely natural manner. In contrast, archival data
involve making use of records of people’s natural (unstudied) behaviors. The subjects of
Phillips’s study of copycat suicides were individuals who decided to kill themselves and
who had no awareness that they would be part of a research study.

Archival research is also an excellent strategy for examining trends and changes over
time. For example, much of the evidence for global warming comes from observing
upward trends in recorded temperatures around the globe. To gather this evidence,
researchers dig into existing archives of weather patterns and conduct statistical tests
on the changes over time. Psychologists and other social scientists also make use of this
approach to examine population-level changes in everything from suicide rates to voting
patterns over time. These comparisons can sometimes involve a blend of archival and
current data. For example, a great deal of social psychology research has been dedicated
to understanding people’s stereotypes about other groups. In a classic series of stud-
ies known as the “Princeton Trilogy,” researchers documented the stereotypes held by
Princeton students over several decades (1933 to 1969). Social psychologist Stephanie
Madon and her colleagues (2001) collected a new round of data but also conducted a
new analysis of this archival data. These new analyses suggested that, over time, people
have become more willing to use stereotypes about other groups, even as stereotypes
themselves have become less negative.

One final advantage of archival research is that once you manage to gain access to the
relevant archives, it requires relatively few resources. The typical laboratory experiment
involves one participant at a time, sometimes requiring the dedicated attention of more
than one research assistant over a period of an hour or more. But once you have assem-
bled your data from the archives, it is a relatively simple matter to conduct statistical

new85743_03_c03_103-168.indd 119 6/18/13 12:01 PM

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

analyses. In a 2001 article, the psychologists Shannon Stirman and James Pennebaker used
a text-analysis computer program to compare the language of poets who committed sui-
cide (e.g., Sylvia Plath) with the language of similar poets who had not committed suicide
(e.g., Denise Levertov). In total, these researchers examined 300 poems from 20 poets, half
of whom had committed suicide. Consistent with Émile Durkheim’s theory of suicide as a
form of “social disengagement,” Stirman and Pennebaker (2001) found that suicidal poets
used more self-references and fewer references to other people in their poems. But here’s
the impressive part: Once they had assembled their archive of poems, it took only seconds
for their computer program to analyze the language and generate a statistical profile of
each poet’s verbal output.

Overall, however, archival research is still relatively low on our continuum of control.
As a researcher, you have to accept the archival data in whatever form they exist, with
no control over the way they were collected. For instance, in Stephanie Madon’s (2001)
reanalysis of the “Princeton Trilogy” data, she had to trust that the original research-
ers had collected the data in a reasonable and unbiased way. In addition, because archi-
val data often represent natural behavior, it can be difficult to categorize and organize
responses in a meaningful and quantitative way. The upshot is that archival research often
requires some creativity on the researcher ’s part—such as analyzing poetry using a text
analysis program. In many cases, as we discuss next, the process of analyzing archives
involves developing a coding strategy for extracting the most relevant information.

Content Analysis—Analyzing Archives
In most of our examples so far, the data have come in a straightforward, ready-to-analyze
form. That is, it is relatively simple to count the number of suicides, track the average tem-
perature, or compare responses to questionnaires about stereotyping over time. In other
cases, the data can come in a sloppy, disorganized mass of information. What do you do
if you want to analyze literature, media images, or changes in race relations on television?
These types of data can yield incredibly useful information, provided you can develop a
strategy for extracting it.

Mark Frank and Tom Gilovich—both psychologists at Cornell University—were inter-
ested in whether cultural associations with the color black would have an effect on behav-
ior. In virtually all cultures, black is associated with evil—the bad guys wear black hats;
we have a “black day” when things turn sour; and we are excluded from social groups
by being blacklisted or blackballed. Frank and Gilovich (1988) wondered whether “a cue
as subtle as the color of a person’s clothing” (p. 74) would influence aggressive behavior.
To test this hypothesis, they examined aggressive behaviors in professional football and
hockey games, comparing teams whose uniforms were black with teams who wore other
colors. Imagine for a moment that this was your research study. Professional sporting
events contain a wealth of behaviors and events. How would you extract information on
the relationship between uniform color and aggressive behavior?

Frank and Gilovich (1988) solved this problem by examining public records of penalty
yards (football) and penalty minutes (hockey) because these represent instances of pun-
ishment for excessively aggressive behavior, as recognized by the referees. And, in both
sports, the size of the penalty increases according to the degree of aggression. These pen-
alty records were obtained from the central offices of both leagues, covering the period
from 1970 to 1986. Consistent with their hypothesis, teams with black uniforms were

new85743_03_c03_103-168.indd 120 6/18/13 12:01 PM

121

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

“uncommonly aggressive” (p. 76). Most strikingly, two NHL hockey teams changed their
uniforms to black during the period under study and showed a marked increase in pen-
alty minutes while sporting the new uniforms!

But even this analysis is relatively straightforward in that it involved data that were already
in quantitative form (penalty yards and minutes). In many cases, the starting point is a
messy collection of human behavior. In a pair of journal articles, psychologist Russell Wei-
gel and his colleagues (1980; 1995) examined the portrayal of race relations on prime-time
television. In order to do this, they had to make several critical decisions about what to
analyze and how to quantify it. The process of systematically extracting and analyzing the
contents of a collection of information is known as content analysis. In essence, content
analysis involves developing a plan to code and record specific behaviors and events in a
consistent way. We can break this down into a three-step process:

Step 1—Identify Relevant Archives

Before we develop our coding scheme, we have
to start by finding the most appropriate source
of data. Sometimes the choice is fairly obvious.
If you want to compare temperature trends, the
most relevant archives will be weather records.
If you want to track changes in stereotyping over
time, the most relevant archive will comprise
questionnaire data assessing people’s attitudes.
In other cases, this decision involves careful con-
sideration of both your research question and
practical concerns. Frank and Gilovich decided
to study penalties in professional sports because
these data were both readily available (from the
central league offices) and highly relevant to their
hypothesis about aggression and uniform color.

Because these penalty records were publicly
available, the researchers were able to access them
easily. But if your research question involved sen-
sitive or personal information—such as hospital
records or personal correspondence—you would
need to obtain permission from a responsible
party. Let’s say you wanted to analyze the love
letters written by soldiers serving overseas and
then try to predict relationship stability. Because
these letters would be personal, perhaps rather
intimate, you would need permission from each person involved before proceeding with
the study. Or, say you wanted to analyze the correlation between the length of a person’s
hospital stay and the number of visitors he or she receives. This would most likely require
permission from both hospital administrators, doctors, and the patients themselves. How-
ever you manage to obtain access to private records, it is absolutely essential to protect
the privacy and anonymity of the people involved. This would mean, for example, using
pseudonyms and/or removing names and other identifiers from published excerpts of
personal letters.

iStockphoto/Thinkstock

A personal letter is an example of a data
source that a researcher would need to
obtain permission to use.

new85743_03_c03_103-168.indd 121 6/18/13 12:01 PM

122

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

Step 2—Sample From the Archives

In Weigel’s research on race relations, the most obvious choice of archives consisted of
snippets of both television programming and commercials. But this decision was only
the first step of the process. Should they examine every second of every program ever
aired on television? Naturally not; instead, their approach was to take a smaller sample
of television programming. We will discuss sampling in more detail in Chapter 4 (Sec-
tion 4.3, Sampling From the Population), but the basic process involves taking a smaller,
representative collection of the broader population in order to conserve resources. Weigel
and colleagues (1980) decided to sample one week’s worth of prime-time programming
from 1978, assembling videotapes of everything broadcast by the three major networks at
the time (CBS, NBC, and ABC). They narrowed their sample by eliminating news, sports,
and documentary programming because their hypotheses were centered on portrayals of
fictional characters of different races.

Step 3—Code and Analyze the Archives

The third and most involved step is to develop a system for coding and analyzing the
archival data. Even a sample of one week’s worth of prime-time programming contains
a near-infinite amount of information! In the race-relations studies, Weigel et al. elected
to code four key variables: (1) the total human appearance time, or time during which
people were on-screen; (2) the Black appearance time, in which Black characters appeared
on-screen; (3) the cross-racial appearance time, in which characters of two races were on-
screen at the same time; and (4) the cross-racial interaction time, in which cross-racial
characters interacted. In the original (1980) paper, these authors reported that Black char-
acters were shown only 9% of the time, and cross-racial interactions only 2% of the time.
Fortunately, by the time of their 1995 follow-up study, the rate of Black appearances had
doubled, and the rate of cross-racial interactions had more than tripled. However, there
was discouragingly little change in some of the qualitative dimensions that they mea-
sured, including the degree of emotional connection between characters of different races.

This study also highlights the variety of options for coding complex behaviors. The four
key ratings of “appearance time” consist of simply recording the amount of time that each
person or group is on-screen. In addition, the researchers assessed several abstract quali-
ties of interaction using judges’ ratings. The degree of emotional connection, for instance,
was measured by having judges rate the “extent to which cross-racial interactions were
characterized by conditions promoting mutual respect and understanding” (Weigel et al.,
1980, p. 888). As you’ll remember from Chapter 2 (Section 2.2, Reliability and Validity),
any time you use judges’ ratings, it is important to collect ratings from more than one rater
and to make sure they agree in their assessments.

Your goal as an archival researcher is to find a systematic way to record the variables most
relevant to your hypothesis. As with any research design, the key is to start with clear
operational definitions that capture the variables of interest. This involves both deciding
the most appropriate variables and the best way to measure these variables. For example,
if you analyze written communication, you might decide to compare words, sentences,
characters, or themes across a sample. A study of newspaper coverage might code the
amount of space or number of stories dedicated to a topic. Also, a study of television news
might code the amount of airtime given to different points of view. The best strategy in
each case will be the one that best represents the variables of interest.

new85743_03_c03_103-168.indd 122 6/18/13 12:01 PM

123

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

Qualitative Versus Quantitative Approaches
Archival research can represent either qualitative or quantitative research, depending on
the researcher’s approach to the archives. Most of our examples in this section represent the
quantitative approach: Frank and Gilovich (1988) counted penalties to test their hypoth-
esis about aggression; and Stirman and Pennebaker (2001) counted self-referential words
in poetry to test their hypothesis about suicide. But the race-relations work by Weigel and
colleagues (1980; 1995) represents a nice mix of qualitative and quantitative research. In
their initial 1980 study, the primary goal was to document the portrayal of race relations
on prime-time television (i.e., qualitative). But in the 1995 follow-up study, the primary
goal was to determine whether these portrayals had changed over a 15-year period. That
is, they tested the hypothesis that race relations were portrayed in a more positive light
(i.e., quantitative). Another way in which archival research can be qualitative is to study
open-ended narratives without attempting to impose structure upon them. This approach
is commonly used to study free-flowing text such as personal correspondence or letters to
the editor in a newspaper. A researcher approaching these from a qualitative perspective
would attempt to learn from these narratives but without imposing structure via the use
of content analyses.

Observational Research (Qualitative or Descriptive Design)

Moving further along the continuum of control, we come to the descriptive design with
the greatest amount of researcher control. Observational research involves studies that
directly observe behavior and record these observations in an objective and systematic
way. In previous psychology courses, you may have encountered the concept of attach-
ment theory, which argues that an infant’s bond with his or her primary caregiver has
implications for later social and emotional development. Mary Ainsworth, a Canadian
developmental psychologist, and John Bowlby, a British psychologist and psychiatrist,
articulated this theory in the early 1960s, arguing that children can form either “secure”
or a variety of “insecure” attachments with their caregivers (Ainsworth & Bell, 1970;
Bowlby, 1963).

In order to assess these classifications, Ainsworth and Bell (1970) developed an observa-
tional technique called the “strange situation.” Mothers would arrive at their laboratory
with their children for a series of structured interactions, including having the mother
play with the infant, leave him or her alone with a stranger, and then return to the room
after a brief absence. The researchers were most interested in coding the ways in which
the infant responded to the various episodes (eight in total). One group of infants, for
example, showed curiosity when the mother left but then returned to playing with their
toys, trusting that she would return. Another group showed immediate distress when
the mother left and clung to her nervously upon her return. Based on these and other
behavioral observations, Ainsworth and colleagues classified these groups of infants as
“securely” and “insecurely” attached to their mothers, respectively.

new85743_03_c03_103-168.indd 123 6/18/13 12:01 PM

124

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

Pros and Cons of Observational Research
Observational designs are well suited to a wide range of research questions, provided the
questions can be addressed through directly observable behaviors and events; for exam-
ple, if the researcher is able to observe parent–child interactions, nonverbal cues to emo-
tion, or even crowd behavior. However, if a researcher is interested in studying thought
processes—such as how mothers interpret their interactions—then observation will not
suffice. This harkens back to our discussion of behavioral measures in Chapter 2 (Section
2.2, Reliability and Validity): In exchange for giving up access to internal processes, you
gain access to unfiltered behavioral responses.

To capture these unfiltered behaviors, it is vital for the researcher to be as unobtrusive as
possible. As we have already discussed, people have a tendency to change their behavior
when they are being observed. In the bullying study by Craig and Pepler (1997) discussed

Research: Making an Impact

Harry Harlow

In the 1950s, U.S. psychologist Harry Harlow conducted a landmark series of studies with rhesus
monkeys on the mother–infant bond. While his research would be considered unethical by contem-
porary standards, the results of his work revealed the importance of affection, attachment, and love
on healthy childhood development.

Prior to Harlow’s findings, it was believed that infants attached to their mothers as a part of a drive to
fulfill exclusively biological needs, in this case obtaining food and water and to avoid pain (Herman,
2007; van der Horst & van der Veer, 2008). In an effort to clarify the reasons that infants so clearly
need maternal care, Harlow removed rhesus monkeys from their natural mothers several hours after
birth, giving the young monkeys a choice between two surrogate “mothers.” Both mothers were
made of wire, but one was bare and one was covered in terry cloth. Although the wire mother pro-
vided food via an attached bottle, the monkeys preferred the softer, terry-cloth mother, even though
the latter provided no food (Harlow & Zimmerman, 1958; Herman, 2007).

Further research with the terry-cloth mothers contributed to the understanding of healthy attach-
ment and childhood development (van der Horst & van der Veer, 2008). When the young monkeys
were given the option to explore a room with their terry-cloth mothers and had the cloth mothers in
the room with them, they used the mothers as a safe base. Similarly, when exposed to novel stimuli
such as a loud noise, the monkeys would seek comfort from the cloth-covered surrogate (Harlow &
Zimmerman, 1958). However, when the monkeys were left in the room without their cloth mothers,
they reacted poorly—freezing up, crouching, crying, and screaming.

A control group of monkeys who were never exposed to either their real mothers or one of the sur-
rogates revealed stunted forms of attachment and affection. They were left incapable of forming
lasting emotional attachments with other monkeys (Herman, 2007). Based on this research, Harlow
discovered the importance of proper emotional attachment, stressing the importance of physical
and emotional bonding between infants and mothers (Harlow & Zimmerman, 1958; Herman, 2007).

Harlow’s influential research led to improved understanding of maternal bonding and child develop-
ment (Herman, 2007). His research paved the way for improvements in infant and child care and in
helping children cope with separation from their mothers (Bretherton, 1992; Du Plessis, 2009). In
addition, Harlow’s work contributed to the improved treatment of children in orphanages, hospitals,
day care centers, and schools (Herman, 2007; van der Horst & van der Veer, 2008).

new85743_03_c03_103-168.indd 124 6/18/13 12:01 PM

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

at the beginning of this chapter, the researchers used video cameras to record children’s
behavior unobtrusively; otherwise, the occurrence of bullying might have been artificially
low. If you conduct an observational study in a laboratory setting, there is no way to hide
the fact that people are being observed, but the use of one-way mirrors and video record-
ings can help people to become comfortable with the setting (versus having an experi-
menter staring at them across the table). If you conduct an observational study out in the
real world, there are even more possibilities for blending into the background, includ-
ing using observers who are literally hidden. For example, let’s say you hypothesize that
people are more likely to pick up garbage when the weather is nicer. Rather than station
an observer with a clipboard by the trash can, you could place someone out of sight,
standing behind a tree or perhaps sitting on a park bench pretending to read a magazine.
In both cases, people would be less conscious of being observed and therefore more likely
to behave naturally.

One extremely clever strategy for blending in comes from a study by the social psychol-
ogist Muzafer Sherif, involving observations of cooperative and competitive behaviors
among boys at a summer camp (Sherif et al., 1954). You can imagine that it was particu-
larly important to make observations in this context without the boys realizing they were
part of a research study. Sherif took on the role of camp janitor, allowing him to be a pres-
ence in nearly all of the camp activities. The boys never paid enough attention to the “jani-
tor” to realize his omnipresence—or his discreet note taking. The brilliance of this idea is
that it takes advantage of the fact that people tend to blend into the background once we
become used to their presence.

Types of Observational Research
There are several variations on observational research, according to the amount of control
that a researcher has over the data collection process.

Structured Observation

Structured observation involves creating a standard situation in a controlled setting and
then observing participants’ responses to a predetermined set of events. The “strange sit-
uation” studies of attachment (discussed previously) are a good example of structured
observation—mothers and infants are subjected to a series of eight structured episodes,
and researchers systematically observe and record the infants’ reactions. Even though
these types of studies are conducted in a laboratory, they differ from experimental studies
in an important way: Rather than systematically manipulate a variable to make compari-
sons, researchers present the same set of conditions to all participants.

Another example of structured observation comes from the research of John Gottman,
a psychologist at the University of Washington. For nearly three decades, Gottman and
his colleagues have conducted research on the interaction styles of married couples.
Couples who take part in this research are invited for a 3-hour session in a laboratory
that closely resembles a living room. Gottman’s goal is to make couples feel reasonably
comfortable and natural in the setting, in order to get them talking as they might do at
home. After allowing them to settle in, Gottman adds the structured element by asking
the couple to discuss an “ongoing issue or problem” in their marriage. The researchers
then sit back to watch the sparks fly, recording everything from verbal and nonverbal

new85743_03_c03_103-168.indd 125 6/18/13 12:01 PM

126

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

communication to measures of heart rate and blood pressure. Gottman has observed
and tracked so many couples over the decades that he is able to predict, with remark-
able accuracy, which couples will divorce in the 18 months following the lab visit
(Gottman & Levenson, 1992).

Naturalistic Observation

Naturalistic observation involves observing and systematically recording behavior out in
the real world. This can be done in two broad ways—with or without intervention on the
part of the researcher. Naturalistic studies that involve researcher intervention consist of
manipulating some aspect of the environment and then observing responses. For exam-
ple, you might leave a shopping cart just a few feet away from the cart return area and
measure whether people move the cart. (Given the number of carts that are abandoned
just inches away from their proper destination, someone must be doing this research all
the time. . . .) In another example you may remember from Chapter 1 (in our discussion of
ethical dilemmas in Section 1.7, Ethics in Research), Harari and associates (1995) used this
approach to study whether people would help in emergency situations. In brief, these
researchers staged what appeared to be an attempted rape in a public park and then
observed whether groups or individual males were more likely to rush to the victim’s aid.

The ABC network has developed a hit reality show that illustrates this type of research.
The show What Would You Do? sets up provocative settings in public and videotapes peo-
ple’s reactions; full episodes are available online at http://abcnews.go.com/WhatWould
YouDo/. If you were an unwitting participant in one of these episodes, you might see a
customer stealing tips from a restaurant table or a son berating his father for being gay
or a man proposing to his girlfriend who minutes earlier had been kissing another man
at the bar. Of course, these observation “studies” are more interested in shock value than
data collection (or IRB approval; see Section 1.6), but the overall approach can be a useful

strategy to assess people’s reactions
to various situations. In fact, some of
the scenarios on the show are based
on classic studies in social psychol-
ogy, such as the well-documented
phenomenon that people are reluc-
tant to take responsibility for helping
in emergencies.

Alternatively, naturalistic studies can
involve simply recording ongoing
behavior without any attempt by the
researchers to intervene or influence
the situation. In these cases, the goal
is to observe and record behavior
in a completely natural setting. For
example, you might station your-
self at a liquor store and observe the
numbers of men and women who
buy beer versus wine. Or, you might
observe the numbers of people who
give money to the Salvation Army

James D. Smith/Associated Press

Naturalistic studies involve observing and recording
behavior in the real world. One example might be noting
what type of people give money to charities and under
what conditions they donate.

new85743_03_c03_103-168.indd 126 6/18/13 12:01 PM

http://abcnews.go.com/WhatWouldYouDo/

127

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

bell ringers during the holiday season. You can use this approach to make comparisons of
different conditions, provided the differences occur naturally. That is, you could observe
whether people donate more money to the Salvation Army on sunny or snowy days or
compare donation rates when the bell ringers are of a different gender or race. Do people
give more money when the bell ringer is an attractive female? Or do they give more to
someone who looks needier? These are all research questions that could be addressed
using a well-designed naturalistic observation study.

Participant Observation

Participant observation involves having the researcher(s) conduct observations while
engaging in the same activities as the participants. The goal is to interact with these par-
ticipants in order to gain better access and insight into their behaviors. In one famous
example, the psychologist David Rosenhan (1973) was interested in the experience of peo-
ple hospitalized for mental illness. To study these experiences, he had eight perfectly sane
people gain admission to different mental hospitals. These fake patients were instructed
to give accurate life histories to a doctor except for lying about one diagnostic symptom;
they all supposedly heard voices occasionally, a symptom of schizophrenia.

Once admitted, these “patients” behaved in a normal and cooperative manner, with
instructions to convince hospital staff that they were healthy enough to be released. In
the meantime, they observed life in the hospital and took notes on their experiences—a
behavior that many doctors interpreted as “paranoid note taking.” The main finding of
this study was that hospital staff tended to see all patient behaviors through the lens of
their initial diagnoses. Despite immediately acting “normally,” these fake patients were
hospitalized an average of 19 days (with a range from 7 to 52!) before being released. And
all but one was given a diagnosis of “schizophrenia in remission” upon release. The other
striking finding was that treatment was generally depersonalized, with staff spending
little time with individual patients.

In another great example of participant observation, Festinger, Riecken, and Schachter
(1956) decided to join a doomsday cult to test their new theory of cognitive dissonance.
Briefly, this theory argues that people are motivated to maintain a sense of consistency
among their various thoughts and behaviors. So, for example, if you find yourself smok-
ing a cigarette despite being aware of the health risks, you might rationalize your smoking
by convincing yourself that lung cancer risk is really just genetic. In this case, Festinger
and colleagues stumbled upon the case of a woman named Mrs. Keach, who was predict-
ing the end of the world, via alien invasion, at 11 p.m. on a specific date 6 months in the
future. What would happen, they wondered, when this prophecy failed to come true?

To answer this question, the researchers pretended to be new converts and joined the
cult, living among the members and observing them as they made their preparations for
doomsday. Sure enough, the day came, and 11 p.m. came and went without the world
ending. Mrs. Keach first declared that she had forgotten to account for the time zone dif-
ference, but as sunrise started to approach, the group members became restless. Finally,
after a short absence to communicate with the aliens, Mrs. Keach returned with some good
news: The aliens were so impressed with the devotion of the group that they decided to
postpone their invasion! The group members rejoiced, rallying around this brilliant piece
of rationalizing, and quickly began a new campaign to recruit new members.

new85743_03_c03_103-168.indd 127 6/18/13 12:01 PM

128

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

As you can see from these examples, participant observation can provide access to amaz-
ing and one-of-a-kind data, including insights into group members’ thoughts and feel-
ings. This form of investigation also provides access to groups that might be reluctant
to allow in outside observers. However, this approach has two clear disadvantages over
other types of observation. The first problem is ethical; data are collected from individuals
who do not have the opportunity to give informed consent. Indeed, the whole point of the
technique is to observe people without their knowledge. In order for an IRB to approve
this kind of study, there has to be an extremely compelling reason to ignore informed con-
sent, as well as extremely rigorous measures to protect identities. The second problem is
methodological; there is ample opportunity for the objectivity of observations to be com-
promised by the close contact between researcher and participant. Because the researcher
is a part of the group, he or she can change the dynamics in subtle ways, possibly leading
the group to confirm his or her hypothesis. In addition, the group can shape the research-
er’s interpretations in subtle ways, leading him or her to miss important details.

Steps in Observational Research
One of the major strengths of observational research is that it has a high degree of
ecological validity; that is, the research can be conducted in situations that closely resemble
the real world. Think of our examples so far—married couples observed in a living room–
like laboratory; doomsday cults observed from within; bullying behaviors on the school
playground seen by hidden observers. In every case, people’s behaviors are observed in
the natural environment or something very close to it. But this ecological validity comes
at a price; the real world is a jumble of information—some relevant, some not so much.
The challenge for the researcher, then, is to decide on a system for sorting out the signal
from the noise that provides the best test of the hypothesis. In this section, we discuss a
three-step process for conducting observational research. The key thing you should note
right away is that most of this process involves making decisions ahead of time so that the
process of data collection is smooth, simple, and systematic.

Step 1—Develop a Hypothesis

For research to be systematic, it is important to impose structure by having a clear research
question and hypothesis. We have covered hypotheses in detail in other chapters, but the
main points bear repeating: Your hypothesis must be testable and falsifiable, meaning that
it must be framed in such a way that it can be addressed through empirical data and might
be disconfirmed by these data. In our example involving Salvation Army donations, we
predicted that people might donate more money to an attractive bell ringer. This could
easily be tested empirically and could just as easily be disconfirmed by the right set of
data—say, if attractive bell ringers brought in the fewest donations.

This particular example also highlights an additional important feature of observational
hypotheses; namely, they have to be observable. Because observational studies are based
on observations of behaviors, our hypotheses have to be centered on behavioral measures.
That is, we can safely make predictions about the amount of money people will donate
because this can be directly observed. But we are unable to make predictions in this con-
text about the reasons for donations. There would be no way to observe, say, that people
donate more to attractive bell ringers because they were trying to impress them. In sum,
one limitation of observing behavior in the real world is that we are unable to delve into
the cognitive and motivational reasons behind the behaviors, as we would in phenomeno-
logical research, for example.

new85743_03_c03_103-168.indd 128 6/18/13 12:01 PM

129

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

Step 2—Decide What and How to Sample

Once you have developed a hypothesis that is testable, falsifiable, and observable, the
next step is to decide what kind of information to gather from the environment to test
this hypothesis. The simple fact is that the world
is too complex to sample everything in it. Imag-
ine that you wanted to observe the dinner rush
at a restaurant. There is a nearly infinite list of
events to observe: What time does the restaurant
get crowded? How many times do people send
their food back to the kitchen? What are the most
popular dishes? How often do people get into
arguments with the waitstaff? To simplify the pro-
cess of observing behavior, you will need to take
samples, which are small snippets of the environ-
ment that are relevant to your hypothesis. That is,
rather than observing “dinner at the restaurant,”
the goal is to narrow your focus to something like
“the number of people waiting in line for a table
at 6 p.m. versus 9 p.m.”

The choice of what and how to sample will ulti-
mately depend on the best fit for your hypothesis.
In the context of observational research, there
are three strategies for sampling behaviors and
events. The first strategy, time sampling, involves
comparing behaviors during different time inter-
vals. For example, to test the hypothesis that foot-
ball teams make more mistakes when they start
to get tired, the researcher could count the number of penalties in the first 5 and the last
5 minutes of the game. This data would allow one to compare mistakes at one time inter-
val with mistakes at another time interval. In the case of Festinger’s study of a doomsday
cult, time sampling was used to compare how the group members behaved before and
after their prophecy failed to come true.

The second strategy, individual sampling, involves collecting data by observing one per-
son at a time in order to test hypotheses about individual behaviors. Many of the exam-
ples we have already discussed involve individual sampling. For instance, Ainsworth and
colleagues tested their hypotheses about attachment behaviors by observing individual
infants, while Gottman tests his hypotheses about romantic relationships by observing
one married couple at a time. These types of data allow us to examine behavior at the
individual level and test hypotheses about the kinds of things people do—from the way
they argue with their spouses to whether they wear team colors to a football game.

The third strategy, event sampling, involves observing and recording behaviors that
occur throughout an event. For example, you could track the number of fights that break
out during an event such as a football game or the number of times people leave the res-
taurant without paying the check. This strategy allows for testing hypotheses about the
types of behaviors that occur in a particular environment or setting. For example, you
might compare the number of fights that break out in a professional football versus a

Steve Mason/Photodisc/Thinkstock

The dinner scene at a busy restaurant offers
a wide variety of behaviors to sample.

new85743_03_c03_103-168.indd 129 6/18/13 12:01 PM

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

professional hockey game. Or, the next time you host a party, you could count the number
of wine bottles versus beer bottles that end up in your recycling bin. The distinguishing
feature of this strategy is that you focus on the occurrence of behaviors more than on the
individuals performing these behaviors.

Step 3—Record and Code Behavior

Now that you have formulated a hypothesis and decided on the best sampling strategy,
there is one final and critical step to take before you begin data collection. Namely, you
have to develop good operational definitions of your variables by translating the underly-
ing concepts into measurable variables. Gottman’s research turns the concept of marital
interactions into a range of measurable variables like the number of dismissive comments
and passive-aggressive sighing—all things that can be observed and counted objectively.
Rosenhan’s study involving fake schizophrenic patients turned the concept of how staff
treat patients into measurable variables such as the amount of time staff members spent
with each patient—again, something very straightforward to observe.

It is vital to decide up front what kinds and categories of behavior you will be observing
and recording. In the previous section, we narrowed down our observation of dinner at
the restaurant to the number of people in line at 6 p.m. versus the number of people in line
at 9 p.m. But how can we be sure we get an accurate count? What if two people are wait-
ing by the door while the other two members of the group are sitting at the bar? Are those
at the bar waiting for a table or simply having drinks? One possibility might be to count
the number of individuals who walk through the door in different time periods, although
our count could be inflated by those who give up on waiting or who only enter to ask for
directions to another place.

In short, observing behavior in the real world can be messy. The best way to deal with
this mess is to develop a clear, consistent categorization scheme and stick with it. That
is, in testing your hypothesis about the most crowded time at the restaurant, you would
choose one method of counting people and use it for the duration of the study. In part,
this choice is a judgment call, but your judgment should be informed by three criteria.
First, you should consider practical issues, such as whether your categories can be directly
observed. You can observe the number of people who leave the restaurant, but you cannot
observe whether they became impatient. Second, you should consider theoretical issues,
such as how well your categories represent the underlying theory. Why did you decide to
study the most crowded time at the restaurant? Perhaps this particular restaurant is in a
new, up-and-coming neighborhood and you expect the restaurant to get crowded over the
course of the evening. It would also lead you to include people sitting both at tables and at
the bar—because this crowd may come to the restaurant with the sole intention of staying
at the bar. Third, you should consider previous research when choosing your categories.
Have other researchers studied dining patterns in restaurants? What kinds of behaviors
did they observe? If these categories make sense for your project, feel free to reuse them.

Last, but not least, you should take a step back and evaluate both the validity and the reli-
ability of your coding system. (See Section 2.2 for a review of these terms.) Validity in this
case means making sure the categories we observe do a good job of capturing the under-
lying variables in our hypothesis (i.e., construct validity; see Section 2.2). For example, in
Gottman’s studies of marital interactions, some of the most important variables are the

new85743_03_c03_103-168.indd 130 6/18/13 12:01 PM

131

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

emotions expressed by both partners. One way to observe emotions would be to count the
number of times a person smiles. However, we would have to think carefully about the
validity of this measure because smiling could indicate either genuine happiness, superfi-
ciality, condescension, or even awkward embarrassment. As a general rule, the better our
operational definitions, the more valid our measures will be (Chapter 2).

Reliability in the context of observation means making sure data are collected in a consis-
tent way. If research involves more than one observer using the same system, their data
should look roughly the same (i.e., have interrater reliability). This is accomplished in part
by making the task simple and straightforward—for example, by having trained assis-
tants use a checklist to record behaviors rather than depending on open-ended notes. The
other key to improving reliability is through careful training of the observers, giving them
detailed instructions and ample opportunities to practice the rating system.

Observational Research Examples
To give you a sense of how all of this comes together, let’s walk through a pair of exam-
ples, from forming the research question to collecting the data.

Example 1—Theater Restroom Usage

First, imagine for the sake of this example that you are interested in whether people are
more likely to use the restroom before or after watching a movie. This research question
could provide valuable information for theater owners in planning employee schedules
(i.e., when are bathrooms most likely to need cleaning). Thus, by studying patterns of
human behavior, we could gain useful applied knowledge.

The first step is to develop a specific, testable, and observable hypothesis. In this case, we
might predict that people are more likely to use the restroom after the movie, as a result
of consuming those 64-ounce sodas during the movie. And, just for fun, let’s also compare
the restroom usage of men and women. Perhaps men are more likely to wait until after
the movie, whereas women are as likely to go before as after. This pattern of data might
look something like the percentages in Table 3.1. That is, men make 80% of their restroom
visits after the movie and 20% before the movie, while women make about 50% of their
restroom visits at each time.

Table 3.1: Hypothesized data from observation exercise

Men Women

Before movie 20% 50%

After movie 80% 50%

Total 100% 100%

new85743_03_c03_103-168.indd 131 6/18/13 12:01 PM

132

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

involves comparing the number of restroom visitors in two time periods (before versus
after the movie). So, in this case, we would need to define a time interval for collect-
ing data. One option would be to limit our observations to the 10 minutes before the
previews begin and the 10 minutes after the credits end. The potential problem here, of
course, is that some people might use either the previews or the end credits as a chance
to use the restroom. Another complication arises in trying to determine which movie
people are watching; in a giant multiplex theater, movies start just as others are finishing.
One possible solution, then, would be to narrow our sample to movie theaters that show
only one movie at a time and to define the sampling times based on the actual movie
start and end times.

Once we decide on a sampling strategy, the next step is to decide on the types of behav-
iors we want to record. This particular hypothesis poses a challenge because it deals with
a rather private behavior. In order to faithfully record people “using the restroom,” we
would need to station researchers in both men’s and women’s restrooms to verify that
people actually, well, “use” the restroom while they are in there, as opposed to primp-
ing or putting on makeup. However, this strategy comes with the potential downside
that your presence (standing in the corner of the restroom) will affect people’s behavior.
Another, less intrusive option would be to stand outside the restroom and simply count
“the number of people who enter.” The downside here, of course, is that we don’t tech-
nically know why people are going into the restroom. But sometimes research involves
making these sorts of compromises—in this case, we chose to sacrifice a bit of precision in
favor of a less intrusive measurement.

So, in sum, we started with the hypothesis that men are more likely to use the restroom
after a movie, while women use the restroom equally before and after. We then decided
that the best sampling strategy would be to identify a movie theater showing only one
movie and to sample from the 10-minute periods before and after the actual movie’s run-
ning time. Finally, we decided that the best strategy for recording behavior would be to
station observers outside the restrooms and count the number of people who enter. Now,
let’s say we conduct these observations every evening for one week and collect the data
in Table 3.2.

Table 3.2: Findings from observation exercise

Men Women

Before movie 75 (25%) 300 (60%)

After movie 225 (75%) 200 (40%)

Total 300 (100%) 500 (100%)

You can see that more women (n 5 500) than men (n 5 300) attended the movie theater
during our week of sampling. But the real test of our hypothesis comes from examining
the percentages within gender groups. That is, of the 300 men who went into the restroom,
what percentage of them did so before the movie and what percentage of them did so
after the movie? In this dataset, women used the restroom with relatively equal frequency
before (60%) and after (40%) the movie. Men, in contrast, were three times as likely to
use the restroom after (75%) than before (25%) the movie. In other words, our hypothesis
appears to be confirmed by examining these percentages.

new85743_03_c03_103-168.indd 132 6/18/13 12:01 PM

133

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

Example 2—Cell Phone Usage While Driving

Imagine for this example that you are interested in patterns of cell-phone usage among
drivers. Several recent studies have reported that drivers using cell phones are as impaired
as drunk drivers, making this an important public safety hazard. Thus, if we could under-
stand the contexts in which people are most likely to use cell phones, this would provide
valuable information for developing guidelines for safe and legal use of these devices. So
in this study we might count the number of drivers using cell phones in two settings: in
rush-hour traffic and moving on the freeway.

The first step is to develop a specific, testable, and observable hypothesis. In this case, we
might predict that people are more likely to use cell phones when they are bored in the
car. So we hypothesize that we will see more drivers using cell phones while stuck in rush-
hour traffic than while moving on the freeway.

The next step is to decide on the best sampling strategy to test this hypothesis. Of the
three sampling strategies we discussed—individual, event, and time—which one seems
most relevant here? The best option would probably be individual sampling because we
are interested in the cell phone usage of individual drivers. That is, for each individual
car we see during the observation period, we want to know whether the driver is using
a cell phone. One strategy for collecting these observations would be to station observers
along a fast-moving stretch of freeway, as well as along a stretch of road that is typically
clogged during rush hour. These observers would keep a record of each passing car, not-
ing whether the driver was on the phone.

Once we decide on a sampling strategy, our next step is to decide on the types of behaviors
we want to record. One challenge in this study is in deciding how broadly to define the
category of cell-phone usage. Would we include both talking and text messaging? Given
our interest in distraction and public safety, we probably would want to include text mes-
saging. In response to tragic accidents, several states have recently banned text messaging
while driving. Because we will be observing moving vehicles, the most reliable approach
might be to simply note whether each driver had a cell phone in his or her hand. As with
our restroom study, we are sacrificing a little bit of precision (i.e., we don’t know what the
cell phone is being used for) to capture behaviors that are easier to record.

So, in sum, we started with the hypothesis that drivers would be more likely to use cell
phones when stuck in traffic. We then decided that the best sampling strategy would be
to station observers along two stretches of road and that they should note whether drivers
were using cell phones. Finally, we decided that the best compromise for observing cell-
phone usage would be to note whether each driver was holding a cell phone. Now, let’s
say we conducted these observations over a 24-hour period and collected the data shown
in Table 3.3.

Table 3.3: Findings from observation exercise #2

Rush Hour Moving

Cell phone 30 (30%) 200 (67%)

No cell phone 70 (70%) 100 (33%)

Total 100 300

new85743_03_c03_103-168.indd 133 6/18/13 12:01 PM

134

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

You can see that more cars passed by during the non–rush-hour stretch (n 5 300) than
during the rush-hour stretch (n 5 100). But the real test of our hypothesis comes from
examining the percentages within each stretch. That is, of the 100 people observed during
rush hour and the 300 observed not during rush hour, what percentage were using cell
phones? In this dataset, 30% of those in rush hour were using cell phones, compared with
67% of those not during rush hour. In other words, our hypothesis was not confirmed by
the data. Drivers in rush hour were less than half as likely to be using cell phones. The
next step in our research program would be to speculate on the reasons why the data con-
tradicted our hypothesis.

Qualitative Versus Quantitative Approaches
The general method of observation lends itself equally well to both qualitative and quan-
titative approaches, although some types of observation fit one approach better than the
other. For example, structured observation tends to be focused on testing hypotheses and
quantifying responses. In Mary Ainsworth’s “strange situation” research (described ear-
lier), the primary goal was to expose children to a predetermined script of events and to
test hypotheses about how children with secure and insecure attachments would respond
to these events. In contrast, naturalistic observation—and, to a greater extent, participant
observation—tends to focus on learning from events as they unfold naturally. In Leon
Festinger’s “doomsday cult” study, the researchers joined the group in order to observe
the ways members reacted when their prophecy failed to come true.

Research: Thinking Critically

The Irritable Heart

By K. Kris Hirst

Using open-source data from a federal project digitizing medical records of veterans of the American
Civil War (1860–1865) called the Early Indicators of Later Work Levels, Disease, and Death Project,
researchers identified an increased risk of postwar illness among Civil War veterans, including car-
diac, gastrointestinal, and mental diseases throughout their lives. In a project partly funded by the
National Institutes of Aging, military service files from a total of 15,027 servicemen from 303 compa-
nies of the Union Army stored at the United States National Archives were matched to pension files
and surgeon’s reports of multiple health examinations. A total of 43% of the men had mental health
problems throughout their lives, some of which are today recognized as related to posttraumatic
stress disorder (PTSD). Most particularly affected were men who enlisted at ages under 17. Roxane
Cohen Silver and colleagues at the University of California, Irvine, published their results in the Feb-
ruary 2006 issue of Archives of General Psychiatry.

Studies of PTSD to date have connected war experiences to the recurrence of mental health prob-
lems and physical health problems such as cardiovascular disease, hypertension, and gastrointestinal
disorders. These studies have not had access to long-term health impacts, since they have focused
on veterans of recent conflicts. Researchers studying the impact of modern conflict participation
report that the factors increasing risk of later health issues include age at enlistment, intimate expo-
sure to violence, prisoner-of-war status, and having been wounded.

The Trauma of the American Civil War

The Civil War was a particularly traumatic conflict for American soldiers. Army soldiers commonly
enlisted at quite young ages; between 15% and 20% of the Union army soldiers enlisted between the

(continued)

new85743_03_c03_103-168.indd 134 6/18/13 12:01 PM

CHAPTER 3Section 3.1 Qualitative and Descriptive Research Designs

Research: Thinking Critically (continued)
ages of 9 and 17. Each of the Union companies was made up of 100 men assembled from regional
neighborhoods and thus often included family members and friends. Large company losses—75%
of companies in this sample lost between 5% and 30% of their personnel—nearly always meant the
loss of family or friends. The men readily identified with the enemy, who in some cases represented
family members or acquaintances. Finally, close-quarter conflict, including hand-to-hand combat
without trenches or other barriers, was a common field tactic during the Civil War.

To quantify trauma experienced by Civil War soldiers, researchers used a variable derived from per-
centage of company lost to represent relative exposure to trauma. Researchers found that in military
companies with a larger percentage of soldiers killed, the veterans were 51% more likely to have
cardiac, gastrointestinal, and nervous disease.

The Youngest Soldiers Were Hardest Hit

The study found that the youngest soldiers (aged 9 to 17 years at enlistment) were 93% more likely
than the oldest (aged 31 and older) to experience both mental and physical disease. The younger
soldiers were also more likely to show signs of cardiovascular disease alone and in conjunction with
gastrointestinal conditions, and they were more likely to die early. Former POWs had an increased
risk of combined mental and physical problems as well as early death.

One problem the researchers grappled with was comparing diseases as they were recorded during
the latter half of the 19th century with today’s recognized diseases and psychiatric disorders. For
one, PTSD was not recognized by doctors—although they did recognize that veterans exhibited an
extreme level of “nervous disease” that they labeled “irritable heart” syndrome.

Children and Adolescents in Combat

Harvard psychologist Roger Pitman, writing in an editorial in the publication, writes that the impact
on younger soldiers should be of immediate concern, since “their immature nervous systems and
diminished capacity to regulate emotion give even greater reason to shudder at the thought of chil-
dren and adolescents serving in combat.” Although disease identification is not one-to-one, said
senior researcher Roxane Cohen Silver, “I’ve been studying how people cope with traumatic life expe-
riences of all kinds for 20 years and these findings are quite consistent with an increasing body of
literature on the physical and mental health consequences of traumatic experiences.”

Boston University psychologist Terence M. Keane, Director of the National Center for PTSD, com-
mented that this “remarkably creative study is timely and extremely valuable to our understanding
of the long-term effects of combat experiences.” Joseph Boscarino, senior investigator at Geisinger
Health System, added, “There are a few detractors that say that PTSD does not exist or has been
exaggerated. Studies such as these are making it difficult to ignore the long-term effects of war-
related psychological trauma.”

The Irritable Heart: Increased Risk of Physical and Psychological Effects of Trauma in Civil War Vets by K. Kris Hirst. © 2011 K.
Kris Hirst (http://anthropology.about.com). Used with permission of About Inc., which can be found online at www.about.com.
All rights reserved.

Think about it:

1. What hypotheses were the researchers testing in this study?

2. How did the researchers quantify trauma experienced by Civil War soldiers? Do you think this
is a valid way to operationalize trauma? Explain why or why not.

3. Would this research be best described as case studies, archival research, or natural observa-
tion? Are there elements of more than one type? Explain.

new85743_03_c03_103-168.indd 135 6/18/13 12:01 PM

http://anthropology.about.com

www.about.com

136

CHAPTER 3Section

3.2 Qualitative Research Interviews

As pointed out in our discussion of phenomenological research and case studies, interviews are an essential component of many types of projects and can yield a great deal of information on subjects’ experiences of particular events. Interviews
can be utilized at any stage during the research process to identify areas of further explo-
ration, function as the main source of data collection, or provide additional information on
data interpretation (Breakwell, Hammond, & Fife-Schaw, 2000). Research interviews can
be used for many purposes and in a wide variety of contexts. For example, a researcher
may utilize an interview to uncover the subject’s beliefs, feelings, and perspectives on an
experience; investigate the motives behind certain behaviors; or obtain factual data about
an event or person.

Interviews yield similar data to observational research but also provide information on
what participants say and how they say it. In addition, data are collected on interviewees’
nonverbal behaviors such as eye movements, facial expressions, voice tones, and body
posture. All these details can provide additional information about the interviewees’ com-
fortableness about a topic, intensity of feeling, or thoughts about an experience. However,
the main focus of interviews should be on the verbal content the interviewees provide.

Conducting research interviews requires researchers to take a systematic approach to
data collection. This gives researchers the ability “to maximize the chances of maintaining
objectivity and achieving valid and reliable results” (Breakwell et al., 2000, p. 239). Inter-
views require a great deal of skill and sensitivity; they are often time consuming in order
for them to collect enough data to answer the research question(s). And, as with most
interpersonal situations, the outcome of the interview is influenced largely by the person-
ality and communication style of the interviewer and interviewee. If an interviewer is not
very personable and engaging, this could influence the quality and quantity of the infor-
mation that the interviewee provides and ultimately limit the interpretations that can be
made from it. All interviews should include a dynamic, two-way interchange between the
interviewer and the interviewee. Whereas quantitative interviews generally utilize struc-
tured interviews, which include a predetermined and fixed set of questions, qualitative
interviews are generally unstructured or semistructured and take on more of an informal
conversational approach, with the interviewee doing most of the speaking.

Interview Characteristics and Techniques

The following five sections lay out important characteristics of the interviewer, the inter-
view setting, and the types of interviews utilized in qualitative research. As with any
type of interview, the interviewer’s personality, appearance, and communication style can
significantly influence how the interviewee responds. In addition, where the interview
occurs, or in what setting, can have influence on the quality and quantity of data collected.

Personal Characteristics of the Interviewer
Although it might seem that anyone can conduct an interview, an interviewer needs to
display certain characteristics in order to ensure a successful interview. Aiken and Groth-
Marnat (2006) list the following characteristics of a professional interviewer:

new85743_03_c03_103-168.indd 136 6/18/13 12:01 PM

137

CHAPTER 3Section 3.2 Qualitative Research Interviews

• maintains a friendly but neutral tone and demeanor;
• shows interest in the interview but does not pry or show intense reactions to the

interviewee;
• keeps a warm and open approach;
• does not show approval or disapproval toward the interviewee;
• times questions appropriately so that the conversation flows smoothly from topic

to topic;
• allows appropriate silences or pauses so that the interviewee can collect his or

her thoughts;
• allows the interviewee to complete a discussion without interrupting;
• pays close attention to nonverbal behaviors, such as facial expressions, body

posture, and voice fluctuations;
• displays patience throughout the interview;
• checks with the interviewee to ensure that there are no misunderstandings; and
• maintains eye contact.

All these characteristics can determine whether rapport is developed, how comfortable
the interviewee feels about sharing information, the type of information the interviewee
discusses, and the length of the interview. In qualitative research, interviews are designed
to gather in-depth information, so if the interviewee feels uncomfortable, the researcher
may obtain only limited information and the interview process may be cut short. There-
fore, it is imperative that interviewers maintain a friendly and welcoming tone, as well as
provide an open environment to ensure that the interviewee feels comfortable.

The appearance of the interviewer is another important characteristic. Interviewers that
are appropriately dressed and well-groomed will be more positively received than those
who look disheveled or unclean. Likewise, an interviewer dressed in a suit with a brief-
case (or a lab coat) might be more intimidating than one dressed in a business-casual man-
ner. Additional characteristics such as age, gender, and ethnicity may also influence how

comfortable the interviewee feels
and how the interview progresses
(Breakwell et al., 2000).

The Interview Setting
Although interviews can take place
anywhere, it is best to conduct them
in a quiet, well-lit room that is free of
distractions. Conducting interviews
in noisy environments (the nearest
Starbucks) or in rooms that have a
lot of distracting décor items (e.g.,
posters, paintings, bookshelves) can
negatively impact the quality and
quantity of the conversation. For
example, conducting an interview
at a coffee house would not be as
productive as conducting the inter-
view in a conference room or at the

Johnny Haglund/Lonely Planet Images/Getty Images

The quality of an interview is impacted by the setting of
the interview and the characteristics of the interviewer.

new85743_03_c03_103-168.indd 137 6/18/13 12:01 PM

138

CHAPTER 3Section 3.2 Qualitative Research Interviews

kitchen table. Additionally, rooms that are not well-lit or are too cold or too warm will
also influence how the interview progresses. For example, if a room is dimly lit and hot,
the interviewee may become fatigued and less focused on the interviewer ’s questions.
Another important feature is the comfortableness of the chairs. Because qualitative inter-
views are fairly lengthy, setting up comfortable chairs that face each other is imperative.

Unstructured Interviews
Unlike structured interviews in quantitative research that make use of fixed questions
in a particular order, qualitative interviews do not include developed questions prior to
the interview. Unstructured interviews are commonly used in qualitative research and
utilize a looser approach. Unstructured interviews are probing in nature and are used
primarily to explore new topic areas or to investigate topics that are not well understood.
Although the researcher has a number of topics in mind that he or she would like to
cover, no specific questions have been developed and the process of the interview is not
outlined or planned. Unstructured interviews involve only open-ended questions, which
allow the interviewee to respond as much or as little as he or she may want to. Although
interviewers typically have checklists of topics that should be covered and will guide the
interviewee with probing questions to remain on topic, interview questions are generally
guided by information the interviewee provides. The purpose of unstructured interviews
is to allow the researcher to obtain an in-depth understanding of the interviewee’s experi-
ences from his or her own perspective.

Unstructured interviews are similar to informal conversations with a purpose (Hennink,
Hutter, & Bailey, 2011). As previously mentioned, unstructured interviews let interview-
ees share in-depth information in their own words and from their own perspective. Vari-
ous types of information can be collected from unstructured interviews, including life
narratives, the person’s identity and background characteristics, and the context in which
the interviewee lives (Hennink et al., 2011). Unstructured interviews can be used in many
situations, such as examining personal life stories or exploring people’s feelings, thoughts,
and perceptions on a chosen topic.

Steps in Unstructured Interviews

Although the researcher does not formulate interview questions before the interview,
researchers conducting unstructured interviews may want to develop an interview guide
of topics they would like to cover. Interview guides for unstructured interviews may
include reminders about what to tell the interviewee at the beginning of the interview
(e.g., explain the purpose of the research, discuss ethical issues), a statement about the
population sample being researched, a list of key topic areas that need to be addressed,
and a few probing questions. For example, if researching the potential causes or influ-
ences of heroin addiction, topic areas might include family history, childhood experiences,
relationships with family members, peer relationships, first exposure to heroin, experi-
ences with heroin, and so forth.

When meeting an interviewee for the first time, it is important not to jump right into ques-
tions or research topics but rather to spend some time establishing a rapport. Establishing
a rapport is extremely important in in-depth interviews because it nurtures a sense of

new85743_03_c03_103-168.indd 138 6/18/13 12:01 PM

139

CHAPTER 3Section 3.2 Qualitative Research Interviews

trust. Making small talk about the weather or other current events is a great way to ease
the interviewee into the process. It also allows the interviewer and interviewee to feel
more comfortable together. Additionally, in order to maintain the flow of the conversa-
tion, note taking is discouraged but tape recording is encouraged.

Unstructured interview questions should be open-ended and nonleading. An open-ended
question to the beginning of an interview might take the following form: Tell me about
your experiences with heroin. This type of guided question begins the interview process
and allows the interviewee to respond in any manner. Closed-ended questions (used in
quantitative research) should be avoided at all costs because they are not engaging. For
example, asking Has heroin abuse been hard on you? elicits a yes or no response and may not
further the discussion. In addition, leading questions should be avoided so that interview-
ees can provide the information on their own terms and voice their own viewpoint. For
example, instead of asking, How did heroin negatively impact your dental hygiene?, a nonlead-
ing question would be, How did heroin affect your overall health? The latter question allows
the interviewee to approach the topic in any way desired and does not just focus on the
negative aspects of dental hygiene.

Data Analysis in Unstructured Interviews

Analyzing data from unstructured interviews can be complicated and tedious. Because
each interview is different (based on the direction the interviewee took), the informa-
tion obtained may not be consistent across interviewees. Therefore, to analyze the data,
the researcher will need to first transcribe all the recorded interviews onto paper. Then,
he or she will need to read through all the information and reflect on its overall mean-
ing. Content analysis is often used in analyzing interview data. As discussed previously
in Archival Research, the process of systematically extracting and analyzing a collection
of information is known as content analysis. Because interview data can quickly become
overwhelming, it is a good idea to begin developing coding categories early in the data
collection process. Coding categories are symbols or words applied to a group of words
in order to categorize the information. Once the information is organized into coding cat-
egories, the researcher can begin to group the categories according to their patterns or
themes within the data. In addition to common patterns and themes, the researcher may
also want to include quotations from the interview to support his or her conclusions.

Pros and Cons of Unstructured Interviews

Although the flexibility of unstructured interviews provides a number of advantages,
there are a few challenges associated with this type of data collection. First, this approach
is so in-depth, unstructured interviews can be very time consuming, especially with larger
sample sizes. Establishing a rapport and covering the topic areas thoroughly can take
some time. Additionally, because these interviews are unstructured, the length of the
interview will vary depending on the individual and the direction the interview takes.
Second, because the interviewer has little control over the interview process, unstruc-
tured interviews can veer off topic easily, and it can be difficult to know how to guide the
interviewee back to the main topic without the risk of losing continuity, naturalness, and
comfort in the discussion. And third, because data collection will vary by interviewee, it
might be difficult for the researcher to make comparisons across interviewees.

new85743_03_c03_103-168.indd 139 6/18/13 12:01 PM

CHAPTER 3Section 3.2 Qualitative Research Interviews

Semistructured Interviews
In contrast to unstructured interviews, semistructured interviews give the interviewer
more control over the process. Instead of a checklist of topic areas, the interviewer devel-
ops key questions for specific topic areas before the interview and creates a more detailed
interview guide. The researcher standardizes the questions across all interviewees, so all
of them will be asked the same key questions. However, the key questions do not have to
be asked in the same order, and the interviewer has the option to pose probing questions
to explore an area further. Thus, the semistructured interview often moves between struc-
tured and unstructured questions throughout the session.

The interview guide for a semistructured interview is more comprehensive than those
used for unstructured interviews. It contains a written script about why the research is
being undertaken and how to conduct the interview as well as sets of standardized open-
ing questions, key questions, and closing questions. Specific probing questions are also
usually provided. The semistructured interview is more flexible than its unstructured
counterpart, as the standardized scripts and questions allow different interviewers to
administer basically the same interview.

Semistructured interviews are advantageous in situations where unstructured interviews
have already been conducted on the topic or when the researcher wants to obtain a larger
sample size. Standardized questions make the data easier to analyze and interpret. Even
so, data analysis for semistructured interviews follows the same protocols as for unstruc-
tured interviews.

Focus Groups
Focus groups are generally con-
ducted when the researcher wants
to collect data on several individuals
simultaneously, in the same room.
Focus groups are critical for organi-
zationally based research, especially
action research approaches, and are a
popular method for studying political
trends. A focus group can be thought
of as a group interview, where par-
ticipants share their thoughts, beliefs,
experiences, and perspectives on a
particular issue. Focus groups usu-
ally contain between 10 and 12 indi-
viduals and meet together for about
one to two hours to discuss a particu-
lar topic. There is always a moderator
present, who may or may not be the
researcher. He or she introduces the
topics to be discussed, ensures that the discussion remains on topic, and sees to it that no
one participant dominates the discussion (Leedy & Ormrod, 2010).

Michael Blann/Getty Images

A focus group can be thought of as a group interview.

new85743_03_c03_103-168.indd 140 6/18/13 12:01 PM

141

CHAPTER 3Section 3.2 Qualitative Research Interviews

Similar to the interview guides used in unstructured and semistructured interviews, focus
groups incorporate discussion guides to guide the discussion and keep the group focused
(Hennink et al., 2011). The discussion guide serves primarily as a reminder of what top-
ics and questions should be covered. The structure of the discussion guide is extremely
important to the success of a focus group discussion because it helps the moderator intro-
duce the topic, establish a rapport among group participants, focus on key topic areas,
and bring the discussion to a close (Hennink et al., 2011). Successful discussion guides
contain an introduction that explains the nature of the study, any ethical issues, and how
the discussion will proceed, as well as standardized introductory questions, transition
questions, key questions, and closing questions. As with other qualitative interview tech-
niques, questions posed in focus groups should be open-ended.

There are several ways to analyze the data collected during a focus group interview. How-
ever, summarizing the transcript along with any field notes is the most common and effi-
cient. Because data from some focus groups are needed fairly quickly to address a timely
topic, summaries are the most convenient way of communicating the results. In addition,
because the data are fairly straightforward, a summary of the conclusions can easily be
conducted. Data from focus groups can also be interpreted through content analysis, as
discussed previously in this chapter, and through techniques that are beyond the scope of
this book.

Pros and Cons of Focus Groups

As with other qualitative methods, focus groups are used for exploratory and explana-
tory research. As Hennink et al. (2011) describe, focus groups are very useful for exploring
new topics, obtaining a range of views about a topic, understanding typical behaviors or
norms, understanding group processes, and pairing with quantitative or other qualita-
tive methods. Focus groups are especially beneficial when exploring difficult or traumatic
experiences because the group environment tends to be more supportive. For example, if
the research topic was about losing a spouse from a traumatic accident, participants might
find solace and comfort from others who have experienced the same tragedy.

It is important to determine whether a study will benefit more from a focus group or an
unstructured or semistructured interview. For example, if the researcher wants to obtain
detailed information about participants’ experiences, a focus group may not be the best
method to use. Since focus groups include interactions among various participants, the
data collected may not fully represent the individual perspectives of each participant as a
one-to-one interview would. In addition, focus groups provide the researcher with very
limited control over the discussions that occur, so the information obtained may not be
consistent with, or fully address, the research questions proposed. And, unlike unstruc-
tured and semistructured interviews, in focus groups the researcher cannot ensure confi-
dentiality and anonymity of the participants because participants may share that informa-
tion outside of the group.

Quantitative Structured Interviews
Unlike the flexible interview techniques used in qualitative research, quantitative
research involves more structured and standardized data collection methods. Structured
interviews include a fixed set of either open-ended or closed-ended questions that are

new85743_03_c03_103-168.indd 141 6/18/13 12:01 PM

142

CHAPTER 3Section 3.2 Qualitative Research Interviews

administered in a fixed order. Thus, researchers develop questions prior to the interview
and administer the same questions in exactly the same order to every participant. Because
of the standardized format, structured interviews are easily replicated and can be admin-
istered to large sample groups by various interviewers. In addition, the data collected
from structured interviews is much easier to analyze than data from unstructured and
semistructured interviews because the standardized question format enables easy coding
and interpretation.

Pros and Cons of Structured Interviews

Unlike qualitative interviewing techniques, structured interviews also provide a reliable
source of data collection. However, this method is not without limitations. One signifi-
cant weakness involves the limited amount of information that can be obtained during
the interview process. Structured interviews are not intended to explore complex issues
or opinions and provide no flexibility in the questions asked. If the questions are poorly
written, the interviewer cannot modify the questions or present additional probing ques-
tions to obtain further information. Another limitation involves the quality of the data that
are collected. Because the questions in structured interviews are standardized and more
directive, they do not lend themselves to elicit in-depth responses. Thus, participants end
up providing only very limited responses to the questions. A further discussion of inter-
view and survey techniques for quantitative research is discussed in Chapter 4.

Reliability and Validity of Interviews

Interviews are an important data collection method but, like observational methods, they
too impart problems with reliability and validity. And, because reliability requires consis-
tency, it is often difficult to generate high levels of reliability using unstructured and semi-
structured interviews. Because unstructured and semistructured interviews differ among
interviewees in their approach and process, there might be little consistency across inter-
views. For example, if unstructured interviews were administered to 10 participants, it is
very unlikely that any 2 interviews will follow the same process, cover the same material,
or generate the same data. The interviewer’s experiences with the topic being discussed
may also influence how the data are analyzed and interpreted, and in larger studies, there
are bound to be several interviewers. Additionally, as with any self-report measure, the
interviewee’s responses may be distorted or inaccurate based on what he or she believes
the interviewer wants to hear. It is also possible that an interviewee may feel uncom-
fortable with the interviewer or the topic being discussed and may withhold important
details. And sometimes, interviewees are simply unable to remember all the details of an
experience, so the information they provide is not complete.

The validity of interviews is extremely variable and increases along with more structured
methods. Thus, structured and semistructured interviews will generally have higher
levels of validity than unstructured interviews owing to the more reliable data that the
structured format can generate. Interviews that focus on specific topics and are analyzed
by two or more evaluators also tend to have higher validity levels. Having more than
one researcher agreeing on the findings increases the likelihood that the results are valid.
Additionally, utilizing other types of measurement methods, such as observations or
experiments, to supplement the interview increases the validity of interview data.

new85743_03_c03_103-168.indd 142 6/18/13 12:01 PM

143

CHAPTER 3Section 3.2 Qualitative Research Interviews

When conducting any type of interview, the interviewer is considered the assessment or
measurement tool (Aiken & Groth-Marnat, 2006). Thus, the interviewer is the key instru-
ment used to collect data in the study. As a result, most reliability problems associated
with interviews are the result of the characteristics and behavior of the interviewer. For
example, such interviewer characteristics as appearance, age, gender, demeanor, and per-
sonality can influence how engaged the interviewee becomes during the interview and
the type of information he or she will disclose. The interviewer’s personal biases can also
influence the direction of the interview and the type of data that is gathered. Since the
interviewer is in charge of the interview, the length of interview and the data collected
depend on the interviewer’s questions. Ignoring specific question areas, asking the wrong
questions, or asking questions that elicit short responses will affect the type of information
the interviewee provides.

Although some interviewer effects cannot be eliminated, Breakwell et al. (2000) discuss
a few ways to control for them. One way to do this involves having the same person
conduct all interviews so that the effects of the interviewer are constant. While this tech-
nique ensures that the interviewer’s appearance and personal biases will likely be con-
stants, it does not ensure that variation among the interviewees will be eliminated. Some
interviewees may feel more comfortable with a male, for example, or be more willing to
open up to someone who is middle-aged. Another way to control for interviewer effects
is to have several interviewers randomly assigned to interviewees. Instead of holding the
interviewer constant across all interviews, utilizing different interviewers can minimize
any strong effects that might be experienced with one particular interviewer. Finally, some
researchers might find it helpful to match interviewers to interviewees based on age or
gender. This is especially useful if it is known that particular interviewees feel more com-
fortable with certain types of people.

Ethical Guidelines

When conducting interviews, several important ethical issues need to be addressed. First,
it is imperative that the interviewer provide an explanation of the purpose of the research,
the intent of the interview, and any potential risks and benefits of participating (this is also
known as informed consent). Thus, all interviewees must be notified of all the features of
the study so they can make intelligent decisions about their willingness to participate. If
the interview includes sensitive or emotional topics, interviewees need to be informed
that they will be asked to discuss some painful or embarrassing experiences. In addition,
interviewees need to be told how any identifying information will be kept confidential as
well as how the researchers will protect any information shared and collected.

As with all types of research, the participant (or in this case, the interviewee) must be pro-
tected from harm. Interviewers must take reasonable steps to ensure that interviewees do
not experience any harm during the interview or research process. In practice, this means
that the risk of harm for the interviewee is not greater than the harm that he or she would
experience in everyday life and that the risk is outweighed by the benefits of the study.
(See APA and Other Ethical Guidelines in Chapter 1 for further discussion.)

new85743_03_c03_103-168.indd 143 6/18/13 12:01 PM

144

CHAPTER 3Section

3.3 Critiquing a Qualitative Study

How does one assess the overall worth and credibility of a qualitative research study or proposal? What characteristics constitute strong research studies, and what characteristics constitute weak ones? The methods and guidelines for evalu-
ating research studies are fairly detailed and tedious. Not all studies are worthy or rig-
orous, and it is important that you, as a consumer of psychological research, are able to
identify potential problems. Just because a study claims to have valid and reliable results
does not necessarily mean that it does. For example, some studies may utilize incorrect
statistical or data analysis procedures, so the results generated may not be correct or com-
plete. Additionally, some studies may use inappropriate sampling techniques for the type
of research design being conducted. As professionals in the field, you will need to be able
to identify which parts of a study are valid, which parts can be considered acceptable with
caution, and which parts have significant limitations or are downright misleading.

Leedy and Ormrod (2010) reviewed standards from experienced qualitative researchers
and compiled a list of general criteria to apply when evaluating a qualitative study:

1. Purposefulness: Does the research question(s) drive the research process and the
methods to collect and analyze the data?

2. Explicitness of Assumptions and Biases: Does the researcher describe any
assumptions, expectations, or biases that might influence how the data are
collected, analyzed, and interpreted?

3. Rigor: Does the researcher use rigorous, precise, and thorough methods to
collect, record, and analyze the data? Does the researcher take steps to remain
objective throughout the study?

4. Open-mindedness: Is the researcher willing to modify interpretations when
newly collected data do not support previously collected data?

5. Completeness: Does the researcher describe the phenomenon in all its complexity?
Does the researcher spend sufficient time in the field examining the phenomenon,
detail all aspects of the phenomenon (e.g., setting, behaviors, perceptions), and
provide a holistic picture of the phenomenon?

6. Coherence: Do the data show consistent findings with the measurement used
and across multiple measurement methods used?

7. Persuasiveness: Does the researcher provide logical arguments, and does the
evidence support one interpretation of the data?

8. Consensus: Do other studies and researchers in the field agree with the interpre-
tations and explanations?

9. Usefulness: Does the study provide useful implications for future research, a
more thorough understanding of the phenomenon, or lead to interventions that
could enhance the quality of life? (p. 187)

In addition to these criteria, there are several factors to consider when evaluating the vari-
ous sections of a research study or proposal. The next seven sections will discuss how to
critically evaluate the literature review, the purpose statement, the sampling methods, the
procedures, the instruments, the results, and the discussion section of a qualitative study.

new85743_03_c03_103-168.indd 144 6/18/13 12:01 PM

CHAPTER 3Section 3.3 Critiquing a Qualitative Study

Evaluating the Literature Review Section

As mentioned in Chapter 1, the purpose of the literature review is to support the need for
research to be conducted. Literature reviews should be thorough and comprehensive and
contain all relevant research on the specific topic being studied. Literature reviews should
also be objective, showing no biases toward the selection of articles being reviewed,
and include previous research that relates to the current study. The following questions,
adapted from Houser (2009), will assist in the evaluation of the literature review:

• Do the researchers present an adequate rationale for conducting the study?
• What is the significance of the study? What difference will it make to the field?
• Is the literature review thorough and comprehensive?
• Do the researchers demonstrate any potential biases in the literature review?
• Are all important concepts clearly defined by the researchers?
• Do the researchers clearly describe previous methods that are relevant to under-

standing the purpose for conducting this study?

Evaluating the Purpose Statement

The purpose statement provides the aim or intent of the study. It is generally found in the
Introduction section, as the last paragraph before the Literature Review section. Purpose
statements can be written as a declarative statement or in the form of a question or ques-
tions. They should include the type of research methods and design used and describe
the variables and population studied. When evaluating the purpose statement, it is also
important to examine whether the purpose and research problem are in fact researchable.
Purpose statements and research problems are researchable only when the variables of
interest can be operationalized—that is, defined in a measurable and objective way. Con-
sidering these requirements, the following questions, adapted from Houser (2009), can
assist in the evaluation of the purpose statement:

• Does the article clearly present the purpose statement?
• Is the purpose statement clearly based on the argument developed in the litera-

ture review?
• Are the variables of interest (i.e., independent and dependent) clearly identified

in the purpose statement?

Evaluating the Methods Section—Sampling

The Sampling section includes thorough and detailed information on the sample used and
the techniques or methods used to select the sample. Descriptions of the sample should
include all relevant demographic characteristics (e.g., age, ethnicity, sex) as well as size.
Unlike quantitative approaches, which usually require large samples, qualitative tech-
niques do not have any restrictions on sample size. Thus, sample size depends on what
the research wants to know, the purpose of the inquiry, what it will be useful for, how
credible it will be, and what can be done with available time and resources. Qualitative

new85743_03_c03_103-168.indd 145 6/18/13 12:01 PM

146

CHAPTER 3Section 3.3 Critiquing a Qualitative Study

research can be very costly and time-consuming, so choosing information-rich cases will
be most valuable. As noted by Patton (2002), “The validity, meaningfulness, and insights
generated from qualitative inquiry have more to do with the information-richness of the
cases selected and the observational/analytical capabilities of the researcher than with
sample size” (p. 245).

The sampling techniques employed should also be discussed, including detailed informa-
tion about how the sample was selected and what sampling methods were used (e.g., pur-
posive sampling, snowball sampling). In contrast to quantitative research, which strives
for generalizable representative sampling, qualitative research typically focuses on rela-
tively smaller samples, and even sometimes only single cases, that are selected purpose-
fully. “Purposeful sampling refers to selecting information-rich cases for study in depth”
(Patton, 2002, p. 230). When evaluating the sampling methods section, it is important to
examine whether appropriate sampling techniques were used for the type of research
design that was employed and the research questions proposed. The following list cov-
ers a few of the most common sampling procedures used in qualitative studies (these are
discussed in more detail in Chapter 4):

• Purposive sampling (or judgment sampling): The researcher selects a sample that
will yield the most information to answer the research questions.

• Quota sampling (a type of purposive sampling): The researcher determines the
number of participants and what characteristics will be needed, and then selects
a sample based on those.

• Theoretical sampling: The researcher selects a sample that will assist him or her
in developing a theory.

• Convenience sampling: The researcher selects anyone who shows up for the
study, regardless of individual demographics.

• Snowball sampling (a type of purposive sampling): The researcher collects data
on a few participants that he or she has access to and then asks those participants
for referrals to other individuals who are within the same population.

All these approaches serve a somewhat different purpose. However, the underlying prin-
ciple common to all of these techniques is selecting information-rich cases.

The following questions, adapted from Houser (2009), are provided to assist in the evalu-
ation of the sampling methods section:

• What type of sampling method is used?
• Are the sampling procedures consistent with the purpose and research

questions?
• Are relevant demographic characteristics of the sample clearly identified?
• Do the methods of sample selection provide a good representative sample, based

on the population?
• Are there any apparent biases in the selection of the sample?
• Is the sample size large enough for the study proposed?

new85743_03_c03_103-168.indd 146 6/18/13 12:01 PM

147

CHAPTER 3Section 3.3 Critiquing a Qualitative Study

Evaluating the Methods Section—Procedures

The procedures section provides a detailed description of everything that was conducted
in the study. For qualitative studies, this involves primarily the type of research design
that was employed. When evaluating the procedures section, it is important to examine
whether the research design is appropriate for the study, as well as whether it is consistent
with the purpose and research questions. The following questions, adapted from Houser
(2009), are provided to assist in the evaluation of the procedures section:

• What type of research design is used (e.g., case study, phenomenological study)?
• Is the research design consistent with the purpose and research questions?
• Did the researcher provide a detailed description of what was conducted?
• Did the researcher introduce any bias in the procedures used?

Evaluating the Methods Section—Instruments

The instruments section provides a detailed description regarding the types of instru-
ments and measures that were used to collect the data. In qualitative research, common
instrumentation includes interviews, observations, and journals. When evaluating the
instruments section, it is important to consider whether the instruments were appropri-
ate for the study and the sample and whether there were any limitations in the types of
instruments utilized. The following questions, adapted from Houser (2009), are provided
to assist in the evaluation of the instruments section:

• Is there a clear and adequate description of the instruments (e.g., data collection
measures) used?

• What types of measures were used in the study (observations, interviews, etc.)?
• What are some potential problems or limitations of the types of measures used?
• Does the instrument appear to be appropriate for the sample?

Evaluating the Results Section

The results section describes findings from the study. Unlike quantitative studies, which
focus on statistical analyses and results, qualitative studies include descriptions about the
findings. When evaluating the Results section, it is important to examine how the data
were analyzed (using themes, patterns, codes, etc.), whether concrete examples supported
the themes or concepts, and how adequate the descriptions were to the findings. The fol-
lowing questions, adapted from Houser (2009), are provided to assist in the evaluation of
the results section:

• What strategies were used for coding and interpreting the data? Were they
clearly described?

• Are concrete examples provided that link to identified themes or concepts? Are
the examples adequate?

new85743_03_c03_103-168.indd 147 6/18/13 12:01 PM

148

CHAPTER 3Section

3.4 Writing the Qualitative Research Proposal

Evaluating the Discussion Section

The discussion section summarizes the purpose of the research and what the findings
imply for future research and actual practice. Additionally, the discussion section includes
alternative explanations and potential limitations of the findings. The following ques-
tions, adapted from Houser (2009), are provided to assist in the evaluation of the discus-
sion section:

• Do the researchers clearly restate the purpose and research questions?
• Do the researchers clearly discuss the implications of the findings and how they

relate to theories, other findings, and actual practice?
• Do the researchers provide alternative explanations of the findings?
• Do the researchers identify potential limitations of the study and the results?
• Do the researchers identify possible directions for future research?

3.4 Writing the Qualitative Research Proposal

Although the format of a qualitative research proposal looks similar to the proposal provided in Chapter 1, Section 1.6, the content of some of the sections will differ from that of a quantitative study. Like quantitative studies, the qualitative proposal
includes a title page, abstract page, and an introduction that discusses the research prob-
lem, statement of the problem, research questions, and importance of the study. (Please
refer to Chapter 1, Section 1.3, Research Problem and Questions, for further guidance.)
However, the literature review and methods sections will differ with respect to focus and
content. The following sections discuss the writing requirements for the qualitative litera-
ture review and methods sections.

The Literature Review Section

As discussed in Chapter 1, the primary purpose of the literature review is to cover theoreti-
cal perspectives and previous research findings on the research problem you have selected
(Leedy & Ormrod, 2010). The literature review should demonstrate how your study will
clarify or provide further information on shortcomings found in previous research as well
as how your study will add to the existing literature. The purpose of the literature review
for qualitative studies is slightly different from that of quantitative studies and will vary
depending on the type of research design you are using. Table 3.4 summarizes the pur-
poses of the literature review with respect to research design.

new85743_03_c03_103-168.indd 148 6/18/13 12:01 PM

149

CHAPTER 3Section 3.4 Writing the Qualitative Research Proposal

Table 3.4: Purposes of the literature review in qualitative research

Type of Qualitative Research Purpose of the Literature Review

Ethnographical research Review the literature to provide a background for conducting the study

Phenomenological research Compare and combine findings from the study with the literature to
determine current knowledge of a phenomenon

Grounded theory research Use the literature to explain, support, and extend the theory
generated in the study

Case study research Review the literature to provide a background for the study, as well as
explain and support the study

Archival/historical research Review the literature to develop research questions and provide a
source of data

Observational research Review the literature to provide a background for conducting the study.

Adapted from Burns & Grove, 2005, p. 95.

Part of your literature review should include information about the research design you
have selected. So you will want to include information from books and articles, as well as
other research studies that have employed the same design.

The Method Section

As discussed in Chapter 1, the Method section includes a detailed description of the
method of inquiry (quantitative, qualitative, or mixed design approach), research method
used, the sample, data collection procedures, and data analysis techniques. The key pur-
pose of the Method section is to discuss your design and the specific steps and procedures
you plan to follow in order to complete your study.

Similar to quantitative proposals, qualitative proposals include the research method that
will be used. However, qualitative proposals require explanations on why other meth-
ods (such as quantitative or mixed designs) would have been less effective for the study.
Another key difference in qualitative proposals is that they feature a discussion of the
researcher’s role during data collection procedures. Since the researcher is considered the
assessment or instrument tool in most qualitative studies, the impact of researcher bias and
researcher effects needs to be discussed in detail. Additionally, because qualitative study
samples are generally smaller than quantitative ones, the researchers should justify the
appropriateness of the sample size in relation to the research design and questions.

The Method section for qualitative proposals is generally lengthier than for quantitative
proposals because more thoroughness is required when describing the procedures and
data collection methods used. For example, if conducting ethnographic research, you will
need to describe the site that was selected, how the site was selected, how you will enter
the site, how you will gain rapport with the subjects, and the various types of data collec-
tion procedures you will use. You will also want to discuss the length of the data collection
period and how you plan to exit the site. Additionally, data collection can be more cum-
bersome since qualitative studies tend to employ different methods for it. And, because
the data collected are found in detailed narratives, you will want to describe the process
you will use to analyze the narratives as well as the various data analysis procedures you
will use.

new85743_03_c03_103-168.indd 149 6/18/13 12:01 PM

CHAPTER 3Section

3.5 Describing Data in Descriptive Research

To cap off our discussion of descriptive research designs, this section will cover the process of presenting descriptive data in both graphical and numeric form. No mat-ter how you present your data, a good description should be accurate, concise, and
easy to understand. In other words, you have to represent the data accurately and in the
most efficient way possible so that your audience can understand it. Another, more elo-
quent way to think of these principles is to take the advice of Edward Tufte, a statistician
and expert in the display of visual information. Tufte suggests that when people view
your visual displays, they should spend time on “content-reasoning” rather than “design-
decoding” (Tufte, 2001). The sole purpose of designing visual presentations is to com-
municate your information. So the audience should spend time thinking about what you
have to say, not trying to puzzle through the display itself. The following sections cover
guidelines for accomplishing this goal in both numeric and visual form.

Table 3.5 presents hypothetical data from a sample of 20 participants. In this example, we
have asked people to report their gender and ethnicity, as well as answer questions about
their overall life satisfaction and level of daily stress. Each row in this table represents
one participant in the study, and each column represents one of the variables for which
data were collected. In the following sections, we will explore different options for sum-
marizing these sample data, first in numeric form and then using a series of graphs. In
this chapter, the focus is on ways to describe the sample characteristics. In later chapters,
we will return to these principles when discussing graphs that display the relationship
between two or more variables.

Table 3.5: Raw data from a sample of 20 individuals

Subject ID Gender Ethnicity Life Satisfaction Daily Stress

1 Male European American 40

2 Male European American 47 9

3 Female Asian 29 8

4 Male European American 32 9

5 Female Hispanic 25 3

6 Female Hispanic 35 3

7 Female European American 28 8

8 Male Hispanic 40 9

9 Male Asian 37 10

10 Female African American 30 10

11 Male European American 43 8

12 Male Asian 40 4

13 Male European American 48 7

(continued)

new85743_03_c03_103-168.indd 150 6/18/13 12:01 PM

151

CHAPTER 3Section 3.5 Describing Data in Descriptive Research

Table 3.5: Raw data from a sample of 20 individuals (continued)

Subject ID Gender Ethnicity Life Satisfaction Daily Stress

14 Female African American 30 4

15 Female European American 37 7

16 Male Hispanic 40 1

17 Female European American 36 1

18 Male African American 45 8

19 Female European American 42 8

20 Female African American 38 7

Numerical Descriptions

Raw data, as shown in Table 3.5, illustrate the actual characteristics or scores for every
participant in the sample. To better understand raw datasets, researchers often use numer-
ical descriptions, or descriptive statistics, to summarize a set of scores or a distribution of
numbers. For example, utilizing the raw data shown previously, a researcher may want to
be able to communicate the total number of females and males that were included in the
sample, as well as the average Life Satisfaction score across all participants. In order to cal-
culate these, you would use descriptive statistics such as frequencies (how many females
and males were included in the sample); measures of central tendency (i.e., the mean,
median, and mode of the Life Satisfaction scores); and measures of variability or distri-
bution (i.e., the variance and standard deviation of the Life Satisfaction scores). Descrip-
tive statistics are generally used first to describe the sample (e.g., how many females and
males are included in the sample) and then to describe the scores. The following section
will discuss common procedures used to summarize sets of data.

Frequency Tables
Often, a good first step in approaching your dataset is to get a sense of the frequencies for
your demographic variables—gender and ethnicity in this example. The frequency tables
shown in Table 3.6 are designed to present the number and percentage of the sample that
fall into each of a set of categories. As you can see in this pair of tables, our sample con-
sisted of an equal number of men and women (i.e., 50% for each gender). The majority
of our participants were European American (45%), with the remainder divided almost
equally between African American (20%), Asian (15%), and Hispanic (20%) ethnicities.

new85743_03_c03_103-168.indd 151 6/18/13 12:01 PM

152

CHAPTER 3Section 3.5 Describing Data in Descriptive Research

Table 3.6: Frequency table summarizing ethnicity and sex distribution

Gender Frequency Percentage Valid Percentage Cumulative
Percentage

Female 10 50.0 50.0 50.0

Male 10 50.0 50.0 100.0

Total 20 100.0 100.0

Ethnicity Frequency Percentage Valid Percentage Cumulative
Percentage

African American 4 20.0 20.0 20.0

Asian 3 15.0 15.0 35.0

Hispanic 4 20.0 20.0 55.0

European
American

9 45.0 45.0 100.0

Total 20 100.0 100.0

We can gain a lot of information from numerical summaries of data. In fact, numeric
descriptors form the starting point for doing inferential statistics and testing our hypoth-
eses. We will cover these statistics in later chapters, but for now it is important to under-
stand that two numeric descriptors can provide a wealth of information about our dataset:
measures of central tendency and measures of dispersion.

Measures of Central Tendency
The first number we need to describe our data is a measure of central tendency, which
represents the most typical case in our dataset. There are three indices for representing
central tendency:

The mean is the mathematical average of our dataset, calculated using the following
formula:

M 5
gX
N

The capital letter M is used to indicate the mean; the X refers to individual scores, and the
capital letter N refers to the total number of data points in the sample. Finally, the Greek
letter sigma, or , is a common symbol used to indicate the sum of a set of values.

So, in calculating the mean, we add up all the scores in our dataset (X) and then divide
this total by the number of scores in the dataset (N). Because we are adding and dividing
our scores, the mean can be calculated only using interval or ratio data (see Chapter 2, Sec-
tion 2.3, for a review of the four scales of measurement). In our sample dataset, we could
calculate the mean for both life satisfaction and daily stress. To calculate the mean value
for life satisfaction scores, we would first add the 20 individual scores (i.e., 40 1 47 1 29 1
32 1 . . . 1 38), and then divide this total by the number of people in the sample (i.e., 20).

new85743_03_c03_103-168.indd 152 6/18/13 12:01 PM

153

CHAPTER 3Section 3.5 Describing Data in Descriptive Research
M 5
gX
N

5
742
20

5 37.1

In other words, the mean, or most typical satisfaction rating in this sample, is 37.1.

The median is another measure of central tendency, representing the number in the mid-
dle of our dataset, with 50% of scores both above and below it. The location of the median
is calculated by placing the list of values in ascending numeric order, then using the

following formula:

Mdn 5
(N 1 1)

2
.

For example, if you have 9 scores, the median will be the fifth one:

Mdn 5
(N 1 1)

2
5

(9 1 1)
2

5
10
2

5 5 .

If you have an even number of scores, say 8, the median will fall between two scores:

Mdn 5
(8 1 1)

2
5

9
2

5 4.5 , or the average of the fourth and fifth one.

This measure of central tendency can be used for ordinal, interval, or ratio data because it
does not require mathematical manipulation to obtain. So in our sample dataset, we could
calculate the median for either life satisfaction or daily stress scores. To find the median
score for life satisfaction, we would sort the data in order of increasing satisfaction scores
(which in this case has already been done). Next, we find the position of the median using

the formula Mdn 5
(N 1 1)

2
. Because we have an N of 20 scores:

Mdn 5
(N 1 1)
2
5

20 1 1
2

5
21
2

5 10.5 .

In other words, the median will be the average of the 10th and 11th scores. The 10th par-
ticipant scored a 37, and the 11th participant scored a 38, for a median of 37.5. The median
is another way to represent the most typical score on life satisfaction, so it is no accident
that it is so similar to the mean (i.e., 37.1).

The final measure of central tendency, the mode, represents the most frequent score in our
dataset, obtained either by visual inspection of the values or by consulting a frequency
table like Table 3.6. Because the mode represents a simple frequency count, it can be used
with any of the four scales of measurement. In addition, it is the only measure of central
tendency that is valid for use with nominal data (consisting of group labels) since the
numbers assigned to these data are arbitrary.

So in our sample data we could calculate the mode for any variable in the table. To find
the mode of life satisfaction scores, we would simply scan the table for the most common
score, which turns out to be 40. Thus, we have one more way to represent the most typi-
cal score on life satisfaction. Note that the mode is slightly higher than our mean (37.1) or
our median (37.5). We will return to this issue shortly and discuss the process of choosing
the most representative measure. Since we have been ignoring the nominal variables so
far, let’s also find the mode for ethnicity. This is accomplished by tallying up the number
of people in each category—or, better yet, by letting a computer program do the tallying

new85743_03_c03_103-168.indd 153 6/18/13 12:01 PM

154

CHAPTER 3Section 3.5 Describing Data in Descriptive Research

for you. As we saw earlier, the majority of our participants were European American
(45%), with the remainder divided almost equally among African American (20%), Asian
(15%), and Hispanic (20%) ethnicities. So the modal (most typical) value of ethnicity in
this sample was European American.

One important take-home point is that your scale of measurement largely dictates the
choice between measures of central tendency—nominal scales can use only the mode, and
interval or ratio scales can use only the mean. The other piece of the puzzle is to consider
which measure best represents the data. Remember that the central tendency is a way to
represent the “typical” case with a single number, so the goal is to settle on the most rep-
resentative number. This process is illustrated by the examples in Table 3.7.

Table 3.7: Comparing the mean, median, and mode

Data Mean Median Mode Analysis

1, 2, 3,
4, 5, 11,
11

5.29 4 11 • Both the mean and the median seem to represent
the data fairly well.

• The mean is a slightly better choice because it hints
at the higher scores.

• The mode is not representative—two people seem
to have higher scores than everyone else.

1, 1, 1,
5, 10,
10, 100

18.29 5 1 • The mean is inflated by the atypical score of
100 and therefore does not represent the data
accurately.

• The mode is also not representative because it
ignores the higher values.

• In this case, the median is the most representative
value to describe this dataset.

Let’s look at one more example, using the “daily stress” variable from our sample data in
Table 3.5. The daily stress values of our 20 participants were as follows: 1, 1, 3, 3, 4, 4, 7, 7,
7, 8, 8, 8, 8, 8, 9, 9, 9, 10, 10, and 10.

• To calculate the mean of these values, we add up all the values and divide by our
sample size of 20:

M 5
gX
N

5
134
20

5 6.70

• To calculate the median of these values, we use the formula Mdn 5
(N 1 1)

2
.

to find the middle score: Mdn 5
(N 1 1)

2
5

(21)
2

5 10.5 . This tells us that our

median is the average of our 10th and 11th scores, or 8.
• To obtain the mode of these values, we can inspect the data and determine that 8

is the most common number because it occurs five times.

new85743_03_c03_103-168.indd 154 6/18/13 12:01 PM

155

CHAPTER 3Section 3.5 Describing Data in Descriptive Research

In analyzing these three measures of central tendency, we see that they all appear to rep-
resent the data accurately. The mean is a slightly better choice than the other two because
it represents the lower values as well as the higher ones.

Measures of Dispersion
The second measure used to describe our dataset is a measure of dispersion, or the spread
of scores around the central tendency. Measures of dispersion tell us just how typical the
typical score is. If the dispersion is low, then scores are clustered tightly around the cen-
tral tendency; if dispersion is higher, then the scores stretch out farther from the central
tendency. Figure 3.2 presents a conceptual illustration of dispersion. The graph on the left
has a low amount of dispersion because the scores (yellow curve) cluster tightly around
the average value (red dotted line). The graph on the right shows a high amount of dis-
persion because the scores (yellow curve) spread out widely from the average value (red
dotted line).

Figure 3.2: Two distributions with a low versus high amount of dispersion

One of the most straightforward measures of dispersion is the range, which is the differ-
ence between the highest and lowest scores. In the case of our daily stress data, the range
would be found by simply subtracting the lowest value (1) from the highest value (10)
to get a range of 9. The range is useful for getting a general idea of the spread of scores,
although it does not tell us much about how tightly these scores cluster around the mean.

The most common measures of dispersion are the variance and standard deviation, both
of which represent the average difference between the mean and each individual score.
The variance (abbreviated S2) is calculated by subtracting each score from the mean to
get a deviation score, squaring and summing these individual deviation scores, and then
dividing by the sample size. The more scores are spread out around the mean, the higher
the sum of our deviation scores will be, and therefore the higher our variance will be. The
deviation scores are squared because otherwise their sum would always equal zero; that
is, (X 2 M) 5 0. Finally, the standard deviation, abbreviated SD, is calculated by taking
the square root of our variance. This four-step process is illustrated in Table 3.8, using a
hypothetical dataset of 10 participants.

Once you know the central tendency and the dispersion of your variables, you have a
good sense of what the sample looks like. These numbers are also a valuable piece for
calculating the inferential statistics that we ultimately use to test our hypotheses.

Low Amount of
Dispersion Around the
Mean (red dotted line)

High Amount of
Dispersion Around the
Mean (red dotted line)

new85743_03_c03_103-168.indd 155 6/18/13 12:01 PM

156

CHAPTER 3Section 3.5 Describing Data in Descriptive Research

Table 3.8: Steps to calculate the variance and standard deviation

Values 1. Subtract values
from mean.

2. Square and sum
deviation scores.

3. Calculate variance.

1 (1 2 5.4) 5 24.4 24.42 5 19.36

S2 5
g(X 2 X )

N
5

82.4
10

5 8.24

2 (2 2 5.4) 5 23.4 23.42 5 11.56

4 (4 2 5.4) 5 21.4 21.42 5 1.96

5 (5 2 5.4) 5 20.4 20.42 5 0.16

7 (7 2 5.4) 5 1.6 1.62 5 2.56 4. Calculate standard deviation.

7 (7 2 5.4) 5 1.6 1.62 5 2.56

s 5 “s2 5 “8.24 5 2.87
8 (8 2 5.4) 5 2.6 2.62 5 6.76

9 (9 2 5.4) 5 3.6 3.62 5 12.96

mean 5
5.40

 5 0.00  5 82.40

Standard Scores
So far, we have been discussing ways to describe one particular sample in numeric terms.
But what do we do when we want to compare results from different samples or from stud-
ies using different scales? Let’s say you want to compare the anxiety levels of two people;
unfortunately, in this example, the people were measured using different anxiety scales:

Joe scored 25 on the ABC Anxiety Scale, which has a mean of 15 and a stan-
dard deviation of 2.

Deb scored 40 on the XYZ Anxiety Scale, which has a mean of 30 and a
standard deviation of 10.

At first glance, Deb’s anxiety score appears higher, but note that the scales have different
properties: The ABC scale has an average score of 15, while the XYZ scale has an average
score of 30. The dispersion of these scales is also different; scores on the ABC scale cluster
more tightly around the mean (i.e., SD 5 2 compared with SD 5 10).

The solution for comparing these scores is to convert both of them to standard scores (or
z-scores), which represent the distance of each score from the sample mean, expressed in
standard deviation units. The formula for a z-score is:

z 5
x 2 M

This formula subtracts the individual score from the mean and then divides this difference
by the standard deviation of the sample. In order to compare Joe’s score with Deb’s score,
we simply plug in the appropriate numbers, using the mean and standard deviation from

new85743_03_c03_103-168.indd 156 6/18/13 12:01 PM

157

CHAPTER 3Section 3.5 Describing Data in Descriptive Research

the scale that each one completed. This lets us put scores from very different distributions
on the same scale. So, in this case:

Joe: z 5
x 2 M

SD
5

25 2 15
2

5
10
2

5 5

Deb: z 5
x 2 M

SD
5

40 2 30
10

5
10
10

5 1

The resulting scores represent each person’s score in standard deviation terms: Joe is 5
standard deviations above the mean of the ABC scale, while Deb is only 1 standard devia-
tion above the mean of the XYZ scale. Or, in plain English, Joe is actually considerably
more anxious than Deb.

In order to understand just how anxious Joe is, it is helpful to know a bit about why this
technique works. If you have taken a statistics class, you will have encountered the con-
cept of the normal distribution (or “bell curve”), a symmetric distribution with an equal
number of scores on either side of the mean, as illustrated in Figure 3.3.

Figure 3.3: Standard deviations and the normal distribution

It turns out that lots of variables in the social and behavioral sciences fit this normal dis-
tribution, provided the sample sizes are large enough. The useful thing about a normal
distribution is that it has a consistent set of properties, such as having the same value for
mean, median, and mode. In addition, if the distribution is normal, each standard devia-
tion cuts off a known percentage of the curve, as illustrated in Figure 3.3. That is, 68%
of scores will fall within 6 one standard deviation of the mean; 95% of scores will fall
within 6 two standard deviations; and 99.7% of scores will fall within 6 three standard
deviations.

These percentages allow us to understand our individual data points in even more use-
ful ways, because we can easily move back and forth between z-scores, percentages, and
standard deviations. Take our example of Joe and Deb’s anxiety scores: Deb has a z-score

Low –3SD –2SD –1SD +1SD +2SD +3SDMean

Score

68%

95%

99.7%

High

F
re

q
u

e
n

c
y

new85743_03_c03_103-168.indd 157 6/18/13 12:01 PM

158

CHAPTER 3Section 3.5 Describing Data in Descriptive Research

of 1, which means her anxiety is 1 standard deviation above the mean. And, as we can see
by consulting the normal distribution, her anxiety level is higher than 84% of the popula-
tion. Poor Joe has a z-score of 5, which means his anxiety is 5 standard deviations above
the mean. This also means that his anxiety is higher than 99.999% of the population. (See
http://www.measuringusability.com/pcalcz.php for a handy online calculator that con-
verts between z-scores and percentages.)

This relationship between z-scores and percentiles is also commonly used in discussions
of intelligence test scores. Tests that purport to measure IQ are converted to a scale that
has a mean of 100 and a standard deviation of 15. Because IQ is normally distributed, we
are able to move easily back and forth between z-scores and percentages. For example,
someone who has an IQ test score of 130 falls 2 standard deviations above the mean and
falls in the upper 2.5% of the population. A person with an IQ test score of 70 is 2 standard
deviations below the mean and thus falls in the bottom 2.5% of the population.

Ultimately, the use of standard scores allows us to take data that have been collected
on different scales—perhaps in different laboratories and different countries—and place
them on the same metric (standard of measurement) for comparison. As we have dis-
cussed in several contexts, science is all about the accumulation of knowledge one study
at a time. The best support for an idea comes when it is supported by data from different
researchers, using different measures to capture the same concept. The ability to convert
these different measures back to the same metric is an invaluable tool for researchers who
want to compare research results.

Visual Descriptions

Displaying your data in visual form is often one of the most effective ways to communi-
cate your findings—hence the cliché, a picture is worth a thousand words. But what sort
of visual aids should you use? Your choice of graphs should be guided by two criteria: the
scale of measurement and the best fit for the results.

Displaying Frequencies
One common type of graph is the bar graph, which also summarizes the frequency of data
by category. Figure 3.4a presents a bar graph, showing our four categories of ethnicity
along the horizontal axis and the number of people falling into each category indicated
by the height of the bars. So, for example, this sample contains 9 European American par-
ticipants and 4 Hispanic participants. You’ll notice that these bar graphs contain exactly
the same information as the frequency table in Table 3.6. When reporting your results in a
paper, you would, of course, use only one of these methods; more often than not, graphi-
cal displays are the most effective way to communicate information.

Figure 3.4b shows another variation on the bar graph, the clustered bar graph, which
summarizes frequency by two categories at one time. In this case, our bar graph displays
information about both gender and ethnicity. As in the previous graph, our categories of
ethnicity are displayed along the horizontal axis. But this time, we have divided the total
number of each ethnicity by the gender of respondents—indicated using different colored
bars. For example, you can see that our 9 European American participants are divided into
5 males and 4 females; similarly, our 4 African American participants are divided into
1 male and 3 females.

F
re
q
u
e
n
c
y
10

Asian Hispanic African American

Ethnicity

European American

9
8
7
6
5
4
3
1
2
0

new85743_03_c03_103-168.indd 158 6/18/13 12:01 PM

http://www.measuringusability.com/pcalcz.php

159

CHAPTER 3Section 3.5 Describing Data in Descriptive Research

Figure 3.4: Bar graph displaying (a) frequency by ethnicity and
(b) clustered bar graph displaying frequency by ethnicity and gender

The important rule to keep in mind with bar graphs is that they are used for qualitative,
or nominal, categories—that is, those that do not have a numerical value. We could just as
easily have listed European American participants second, third, or fourth along the axis
because ethnicity is measured on a nominal scale.

When we want to present quantitative data—that is, those values measured on an ordi-
nal, interval, or ratio scale—we use a different kind of graph called a histogram. As seen
in Figure 3.5a, histograms are drawn with the bars touching one another to indicate that
the categories are quantitative and on a continuous scale. In this figure, we have broken
down the “life satisfaction” values into three categories (less than 31, 31–40, and 41–50)

F
re
q
u
e
n
c
y
6

European American Asian Hispanic African American

Gender

Male

Female

5
4
3
2
1
0
of 1, which means her anxiety is 1 standard deviation above the mean. And, as we can see
by consulting the normal distribution, her anxiety level is higher than 84% of the popula-
tion. Poor Joe has a z-score of 5, which means his anxiety is 5 standard deviations above
the mean. This also means that his anxiety is higher than 99.999% of the population. (See
http://www.measuringusability.com/pcalcz.php for a handy online calculator that con-
verts between z-scores and percentages.)
This relationship between z-scores and percentiles is also commonly used in discussions
of intelligence test scores. Tests that purport to measure IQ are converted to a scale that
has a mean of 100 and a standard deviation of 15. Because IQ is normally distributed, we
are able to move easily back and forth between z-scores and percentages. For example,
someone who has an IQ test score of 130 falls 2 standard deviations above the mean and
falls in the upper 2.5% of the population. A person with an IQ test score of 70 is 2 standard
deviations below the mean and thus falls in the bottom 2.5% of the population.
Ultimately, the use of standard scores allows us to take data that have been collected
on different scales—perhaps in different laboratories and different countries—and place
them on the same metric (standard of measurement) for comparison. As we have dis-
cussed in several contexts, science is all about the accumulation of knowledge one study
at a time. The best support for an idea comes when it is supported by data from different
researchers, using different measures to capture the same concept. The ability to convert
these different measures back to the same metric is an invaluable tool for researchers who
want to compare research results.
Visual Descriptions
Displaying your data in visual form is often one of the most effective ways to communi-
cate your findings—hence the cliché, a picture is worth a thousand words. But what sort
of visual aids should you use? Your choice of graphs should be guided by two criteria: the
scale of measurement and the best fit for the results.
Displaying Frequencies
One common type of graph is the bar graph, which also summarizes the frequency of data
by category. Figure 3.4a presents a bar graph, showing our four categories of ethnicity
along the horizontal axis and the number of people falling into each category indicated
by the height of the bars. So, for example, this sample contains 9 European American par-
ticipants and 4 Hispanic participants. You’ll notice that these bar graphs contain exactly
the same information as the frequency table in Table 3.6. When reporting your results in a
paper, you would, of course, use only one of these methods; more often than not, graphi-
cal displays are the most effective way to communicate information.
Figure 3.4b shows another variation on the bar graph, the clustered bar graph, which
summarizes frequency by two categories at one time. In this case, our bar graph displays
information about both gender and ethnicity. As in the previous graph, our categories of
ethnicity are displayed along the horizontal axis. But this time, we have divided the total
number of each ethnicity by the gender of respondents—indicated using different colored
bars. For example, you can see that our 9 European American participants are divided into
5 males and 4 females; similarly, our 4 African American participants are divided into
1 male and 3 females.
F
re
q
u
e
n
c
y
10
Asian Hispanic African American
Ethnicity
European American
9
8
7
6
5
4
3
1
2
0

(a)

(b)

new85743_03_c03_103-168.indd 159 6/18/13 12:01 PM

http://www.measuringusability.com/pcalcz.php

160

CHAPTER 3Section 3.5 Describing Data in Descriptive Research

and displayed the frequencies for each category in numerical order. For example, you can
see that six people had life satisfaction scores falling between 31 and 40.

Finally, all our bar graphs and histograms so far have displayed data that have been split
into categories. But, as seen in Figure 3.5b, histograms can also present data on a continu-
ous scale. Figure 3.5b has an additional new feature—a curved line overlaid on the graph.
This curve is a representation of a normal distribution and allows us to gauge visually
how close our sample data are to being normally distributed.

Figure 3.5: Histograms showing (a) frequencies by life satisfaction (quanti-
tative) categories and (b) life satisfaction scores on a continuous scale

F
re
q
u
e
n
c
y
10

Less than 31 31–40 41–50

Life Satisfaction

9
8
7
6
5
4
3
1
2
0
F
re
q
u
e
n
c
y
4
3
2
1

0
19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Life Satisfaction
(a)
(b)

new85743_03_c03_103-168.indd 160 6/18/13 12:01 PM

161

CHAPTER 3Section 3.5 Describing Data in Descriptive Research

Displaying Central Tendency
Another common use of graphs is to display numeric descriptors in an easy-to-understand
visual format. That is, we can apply the same principles for displaying information about
our sample frequencies to displaying the typical scores in the sample. If we refer back
to our sample data in Table 3.5, we have information about ethnicity and gender but
also about reports of daily stress and life satisfaction. Thus, a natural question to ask is
whether there are gender or ethnic differences in these two variables. Figure 3.6 displays a
clustered bar graph, displaying the mean level of life satisfaction in each group of partici-
pants. One thing that jumps out is that males appear to report more life satisfaction than
females, as seen by the fact that the red bars are always higher than the gold bars. We can
also see some variation in satisfaction levels by ethnicity: African-American males (45)
seem to report slightly more satisfaction than European American males (42).

Figure 3.6: Clustered bar graph displaying life satisfaction scores by gender
and ethnicity

These particular data are fictional, of course; but even if our graph were displaying real
data, we would want to be cautious in our interpretations. One reason for caution is that
this represents a descriptive study. We might be able to state which demographic groups
report more life satisfaction, but we would be unable to determine the reasons for the
difference. Another, more important reason for caution is that visual presentations can be
misleading, and we would need to conduct statistical analyses to discover the real pat-
terns of differences.

The best way to appreciate this latter point is to see what happens when we tweak the
graph a little bit. Our original graph in Figure 3.6 is a fair representation of the data: The
scale starts at zero, and the y-axis on the left side increases by reasonable intervals. But
if we were trying to win an argument about gender differences in happiness, we could
always alter the scale, as shown in Figure 3.7. These bars represent the same set of means,
but we have compacted the y-axis to show only a small part of the range of the scale.

African–American Asian Hispanic European American

45
40
35
30
25
20
15
5
10
0
Male
Female

new85743_03_c03_103-168.indd 161 6/18/13 12:01 PM

162

CHAPTER 3

Summary

That is, rather than ranging from 0 to 50, this misleading graph ranges from 28 to 45, in
increments of 1. To the uncritical eye, this appears to show an enormous gender difference
in life satisfaction; to the trained eye, this shows an obvious attempt to make the findings
seem more dramatic. Any time you encounter a bar graph that is used to support a par-
ticular argument, always pay close attention to the scale of the results: Does it represent
the actual range of the data, or is it compacted to exaggerate the differences? Likewise,
any time you create a graph to display results, it is your responsibility as a researcher to
ensure that the graph accurately represents the data.

Figure 3.7: Clustered bar graph altered to exaggerate the differences

Summary

In this chapter, we have focused on qualitative and descriptive research designs, the latter being the first of three specific designs covered in the continuum of control. As discussed, qualitative methods differ from descriptive methods, with the latter allow-
ing for the utilization of either qualitative, quantitative, or mixed-method approaches.
Qualitative methods have minimal researcher control and are used to thoroughly explain
or understand an event, situation, or phenomenon in great detail. On the other hand, the
primary goal of descriptive designs is to describe attitudes and behavior, without any pre-
tense of making causal claims. One common feature of both qualitative and descriptive
designs is that they are able to assess behaviors that occur in their natural environment,
or at least in something very close to it. Thus, this chapter first covered three qualita-
tive designs and then three types of descriptive research: ethnographic, phenomenologi-
cal, and grounded theory studies, and case studies, archival research, and observational
research, respectively. Because each of the descriptive methods discussed has the goal of
describing attitudes, feelings, and behaviors, each one can be used from either a quantita-
tive or a qualitative perspective; thus, they were separated from qualitative designs that
utilize only qualitative techniques.

45
44
43
42
41
40
39
38
37
36
35
34
33
32
31
30
29
28

African–American Asian Hispanic European American
Male
Female

new85743_03_c03_103-168.indd 162 6/18/13 12:02 PM

163

CHAPTER 3

As mentioned previously, all the qualitative designs discussed are used to explore, explain,
and understand events or situations in great detail. Thus, the goal of qualitative inquiry is
to understand and explain the personal experiences of the participants from their perspec-
tive and in their own environment. In qualitative research, only qualitative methods are
applied.

In descriptive designs, such as a case study, the researcher studies an individual unit (such
as a single person, group, or event) in great detail over a period of time. This approach
is often used to study special populations and to gather detailed information about rare
phenomena. Unlike qualitative designs, case studies can include either qualitative, quan-
titative, or mixed-method approaches to data collection. On the one hand, case studies
represent one of the lowest points on our continuum of control, owing to the lack of a
comparison group and the difficulty of generalizing from a single case. On the other
hand, case studies are a valuable tool for beginning to study a phenomenon in depth. We
discussed the example of Phineas Gage, who suffered severe brain damage and showed
drastic changes in his personality and cognitive skills. Although it is difficult to generalize
from the specifics of Gage’s experience, this case helped to inspire more than a century’s
worth of research into the connections among mind, brain, and behavior.

Archival research involves drawing new conclusions by analyzing existing sources of
data. This approach is often used to track changes over time or to study things that would
be impossible to measure in a laboratory setting. For example, we discussed Phillips’s
study of copycat suicides, which he conducted by matching newspaper coverage of sui-
cides to subsequent spikes in fatality rates. There would be no practical or ethical way to
study these connections other than by examining the patterns as they occurred naturally.
Archival studies are still relatively low on our continuum of control, primarily because
the researcher does not have much control over how the data are collected. In many cases,
analyzing archives involves a process known as content analysis—that is, developing a cod-
ing strategy to extract relevant information from a broader collection of content. Content
analysis involves a three-step process: identifying the most relevant archives, sampling
from these archives, and finally, coding and recording behaviors. For example, Weigel and
colleagues studied race relations on television by sampling a week’s worth of prime-time
programming and recording the screen time dedicated to portraying different races.

Observational research involves directly observing behavior and recording observations
in a systematic way. This approach is well suited to a wide variety of research questions,
provided that the variables can be directly observed. That is, one can observe what people
do but not why they do it. In exchange for giving up access to internal processes, the
researcher gains access to unfiltered behavioral responses—especially when finding ways
to observe people unobtrusively. We discussed three main types of observational research.
Structured observation involves creating a standardized situation, often in a laboratory set-
ting, and tracking people’s responses. Naturalistic observation involves observing behav-
ior as it occurs naturally, often in its real-world context. Participant observation involves
having the researcher take part in the same activities as the participants in order to gain
greater insight into their private behaviors. All three variations go through a similar three-
step process as archival research: Choose a hypothesis, choose a sampling strategy, and
then code and record behaviors.

Summary

new85743_03_c03_103-168.indd 163 6/18/13 12:02 PM

164

CHAPTER 3

Key Terms

This chapter next covered principles for describing data in both visual and numeric form.
To move toward conducting statistical analyses, it is also useful to summarize data in
numeric form. We discussed two categories of numeric summaries: central tendency and
dispersion. Measures of central tendency (i.e., mean, median, and mode) provide infor-
mation about the “typical” score in a dataset, whereas measures of dispersion (i.e., range,
variance, and standard deviation) provide information about the distribution of scores
around the central tendency—that is, they tell us how typical the typical score is. We
then covered the process of translating scores into standard scores (aka, z-scores), which
express individual scores in terms of standard deviations. This technique is useful for
comparing results from different studies that used different measures.

Finally, we discussed guidelines for visual presentation. If you remember one thing from
this section, it should be that the sole purpose of visual information is to communicate
your findings to an audience. Thus, your descriptions should always be accurate, concise,
and easy to understand. The most common visual displays for summarizing data are bar
graphs (for nominal data) and histograms (for quantitative data). Regardless of the visual
display you choose, it should represent your data accurately; it is especially important to
make sure that the y-axis accurately represents the range of your data.

Key Terms

archival research A descriptive design
that involves drawing conclusions by ana-
lyzing existing sources of data, including
both public and private records.

axial coding Used in grounded the-
ory, involves finding connections or
relationships between categories and
subcategories.

bar graph A visual display that summa-
rizes the frequency of data by category;
used to display nominal data.

case study A descriptive design that pro-
vides a detailed, in-depth analysis of one
person over a period of time.

central tendency A numeric descriptor
that represents the most typical case in a
dataset.

clustered bar graph A visual display that
summarizes frequency data by two catego-
ries at one time; used to display nominal
data.

coding categories Symbols or words
applied to a group of words in order to
categorize the information prior to data
analysis.

content analysis The process of system-
atically extracting and sifting through the
contents of a collection of information.

deviation score The difference between
an individual score and the sample mean,
obtained by subtracting each score from
the mean.

dispersion A numeric descriptor that
represents the spread of scores around the
central tendency.

ecological validity The extent to which
the research setting resembles conditions
in the real world.

ethnography A qualitative method that
focuses on an entire cultural group or a
group that shares a common culture.

new85743_03_c03_103-168.indd 164 6/18/13 12:02 PM

165

CHAPTER 3Key Terms

event sampling In observational research,
a technique that involves observing and
recording behaviors that occur throughout
an entire event.

focus groups Group interviews that are
generally conducted when the researcher
wants to collect data on several individu-
als simultaneously and in the same room.

frequency tables Summary tables that
present the number and percentage of
the sample that fall into each of a set of
categories.

gatekeeper Used in ethnographic
research, facilitates access into the site.

grounded theory A method of research
that builds theories from preexisting
“grounded” data that have been systemati-
cally analyzed and reanalyzed.

histogram A variation of a bar graph used
to display ordinal, interval, or ratio data;
histograms are drawn with the bars touch-
ing one another to indicate that the catego-
ries are quantitative.

individual sampling In observational
research, a technique that involves collect-
ing data by observing one person at a time
in order to test hypotheses about individ-
ual behaviors.

interrater reliability For research that
involves more than one observer using
the same system, ensuring that their data
looks roughly the same.

mean A measure of central tendency that
represents the mathematical average of a
dataset; calculated by adding all the scores
together and then dividing by the number
of scores.

median A measure of central tendency
that represents the number in the middle
of a dataset, with 50% of scores both above
and below it.

mode A measure of central tendency
that represents the most frequent score
in a dataset, obtained either by visually
inspecting the values or by consulting a
frequency table.

naturalistic observation A type of obser-
vational study that involves observing and
systematically recording behavior in the
real world; can be done with or without
intervention by the researcher.

noise Amount of unexplained variation in
a sample.

nominal data Categories that have no
numerical meaning, such as gender,
religious affiliation, or state; values that
cannot be added, subtracted, or sorted in a
logical fashion.

normal distribution (or “bell curve”) A
symmetric distribution with an equal num-
ber of scores on either side of the mean;
has the same value for mean, median, and
mode.

observational research A descriptive
design that involves directly observing
behavior and recording these observations
in an objective and systematic way.

open coding Used in grounded theory,
involves the researcher labeling and cate-
gorizing the data into categories or themes
and smaller subcategories that describe the
phenomenon being investigated.

new85743_03_c03_103-168.indd 165 6/18/13 12:02 PM

166

CHAPTER 3Key Terms

participant-expectancy bias or participant-
observer effect The tendency of people
to behave differently when they are aware
of being observed or to alter their normal
behaviors to be consistent with what they
think the researcher is expecting from them.

participant observation A type of obser-
vational study that involves having the
researcher(s) conduct observations while
engaging in the same activities as the
participants; the goal is to interact with
participants to gain access and insight into
their behaviors.

participant reactivity The tendency of
people to behave differently when they are
aware of being observed.

phenomenological study An investiga-
tion that attempts to understand the inner
experiences of an event, such as a person’s
perceptions of, perspective on, and under-
standing of a particular experience.

range A measure of dispersion that rep-
resents the difference between the highest
and lowest scores.

researcher bias Researcher influences the
results in order to portray or bring about a
certain outcome.

samples Small snippets of the environ-
ment that are relevant to the hypothesis;
also selected groups of study participants,
chosen purposefully or randomly.

saturation Occurs when no additional
supporting or disconfirming data are being
found to develop a category.

selective coding Used in grounded theory
research; involves the researcher combining
categories and their interrelationships into
theoretical constructs (i.e., a story line).

semistructured interview A type of
researcher inquiry that utilizes both preset
(structured) and spontaneous (unstruc-
tured) questions.

standard scores (or z-scores) Scores that
represent the distance of each score from
the sample mean, expressed in standard
deviation units; calculated by subtracting a
score from the mean and then dividing by
the standard deviation.

structured interview A type of research
inquiry that includes asking a fixed set of
either open-ended or closed-ended ques-
tions that are administered in a specific
order.

structured observation A type of observa-
tional study that involves creating a stan-
dard situation in a controlled setting and
then observing participants’ responses.

thick description Detailed representations
that provide depth, breadth, and context to
a particular issue.

time sampling In observational research,
a technique that involves comparing
behaviors during different time intervals.

unstructured interview A type of inter-
view that is commonly used in qualitative
research and utilizes a more open-ended,
unstructured approach, allowing the inter-
viewee to lead the conversation.

variance A measure of dispersion that rep-
resents the average difference between the
mean and each individual score; calculated
by subtracting each score from the mean to
get a deviation score, squaring and sum-
ming these individual deviation scores,
and dividing by the sample size.

new85743_03_c03_103-168.indd 166 6/18/13 12:02 PM

167

CHAPTER 3

Apply Your Knowledge

1. Compare and contrast the sets of the following terms. Your answers should
demonstrate that you understand each term.
a. individual sampling versus event sampling
b. participant observation versus naturalistic observation
c. mean versus median versus mode
d. variance versus standard deviation
e. bar graph versus histogram

2. Place each of the descriptive research methods we have discussed in this chapter
(listed below) on the continuum of control.

archival research
case study
naturalistic observation

3. List one advantage and one disadvantage associated with each of the following
research methods.
a. archival research

advantage:
disadvantage:

b. case studies
advantage:
disadvantage:

c. ethnography studies
advantage:
disadvantage:

d. grounded theory research
advantage:
disadvantage:

e. phenomenological studies
advantage:
disadvantage:

f. observation studies
advantage:
disadvantage:

4. For each of the following datasets, compute the mean, median, mode, and
standard deviation. Once you have all three measures of central tendency,
decide which one is the best representation of the data.
a. 2, 2, 4, 5
b. 10, 13, 15, 100

5. Mike scores an 80 on a math test that has a mean of 100 and a standard deviation
of 20. Convert Mike’s test score into a z-score.

new85743_03_c03_103-168.indd 167 6/18/13 12:02 PM

168

CHAPTER 3

Critical Thinking & Discussion Questions

6. For each of the following relationships, state the best way to present it graphically
(bar graph, clustered bar graph, or histogram).
a. average income by years of school completed (ratio scale)
b. average income based on category of school completed (high school, some

college, college degree, master’s degree, and doctoral degree)
c. average income based on gender and category of school completed

7. For each of the following questions, state how you would test them using an
observational design.
a. Are people who own red cars more likely to drive like maniacs?

(1) What would your hypothesis be?
(2) Where would you get your sample, and how (i.e., which type)?
(3) What categories of behavior would you record? How would you define

them?

b. Are men more likely than women to “lose control” at a party?

(1) What would your hypothesis be?
(2) Where would you get your sample, and how (i.e., which type)?
(3) What categories of behavior would you record? How would you define

them?
c. How many fights break out in an average NHL (hockey) game?

(1) What would your hypothesis be?
(2) Where would you get your sample, and how (i.e., which type)?
(3) What categories of behavior would you record? How would you define
them?
Critical Thinking & Discussion Questions

1. Explain the tradeoffs involved in taking a qualitative versus a quantitative
approach to your research question. What are the pros and cons of each one?

2. What are the advantages and disadvantages of conducting participant
observation?

3. What are the similarities and differences between qualitative research designs
and descriptive research designs?

4. Provide examples of a phenomenological study and an ethnographical study.
What are the similarities and differences between both designs?

5. Which types of qualitative and descriptive research designs described in this
chapter would be considered top-down approaches, and which would be
considered bottom-up?

new85743_03_c03_103-168.indd 168 6/18/13 12:02 PM

Tony Burns/Getty Images

chapter 2

Research Design, Measurement,

and Testing Hypotheses

Chapter Contents

• Overview of Research Designs
• Reliability and

Validity

• Scales and

Types of Measurement

• Hypothesis Testing

CO_

new85743_02_c02_063-102.indd 63 6/18/13 12:16 PM

CHAPTER 2Section

2.1 Overview of Research Designs

In the early 1950s, Canadian physician Hans Selye introduced the term stress into both the medical and popular lexicons. By that time, it had been accepted that humans have a well-evolved fight-or-flight response, which prepares us to either fight back or flee
from danger, largely by releasing adrenaline and mobilizing the body’s resources more
efficiently. While working at McGill University, Selye began to wonder about the health
consequences of this adrenaline and designed an experiment to test his ideas using rats.
Selye injected rats with doses of adrenaline over a period of several days and then eutha-
nized the rats in order to examine the physical effects of the injections. As expected, the
rats that were exposed to adrenaline had developed ill effects, such as ulcers, increased
arterial plaques, and decreases in the size of reproductive glands—all now understood
to be consequences of long-term stress exposure. But there was just one problem. When
Selye took a second group of rats and injected them with a placebo, they also developed
ulcers, plaques, and shrunken reproductive glands!

Fortunately, Selye was able to solve this scientific mystery with a little self-reflection.
Despite all his methodological savvy, he turned out to be rather clumsy when it came to
handling rats, occasionally dropping one when he removed it from its cage for an injec-
tion. In essence, the experience for these rats was one that we would now call stressful,
and it is no surprise that they developed physical ailments in response to it. Rather than
testing the effects of adrenaline injections, Selye was inadvertently testing the effects of
being handled by a clumsy scientist. It is important to note that if Selye ran this study in
the present day, ethical guidelines would dictate much more stringent oversight of his
study procedures in order to protect the welfare of the animals.

This story illustrates two key points about the scientific process. First, as we discussed in
Chapter 1, it is always good to be attentive to your apparent mistakes because they can
lead to valuable insights. Second, it is absolutely vital to measure what you think you
are measuring. In this chapter, we get more concrete about what it means to do research,
beginning with a broad look at the three types of research design. Our goal at this stage
is to get a general sense of what these designs refer to, when they are used, and the main
differences among them. (Chapters 3, 4, and 5 are each dedicated to different types of
research design and elaborate further on each one.) Following our overview of designs,
this chapter covers a set of basic principles that are common to all quantitative research
designs. Regardless of the particulars of your design, all quantitative research studies
involve making sure our measurements are accurate and consistent and that they are cap-
tured using the appropriate type of scale. Finally, we will discuss the general process of
hypothesis testing, from laying out predictions to drawing conclusions.

2.1 Overview of Research Designs

As you learned in Chapter 1, scientists can have a wide range of goals going into a research project, from describing a phenomenon to attempting to change people’s behavior. It turns out that these goals lend themselves to different approaches to
answering a research question. That is, you will approach the problem differently when
you want to describe voting patterns than when you want to explain them or predict
them. These approaches are called research designs, or the specific methods that are used
to collect, analyze, and interpret data. The choice of a design is not one to be made lightly;
the way you collect data trickles down to the kinds of conclusions that you can draw

TX_

new85743_02_c02_063-102.indd 64 6/18/13 12:16 PM

CHAPTER 2Section 2.1 Overview of Research Designs

about them. This section provides a brief introduction to the four main types of design:
qualitative, descriptive, correlational, and experimental.

Qualitative Research

You will recall from Chapter 1 that qualitative research is used to gain a deep and thor-
ough understanding of particular cases and contexts. It is often used when the researcher
wants to obtain more detailed and rich data about personal experiences, events, and
behaviors in their natural environment. If your research question seeks to obtain insight
into and to thoroughly understand people’s attitudes, behaviors, value systems, concerns,
motivations, aspirations, culture, or lifestyles from their perspective, then your research
design will fall under the category of qualitative research. Qualitative research can be very
time-consuming because it delves into great detail about the phenomena of interest, such
as people’s reactions to a particular situation, how a group interacts over time, or how a
person behaves in certain environments and circumstances. The following are examples
of qualitative research questions:

• How do women in a psychology doctoral program describe their decision to
attend an online program versus a campus-based program?

• What is it like to live with a family member who has Alzheimer’s disease?
• What are the familial experiences of teenagers who join gangs?
• How do women who have lost their spouse from a tragic accident experience grief?
• What is the nature of the culture of people living on the island of Niihau?

What these five questions have in common is that they use the words What and How in an
attempt to discover, understand, explore, and describe experiences. They are not trying to
explain the causes of a phenomenon or to predict cause and effect.

Unlike the other designs that will be discussed in this chapter, qualitative research pro-
duces data in the form of words, transcripts, pictures, and stories and generally cannot
(or at least not easily) be converted into numerical data. Thus, qualitative research focuses
on building holistic and largely narrative descriptions to provide an understanding of a
social or cultural phenomenon.

As we will review in Chapter 3, qualitative research is conducted in a natural setting
and involves building a complex and holistic picture of the phenomenon of interest. The
researcher immerses him- or herself into the study and interacts with participants to
obtain a better understanding of their experiences. The goal of qualitative research is not
to test hypotheses but rather to uncover patterns that help explain a phenomenon of inter-
est. Thus, qualitative research begins with research questions and may offer hypotheses
after the study has been conducted. Because of these traits, qualitative research is often
conducted on topics that have not been well researched or on topics that are fairly new.

Descriptive Research

Recall from Chapter 1 that one of the basic goals of research is to describe a phenom-
enon. If your research question centers around description, then your research design
falls under the category of descriptive research, in which the primary goal is to describe
thoughts, feelings, or behaviors. Descriptive research provides a static picture of what

new85743_02_c02_063-102.indd 65 6/18/13 12:16 PM

CHAPTER 2Section 2.1 Overview of Research Designs

people are thinking, feeling, and doing at a given moment in time, as seen in the following
examples of research questions:

• What percentage of doctors prefer Xanax for the treatment of anxiety? (thoughts)
• What percentage of registered Republicans vote for independent candidates?

(behaviors)

• What percentage of Americans blame the president for the economic crisis?

(thoughts)
• What percentage of college students experience clinical depression? (feelings)
• What is the difference in crime rates between Beverly Hills and Detroit?

(behaviors)

What these five questions have in common is that they attempt to describe a phenomenon
without trying to delve into its causes.

The crime rate example highlights the main advantages
and disadvantages of descriptive designs. On the plus
side, descriptive research is a good way to get a broad
overview of a phenomenon and can inspire future
research. It is also a good way to study things that are
difficult to translate into a controlled experimental set-
ting. For example, crime rates can affect every aspect
of people’s lives, and this importance would likely be
lost in an experiment that manipulated income in a
laboratory. On the downside, descriptive research pro-
vides a static overview of a phenomenon and cannot
dig into the reasons for it. A descriptive design might
tell us that Beverly Hills residents are half as likely as
Detroit residents to be assault victims, but it would not
reveal the reasons for this discrepancy. (If we wanted
to understand why this was true, we would use one of
the other designs.)

Descriptive research can be either qualitative or quan-
titative. Descriptions are quantitative when they
include hypotheses and attempt to make compari-
sons and/or to present a random sampling of people’s
opinions. The majority of our sample questions above
would fall into this group because they quantify opin-
ions from samples of households, or cities, or college
students. Good examples of quantitative description

appear in the “snapshot” feature on the front page of USA Today. The graphics represent
poll results from various sources; the snapshot for August 3, 2011, reveals that only 61% of
Americans turn off the water while they brush their teeth (i.e., behavior).

Descriptive designs are qualitative when they include research questions and attempt
to provide a rich description of a particular set of circumstances. A great example of this
approach can be found in the work of neurologist Oliver Sacks. Sacks has written several
books exploring the ways that people with neurological damage or deficits are able to
navigate the world around them. In one selection from The Man Who Mistook His Wife

Jennifer Graylock/Associated Press

Dr. Oliver Sacks studies how people
with neurological damage form and
retain memories.

new85743_02_c02_063-102.indd 66 6/18/13 12:16 PM

CHAPTER 2Section 2.1 Overview of Research Designs

for a Hat (1998), Sacks relates the story of a man he calls William Thompson. As a result
of chronic alcohol abuse, Thompson developed Korsakov’s syndrome, a brain disease
marked by profound memory loss. The memory loss was so severe that Thompson had
effectively “erased” himself and could remember only scattered fragments of his past.

Whenever Thompson encountered people, he would frantically try to determine who he
was. He would develop hypotheses and test them, as in this excerpt from one of Sacks’s
visits:

I am a grocer, and you’re my customer, right? Well, will that be paper
or plastic? No, wait, why are you wearing that white coat? You must be
Hymie, the kosher butcher. Yep. That’s it. But why are there no bloodstains
on your coat? (Sacks, 1998, p. 112)

Sacks concludes that Thompson is “continually creating a world and self, to replace what
was continually being forgotten and lost” (p. 113). In telling this story, Sacks helps us to
understand Thompson’s experience and to be grateful for our ability to form and retain
memories. This story also illustrates the trade-off in these sorts of descriptive case studies:
Despite all its richness, we cannot generalize these details to other cases of brain damage;
we would need to study and describe each patient individually.

Correlational Research

The second goal of research that we discussed in Chapter 1 was to predict a phenomenon.
If your research question centers around prediction, then your research design falls under
the category of correlational research, in which the primary goal is to understand the
relationships among various thoughts, feelings, and behaviors. Examples of correlational
research questions include:

• Are people more aggressive on hot days?
• Are people more likely to smoke when they are drinking?
• Is income level associated with happiness?
• What is the best predictor of success in college?
• Does television viewing relate to hours of exercise?

What each of these questions has in common is that the goal is to predict one variable
based on another. If you know the temperature, can you predict aggression? If you know
a person’s income, can you predict her level of happiness? If you know a student’s SAT
scores, can you predict his college GPA?

These predictive relationships can turn out in one of three ways (more detail on each one
when we get to Chapter 4): A positive correlation means that higher values of one vari-
able predict higher values of the other variable. As in, more money predicts higher levels
of happiness, and less money predicts lower levels of happiness. The key is that these
variables move up and down together, as shown in the first row of Table 2.1. A negative
correlation means that higher values of one variable predict lower values of the other
variable. As in, more television viewing predicts fewer hours of exercise, and fewer hours
of television predict more hours of exercise. The key is that one variable increases while
the other decreases, as seen in the second row of Table 2.1. Finally, it is worth noting a

new85743_02_c02_063-102.indd 67 6/18/13 12:16 PM

CHAPTER 2Section 2.1 Overview of Research Designs

third possibility, which is to have no correlation between two variables, meaning that you
cannot predict one variable based on another. The key is that changes in one variable are
not associated with changes in the other, as seen in the third row of Table 2.1.

Table 2.1: Three possibilities for correlational research

Outcome Description Visual

Positive Correlation Variables go up and down together

For example: Taller people have bigger
feet, and shorter people have smaller
feet

Negative Correlation One variable goes up and the other goes
down

For example: as the number of beers
consumed goes up, speed of reactions
go down

No Correlation The variables have nothing to do with
one another

For example: shoe size and number of
siblings are completely unrelated

Correlational designs are about prediction, and
we are still unable to make causal, explanatory
statements (that comes next. . .). A common
mantra in the field of psychology is that corre-
lation does not equal causation. In other words,
just because variable A predicts variable B does
not mean that A causes B. This is true for two
reasons, which we refer to as the directionality
problem and the third variable problem (See
Figure 2.1).

First, we do not know the direction of the rela-
tionship; A could cause B or B could cause A. For
example, money could cause people to be hap-
pier, or happiness could give people the confi-
dence to find higher-paying jobs. Second, there
could be a third variable that causes both of our
variables to change. For example, increases in
temperature could lead to increases in both homi-
cide rates and ice cream sales, making it seem
like these variables are related to one another.

Figure 2.1: Correlation is not
causation

The Directionality Problem

Income Happiness

The Third Variable Problem

Ice Cream Sales

Temperature

Homicides

# of beers

Reac�on
speed

# Siblings

Shoe size

new85743_02_c02_063-102.indd 68 6/18/13 12:16 PM

CHAPTER 2Section 2.1 Overview of Research Designs

First, when we measure two variables at the same time, we have no way of knowing the
direction of the relationship. Take the relationship between money and happiness: It could
be true that money makes people happier because they can afford nice things and fancy
vacations. It could also be true that happy people have the confidence and charm to obtain
higher-paying jobs, resulting in more money. In a correlational study, we are unable to dis-
tinguish between these possibilities. Or, take the relationship between television viewing
and obesity: It could be that people who watch more television get heavier because sed-
entary TV watching leads to their snacking more and exercising less. It could also be that
people who are overweight don’t have the energy to move around and end up watching
more television as a consequence. Once again, we cannot identify a cause–effect relation-
ship in a correlational study.

Second, when we measure two variables as they naturally occur, there is always the pos-
sibility of a third variable that actually causes both of them. For example, imagine we find
a correlation between the number of churches and the number of liquor stores in a city. Do
people build more churches to offset the threat of vice encouraged by liquor stores? Or do
people build more liquor stores to rebel against the moral code of churches? Most likely,
the link involves the third variable of population: The more people there are living in a
city, the more churches and liquor stores they can support.

Or, consider this example from analyses of posts on the recommendation website Hunch.
com. One of the cofounders of the website conducted extensive analyses of people’s activ-
ity and brand preferences and found a positive correlation between how much people
liked to dance and how likely they were to prefer Apple computers (Fake, 2009). Does this
mean that owning a Mac makes you want to dance? Does dancing make you think highly
of Macs? Most likely, the link here involves a third variable of personality: People who are
more unconventional may be more likely to prefer both Apple computers and dancing.

Experimental Research

Finally, recall that the most powerful goal of research is to attempt to explain and make
cause-and-effect statements about a phenomenon. When your research goal involves
explanation, then your research design falls under the category of experimental research,
in which the primary goal is to explain thoughts, feelings, and behaviors and to make
causal statements. Examples of experimental research questions include:

• Does smoking cause cancer?
• Does alcohol make people more aggressive?
• Does loneliness cause alcoholism?
• Does stress cause heart disease?
• Can meditation make people healthier?

What these five questions have in common is a focus on understanding why something
happens. Experiments move beyond asking, for example, whether alcoholics are more
aggressive to asking whether alcohol causes an increase in aggression.

new85743_02_c02_063-102.indd 69 6/18/13 12:16 PM

CHAPTER 2Section 2.1 Overview of Research Designs

Experimental designs are able to address the shortcomings of correlational designs
because the researcher has more control over the environment. We will cover this in great
detail in Chapter 5, but for now, experiments are a relatively simple process: A researcher
has to control the environment as much as possible so that all participants in the study
have the same experience. He or she will then manipulate, or change, one key variable
and then measure outcomes in another key variable. The variable that gets manipulated
by the experimenter is called the independent variable. The outcome variable that is mea-
sured by the experimenter is called the dependent variable. The combination of controlling
the setting and changing one aspect of this setting at a time allows the researcher to state
with some certainty that the changes caused something to happen.

Let’s make this a little more concrete.
Imagine that you wanted to test the
hypothesis that meditation causes
improvements in health. In this case,
meditation would be the indepen-
dent variable and health would be
the dependent variable. One way to
test this hypothesis would be to take
a group of people and have half of
them meditate 20 minutes per day for
several days while the other half did
something else for the same amount
of time. The group that meditates
would be the experimental group
because it provides the test of our
hypothesis. The group that does not
meditate would be the control group
because it provides a basis of com-
parison for the experimental group.
You would want to make sure that
these groups spent the 20 minutes in similar conditions so that the only difference would
be the presence or absence of meditation. One way to accomplish this would be to have all
participants sit quietly for the 20 minutes but give the experimental group specific instruc-
tions on how to meditate. Then, to test whether meditation led to increased health and
happiness, you would give both groups a set of outcome measures—perhaps a combina-
tion of survey measures and a doctor’s examination. If you found differences between
these groups on the dependent measures, you could be fairly confident that meditation
caused them to happen. For example, you might find lower blood pressure in the experi-
mental group; this would suggest that meditation causes a drop in blood pressure.

Kraig Scarbinsky/Thinkstock

Testing the hypothesis that meditation improves health
requires an experimental group and a control group.

new85743_02_c02_063-102.indd 70 6/18/13 12:16 PM

CHAPTER 2Section 2.1 Overview of Research Designs

Research: Making an Impact

Helping Behaviors

The 1964 murder of Kitty Genovese in plain sight of her neighbors, none of whom helped, drove
numerous researchers to investigate why people may not help others in need. Are people selfish and
bad, or is there a group dynamic at work that leads to inaction? Is there something wrong with our
culture, or are situations more powerful than we think?

Among the body of research conducted in the late 1960s and 1970s was one pivotal study that
revealed why people may not help others in emergencies. Darley and Latané (1968) conducted an
experiment with various individuals in different rooms, communicating via intercom. In reality, it was
one participant and a number of confederates, one of whom pretends to have a seizure. Among par-
ticipants who thought they were the only other person listening over the intercom, more than 80%
helped, and they did so in less than 1 minute. However, among participants who thought they were
one of a group of people listening over the intercom, less than 40% helped, and even then only after
more than 2.5 minutes. This phenomenon, that the more people who witness an emergency, the less
likely any of them is to help, has been dubbed the “bystander effect.” One of the main reasons that
this occurs is that responsibility for helping gets “diffused” among all of the people present, so that
each one feels less personal responsibility for taking action.

This research can be seen in action and has influenced safety measures in today’s society. For exam-
ple, when witnessing an emergency, no longer does it suffice to simply yell to the group, “Call 9-1-1!”
Because of the bystander effect, we know that most people will believe someone else will do it, and
the call will not be made. Instead, it is necessary to point to a specific person to designate them as
the person to make the call. In fact, part of modern-day CPR training involves making individuals
aware of the bystander effect and best practices for getting people to help and be accountable.

Although this phenomenon may be the rule, there are always exceptions. For example, on Septem-
ber 11, 2001, the fourth hijacked airplane was overtaken by a courageous group of passengers. Most
people on the plane had heard about the twin tower crashes, and recognized that their plane was
heading for Washington, D.C. Despite being amongst nearly 100 other people, a few people chose
to help the intended targets in D.C. Risking their own safety, these heroic people chose to help so
as to prevent death and suffering to others. So, while we see events every day that remind us of the
reality of the bystander effect, we also see moments where people are willing to help, no matter the
number of people that surround them.

Choosing a Research Design

The choice of a research design is guided first and foremost by your research topic and
research question, and then adjusted depending on practical and ethical concerns. At this
point, there may be a nagging question in the back of your mind: If experiments are the
most powerful type of design, why not use them every time? Why would you ever give up
the chance to make causal statements? One reason is that we are often interested in vari-
ables that cannot be manipulated, for ethical or practical reasons, and that therefore have to
be studied as they occur naturally. In one example, Matthias Mehl and Jamie Pennebaker
happened to start a weeklong study of college students’ social lives on September 10, 2001.
Following the terrorist attacks on the morning of September 11, Mehl and Pennebaker
were able to track changes in people’s social connections and use this to understand how
groups respond to traumatic events (Mehl & Pennebaker, 2003). Of course, it would have
been unthinkable to experimentally manipulate a terrorist attack for this study, but since
it occurred naturally, the researchers were able to conduct a correlational study of coping.

new85743_02_c02_063-102.indd 71 6/18/13 12:16 PM

CHAPTER 2Section 2.1 Overview of Research Designs

Another reason to use qualitative, descriptive, and correlational designs is that these are
useful in the early stages of research. For example, before you start to think about the
causes of binge drinking among college students, it is important to understand the expe-
riences of binge drinkers and how common this phenomenon is. Before you design a
time- and cost-intensive experiment on the effects of meditation, it is a good idea to con-
duct a correlational study to test whether meditation even predicts health. In fact, this
example comes from a series of real research studies conducted by psychiatrist Sara Lazar
and her colleagues at Massachusetts General Hospital. This research team first discov-
ered that experienced practitioners of mindfulness meditation had more development in
brain areas associated with attention and emotion. But this study was correlational at best;
perhaps meditation causes changes in brain structure or perhaps people who are better
at integrating emotions are drawn to meditation. In a follow-up study, they randomly
assigned people to either meditate or complete stretching exercises for 2 months. These
experimental findings confirmed that mindfulness meditation actually caused structural
changes to the brain (Hölzel et al., 2011). In addition, this is a fantastic example of how
research can progress from correlational to experimental designs. Table 2.2 summarizes
the main advantages and disadvantages of our four types of designs.

Table 2.2: Summary of research designs

Research Design Goal Advantages Disadvantages

Qualitative Obtain insight and
detailed descriptions
into people’s attitudes,
behaviors, value
systems, concerns,
motivations, aspirations,
culture, or lifestyles

Does not require a
strict design plan before
the study begins;
Uncovers in-depth and
rich information about
people’s experiences
in a natural setting;
focuses on people’s
individual experiences

Does not assess
relationships; difficult
to make comparisons;
difficult to make
assumptions beyond the
sample being studied;
very time-consuming;
high level of researcher
involvement could skew
results

Descriptive Describe characteristics
of an existing
phenomenon

Provides a complete
picture of what is
occurring at a given
time

Does not assess
relationships; no
explanation for
phenomenon

Correlational Predict behavior; assess
strength of relationship
between variables

Allows testing of
expected relationships;
enables predictions

Cannot draw
inferences about causal
relationships

Experimental Explain behavior; assess
impact of independent
variable and dependent
variable

Allows conclusions to
be drawn about causal
relationships

Many important
variables cannot be
manipulated

new85743_02_c02_063-102.indd 72 6/18/13 12:16 PM

CHAPTER 2Section

2.2 Reliability and Validity

Designs on the Continuum of Control

Before we leave our design overview behind, a few words on how these designs relate to
one another. The best way to think about the differences between the designs is in terms of
the amount of control you have as a researcher. That is, experimental designs are the most
powerful because the researcher controls everything from the hypothesis to the environ-
ment in which the data are collected. Correlational designs are less powerful because the
researcher is restricted to measuring variables as they occur naturally. However, with cor-
relational designs, the researcher does maintain control over several aspects of data collec-
tion, including the setting and the choice of measures. Descriptive designs and qualitative
designs are the least powerful because it is difficult to control outside influences on data
collection. For example, when people answer opinion polls over the phone, they might be
sitting quietly and pondering the questions or they might be watching television, eating
dinner, and dealing with a fussy toddler. In the case of unstructured, qualitative inter-
views, the researcher generally exerts little control over the direction of the interview and
might obtain different information from various participants, making it difficult to make
comparisons across the data. (We will discuss qualitative methods and interviews fur-
ther in Chapter 3.) As a result, a researcher is more limited in the conclusions he or she
can draw from these data. Figure 2.2 shows an overview of research designs in order of
increasing control, from qualitative and descriptive, to predictive, and to experimental.
As we progress through Chapters 3, 4, and 5, we will cover variations on these designs in
more detail.

Figure 2.2: Research designs on the continuum of control

2.2 Reliability and Validity

Before beginning this section and the rest of this chapter, it should be noted that quali-tative research and qualitative descriptive designs do not test for hypotheses. Rather, they seek to answer research questions in order to understand and describe behav-
iors, experiences, or phenomena and to potentially form hypotheses after the study has
been conducted. In addition, reliability and validity are thought about quite differently
in qualitative research designs and utilize different concepts, such as credibility, transfer-
ability, confirmability, and dependability. As a result, qualitative designs are not discussed
in the following sections of this chapter but will be discussed further in Chapters 3 and 5.

• Survey Research

Predictive
Methods

• Pre-experiments
• Quasi-experiments
• “True” Experiments

Experimental
Methods

• Ethnographic Study
• Phenomenological Study
• Grounded Theory Study
• Case Study
• Archival Research
• Observational Research

Qualitative and
Descriptive Methods

Increasing Control . . .Increasing Control . . .

new85743_02_c02_063-102.indd 73 6/18/13 12:17 PM

CHAPTER 2Section 2.2 Reliability and Validity

Each of the three quantitative designs described in this chapter (descriptive-quantitative,
correlational, and experimental) have the same basic goal: to take a hypothesis about some
phenomenon and translate it into measurable and testable terms. That is, whether we use
a descriptive, correlational, or experimental design to test our predictions about income
and happiness, we still need to translate (or operationalize) the concepts of income and
happiness into measures that will be useful for the study. The sad truth is that our mea-
surements will always be influenced by factors other than the conceptual variable of inter-
est. Answers to any set of questions about happiness will depend both on actual levels of
happiness and the ways people interpret the questions. Our meditation experiment may
have different effects depending on people’s experience with meditation. Even describing
the percentage of Republicans voting for independent candidates will vary depending on
characteristics of a particular candidate.

These additional sources of influence can be grouped into two categories: random and
systematic errors. Random error involves chance fluctuations in measurements, such as
when a few people misunderstand the question or the experimenter enters the wrong
values into a statistical spreadsheet. Although random errors can influence measurement,
they generally cancel out over the span of an entire sample. That is, some people may
overreact to a question while others underreact. The experimenter may accidentally type
a 6 instead of a 5 but then later type a 5 instead of a 6 when entering the data. While both
of these examples would add error to our dataset, they would cancel each other out in a
sufficiently large sample.

Systematic errors, in contrast, are those that systematically increase or decrease along
with values on our measured variable. For example, people who have more experience
with meditation may show consistently more improvement in our meditation experiment
than those with less experience. Or, people with higher self-esteem may score higher on
our measure of happiness than those with lower self-esteem. In this case, our happiness
scale will end up assessing a combination of happiness and self-esteem. These types of
errors can cause more serious trouble for our hypothesis tests because they interfere with
our attempts to understand the link between two variables.

In sum, the measured values of our variable reflect a combination of the true score, ran-
dom error, and systematic error, as shown in the following conceptual equation:

Measured Score 5 True Score 1 (Random Error 1 Systematic Error)

For example:

Happiness Score 5 Level of Happiness 1 (Misreading Question 1 Self-Esteem)

So, if our measurements are also affected by outside influences, how do we know whether
our measures are meaningful? Occasionally, the answer to this question is straightfor-
ward; if we ask people to report their weight or their income level, these values can be
verified using objective sources. However, many of our research questions within psy-
chology involve more ambiguity. How do we know that our happiness scale is the best
one? The problem in answering this question is that we have no way to objectively verify
happiness. What we need, then, are ways to assess how close we are to measuring happi-
ness in a meaningful way. This assessment involves two related concepts: reliability, or
the consistency of a measure; and validity, or the accuracy of a measure. In this section,
we will examine both of these concepts in detail.

new85743_02_c02_063-102.indd 74 6/18/13 12:17 PM

CHAPTER 2Section 2.2 Reliability and Validity

Reliability

The consistency of time measurement by watches, cell phones, and clocks reflects a high
degree of reliability. We think of a watch as reliable when it keeps track of the time consis-
tently. Likewise, our scale is reliable when it gives the same value for our weight in back-
to-back measurements.

Reliability is technically defined as the extent to which a measured variable is free from
random errors. As we discussed above, our measures are never perfect, and reliability is
threatened by five main sources of random error:

• Transient states, or temporary fluctuations in participants’ cognitive or mental state;
for example, some participants may complete your study after an exhausting
midterm or in a bad mood after a fight with their significant others.

• Stable individual differences among participants; for example, some participants
are habitually more motivated, or happier, than other participants.

• Situational factors in the administration of the study; for example, running your
experiment in the early morning may make everyone tired or grumpy.

• Bad measures that add ambiguity or confusion to the measurement; for example, par-
ticipants may respond differently to a question about “the kinds of drugs you are
taking.” Some may take this to mean illegal drugs, whereas others interpret it as
prescription or over-the-counter drugs.

• Mistakes in coding responses during data entry; for example, a handwritten 7
could be mistaken for a 4.

We naturally want to minimize the influence of all of these sources of error, and we will
touch on techniques for doing so throughout the book. However, researchers are also
resigned to the fact that all of our measurements contain a degree of error. The goal, then,
is to develop an estimate of how reliable our measures are. Researchers generally estimate
reliability in three ways.

Test–retest reliability refers to the consistency of our measure over time—much like our
examples of a reliable watch and a reliable scale. A fair number of research questions in
the social and behavioral sciences involve measuring stable qualities. For example, if you
were to design a measure of intelligence or personality, both of these characteristics should
be relatively stable over time. Your score on an intelligence test today should be roughly
the same as your score when you take it again in 5 years. Your level of extraversion today
should correlate highly with your level of extraversion in 20 years. The test–retest reli-
ability of these measures is quantified by simply correlating measures at two time points.
The higher these correlations are, the higher the reliability will be. This makes conceptual
sense as well; if our measured scores reflect the true score more than they reflect random
error, then this will result in increased stability of the measurements.

Interitem reliability refers to the internal consistency among different items on our mea-
sure. If you think back to the last time you completed a survey, you may have noticed that
it seemed to ask the same questions more than once (more on this technique in Chapter
4 (4.1). This is done because a single item is more likely to contain measurement error
than is the average of several items—remember that small random errors tend to cancel
out. Consider the following items from Sheldon Cohen’s Perceived Stress Scale (Cohen,
Kamarck, & Mermelstein, 1983):

new85743_02_c02_063-102.indd 75 6/18/13 12:17 PM

CHAPTER 2Section 2.2 Reliability and Validity

1. In the last month, how often have you felt that you were unable to control the
important things in your life?

2. In the last month, how often have you felt confident about your ability to handle
your personal problems?

3. In the last month, how often have you felt that things were going your way?
4. In the last month, how often have you felt difficulties were piling up so high that

you could not overcome them?

Each of these items taps into the concept of “stressed out,” or overwhelmed by the
demands of one’s life. One standard way to evaluate a measure like this is by computing
the average correlation between each pair of items, a statistic referred to as Cronbach’s
alpha. The more these items tap into a central, consistent construct, the higher the value
of this statistic is. Conceptually, a higher alpha means that variation in responses to the
different items reflects variation in the “true” variable being assessed by the scale items.

Interrater reliability refers to the consistency among judges observing participants’
behavior. The previous two forms of reliability were relevant in dealing with self-report
scales; interrater reliability is more applicable when research involves behavioral mea-
sures. Imagine you are studying the effects of alcohol consumption on aggressive behav-
ior. You would most likely want a group of judges to observe participants in order to make
ratings of their levels of aggression. In the same way that using multiple scale items helps
to cancel out the small errors of individual items, using multiple judges cancels out the
variations in each individual’s ratings. In this case, people could have different ideas and
thresholds for what constitutes aggression. Much like the process of evaluating multiple
scale items, we can evaluate the judges’ ratings by calculating the average correlation
among the ratings. The higher our alpha values, the more the judges agree in their ratings
of aggressive behavior. Conceptually, a higher alpha value means that variation in the
judges’ ratings reflects real variation in levels of aggression.

Validity

Let’s return to our watch and scale examples. Perhaps you are the type of person who
sets your watch 10 minutes ahead to avoid being late. Or perhaps you have adjusted
your scale by 5 pounds to boost your motivation or your self-esteem. In these cases, your
watch and your scale may produce consistent measurements, but the measurements are
not accurate. It turns out that the reliability of a measure is a necessary but not sufficient
basis for evaluating it. Put bluntly, our measures can be (and have to be) consistent but
might still be garbage. The additional piece of the puzzle is the validity of our measures, or
the extent to which they accurately measure what they are designed to measure.

Whereas reliability is threatened more by random error, validity is threatened more by sys-
tematic error. If the measured scores on our happiness scale reflect, say, self-esteem more
than they reflect happiness, this would threaten the validity of our scale. We discussed in
the previous section that a test designed to measure intelligence ought to be consistent
over time. And in fact, these tests do show very high degrees of reliability. However, sev-
eral researchers have cast serious doubts on the validity of intelligence testing, arguing
that even scores on an official IQ test are influenced by a person’s cultural background,
socioeconomic status (SES), and experience with the process of test taking (for discussion
of these critiques, see Daniels et al., 1997; Gould, 1996). For example, children growing up

new85743_02_c02_063-102.indd 76 6/18/13 12:17 PM

CHAPTER 2Section 2.2 Reliability and Validity

in higher SES households tend to have more books in the home, spend more time interact-
ing with one or both parents, and attend schools that have more time and resources avail-
able—all of which are correlated with scores on IQ tests. Thus, all of these factors amount
to systematic error in the measure of intelligence and, therefore, threaten the validity of a
measured score on an intelligence test.

Researchers have two main ways to discuss and evaluate the validity, or accuracy, of mea-
sures: construct validity and criterion validity.

Construct validity is evaluated based on how well the measures capture the underly-
ing conceptual ideas (i.e., the constructs) in a study. These constructs are equivalent
to the “true score” discussed in the previous section. That is, how accurately does our
bathroom scale measure the concept of weight? How accurately does our IQ test mea-
sure the construct of intelligence relative to other things? There are a couple of ways
to assess the validity of our measures. On the subjective end of the continuum, we can
assess the face validity of the measure, or the extent to which it simply seems like a
good measure of the construct. The items from the Perceived Stress Scale have high face
validity because the items match what we intuitively mean by “stress” (e.g., “how often
have you felt difficulties were piling up so high that you could not overcome them?”).
However, if we were to measure your speed at eating hot dogs and then tell you it was
a stress measure, you might be dubious because this would lack face validity as a mea-
sure of stress.

Although face validity is nice to have, it can sometimes (ironically) reduce the validity of
the measures. Imagine seeing the following two measures on a survey of your attitudes:

1. Do you dislike people whose skin color is different from yours?
2. Do you ever beat your children?

On the one hand, these are extremely face-valid measures of attitudes about prejudice and
corporal punishment—they very much capture our intuitive ideas about these concepts.
On the other hand, even people who do support these attitudes may be unlikely to answer
honestly because they can recognize that neither attitude is popular. In cases like this, a
measure low in face validity might end up being the more accurate approach. We will
discuss ways to strike this balance in Chapter 4.

On the less subjective end, we can assess the validity of our constructs by examining
their empirical connections to both related and unrelated measures. Imagine that you
wanted to develop a new measure of narcissism, usually defined as an intense desire
to be liked and admired by other people. Narcissists tend to be self-absorbed but also
very attuned to the feedback they receive from other people—at least as it pertains to the
extent to which people admire them. Narcissism is somewhat similar to self-esteem but
different enough; it is perhaps best viewed as high and unstable self-esteem. So, given
these facts, we might assess the discriminant validity of our measure by making sure it
did not overlap too closely with measures of self-esteem or self-confidence. This would
establish that our measure stands apart from these different constructs. We might then
assess the convergent validity of our measure by making sure that it did correlate with
things like sensitivity to rejection and need for approval. These correlations would place
our measure into a broader theoretical context and help to establish it as a valid measure
of the construct of narcissism.

new85743_02_c02_063-102.indd 77 6/18/13 12:17 PM

CHAPTER 2Section 2.2 Reliability and Validity

Criterion validity is evaluated based on the association between measures and relevant
behavioral outcomes. The criterion in this case refers to a measure that can be used to make
decisions. For example, if you developed a personality test to assess management style,
the most relevant metric of its validity would be whether it predicted a person’s behavior
as a manager. That is, you might expect people scoring high on this scale to be able to
increase the productivity of their employees and to maintain a comfortable work environ-
ment. Likewise, if you developed a measure that predicted the best careers for graduating
seniors based on their skills and personalities, then criterion validity would be assessed
through people’s actual success in these various careers. Whereas construct validity is
more concerned with the underlying theory behind the constructs, criterion validity is
more concerned with the practical application of measures. As you might expect, this
approach is more likely to be used in applied settings.

That said, criterion validity is also a useful way to supplement validation of a new
questionnaire. For example, a questionnaire about generosity should be able to pre-
dict people’s annual giving to charities, and a questionnaire about hostility ought to
predict hostile behaviors. To supplement the construct validity of our narcissism mea-
sure, we might examine its ability to predict the ways people respond to rejection and
approval. Based on the definition of our construct, we might hypothesize that narcis-
sists would become hostile following rejection and perhaps become eager to please fol-
lowing approval. If these predictions were supported, we would end up with further
validation that our measure was capturing the concept of narcissism.

Criterion validity falls into one of two categories, depending on whether the researcher is
interested in the present or the future. Predictive validity involves attempting to predict a
future behavioral outcome based on the measure, as in our examples of the management
style and career placement measures. Predictive validity is also at work when researchers
(and colleges) try to predict likelihood of school success based on SAT or GRE scores. The
goal here is to validate our construct via its ability to predict the future.

In contrast, concurrent validity involves attempting to link a self-report measure with a
behavioral measure collected at the same time, as in our examples of the generosity and
hostility questionnaires. The phrase “at the same time” is used vaguely here; our self-
report and behavioral measures may be separated by a short time span. In fact, concurrent
validity sometimes involves trying to predict behaviors that occurred before completion
of the scale, such as trying to predict students’ past drinking behaviors from an “attitudes
toward alcohol” scale. The goal in this case is to validate our construct via its association
with similar measures.

Summary: Comparing Reliability and Validity

As we have seen in this section, both reliability (consistency) and validity (accuracy) are
ways to evaluate measured variables and to assess how well these measurements capture
the underlying conceptual variable. In establishing estimates of both of these metrics, we
essentially examine a set of correlations with our measured variables. But while reliability
involves correlating our variables with themselves (e.g., happiness scores at week 1 and
week 4), validity involves correlating our variables with other variables (e.g., our happi-
ness scale with the number of times a person smiles). Figure 2.3 displays the relationships
among types of reliability and validity.

new85743_02_c02_063-102.indd 78 6/18/13 12:17 PM

CHAPTER 2Section

2.3 Scales and Types of Measurement

Figure 2.3: Types of reliability and validity

We learned earlier that reliability is necessary but not sufficient to evaluate measured
variables. That is, reliability has to come first and is an essential requirement for any vari-
able—you would not trust a watch that was sometimes 5 minutes fast and other times
10 minutes slow. If we cannot establish that a measure is reliable, then there is really no
chance of establishing its construct validity because every measurement might be a reflec-
tion of random error. However, just because a measure is consistent does not make it
accurate. Your watch might consistently be 10 minutes fast; your scale might always be 5
pounds under your actual weight. For that matter, your test of intelligence might result in
consistent scores but actually be capturing respondents’ cultural background. Reliability
tells us the extent to which a measure is free from random error. Validity takes the second
step of telling us the extent to which the measure is also free from systematic error.

Finally, it is worth pointing out that establishing validity for a new measure is hard work.
Reliability can be tested in a single step by correlating scores from multiple measures, mul-
tiple items, or multiple judges within a study. But testing the construct validity of a new
measure involves demonstrating both convergent and discriminant validity. In developing
our narcissism scale, we would need to show that it correlated with things like fear of rejec-
tion (convergent) but was reasonably different from things like self-esteem (discriminant).
The latter criterion is particularly difficult to establish because it takes time and effort—and
multiple studies—to demonstrate that one scale is distinct from another. There is, however,
an easy way to avoid these challenges: Use existing measures whenever possible. Before
creating a brand new happiness scale, or narcissism scale, or self-esteem scale, check to see
if one exists that has already gone through the ordeal of being validated.

2.3 Scales and Types of Measurement

As you may remember from prior statistics classes, not all measures are created equal. One of the easiest ways to decrease error variance, and thereby increase our reliability and validity, is to make smart choices when we design and
select our measures. Throughout this book, we will discuss guidelines for each type of
research design and ways to ensure that our measures are as accurate and unbiased as

Reliability
(Consistency)

Test–Retest
Reliability

Interitem
Reliability

Interrater
Reliability

Validity
(Accuracy)

Construct
Validity

Criterion
Validity

Convergent
Validity

Discriminant
Validity

Predictive
Validity

Concurrent
Validity

new85743_02_c02_063-102.indd 79 6/18/13 12:17 PM

CHAPTER 2Section 2.3 Scales and Types of Measurement

possible. In this section, we examine some basic rules that apply across all three types
of design. We first review the four scales of measurement and discuss the proper use of
each one; we then turn our attention to three types of measurement used in psychologi-
cal research studies.

Scales of Measurement

Whenever we go through the process of translating our conceptual variables into measur-
able variables (i.e., operationalization; see Chapter 1, section 1.2), it is important to ensure
that our measurements accurately represent the underlying concepts. We have covered
this process already; in our discussion of validity, you learned that this accuracy is a criti-
cal piece of hypothesis testing. For example, if we develop a scale to measure job satisfac-
tion, then we need to verify that this is actually what the scale is measuring. But there is an
additional, subtler dimension to measurement accuracy: We also need to be sure that our
chosen measurement accurately reflects the underlying mathematical properties of the
concept. In many cases in the natural sciences, this process is automatically precise. When
we measure the speed of a falling object or the temperature of a boiling object, the underly-
ing concepts (speed and temperature) translate directly into scaled measurements. But in
the social and behavioral sciences, this process is trickier; we have to decide carefully how
best to represent abstract concepts such as happiness, aggression, and political attitudes.
As we take the step of scaling our variables, or specifying the relationship between our
conceptual variable and numbers on a quantitative measure, we have four different scales
to choose from, presented below in order of increasing statistical power and flexibility.

Nominal Scales
Nominal scales are used to label or identify a particular group or characteristic. For exam-
ple, we can label a person’s gender male or female, and we could label a person’s religion
Catholic, Buddhist, Jewish, Muslim, or some other religion. In experimental designs, we
can also use nominal scales to label the condition to which a person has been assigned
(e.g., experimental or control groups). The assumption in using these labels is that mem-
bers of the group have some common value or characteristic, as defined by the label. For
example, everyone in the Catholic group should have similar religious beliefs, and every-
one in the female group should be of the same gender.

It is common practice in research studies to represent these labels with numeric codes,
such as using a 1 to indicate females and a 2 to indicate males. However, these numbers
are completely arbitrary and meaningless—that is, males do not have more gender than
females. We could just as easily replace the 1 and the 2 with another pair of numbers or
with a pair of letters or names. Thus, the primary characteristic of nominal scales is that
the scaling itself is arbitrary. This prevents us from using these values in mathematical
calculations. One helpful way to appreciate the difference between this scale and the other
three is to think of nominal scales as qualitative, because they label and identify, and to
think of the other scales as quantitative, because they indicate the extent to which some-
one possesses a quality or characteristic. Let’s turn our attention to these quantitative
scales in more detail.

new85743_02_c02_063-102.indd 80 6/18/13 12:17 PM

CHAPTER 2Section 2.3 Scales and Types of Measurement

Ordinal Scales
Ordinal scales are used to represent ranked orders of conceptual variables. For example,
beauty contestants, horses, and Olympic athletes are all ranked by the order in which
they finish—first, second, third, and so on. Likewise, movies, restaurants, and consumer
goods are often rated using a system of stars (i.e., 1 star is not good; 5 stars is excellent)
to represent their quality. In these examples, we can draw conclusions about the relative
speed, beauty, or deliciousness of the rating target. But the numbers used to label these
rankings do not necessarily map directly onto differences in the conceptual variable.
The fourth-place finisher in a race is rarely twice as slow as the second-place finisher;
the beauty contest winner is not three times as attractive as the third-place finisher;
and the boost in quality between a four-star and a five-star restaurant is not the same
as the boost between a two-star and three-star restaurant. Ordinal scales represent rank
orders, but the numbers do not have any absolute value of their own. Thus, this type of

scale is more powerful than a nomi-
nal scale but still limited in that we
cannot perform mathematical oper-
ations. For example, if an Olympic
athlete finished first in the 800-meter
dash, third in the 400-meter hurdles,
and second in the 400-meter relay,
you might be tempted to calculate
her average finish as being in second
place. Unfortunately, the properties
of ordinal scales prevent us from
doing this sort of calculation because
the distance between first, second,
and third place would be different
in each case. In order to perform any
mathematical manipulation of our
variables, we need one of the next
two types of scale.

Interval Scales
Interval scales represent cases where the numbers on a measured variable correspond to
equal distances on a conceptual variable. Likewise, temperature increases on the Fahren-
heit scale represent equal intervals—warming from 40 to 47 degrees is the same increase
as warming from 90 to 97 degrees. Interval scales share the key feature of ordinal scales—
higher numbers indicate higher relative levels of the variable—but interval scales go an
important step further. Because these numbers represent equal intervals, we are able to
add, subtract, and compute averages. That is, whereas we could not calculate our athlete’s
average finish, we can calculate the average temperature in San Francisco or the average
age of our participants.

Ratio Scales
Ratio scales go one final step further, representing interval scales that also have a true zero
point, that is, the potential for a complete absence of the conceptual variable. Ratio scales
can be used in the case of physical measurements, such as length, weight, and time since

EMPCIS Sport/Associated Press

Olympic athletes are ranked using an ordinal scale.

new85743_02_c02_063-102.indd 81 6/18/13 12:17 PM

CHAPTER 2Section 2.3 Scales and Types of Measurement

it is possible to have a complete absence of any of these. Ratio scales can also be used in
measurement of behaviors since it is possible to have zero drinks per day, zero presses of
a reward button, or zero symptoms of the flu. Temperature in degrees Kelvin is measured
on a ratio scale because 0 Kelvin indicates an absence of molecular motion. (In contrast,
0 degrees Fahrenheit is only a center point on the temperature scale.) Contrast these mea-
surements with many of the conceptual variables featured in psychology research—there
is no such thing as zero happiness or zero self-esteem. The big advantage of having a true
zero point is that it allows us to add, subtract, multiply, and divide scale values. When
we measure weight, for example, it makes sense to say that a 300-pound adult weighs
twice as much as a 150-pound adult. And, it makes sense to say that having two drinks
per day is only ¼ as many as having eight drinks per day.

Summary—Choosing and Using Scales of Measurement
The take-home point from our discussion of these four scales of measurement is two-
fold. First, you should always use the most powerful and flexible scale possible for your
conceptual variables. In many cases, there is no choice; time is measured on a ratio scale
and gender is measured on a nominal scale. But in some cases, you have a bit of freedom
in designing your study. For example, if you were interested in correlating weight with
happiness, you could capture weight in a few different ways. One option would be to
ask people their satisfaction with their current weight on a seven-point scale. However,
the resulting data would be on an ordinal or interval scale (see discussion below), and
the degree to which you could manipulate the scale values would be limited. Another,
more powerful option would be to measure people’s weight on a scale, resulting in ratio
scale data. Thus, whenever possible, it is preferable to incorporate physical or behavioral
measures. But the primary goal is also to represent your data accurately. Most variables
in the social and behavioral sciences do not have a true zero point and must therefore be
measured on nominal, ordinal, or interval scales.

Second, you should always be aware of the limitations of your measurement scale. As dis-
cussed above, these scales lend themselves to different amounts of mathematical manipu-
lation. It is not possible to calculate statistical averages with anything less than an interval
scale and not possible to multiply or divide anything less than a ratio scale. What does
this mean for you? If you have collected ordinal data, you are limited to discussing the
rank ordering of the values (e.g., the critics liked Restaurant A better than Restaurant B). If
you have collected nominal data, you are limited to describing the different groups (e.g.,
numbers of Catholics and Protestants).

One conspicuous gray area for both of these points is the use of attitude scales in the social
and behavioral sciences. If you were to ask people to rate their attitudes about the death
penalty on a seven-point rating scale, would this be an ordinal scale or an interval scale?
This turns out to be a contentious issue in the field. The conservative point of view is
that these attitude ratings constitute only ordinal scales. We know that a 7 indicates more
endorsement than a 3 but cannot say that moving from a 3 to a 4 is equivalent to mov-
ing from a 6 to a 7 in people’s minds. The more liberal point of view is that these attitude
ratings can be viewed as interval scales. This perspective is generally guided by practi-
cal concerns—treating these as equal intervals allows us to compute totals and averages
for our variables. A good guideline is to assume that these individual attitude questions
represent ordinal scales by default. We will return to this issue again in Chapter 4 in our
discussion of creating questionnaire items.

new85743_02_c02_063-102.indd 82 6/18/13 12:17 PM

CHAPTER 2Section 2.3 Scales and Types of Measurement
Types of Measurement

Each of the four scales of measurement can be used across a wide variety of research
designs. In this section, we shift gears slightly and discuss measurement at a more concep-
tual level. The types of dependent measures that are used in psychological research stud-
ies can be grouped into three broad categories: behavioral, physiological, and self-report.

Behavioral Measurement
Behavioral measures are those that involve direct and systematic recording of observable
behaviors. If your research question involves the ways that married couples deal with
conflict, you could include a behavioral measure by observing the way participants inter-
act during an argument. Do they cut one another off? Listen attentively? Express hostil-
ity? Behaviors can be measured and quantified in one of four primary ways, as illustrated
using the scenario of observing married couples during conflict situations:

• Frequency measurements involve counting the number of times a behavior
occurs. For example, you could count the number of times each member of the
couple rolled his or her eyes, as a measure of dismissive behavior.

• Duration measurements involve measuring the length of time a behavior lasts.
For example, you could quantify the length of time the couple spends discussing
positive versus negative topics as a measure of emotional tone.

• Intensity measurements involve measuring the strength or potency of a behav-
ior. For example, you could quantify the intensity of anger or happiness in each
minute of the conflict using ratings by trained judges.

• Latency measures involve measuring the delay before onset of a behavior. For
example, you could measure the time between one person’s provocative state-
ment and the other person’s response.

John Gottman, a psychologist at the University of Washington, has been conducting
research along these lines for several decades (Gottman & Levenson, 1992), observing
body language and interaction styles among married couples as they discuss an unre-
solved issue in their relationship (you can read more about this research and its implica-
tions for therapy on Dr. Gottman’s website, http://www.gottman.com/). What all of these
behavioral measures provide is a nonreactive way to measure the health of a relationship.
That is, the major strength of behavioral responses is that they are typically more honest
and unfiltered than responses to questionnaires. As we will discuss in Chapter 4 (4.1),
people are sometimes dishonest on questionnaires in order to convey a more positive (or
less negative) impression.

This is a particular plus if you are interested in unpopular attitudes, such as prejudice
and discrimination. If you were to ask people the extent to which they disliked members
of other ethnic groups, they might not admit to these prejudices. Alternatively, you could
adopt the approach used by Yale psychologist Jack Dovidio and colleagues and mea-
sure how close people sat to people of different ethnic and racial groups, using this dis-
tance as a subtle and effective behavioral measure of prejudice (see http://www.yale.edu
/intergroup/ for more information). But you may have spotted the primary downside
to using behavioral measures: We end up having to infer the reasons that people behave
as they do. Let’s say European-American participants, on average, sit farther away from

new85743_02_c02_063-102.indd 83 6/18/13 12:17 PM

Home

http://www.yale.edu/intergroup/ for more information

CHAPTER 2Section 2.3 Scales and Types of Measurement

African-Americans than from other European-Americans. This could—and usually
does—indicate prejudice; but, for the sake of argument, the farthest seat from the minor-
ity group member might also be the one closest to the window. In order to understand
the reasons for behaviors, researchers have to supplement the behavioral measures with
either physiological or self-report measurements.

Physiological Measurement
Physiological measures are those that involve quantifying bodily processes, including
heart rate, brain activity, and facial muscle movements. If you were interested in the expe-
rience of test anxiety, you could measure heart rate as people completed a difficult math
test. If you wanted to study emotional reactions to political speeches, you could measure
heart rate, facial muscles, and brain activity as people viewed video clips. The big advan-
tage of these types of measures is that they are the least subjective and controllable. It is
incredibly difficult to control your heart rate or brain activity consciously, making these a
great tool for assessing emotional reactions. However, as with behavioral measures, we
always need some way to contextualize our physiological data.

The best example of this shortcoming is the use of the polygraph, or lie detector, to detect
deception. The lie detector test involves connecting a variety of sensors to the body to
measure heart rate, blood pressure, breathing rate, and sweating. All of these are physi-
ological markers of the body’s fight-or-flight stress response; so the goal is to observe
whether you show signs of stress while being questioned. But here’s the problem: It is
also stressful to worry about being falsely accused. A trained polygraph examiner must
place all of your physiological responses in the proper context. Are you stressed through-
out the exam or only stressed when asked whether you pilfered money from the cash
box? Are you stressed when asked about your relationship with your spouse because you
killed him or because you were having an affair? The examiner has to be extremely care-
ful to avoid false accusations based
on misinterpretations of physiologi-
cal responses.

Self-Report Measurement
Self-report measures are those that
involve asking people to report on
their own thoughts, feelings, and
behaviors. If you were interested in
the relationship between income and
happiness, you could simply ask peo-
ple to report their income and their
level of happiness. If you wanted to
know whether people were satisfied
in their romantic relationships, you
could simply ask them to rate their
degree of satisfaction. The big advan-
tage of these measures is that they
provide access to internal processes.
That is, if you want insight into
why people voted for their favorite

Andy Sacks/Getty Images

A self-report measure might be used to determine why
people voted for a particular political candidate.

new85743_02_c02_063-102.indd 84 6/18/13 12:17 PM

CHAPTER 2Section 2.3 Scales and Types of Measurement

Research: Thinking Critically

Neuroscience and Addictive Behaviors

By Christian Nordqvist

Some people really are addicted to foods in a similar way others might be dependent on certain
substances, like addictive illegal or prescription drugs, or alcohol, researchers from Yale University
revealed in Archives of General Psychiatry (Gearhardt et al., 2011). Those with an addiction-like
behavior seem to have more neural activity in specific parts of the brain in the same way substance-
dependent people appear to have, the authors explained.

It’s a bit like saying that if you dangle a tasty chocolate milkshake in front of a pathological eater,
what goes on in that person’s brain is similar to what would happen if you placed a bottle of scotch
in front of an alcoholic.

The researchers wrote:

One-third of American adults are now obese and obesity-
related disease is the second leading cause of preventable
death. Unfortunately, most obesity treatments do not result
in lasting weight loss because most patients regain their lost
weight within five years. Based on numerous parallels in
neural functioning associated with substance dependence
and obesity, theorists have proposed that addictive processes
may be involved in the etiology of obesity. Food and drug use
both result in dopamine release in mesolimbic regions and the
degree of release correlates with subjective reward from both
food and drug use.

The authors believe that no studies had so far looked into the neural correlates of addictive-like eating
behavior. They explained that some studies had demonstrated that photos of nice food can get the brain’s
reward centers to become more active in much the same way that photos of alcoholic drinks might do
for alcoholics. However, this latest study is the first to tell the food addicts from the just overeaters.

(continued)

political candidate, you could simply ask them. However, as we have suggested already,
people may not necessarily be honest and forthright in their answers, especially when deal-
ing with politically incorrect or unpopular attitudes. We will return to this balance again in
Chapter 4 and discuss ways to increase the likelihood of honest self-reported answers.

There are two broad categories of self-report measures. One of the most common
approaches is to ask for people’s responses using a fixed-format scale, which asks them
to indicate their opinion on a preexisting scale. For example, you might ask people, “How
likely are you to vote for the Republican candidate for president?” on a scale from 1
(not likely) to 7 (very likely). The other broad approach is to ask for responses using a
free-response format, which asks people to express their opinion in an open-ended for-
mat. For example, you might ask people to explain, “What are the factors you consider in
choosing a political candidate?” The trade-off between these two categories is essentially a
choice between data that are easy to code and analyze and data that are rich and complex.
In general, fixed-format scales are used more in quantitative research while free-response
formats are used more in qualitative research.

new85743_02_c02_063-102.indd 85 6/18/13 12:17 PM

CHAPTER 2Section 2.3 Scales and Types of Measurement

Research: Thinking Critically (continued)

Ashley N. Gearhardt, M.S., M.Phil., and team looked at the relation between the symptoms of food
addiction and neural activation. Food addiction was assessed by the Yale Food Addiction Scale, while
neural activation was gauged via functional MRI (magnetic resonance imaging). Forty-eight study
participants responded to cues that signaled the imminent arrival of very tasty food, such as a choco-
late milkshake, compared to a control solution (something with no taste). They also compared what
was going on while they consumed the milkshake compared to the tasteless solution.

The Yale Food Addiction Scale questionnaire identified 15 women with high scores for addiction-like
eating behaviors. All the 48 study participants were young women, ranging in body mass index (BMI)
from lean to obese. They were recruited from a healthy weight maintenance study.

The scientists discovered a correlation between food addiction and greater activity in the amygdala,
the medial orbitofrontal cortex (OFC), and the anterior cingulated cortex (ACC) when tasty food deliv-
ery was known to arrive soon.

Those with high food addiction, the 15 women, showed greater activity in the dorsolateral prefrontal
cortex compared to those with low addiction to foods. They also had reduced activity in the lateral
orbitofrontal cortex while they were eating their nice food.

The authors explained:

As predicted, elevated FA (food addiction) scores were
associated with greater activation of regions that play a role in
encoding the motivational value of stimuli in response to food
cues. The ACC and medial OFC have both been implicated in
motivation to feed and to consume drugs among individuals
with substance dependence.

In sum, these findings support the theory that compulsive food
consumption may be driven in part by an enhanced anticipation
of the rewarding properties of food. Similarly, addicted
individuals are more likely to be physiologically, psychologically,
and behaviorally reactive to substance-related cues.

They concluded:

To our knowledge, this is the first study to link indicators of
addictive eating behavior with a specific pattern of neural
activation. The current study also provides evidence that
objectively measured biological differences are related to
variations in YFAS (Yale Food Addiction Scale) scores, thus
providing further support for the validity of the scale. Further,
if certain foods are addictive, this may partially explain the
difficulty people experience in achieving sustainable weight
loss. If food cues take on enhanced motivational properties
in a manner analogous to drug cues, efforts to change the
current food environment may be critical to successful weight
loss and prevention efforts. Ubiquitous food advertising and
the availability of inexpensive palatable foods may make it
extremely difficult to adhere to healthier food choices because
the omnipresent food cues trigger the reward system. Finally,

(continued)

new85743_02_c02_063-102.indd 86 6/18/13 12:17 PM

CHAPTER 2Section 2.3 Scales and Types of Measurement
Research: Thinking Critically (continued)

if palatable food consumption is accompanied by disinhibition
[loss of inhibition], the current emphasis on personal
responsibility as the antidote to increasing obesity rates may
have minimal effectiveness.

Nordqvist, C. (2011, April 5). Food addiction and substance dependence, similar brain activity going on. Medical News Today.
Retrieved from http://www.medicalnewstoday.com/articles/221233.php

Think about it:

1. Is the study described here descriptive, correlational, or experimental? Explain.
2. Can one conclude from this study that food addiction causes brain abnormalities? Why or why

not?
3. The authors of the study concluded: “The current study also provides evidence that objectively

measured biological differences are related to variations in YFAS (Yale Food Addiction Scale)
scores, thus providing further support for the validity of the scale.” What type(s) of validity are
they referring to? Explain.

4. What types of measures are included in this study (e.g., behavioral, self-report)? What are the
strengths and limitations of these measures in this study?

Choosing a Measurement Type
As you can see from these descriptions, each type of measurement has its strengths and
flaws. So, how do you decide which one to use? This question has to be answered for
every case, and the answer depends on three factors. First, and most obviously, the mea-
sure depends on the research question. If you are interested in effects of public speaking
on stress levels, then the best measures will be physiological. If you are interested in atti-
tudes toward capital punishment, these are better measured using self-reports. Second,
the choice of measures is guided by previous research on the topic. If studies have assessed
prejudice by using self-reports, then you could feel comfortable doing the same. If studies
have measured fear responses using facial expressions, then let that be a starting point
for your research. Finally, a mix of availability and convenience often guides the choice
of measures. Measures of brain activity are a fantastic addition to any research program,
but these measures also require a specialized piece of equipment that can run upwards
of $2 million. As a result, many researchers interested in physiological measures opt for
something less expensive like a measure of heart rate or movement of facial muscles, both
of which can be measured using carefully placed sensors (i.e., on the chest or face).

In an ideal world, a program of research will use a wide variety of measures and designs.
The term for this is converging operations, or the use of multiple research methods to
solve a single problem. In essence, over the course of several studies—perhaps spanning
several years—you would address your research question using different designs, differ-
ent measures, and different levels of analysis. One good example of converging opera-
tions comes from the research of psychologist James Gross and his colleagues at Stanford
University. Gross studies the ways that people regulate their emotional responses and has
conducted this work using everything from questionnaires to brain scans (see http://spl
.stanford.edu/projects.html).

new85743_02_c02_063-102.indd 87 6/18/13 12:17 PM

http://www.medicalnewstoday.com/articles/221233.php

http://spl.stanford.edu/projects.html

CHAPTER 2Section

2.4 Hypothesis Testing

One branch of Gross’s research has examined the consequences of trying to either sup-
press emotions (pretend they’re not happening) or reappraise them (think of them in a
different light) (Gross, 1998; Butler et al., 2003). Suppression is studied by asking people
to hold in their emotional reactions while watching a graphic medical video. Reappraisal
is studied by asking people to watch the same video while trying to view it as a medical
student, thus changing the meaning of what they see. When people try to suppress emo-
tional responses, they experience a paradoxical increase in physiological and self-reported
emotional responses, as well as deficits in cognitive and social functioning. Reappraising
emotions, in contrast, actually works quite well. In another branch of the research, Gross
and colleagues have examined the neural processes at work when people change their
perspective on an emotional event (Goldin, McRae, Ramel, & Gross, 2008). In yet another
branch of the research, they have examined individual differences in emotional responses,
with the goal of understanding why some people are more capable of managing their
emotions than others. Taken together, these studies all converge into a more comprehen-
sive picture of the process of emotion regulation than would be possible from any single
study or method.

2.4 Hypothesis Testing

Regardless of the details of a particular study, be it correlational, experimental, or descriptive, all quantitative research follows the same process of testing a hypoth-esis. This section provides an overview of this process, including a discussion of the
statistical logic, the five steps of the process, and the two ways we can make mistakes dur-
ing our hypothesis test. Some of this may be a review from previous statistics classes, but
it forms the basis of our scientific decision-making process and thus warrants repeating.

The Logic of Hypothesis Testing

In Chapter 1 (Section 1.3, Research Problem and Questions), we discussed several criteria
for identifying a “good” theory, one of which is that our theories have to be falsifiable. In
other words, our research questions should have the ability to be proven wrong under the
right set of conditions. Why is this so important? This will sound counterintuitive at first,
but by the standards of logic, it is more meaningful when data run counter to our theory
than when data support the theory.

Let’s say you predict that growing up in a low-income family puts children at higher risk
for depression. If your data fit this pattern, your prediction might very well be correct.
But it’s also possible that these results are due to a third variable—perhaps low-income
families grow up in more stressful neighborhoods, and stress turns out to increase one’s
depression risk. Or, perhaps your sample accidentally contained an abnormal number of
depressed people. This is why we are always cautious in interpreting positive results from
a single study. But now, imagine that you test the same hypothesis and find that those
who grew up in low-income families show a lower rate of depression. This is still a single
study, but it suggests that our hypothesis may have been off base.

new85743_02_c02_063-102.indd 88 6/18/13 12:17 PM

CHAPTER 2Section 2.4 Hypothesis Testing

Another way to think about this is from a statistical perspective. As we discussed earlier
in this chapter, all measurements contain some amount of random error, which means
that any pattern of data could be caused by random chance. This is the primary reason
that research is never able to “prove” a theory. You’ll also remember from your statistics
class that at the end of any hypothesis test, we will calculate a p value, representing the
probability that our results are due to random chance. Conceptually, this means we are
calculating the probability that we’re wrong rather than the probability that we’re right in
our predictions. And the bigger our effect, the smaller this probability will generally be.
So, as strange as this seems, the ideal result of hypothesis testing is to have a small prob-
ability of being wrong.

This focus on falsifiability carries over to the way we test our hypotheses in that our
goal is to reject the possibility of our results being due to chance. The starting point of a
hypothesis test is to state a null hypothesis, or the assumption that there is no real effect
of our variables in the overall population. This is another way of saying that our observed
patterns of data are due to random chance. In essence, we propose this null in hopes of
minimizing the odds that it is true. Then, as a counterpoint to the null hypothesis, we
propose an alternative hypothesis that represents our predicted pattern of results. In sta-
tistical jargon, the alternative hypothesis represents our predicted deviation from the null.
These alternative hypotheses can be directional, meaning that we specify the direction of
the effect, or nondirectional, meaning that we simply predict an effect.

Let’s say you want to test the hypothesis that people like cats better than dogs. You would
start with the null hypothesis, that people like cats and dogs the same amount (i.e., there’s
no difference). The next step is to state your alternative hypothesis, which in this case is
that people will prefer cats. Because you are predicting a direction (cats more than dogs),
this is a directional hypothesis. The other option would be a nondirectional hypoth-
esis, or simply stating that people’s cat preferences differ from their dog preferences.
(Note that we’ve avoided predicting which one people like better; this is what makes it
nondirectional.)

Finally, these three hypotheses can also be expressed using logical notation, as shown
below. The letter H is used as an abbreviation for “Hypothesis,” and the Greek letter m is
a common abbreviation for the mean, or average.

Conceptual Hypothesis: People like cats better than dogs.

Null Hypothesis: H

0
: m

cat
5 m

dog

the “cat” mean is equal to the “dog” mean;

people like cats and dogs the same

Nondirectional Alternative Hypothesis: H

1
: m

cat
≠ m

dog

the “cat” mean is not equal to the “dog” mean;

people like cats and dogs different amounts

new85743_02_c02_063-102.indd 89 6/18/13 12:17 PM

CHAPTER 2Section 2.4 Hypothesis Testing

Directional Alternative Hypothesis: H
1
: m

cat
. m

dog

the “cat” mean is greater than the “dog” mean;

people like cats more than dogs

Why do we need to distinguish between directional and nondirectional hypotheses? As
you’ll see when we get to the statistical calculations, this decision has implications for our
level of statistical significance. Because we always want to minimize the risk of coming to
the wrong conclusion based on chance findings, we have to be more conservative with a
nondirectional test. This idea is illustrated in Figure 2.4.

Figure 2.4: One-tailed vs. two-tailed hypothesis tests

These graphs represent the probability of obtaining a particular difference between our
groups. The graph on the left represents a simple directional hypothesis—we will be com-
fortable rejecting the null hypothesis if our mean difference is above the alpha cutoff (fig-
ure: This figure shows the difference between one-tailed and two-tailed hypothesis tests.
In a one-tailed test, we predict that our group difference will be above a cutoff score. In
a two-tailed test, we predict that the difference will be either above or below a cutoff
score, usually 5%). The graph on the right, however, represents a nondirectional hypoth-
esis, which simply predicts that one group is higher or lower than the other. Because we
are being less specific, we have to be more conservative. With a directional hypothesis
(also called one-tailed), we predict that the group difference will fall on one extreme of
the curve; with a nondirectional hypothesis (also called two-tailed), we predict that the
group difference will fall on either extreme of the curve. The implication of a two-tailed
hypothesis is that our 5% cutoff could become a 10% cutoff, with 5% on each side. Rather
than double our chance of an error, we follow standard practice and use a 2.5% cutoff on
each side of the curve.

Translation: We need bigger group differences to support our two-tailed, nondirectional
hypotheses. In the cats-versus-dogs example, it would take a bigger difference in ratings
to support the claim that people like cats and dogs different amounts than it would to
support the claim that people like cats more than dogs. The goal of all this statistical and
logical jargon is to place our hypothesis testing in the proper frame. The most important
thing to remember is that hypothesis testing is designed to reject the null hypothesis, and
our statistical tests tell us how confident to be in this rejection.

p(X1 — X2)

X1 — X2

new85743_02_c02_063-102.indd 90 6/18/13 12:17 PM

CHAPTER 2Section 2.4 Hypothesis Testing

Five Steps to Hypothesis Testing

Now that you understand how to frame your hypothesis, what do you do with this infor-
mation? The good news is that you’ve now mastered the first step of a five-step process of
hypothesis testing. In this section, we walk through an example of hypothesis testing from
start to finish, that is, from an initial hypothesis to a conclusion about the hypothesis. In
this fictitious study, we will test the prediction that married couples without children are
happier than those with children in the home. This example is inspired by an actual study
by Harvard social psychologist Dan Gilbert and his colleagues, described in a news arti-
cle at http://www.telegraph.co.uk/news/1941195/Marriage-without-children-the-key-to
-bliss.html. Our hypothesis may seem counterintuitive, but Gilbert’s research suggests that
people tend to both overestimate the extent to which children will make them happy and
underestimate the added stress and financial demands of having children in the house.

Step 1—State the Hypothesis
The first step in testing this hypothesis is to spell it out in logical terms. Remember that
we want to start with the null hypothesis that there is no effect. So, in this case, the null
hypothesis would be that couples are equally happy with and without children. Or, in
logical notation, H

0
: m

children
5 m

no children
(i.e., the mean happiness rating for couples with

children equals the mean happiness rating for couples without children). From there,
we can spell out our alternative hypothesis; in this case, we predict that having chil-
dren will make couples less happy. Because this is a directional hypothesis, it is written
H

1
: m

children
, m

no children
(i.e., the mean happiness rating for couples with children is lower

than the mean happiness rating for couples without children).

Step 2—Collect Data
The next step is to design and conduct a study that will test our hypothesis. We will elabo-
rate on this process in great detail over the next three chapters, but the general idea is the
same regardless of the design. In this case, the most appropriate design would be cor-
relational because we want to predict happiness based on whether people have children.
It would be impractical and unethical to randomly assign people to have children, so an
experimental design is not possible in this case. One way to conduct our study would be
to survey married couples about whether they had children and ask them to rate their cur-
rent level of happiness with the marriage. Let’s say we conduct this experiment and end
up with the data in Table 2.3.

As you can see, we get an average happiness rating of 5.7 for couples without children,
compared to an average happiness rating of 2.0 for couples with children. These groups
certainly look different—and encouraging for our hypothesis—but we need to be sure
that the difference is big enough that we can reject the null hypothesis.

new85743_02_c02_063-102.indd 91 6/18/13 12:17 PM

http://www.telegraph.co.uk/news/1941195/Marriage-without-children-the-key-to-bliss.html

CHAPTER 2Section 2.4 Hypothesis Testing

Table 2.3: Sample data for the “children and happiness” study

No Children Children

7 2

5 3

7 1

5 2

4 4

5 3

6 2

7 1

6 1

5 1

mean 5 5.7 mean 5 2.0

S 5 1.06 S 5 1.05

SE 5 .33 SE 5 .33

Step 3—Calculate Statistics
The next step in our hypothesis test is to calculate statistical tests to decide how confident
we can be that our results are meaningful. As a researcher, you have a wide variety of
statistical tools at your disposal and different ways to analyze all manner of data. These
tools can be broadly grouped into descriptive statistics, which describe the patterns and
distributions of measured variables, and inferential statistics, which attempt to draw
inferences about the population from which the sample was drawn. These inferential sta-
tistics are used to make decisions about the significance of the data. Statistics classes will
cover many of these in detail, and we will cover a few examples throughout this book. All
of these different techniques share a common principle: They attempt to make inference
by comparing the relationship among variables to the random variability of the data. As
we discussed earlier in this chapter, people’s measured levels of everything from happi-
ness to heart rate can be influenced by a wide range of variables. The hope in testing our
hypotheses is that differences in our measurements will primarily reflect differences in the
variables we’re studying. In the current example, we would want to see that differences
in happiness ratings of the married couples were influenced more by the presence of chil-
dren than by random fluctuations in happiness.

One of the most straightforward statistical tests to understand is Student’s t-test, which
is widely used to compare differences in the means of two groups. Because of its simplic-
ity, it is also a great way to demonstrate the hypothesis-testing process. Conceptually, the
t-test compares the difference between two group means with the overall variability in
the data set. The end result is a test of whether our groups differ by a meaningful amount.
Imagine you found a 10-point difference in intelligence test scores between Republicans

new85743_02_c02_063-102.indd 92 6/18/13 12:17 PM

CHAPTER 2Section 2.4 Hypothesis Testing

and Democrats. Before concluding that your favorite party was smarter, you would need
to know how much scores varied on average. If your intelligence test were on a 100-point
scale, with a standard deviation of 5, then your 10-point difference would be interesting
and meaningful. But if you measured intelligence on a 1,000-point scale, with a standard
deviation of 100, then 10 points probably wouldn’t reflect a real difference.

So, conceptually, the t-test is a ratio of the mean difference to the average variability. Math-
ematically, the t-test is calculated like so:

t 5
x1 2 x2
SEpooled

Let’s look at the pieces of this formula individually. First, the xs on top of the line are a
common symbol for referring to the mean, or average, in our sample. Thus the terms x

and x
2
refer to the means for groups 1 and 2 in our sample, or the mean happiness for

couples with children and no children. The term below the line, SE
pooled

, represents our
estimate of variability in the sample. You may remember this term from your statistics
class, but let’s walk through a quick review. One common estimate of variability is the
standard deviation, which represents the average difference between individual scores
and the mean of the group. It is calculated by subtracting each score from the mean, squar-
ing the deviation, adding up these squared deviations, dividing by the sample size, and
taking the square root of the result.

One problem with the standard deviation is that it generally underestimates the variabil-
ity of the population, especially in small samples, because small samples are less likely to
include the full range of population values. So, we need a way to correct our variability
estimate in a small sample. Enter the standard error, which is computed by dividing the
standard deviation by the square root of the sample size. (To save time, these values are
already calculated and presented in Table 2.3.) The “pooled” standard error represents a
combination of the standard errors from our two groups:

SEpooled 5 “SE12 1 SE22 5 “(.33)2 1 (.33)2 5 “.218 5 .47

Our final step is to plug the appropriate numbers from our “children and happiness” data
set into the t-test formula.

t 5
x1 2 x2
SEpooled

5
5.7 2 2

.47
5

3.7
.47

5 7.87

If this all seems overwhelming, stop and think about what we’ve done in conceptual
terms. The goal of our statistical test—the t-test—is to determine whether our groups
differ by a meaningful and significant amount. The best way to do that is to examine the
group difference as a ratio, relative to the overall variability in the sample. When we cal-
culate this ratio, we get a value of 7.87, which certainly seems impressive, but there’s one
more step we need to take to interpret this number.

Step 4—Compare With a Critical Value
What does a 7.87 mean for our hypothesis test? To answer this question, we need to gather
two more pieces of information and then look up our t-test value (i.e., 7.87) in a table. The
first piece of information is the alpha level, representing the probability cutoff for our

new85743_02_c02_063-102.indd 93 6/18/13 12:17 PM

CHAPTER 2Section 2.4 Hypothesis Testing

hypothesis test. The standard alpha level to use is .05, meaning that we want to have less
than a 5% chance of the result being due to chance. In some cases, you might elect to use
an alpha level of .01, meaning that you would only be comfortable with a less than 1%
chance of your results being due to chance.

The second piece of information we need is the degrees of freedom in the data set; this
number represents the sample size and is calculated for a t-test via the formula n 2 2, the
number of couples in our sample minus 2. Think of it as a mathematical correction for
the fact that we are estimating values in a sample rather than from the entire population.
Another helpful way to think of degrees of freedom is as the number of values that are
“free to vary.” In our sample experiment, the no-children group has a mean of 5.7 while
the children group has a mean of 2. Theoretically, the values for 9 of the couples in each
group can be almost anything, but the 10th couple has to have a happiness score that will
yield the correct overall group mean. Thus, of the 20 happiness scores in our experiment,
18 are free to vary, giving us 18 degrees of freedom (i.e., n 2 2).

Armed with these two numbers—18 degrees of freedom and an alpha level of .05—we
turn to a critical value table, which contains cutoff scores for our statistical tests. (You can
find these values for a t-test at http://www.statstodo.com/TTest_Tab.php). The numbers
in a critical value table represent the minimum value needed for the statistical test to be
significant. In this case, with 18 degrees of freedom and an alpha level of .05, we would
need a t-test value of 1.73 for a one-tailed (directional) hypothesis test and a t-test value
of 2.10 for a two-tailed (nondirectional) hypothesis test. (Remember, we have to be more
conservative for a nondirectional test.) In our children and happiness study, we had a
clear directional/one-tailed hypothesis that children would make couples less happy, so
we can legitimately use the one-tailed cutoff score of 1.73. Because our t-test value of 7.87
is unquestionably higher than 1.73, our statistical test is significant. In other words, there
is less than a 5% chance that the difference in happiness ratings is due to chance.

Step 5—Make a Decision
Finally, we are able to draw a conclusion about our experiment. Based on the outcome of
our statistical test (i.e., steps 3 and 4), we will make one of two decisions about our null
hypothesis:

Reject null: decide that the probability of the null being correct is suffi-
ciently small; that is, results are due to differences in groups

Fail to reject null: decide that the probability of the null being correct is too
big; that is, results are due to chance

Because our t-test value was quite a bit higher than the required cutoff value, we can be
confident in rejecting the null hypothesis. And, at long last, we can express our findings in
plain English: Couples with children are less happy than couples without children!

Now that we have walked through this five-step process, it’s time to let you in on a little
secret. When it comes to analyzing your own data, to test your own hypotheses, you will
actually rely on a computer program for part of this process—Steps 3 and 4 in particular.

new85743_02_c02_063-102.indd 94 6/18/13 12:17 PM

http://www.statstodo.com/TTest_Tab.php

CHAPTER 2Section 2.4 Hypothesis Testing

In these modern times, it is rare to compute even a t-test by hand. Software programs such
as SPSS (IBM), SAS/STAT (SAS), and Microsoft Excel can take a table of data, compute the
mean difference, compare it to the variability, and calculate the probability that the results
are due to chance. However, because these calculations happen behind the scenes, it is
very important to understand the process. To draw conclusions about your hypotheses,
you have to understand what a p value and a t-test value mean. By understanding how
the software operates, you can reach informed conclusions about your research questions.
Otherwise, you risk making one of two possible errors in your hypothesis test, discussed
in the next section.

Errors in Hypothesis Testing

In the children and happiness study, we concluded with a reasonable amount of con-
fidence that our hypothesis was supported. But what if we make the wrong decision?
Because our conclusions are based on interpreting probability, there is always a chance
that we will draw the wrong conclusion. In interpreting our hypothesis tests, there are two
potential errors to be made, referred to as Type I and Type II errors.

Type I errors occur when the results are due to chance, but the researcher mistakenly con-
cludes that the effect is significant. In other words, no effect of the variables exists in the
population, but some quirk of the sample makes the effect appear significant. This error
can be viewed as a false positive—you get excited over results that are not actually mean-
ingful. In our children and happiness study, a Type I error would occur if children had no
effect on happiness in the real world, but some quirk of chance made our “no children”
group happier than the “children” group. For example, our sample of childless couples
might accidentally contain a greater proportion of people with happy personalities or
greater job stability or simply more marital satisfaction to start with.

Fortunately, although this error sounds scary, we can generally compute the probability of
making it. Our alpha level sets the bar for how extreme our data must be in order to reject
the null hypothesis. At the end of the statistical calculation, we get a p value that tells us
how extreme the data actually are. When we set an alpha level of, say, .05, we are attempt-
ing to avoid a Type I error; our results will only be statistically significant if the effect
outweighs the random variability by a big-enough amount. If our p value falls below our
predetermined alpha level, we decide that the risk of a Type I error is sufficiently small
and can therefore reject the null hypothesis. If, however, our p value is greater than (or
even equal to) our alpha cutoff, we decide that the risk of Type I error is too high to ignore
and will therefore fail to reject the null hypothesis.

Type II errors occur when the results are significant, but the researcher mistakenly concludes
that they are due to chance. In other words, there actually is an effect of the variables in
the population, but some quirk of the sample makes the effect appear nonsignificant. This
error can be viewed as a false negative—you miss results that actually could have been
meaningful. In our children/happiness experiment, a Type II error would occur if couples
without children really were happier than couples with children but some flaw in the
experiment kept us from detecting the difference. For example, if our measures of happi-
ness were poorly designed, people could interpret the items in a variety of ways, making
it difficult to spot an overall difference between the groups.

new85743_02_c02_063-102.indd 95 6/18/13 12:17 PM

CHAPTER 2Section 2.4 Hypothesis Testing

Fortunately, although this error sounds disappointing, there are some fairly easy ways to
avoid or minimize it. The key factor in reducing Type II error is to maximize the power
of the statistical test, or the probability of detecting a real difference. In fact, power is
inversely related to the probability of a Type II error—the higher the power, the lower the
chance of Type II error. Power is analogous to the sensitivity, or accuracy, of the hypoth-
esis test; it is under the researcher’s control in three main ways. First, as we discussed in
the section “Reliability and Validity,” it is important to make sure that your measures are
capturing what you think they are. If your happiness scale actually captures something
like narcissism, then this will cause problems for your hypothesis about the predictors of
happiness. Second, it is important to be careful throughout the process of coding and ana-
lyzing your data. Small mistakes can occur at every step, from entering data, to calculat-
ing scale totals, to choosing an inappropriate analysis. And third, statistical tests generally
have more power when there is a larger sample. We will discuss each of these factors in
more detail as we move through the course.

Summary of Correct and Incorrect Decisions
In the real world, at the level of the entire population our null hypothesis is either true or
false. That is, if we could test our hypothesis using every married couple in the world, we
could say with 100% certainty whether or not the hypothesis was true. However, in each
individual study, at the level of our sample, we have to either reject the null or fail to reject
it. In the top left and bottom right cells, we make the right decision—either rejecting a null
hypothesis that is false or failing to reject one that is true in the population. Table 2.4 sum-
marizes the four possible outcomes of a decision about a hypothesis test. In the bottom left
cell of the table, we make a Type I error, rejecting a null hypothesis that is actually true,
and mistakenly thinking our hypothesis is supported (i.e., a false positive). In the top right
cell of the table, we make a Type II error, failing to reject a null hypothesis that is actually
false, and mistakenly thinking our hypothesis should be rejected (i.e., a false negative).

Table 2.4: Errors and correct decisions in hypothesis testing

Researcher’s Decision

Reject Null Fail to Reject Null

Null is FALSE Correct Decision Type II Error

Null is TRUE Type I Error Correct Decision

In Chapter 1 (Section 1.3), we covered the process of drawing conclusions about “proof”
and “disproof,” suggesting that neither one is ever possible in a single study. Now that we
have covered the hypothesis testing process, you should have a better grasp of the reason-
ing behind our rules regarding proof and disproof. The reality is that Type I and Type II
errors are possible in every research study. Rejecting the null hypothesis in one study does
not automatically mean that it is false, only that the null hypothesis could not explain the
pattern of data in the study. And failing to reject the null in one study does not automati-
cally mean that it is true, only that the pattern of data in the study does not support reject-
ing it. Science accumulates knowledge over the course of several related studies. It is only
when these studies start to suggest the same conclusion that we can feel more confident in
our decisions about the status of the null hypothesis.

new85743_02_c02_063-102.indd 96 6/18/13 12:17 PM

CHAPTER 2Section 2.4 Hypothesis Testing

Effect Size

So far, our discussion about hypothesis testing has been focused on statistical significance,
and we have been concerned with the probability that our results might be due to random
chance. But there’s an additional piece to the puzzle of interpreting results. Imagine that
you have been placed in charge of testing a new drug that might help cure depression. You
might start by collecting a large sample of depressed patients, giving half of them the new
drug and half of them a placebo. Now imagine that the new drug reduced symptoms by
20%, compared to a 10% reduction with the placebo. Is this effect big enough to get excited
about? If the new drug costs twice as much as existing ones, is it worth recommending?
These questions revolve around the issue of effect size, a statistic used to represent the
size, or magnitude, of an effect.

There are several ways to calculate effect size, but in general, bigger values mean a stron-
ger effect. One of these statistics, Cohen’s d, is calculated as the difference between two
means divided by their pooled standard deviation. The resulting values can therefore be
expressed in terms of standard deviations; a d of 1 indicates that the means are one stan-
dard deviation apart. How big should we expect our effects to be? Based on his analyses
of typical effect sizes in the social sciences, Cohen suggests the following benchmarks:
d 5 .20 is a small effect; d 5 .40 is a moderate effect; and d 5 .60 is a large effect. In other
words, a large effect in social and behavioral sciences accounts for a little over half of a
standard deviation. For comparison purposes, the effect of the polio vaccine on reducing
polio symptoms was a d 5 2.72 (almost three standard deviations; Oshinsky, 2006). In our
children and happiness study, we get a d 5 3.82, but fake data are always more impressive
than real data.

Effect size is useful in two primary ways. First, at the end of an experiment, we can cal-
culate the exact size of the effect in our particular sample. This is a useful supplement to
our test of statistical significance because it is more independent of sample size. If we fail
to reject the null hypothesis in a small sample, the effect size might tell us whether the
effect is big enough to test again with a larger sample. And, if we support our research
hypothesis, the effect size provides valuable information about the usefulness of our find-
ings. Imagine you test two different diabetes drugs in two different studies. Let’s say both
show a statistically significant reduction in symptoms, but Drug A has an effect size of
d 5 .50, and Drug B has an effect size of d 5 2.5. This tells us that Drug B has a larger effect
and could therefore have a bigger benefit for diabetes patients.

The second use for effect size is in deciding on our sample size before the study begins.
We learned earlier that our statistical tests generally have more power in a larger sample
size. So why not run 10,000 participants in every single research study? The problem is
that participants take time, money, and other resources, and not every study needs 10,000
people to detect an effect. Rather than striving for perfect power in every study, research-
ers usually compromise and hope for 80% power, which equates to only a 20% chance
of Type II error. It turns out that we also have more power when the underlying effect is
larger. Thus, we can take our estimates of effect size and determine the number of people
we need to achieve at least 80% power.

The best way to perform these calculations is by using any of the power calculators avail-
able over the Internet, such as the one found here: http://www.stat.ubc.ca/~rollin/stats
/ssize/n2.html. Try entering the values from our children and happiness study, plus the

new85743_02_c02_063-102.indd 97 6/18/13 12:17 PM

http://www.stat.ubc.ca/~rollin/stats/ssize/n2.html

CHAPTER 2

Summary

pooled standard deviation of 1.25. This should result in the previously mentioned d of
3.82. According to this calculator, we would only need two people per group to detect this
effect in a future study—much cheaper and easier than 10,000!

Summary

In this chapter, we have covered several basic principles of research design and empha-sized the importance of ensuring that our study uses the best and most accurate mea-sures available. We first examined the four main types of research design: qualitative,
descriptive, correlational, and experimental. These designs increase the amount of control
that a researcher has. Qualitative designs can lead to in-depth interpretations, verifica-
tions, and evaluations of phenomena, such as personal experiences, events, and behav-
iors. However, their drawbacks include difficulty making comparisons, difficulty making
assumptions beyond the sample being studied, and being very time-consuming. Descrip-
tive designs can provide rich descriptions of various phenomena, from brain tumors to
voting preferences, but are unable to delve into why these things happen. Correlational
designs allow us to predict variables from other variables but are still unable to identify a
causal relationship. This limitation in correlational designs occurs for two reasons: We do
not know the direction of the relationship; and it is always possible that a third variable is
causing both of them. Finally, experimental designs allow us to state with some certainty
that one variable causes another because these designs involve systematically testing the
impact of variables while controlling the environment. The downside of experimental
designs is that they often have to sacrifice some realism in order to establish control.

We focused on the importance of the accuracy and consistency of measures. In every
research study, you start with an abstract variable and operationalize it into a measured
variable. “Happiness” becomes a seven-point happiness scale; “time” becomes the read-
ing on a stopwatch; and so on. As a researcher, your job is to evaluate the extent to which
these measured variables capture the underlying concepts. One metric for evaluating this
is the reliability, or consistency of the measures. Measures are more reliable when they
are free from random error; we can assess this by comparing multiple measures within
the study. A second metric is the validity, or accuracy of the measures. Measures are more
valid when they are free from systematic error, meaning that they measure what they
claim to measure. Validity is generally assessed by examining correlations with other
measures, either to test the theoretical construct or to predict a behavioral criterion.

We next discussed the different options for scaling and measuring variables. In addition
to ensuring the accuracy and consistency of measures, it is critical to use a scaling method
that matches the mathematical properties of the variable. Nominal scales represent arbi-
trary labels for categories; ordinal scales represent rank ordering of values; interval scales
represent scales with equal intervals; and ratio scales represent variables with true zero
points. As a researcher, you should use the most powerful scale available—for example,
by using behavioral counts rather than labels when possible. But you also have to be
aware of the limitations of the scale that you choose. While ratio scale values can be added,
subtracted, divided, and multiplied, ordinal scale values cannot be manipulated. We also
discussed three primary types of measurement. Behavioral measures involve observa-
tion and systematic recording of behavior; self-report measures involve asking people to

new85743_02_c02_063-102.indd 98 6/18/13 12:17 PM

CHAPTER 2

Key Terms

alpha level Predetermined probability
cutoff for a hypothesis test; usually set as
p , .05.

alternative hypothesis The predicted pat-
tern of results or predicted deviation from
the null.

behavioral measure A type of measure
that involves direct and systematic record-
ing of observable behaviors.

Cronbach’s alpha The average correlation
between each pair of items on the measure;
used to calculate an estimate of interitem
reliability.

Cohen’s d An effect size measure calcu-
lated as the difference between two means
divided by their pooled standard devia-
tion; the resulting values are expressed in
terms of standard deviations.

concurrent validity The extent to which
a self-report measure is able to predict a
behavioral measure collected at the same
time.

construct validity An assessment of how
well the measures capture the underly-
ing conceptual ideas (i.e., constructs) in a
study.

continuum of control A framework
for organizing and discussing research
designs in terms of the amount of control
the researcher has over the design.

control group Group that provides a basis
of comparison for the experimental group.

convergent validity The extent to which
a measure overlaps conceptually similar
measures.

converging operations The use of mul-
tiple research methods to solve a single
problem.

correlational research Research designed
to predict thoughts, feelings, or behaviors.

criterion validity An assessment of valid-
ity based on the association between mea-
sures and relevant behavioral outcomes.

report their own thoughts; and physiological measures involve measurements of bodily
processes. Because each approach has advantages and disadvantages, many researchers
use converging operations over the course of a research program, making use of all three
in order to address a broad question.

Finally, this chapter discussed the process of hypothesis testing. Regardless of the question
asked, the design used, and the way data are measured, all quantitative studies involve
the same process of testing hypotheses using statistical results. We covered this process
in five steps: (1) Lay out the null and alternative hypotheses; (2) collect data; (3) calculate
the appropriate statistics; (4) compare statistical results to a critical value; and (5) make
a decision about the original hypothesis. Despite our best efforts, a hypothesis test occa-
sionally leads to incorrect conclusions. A Type I error occurs when the researcher rejects
the null but shouldn’t have; a Type II error occurs when the researcher fails to reject the
null but could have under better conditions. As we will discuss in later chapters, you can
reduce the odds of both errors through careful research design and analysis. In the next
three chapters, we will cover the specifics of the three types of research design: descriptive
(Chapter 3), correlational (Chapter 4), and experimental (Chapter 5).

Key Terms

new85743_02_c02_063-102.indd 99 6/18/13 12:17 PM

100

CHAPTER 2Key Terms

critical value table A table containing
cutoff scores for statistical tests.

degrees of freedom A number represent-
ing sample size; calculated for a t-test via
the formula n – 2 (the number of people in
a sample minus 2 variables); the number of
values that are “free to vary.”

descriptive research Research designed to
describe thoughts, feelings, or behaviors.

descriptive statistics Statistics that
describe the patterns and distributions of
measured variables.

directional hypothesis Alternative
hypothesis that specifies the direction of
the effect; also called a one-tailed test.

directionality problem Limitation of
correlational research; when we mea-
sure two variables at the same time, we
have no way of knowing the cause of the
relationship.

discriminant validity The extent to
which a measure diverges from unrelated
measures.

duration The length of time a behavior
lasts.

effect size A statistic that represents the
size, or magnitude, of an effect.

experimental group Group that receives
the treatment of interest that provides the
test of the hypothesis.

experimental research Research designed
to explain thoughts, feelings, and behav-
iors and to make causal statements.

face validity The extent to which a mea-
sure seems like a good measure of the
construct.

fixed-format A response format for self-
report measures that asks people to indi-
cate their opinions on a preexisting scale.

free-response A response format for
self-report measures that asks people to
express their opinions in an open-ended
format.

frequency The number of times a behav-
ior occurs.

inferential statistics Statistics that attempt
to draw inferences about the population
from which a sample was drawn.

intensity The strength or potency of a
behavior.

interitem reliability The internal con-
sistency among different questions on a
questionnaire measure.

interrater reliability The consistency
among judges’ observations of partici-
pants’ behavior.

interval scale A scaling method used to
represent cases where the numbers on a
measured variable correspond to equal
distances on a conceptual variable.

latency The length of delay before onset of
a behavior.

negative correlation Relationship
between two variables such that higher
values of one variable predict lower values
of the other variable.

nominal scale A scaling method used
to label or identify a particular group or
characteristic.

nondirectional hypothesis Alternative
hypothesis that predicts only an effect,
without specifying its direction; also called
a two-tailed test.

new85743_02_c02_063-102.indd 100 6/18/13 12:17 PM

101

CHAPTER 2Key Terms

null hypothesis The assumption that
there is no real effect of variables in the
overall population.

ordinal scale A scaling method used
to represent ranked order of conceptual
variables.

physiological measure A type of measure
that quantifies bodily processes, including
heart rate, brain activity, and facial muscle
movements.

positive correlation Relationship between
two variables such that higher values of
one variable predict higher values of the
other variable.

power The probability of detecting a real
difference; inversely related to the prob-
ability of a Type II error.

predictive validity The extent to which
a self-report measure is able to predict a
behavioral outcome.

p value A statistic representing the prob-
ability that results are due to random
chance.

random error Chance fluctuations in the
measurements.

ratio scale A scaling method used to rep-
resent interval scales that also have a true
zero point, that is, a complete absence of
the conceptual variable.

reliability (quantitative) Consistency of
measurement; the extent to which a mea-
sured variable is free from random errors.

research design The specific method used
to collect, analyze, and interpret data.

scaling The process of specifying the
relationship between a conceptual variable
and numbers on a quantitative measure.

self-report measure a type of measure
that involves asking people to report their
thoughts, feelings, and behaviors.

standard deviation An estimate of
variability that represents the average
deviation from the mean of the group;
calculated by subtracting each score from
the mean, adding up those differences,
and dividing by the number of scores.

standard error An estimate of variability
that is computed by dividing the standard
deviation by the square root of the sample
size.

Student’s t-test An inferential statistic
used to compare differences in the means
of two groups; calculated as a ratio of
mean difference to average variability.

systematic errors Errors that systemati-
cally increase or decrease values on the
measured variable.

test–retest reliability The consistency of
the measure at different time points.

third variable problem limitation of
correlational research; when we measure
two variables as they naturally occur,
there is the possibility of a third variable
that causes both of them.

Type I error A hypothesis testing error
that occurs when results are due to chance
but the conclusion mistakenly states that
the effect is significant; a false positive.

Type II error A hypothesis testing error
that occurs when the results are significant
but the conclusion mistakenly states that
they are due to chance; a false negative.

validity The accuracy of measurements;
the extent to which they accurately mea-
sure what they are designed to measure
and are free from systematic error.

new85743_02_c02_063-102.indd 101 6/18/13 12:17 PM

102

CHAPTER 2

Critical Thinking & Discussion Questions

Apply Your Knowledge

1. For each of the following research questions, tell whether the most appropriate
strategy involves descriptive, correlational, or experimental research.
a. Are students more likely to cheat on exams in their first or last year of

college?
b. Does writing about a traumatic experience result in better health?
c. What personality variables predict success in school?

2. Dr. Blutarsky is interested in predicting the link between poor parenting and
teen alcohol abuse. To investigate this, he has parents fill out questionnaires
about their parenting styles and then waits to see how likely their children are
to abuse alcohol.
a. Identify the independent and dependent variables in this study.

independent:
dependent:

b. What type of research design is Dr. Blutarsky using?
c. Give an operational definition of “poor parenting” and “alcohol abuse”

poor parenting:
alcohol abuse:

3. For each of the following, identify the scale of measurement:
a. placing children in gifted and special needs programs based on ability
b. an “attitudes toward the president” scale, measured from 1 to 7
c. height measured in inches
d. the number of drinks consumed per day

4. For each of the following abstract concepts, suggest a way to measure it using a
behavioral and self-report measure:

Behavioral Self-Report

Conformity

Enjoyment of
Reading

Leadership Ability

Paranoia

Independence

Critical Thinking & Discussion Questions

1. Can a measure be reliable but not valid? Explain why or why not.
2. Explain the trade-off between Type I and Type II errors. Why might attempts to

minimize one of these inflate the other?

new85743_02_c02_063-102.indd 102 6/18/13 12:17 PM

VisitBritain/Jason Knott/Getty Images

chapter 1

Psychology as a Science

Chapter Contents

• Research Areas in Psychology
• Scientific Thinking and Paths to Knowledge
• Research Problem and Questions
• Hypotheses and Theories
• Searching the Literature
• Writing a Research Proposal
• Ethics in Research

CO_

new85743_01_c01_001-062.indd 1 6/18/13 11:55 AM

CHAPTER 1

Introduction

In an article in Wired magazine, journalist Amy Wallace described her visit to the annual conference sponsored by Autism One, a nonprofit group organized around the belief that autism is caused by mandatory childhood vaccines:
I flashed more than once on Carl Sagan’s idea of the power of an “unsatis-
fied medical need.” Because a massive research effort has yet to reveal the
precise causes of autism, pseudoscience has stepped into the void. In the
hallways of the Westin O’Hare hotel, helpful salespeople strove to catch
my eye . . . pitching everything from vitamins and supplements to gluten-
free cookies . . . hyperbaric chambers, and neuro-feedback machines.

(Wallace, 2009, p. 134)

The “pseudoscience” to which Wallace refers is the claim that vaccines generally do more
harm than good and specifically cause children to develop autism. In fact, an extensive
statistical review of epidemiological studies, including tens of thousands of vaccinated
children, found no evidence of a link between vaccines and autism. But something about
this phrasing doesn’t sit right with many people; “no evidence” rings of scientific mumbo
jumbo, and a “statistical review” pales in comparison with tearful testimonials from par-
ents that their child developed autistic symptoms shortly after being vaccinated. The
reality is this: Research tells us that vaccines bear no relation to autism, but people still
believe that they do. Because of these beliefs, increasing numbers of parents are forgoing
vaccinations, and many communities are seeing a loss of herd immunity and a resurgence
of rare diseases including measles and mumps.

So what does it mean to say that “research” has reached a conclusion? Why should we
trust this conclusion over a parent’s personal experience? One of the biggest challenges
in starting a course on research methods is learning how to think like a scientist—that is,
to frame questions in testable ways and to make decisions by weighing the evidence. The
more personal these questions become, and the bigger their consequences, the harder it
is to put feelings aside. But, as we will see throughout this course, it is precisely in these
cases that listening to the evidence becomes most important.

There are several reasons to understand the importance of scientific thinking, even if you
never take another psychology course. First, at a practical level, critical thinking is an
invaluable skill to have in a wide variety of careers and in all areas of life. Employers of
all types appreciate the ability to reason through the decision-making process. Second,
understanding the scientific approach tends to make you a more skeptical consumer of
news reports. If you read in Newsweek that the planet is warming, or cooling, or staying
the same temperature, you will be able to decipher and evaluate how the author reached
this conclusion and possibly reach a different one on your own. Third, understanding sci-
ence makes you a more informed participant in debates about public policy. If we want to
know whether the planet is truly getting warmer, this conclusion should come from care-
fully weighing the scientific evidence rather than trusting the loudest pundit on a cable
news network.

TX_

new85743_01_c01_001-062.indd 2 6/18/13 11:55 AM

CHAPTER 1Introduction

Research: Making an Impact

The Vaccines and Autism Controversy

In a 1998 paper published in the well-respected medical journal The Lancet, British physician Andrew
Wakefield and his colleagues studied the link between autism symptoms and the measles, mumps,
and rubella (MMR) vaccine in a sample of twelve children (Wakefield et al., 1998). Based on a review
of these cases, the authors reported that all twelve experienced adverse effects of the vaccine,
including both intestinal and behavioral problems. The finding that grabbed the headlines was the
authors’ report that nine of the twelve children showed an onset of autism symptoms shortly after
they received the MMR vaccine.

Immediately after the publication of this paper, the scientific community criticized the study for its
small sample and its lack of a comparison group (i.e., children in the general population). Unfor-
tunately, it turned out these issues were only the tip of the iceberg (Godlee, Smith, & Marcovitch,
2011). The British journalist Brian Deer conducted an in-depth investigation of Wakefield’s study and
discovered some startling information (Deer, 2004). First, the study had been funded by a law firm
that was in the process of suing the manufacturers of the MMR vaccine, resulting in a real threat
to the researchers’ objectivity. Second, there was clear evidence of scientific misconduct; the data
had been falsified and altered to fit Wakefield’s hypothesis—many of the children had shown autism
symptoms before receiving the vaccine. In his report, Deer stated that every one of the twelve cases
showed evidence of alteration and misrepresentation.

Ultimately, the Lancet withdrew the article in 2010, effectively removing it from the scientific record
and declaring the findings no longer trustworthy. But in many respects, the damage was already
done. Vaccination rates in Britain dropped to 80% following publication of Wakefield’s article, and
these rates remain below the recommended 95% level recommended by the (continued)

Where does psychology fit into this picture? Objectivity can be a particular challenge in
studying our own behavior and mental processes because we are intimately familiar with
the processes we are trying to understand. The psychologist William C. Corning captured
this sentiment over 40 years ago: “In the study of brain functions we rely upon a biased,
poorly understood, and frequently unpredictable organ in order to study the properties of
another such organ; we have to use a brain to study a brain” (Corning, 1968, p. 6). (Or, in
the words of comedian Emo Philips, “I used to think that the brain was the most wonder-
ful organ in my body. Then I realized who was telling me this.”) The trick, then, is learning
to take a step back and apply scientific thinking to issues you encounter and experience
every day.

This textbook provides an introduction to the research methods used in the study of psy-
chology. It will introduce you to the full spectrum of research designs, from observing
behavior to manipulating conditions in a laboratory. We will cover the key issues and
important steps for each type of design, both qualitative and those that observe, predict,
and explain behavior, as well as the analysis strategies most appropriate for each type. In
this chapter, we begin with an overview of the different areas of psychological science.
We then introduce the research process by discussing the key features of the scientific
approach and then cover the process of forming testable research questions, developing
hypotheses and theories, and searching the literature. In the final two sections, we cover
writing a research proposal and discuss the importance of adhering to ethical principles
at all stages of the research.

new85743_01_c01_001-062.indd 3 6/18/13 11:55 AM

CHAPTER 1Section

1.1 Research Areas in Psychology

Psychology is a diverse discipline, encompassing a wide range of approaches to ask-ing questions about why people do the things that they do. The common thread among all of these approaches is the scientific study of human behavior. So, while
psychology might not be the only field to speculate on the causes of human behav-
ior—philosophers have been doing this for millennia—psychology is distinguished by
its reliance on the scientific method to draw conclusions. We will examine the meaning
and implications of this scientific perspective later in the chapter. In this section, we dis-
cuss the major content areas within the field of psychology, along with samples of the
types of research questions asked by each one. For further reading about these areas, the
American Psychological Association (APA) has an excellent collection of web resources:
http://www.apa.org/topics/index.aspx.

Biopsychology

Biopsychology, as the name implies, combines research questions and techniques from
both biology and psychology. It is typically defined as the study of connections between
biological systems (including the brain, hormones, and neurotransmitters) and our
thoughts, feelings, and behaviors. As a result, the research conducted by biopsycholo-
gists often overlaps research in other areas—but with a focus on biological processes.
Biopsychologists are often interested in the way interactions between biological systems
and thoughts, feelings, and behaviors impact the ability to treat disease, as seen in the
following questions: What brain systems are involved in the formation of memories?
Can Alzheimer’s be cured or prevented through early intervention? How does long-term
exposure to toxins such as lead impact our thoughts, feelings, and behaviors? How easily
can the brain recover after a stroke?

Research: Making an Impact

(continued)

World Health Organization (Godlee et al., 2011). That is, even though the article was a fraud, it
made parents afraid to vaccinate their children. Vaccinations work optimally when most members
of a community get the vaccines because this minimizes the opportunity for an outbreak. When
even a small portion refuses to vaccinate their children, the entire community is at risk of infection
(National Institute of Allergy and Infectious Diseases, n.d.). Thus, it should be no surprise that many
communities are seeing a resurgence of measles, mumps, and rubella: In 2008, England and Wales
declared measles to be a prevalent problem for the first time in 14 years (Godlee et al., 2011).

This scenario highlights the importance of conducting science honestly. While disease outbreaks are
the most obvious impact of Wakefield’s fraud, they are not the only one. In a 2011 editorial in the
British Medical Journal condemning Wakefield’s actions, British doctor Fiona Godlee and colleagues
captured this rather eloquently:

But perhaps as important as the scare’s effect on infectious
disease is the energy, emotion, and money that have been
diverted away from efforts to understand the real causes of
autism and how to help children and families who live with it.
(p. 7452)

new85743_01_c01_001-062.indd 4 6/18/13 11:55 AM

http://www.apa.org/topics/index.aspx

CHAPTER 1Section 1.1 Research Areas in Psychology

In one example of this approach, Kim
and colleagues (2010) investigated
changes in brain anatomy among new
mothers for the first 3 months fol-
lowing delivery. These authors were
intrigued by the numerous changes
new mothers undergo in attention,
memory, and motivation; they specu-
lated that these changes might be asso-
ciated with changes in brain structure.
As expected, new mothers showed
increases in gray matter (i.e., increased
complexity) in several brain areas
associated with maternal motivation
and behavior. And, the more these
brain areas developed, the more posi-
tively these women felt toward their
newborn children. Thus, this study
sheds light on the potential biologi-
cal processes involved in the mother–
infant bond.

Cognitive Psychology

Whereas biopsychology focuses on studying the brain, cognitive psychology studies the
mind. It is typically defined as the study of internal mental processes, including the ways
that people think, learn, remember, speak, perceive, and so on. Cognitive psychologists
are interested primarily in the ways that people navigate and make sense of the world,
including questions such as the following: How do our minds translate input from the
five senses into a meaningful picture of the world? How do we form memories of emo-
tional versus mundane experiences? What draws our attention in a complex environ-
ment? What is the best way to teach children to read?

In one example of this approach, Foulsham, Cheng, Tracy, Henrich, & Kingstone (2010)
were interested in what kinds of things people pay attention to in a complex social scene.
The world around us is chock-full of information, but we can pay attention only to a rela-
tively thin slice of it. Foulsham and colleagues were particularly interested in where our
attention is directed when we observe groups of people. They answered this question by
asking people to watch videos of a group discussion and using tools to track eye move-
ments. It turned out that people in this study spent most of their time looking at the most
dominant member of the group, suggesting that we are wired to pay attention to those in
positions of power. Thus, this study sheds light on one of the ways that we make sense
out of the world.

Developmental Psychology

Developmental psychology is defined as the systematic study of physical, social, and
cognitive changes over the human life span. Although this field initially focused on

George Doyle/Stockbyte/Thinkstock

A study investigating changes in the brain anatomy of new
mothers explores the connection between a biological
system and the emotions, thoughts, and behaviors
involved in caring for a newborn child.

new85743_01_c01_001-062.indd 5 6/18/13 11:55 AM

CHAPTER 1Section 1.1 Research Areas in Psychology

childhood development, many researchers now study changes and key stages over the
entire life span. Developmental psychologists study a wide range of phenomena related
to physical, social, and cognitive change, including, How do children bond with their
primary caregiver(s)? What are our primary needs and goals at each stage of life? Why
do some cognitive skills decline in old age? At what ages do infants develop basic motor
skills?

In one example of this approach, Hill and Tyson (2009) explored the connection between
children’s school achievement and their parents’ involvement with the school. In other
words, Do children perform better when their parents are actively involved in school
activities? The authors addressed this question by combining results from several studies
into one data set. Across 50 studies, the answer to this question was yes—children do bet-
ter in school if their parents are involved. Thus, this study sheds light on a key predictor
of academic achievement during an important developmental period.

Social Psychology

Social psychology attempts to study behavior
in a broader social context. It is typically defined
as the study of the ways our thoughts, feelings,
and behaviors are shaped by other people. As
you might imagine, this broad perspective allows
social psychologists to tackle a wide range of
research questions, including the following: What
kinds of things do we look for in selecting roman-
tic partners? Why do people stay in bad relation-
ships? How do other people shape our sense of
who we are? When and why do people help in
emergencies?

Norman Triplett conducted the first published
social psychology study at the end of the 19th
century (Triplett, 1898). Triplett had noticed that
professional cyclists tended to ride faster when
racing against other cyclists than when compet-
ing in solo time trials. He tested this observation
in a controlled laboratory setting, asking people
to do a number of tasks either alone or next to
another person. His results (and countless other
studies since) revealed that people worked faster
in groups, suggesting that other people can have
definite and concrete influences on our behavior.

Clinical Psychology

Finally, the area of clinical psychology is an applied field focused on understanding
the best ways to treat psychological disorders. It is typically defined as the study of best
practices for understanding, treating, and preventing distress and dysfunction. Clinical

Thomas Northcut/Photodisc/Thinkstock

Social psychologist Norman Triplett’s study
of cyclists led to conclusions about how
people influence one another.

new85743_01_c01_001-062.indd 6 6/18/13 11:55 AM

CHAPTER 1Section

1.2 Scientific Thinking and Paths to Knowledge

psychologists engage in both the assessment and the treatment of psychological disor-
ders, as seen in the following research questions: What is the most effective treatment for
depression? How can we help people overcome posttraumatic stress disorder following a
traumatic event? Should anxiety disorders be treated with drugs, therapy, or a combina-
tion? What is the most reliable way to diagnose schizophrenia?

One example of this approach is found in a study by Kleim and Ehlers (2008), which
attempted to understand the risk factors for posttraumatic stress disorder, a prolonged
reaction to a severe traumatic experience. Kleim and Ehlers found that assault victims
who tend to form less specific memories about life in general might be more likely to
develop a disorder in response to trauma than victims who tend to form detailed memo-
ries. People who tend to form vague memories may have fewer resources to draw on in
trying to reconnect with their daily life after a traumatic event. Thus, this study sheds light
on a possible pathway contributing to the development of a psychological disorder.

1.2 Scientific Thinking and Paths to Knowledge

One of the easiest ways to understand the scientific approach is to contrast it with other ways of understanding the world. While science offers us the most objective and rigorous approach to decision making, it is by no means the only approach.
Some of the following paths to knowledge have been popular and acceptable during dif-
ferent historical periods. Other approaches are currently in use by different academic dis-
ciplines. To showcase the distinctions among them, the following examples illustrate how
each perspective might approach the link between vaccines and autism.

Authority

In a number of contexts, people understand the world based on what authority figures tell
them. Parents dictate curfews to children; cities assign speed limits within their borders;
and churches interpret the meaning of holy texts. In each case, the rules and knowledge
are accepted because there is trust in the source of the knowledge. In the debate over
vaccines and autism, this perspective would be evident in those who trust their doctor’s
advice to vaccinate their children. It would also be evident in those who trust celebrity
spokesperson Jenny McCarthy’s testimony that vaccines gave her son autism.

Phenomenology

Many academic disciplines take a phenomenological approach to studying the world
around us. This approach focuses on each individual’s intuition and subjective experience
and treats truth as a subjective concept. In other words, if you believe that your alcoholism
stems from a bad relationship with your father, there is some “truth” to this belief (regard-
less of the objective truth). In the debate over vaccines and autism, this perspective would
be evident in those who are swayed by a parent’s testimony, despite all evidence to the
contrary. If Jenny McCarthy believes vaccines gave her child autism, then there must be
some “truth” to her belief.

new85743_01_c01_001-062.indd 7 6/18/13 11:55 AM

CHAPTER 1Section 1.2 Scientific Thinking and Paths to Knowledge

Rationalism

For several centuries, scientific inquiry was guided by a rationalist approach, and this
approach is still dominant in many of the humanities disciplines. Rationalism involves
making decisions based on logical arguments; if something “makes sense,” it must be the
right answer. In the debate over vaccines and autism, this perspective would be evident
in the narrowly constructed argument that because autism symptoms appear shortly after
vaccination, vaccines must be the cause. (This reasoning ignores the rules about the kinds
of evidence needed to make statements about causation, which we will cover in later
chapters.)

Empiricism

The scientific approach, which is our
focus in this book, makes decisions
based on evidence. This approach,
also called empiricism, focuses on
the role of observation and sensory
experience over the role of reason
and logic alone. It is all well and
good to come up with a creative idea
about how the world works, but this
idea does not carry scientific weight
until it has been supported through
carefully collected observations of
the world around us. These obser-
vations form the basis of science,
which set it apart from the other
paths to knowledge. In the debate
over vaccines and autism, scien-
tific evidence leads to the unam-
biguous conclusion: There is no link
between vaccines and autism. But if
the opposite picture were true, sci-

entists would gladly change their minds. One of the key advantages of science is that it
is not bound to a particular ideology (e.g., a political point of view or prejudice) but is
dedicated to the belief in the superiority of observable evidence. Although the experi-
menter ’s values are certain to enter the picture, they can be a powerful motivating force
to uncover the truth rather than a source of bias.

In summary, scientific inquiry offers us one of many ways to understand the world. In
theory, these perspectives are not incompatible, although in practice, differing perspec-
tives can lead to drastically different conclusions. (The writer Stephen Jay Gould famously
made this argument about science and religion, arguing that they are essentially suited to
answering different types of questions. You can read an essay by Gould at the following
website: http://www.stephenjaygould.org/library/gould_noma.html.) And, on a particu-
larly practical note, the scientific approach is the one that we will adopt throughout this
class. So when you are asked to evaluate research results on your exams, your interpreta-
tion will need to be based on weighing the evidence; it is not acceptable to claim that a
finding “just makes sense.”

Billy Hustace/The Image Bank/Getty Images

Rather than relying on reason and logic, empiricism
focuses on what one can learn through observations and
sensory experiences.

new85743_01_c01_001-062.indd 8 6/18/13 11:55 AM

http://www.stephenjaygould.org/library/gould_noma.html

CHAPTER 1Section 1.2 Scientific Thinking and Paths to Knowledge

The Research Process

So what does it mean to draw conclusions based on science? Scientists across all disciplines
use the same process of forming and testing their ideas. The overall goal of this research
process—also known as the scientific method—is to draw conclusions based on empirical
observations and experiments (e.g., random assignment and manipulation) designed to test
causal theories. In this section, we cover the four steps of the research process—hypothesize,
operationalize, measure, and explain, abbreviated with the acronym HOME.

Step 1—Hypothesize
The first step in the research process is to develop a testable prediction, or hypothesis. A
hypothesis is a specific and falsifiable statement about the relationship between two or
more variables (more on that “falsifiable” bit in a minute . . .). For example, if we study the
link between smoking and cancer, our hypothesis might be that smoking causes lung can-
cer. Or, if we are studying a new drug for treating depression, we might hypothesize that
drug X will lead to a reduction in depression symptoms. We will cover hypotheses in more
detail in the next section, but for now it is important to understand that the way we frame
our hypothesis guides every other step of the research process. Even the most promising
theories will not be testable if you do not clearly define the variables, or if many contradic-
tory outcomes are possible (e.g., depression can lead to weight gain or weight loss).

Step 2—Operationalize
Once we have developed a hypothesis, the next step is to decide how to test it. The pro-
cess of operationalization involves choosing measurable variables to represent the com-
ponents of our hypothesis. In the preceding depression drug example, we would need
to decide how to measure both cause and effect; in this case, we define the cause as the
drug and the effect as reduced symptoms of depression. That is, what doses of the drug
should we investigate? How many different doses should we compare? Also, how will we
measure depression symptoms? Will it work to have people complete a questionnaire? Or
do we want to have a clinician interview participants before and after they take the drug?
An additional complication for psychology studies is that many of our research questions
deal with abstract concepts. There is an art to turning these concepts into measurable
variables. For example, the concept of “happiness” could be operationalized as a person’s
score on a happiness scale, or as the number of times a person smiles in a 5-minute period,
or perhaps even as a person’s subjective experience of happiness during an interview. We
will cover this process in more detail in Chapter 2 (Section 2.2), where we discuss guide-
lines for making these important decisions about the study.

Step 3—Measure
Now that we have developed both our research question and our operational defini-
tions, it is time to collect some data. We will cover this process in great detail; Chapters
3 through 5 are dedicated to the three primary approaches to data collection: descriptive
designs (including qualitative approaches, although quantitative studies can be descrip-
tive as well), survey designs, and experimental designs. The goal of the data collection
stage is to gather empirical observations that will help address our hypothesis. As we dis-
cuss in Chapter 2, these observations can range from questionnaire responses to measures
of brain activity, and they can be collected in ways ranging from online questionnaires to
carefully controlled experiments.

new85743_01_c01_001-062.indd 9 6/18/13 11:55 AM

CHAPTER 1Section 1.2 Scientific Thinking and Paths to Knowledge

Step 4—Explain
After the data has been collected, the final step is to analyze and interpret the results. The
goal of this step is to return full circle to our research question and determine whether the
results support our hypothesis. Let’s go back to our hypothesis that drug X should reduce
depression symptoms. If we find at the end of the study that people who took drug X
showed a 70% decrease in symptoms, this would be consistent with the hypothesis. But
the explanation stage also involves thinking about alternative explanations and planning
for future studies. What if depression symptoms dropped simply due to the passage of
time? How could we address this concern in a future study? As it turns out, there is a
fairly easy way to fix this problem, which we’ll cover in Chapter 5.

In summary, the research process involves four stages: forming a hypothesis, deciding
how to test it, collecting data, and interpreting the results. This process is used regardless
of whether our research questions involve depression drugs, reading speed, or the speed
of light in a vacuum.

Examples of the Research Process

To make these steps a bit more concrete, let’s walk through two examples of how they
could be applied to specific research topics.

Example 1—Depression and Heart Disease
Depression affects approximately 20 million Americans, and 16% of the population will
experience it at some time in their lives (National Institute of Mental Health [NIMH],
2007). Depression is associated with a range of emotional and physical symptoms, includ-
ing feelings of hopelessness and guilt, loss of appetite, sleep disturbance, and suicidal
thoughts. This list has expanded even further to include an increased risk of heart disease.
Individuals who are otherwise healthy but suffering from depression are more likely to
develop and to die from cardiovascular disease than those without depression. Accord-
ing to one study, patients who experience depression following a heart attack experience
a fourfold increase in 5-year mortality rates (research reviewed in Glassman et al., 2011).

One intriguing idea that comes from these findings is that it might make sense to treat
heart attack patients with antidepressant drugs. The goal of the HOME method is to take
this idea, turn it into a testable question, and conduct a study that will test it.

Step 1 is to form a testable hypothesis from this research question. In this case, we might
predict that people who have had heart attacks and take prescribed antidepressants are
more likely to survive in the years following the heart attack than those who do not take
antidepressants. What we’ve done here is to take a general idea about the benefits of a
drug and state it in a way that can be directly tested in a research study.

Step 2 is to decide how we want to operationalize the concepts in our study. In this case, we
would first decide who qualified as a heart attack patient: Would we include only those hos-
pitalized with severe heart attacks or include anyone with abnormal cardiac symptoms? As
we will discuss in later chapters, this decision will have implications for how we interpret
the results. We would also need to decide on the doses of antidepressant drugs to use and
the time period to measure survival rates. How long would we follow patients?

new85743_01_c01_001-062.indd 10 6/18/13 11:55 AM

CHAPTER 1Section 1.2 Scientific Thinking and Paths to Knowledge

Step 3 is to measure these key concepts based on the decisions we made in step 2. This
step involves collecting data from participants and then conducting statistical analyses to
test our hypothesis. We will cover the specifics of research designs beginning in Chapter 2
(Section 2.1), but essentially we would want to give antidepressants to half of our sample
and compare their survival rates with the half not given these drugs.

Step 4 is to explain the results and tie the statistical analyses back into our hypothesis.
In this case, we would want to know whether antidepressant drugs did indeed benefit
heart attack patients and increase their odds of survival for 5 years. If so, our hypoth-
esis is supported. If not, we would go back to the drawing board and try to determine
whether something went wrong with the study or antidepressant drugs really don’t have
any benefit for this population. As we’ll discuss, answering these kinds of questions usu-
ally involves conducting additional studies. Either way, the goal of this final step is to
return full circle to our research question and discuss the implications of antidepressant
drug treatment for heart attack patients.

Example 2—Language and Deception
In 1994, Susan Smith appeared on television claiming that her two young children had
been kidnapped at gunpoint. Eventually, authorities discovered she had drowned her
children in a lake and fabricated the kidnapping story to cover her actions. Before Smith
was a suspect in the children’s deaths, she told reporters, “My children wanted me. They
needed me. And now I can’t help them” (Lee and Vobejda, 1994). Normally, relatives speak
of a missing person in the present tense. The fact that Smith used the past tense in this con-
text suggested to trained FBI agents that she already viewed them as dead (Adams, 1996).

One intriguing idea that comes from this story is that people may communicate in differ-
ent ways when they are lying than when they are telling the truth. The goal of the HOME
method is to take this idea, turn it into a testable question, and conduct a study that will
test it.

Step 1 is to form a testable hypothesis from this research question. This example is some-
what more challenging because “communicating differently” can be defined in many
ways. Thus, we need a hypothesis that will narrow the focus of our study. One hypothesis,
based on research literature, might be that liars show more negative emotion (e.g., anger,
fear) in the way that they communicate than truth-tellers do (e.g., Newman, Pennebaker,
Berry, & Richards, 2003). What we’ve done here is to take a general idea and state it in a
way that can be directly tested in a research study.

Step 2 is to decide how we want to operationalize the concepts in our study. In this case,
we would need to decide what counts as “showing negative emotion.” We might take the
approach used in a previous study (Newman et al., 2003) and scan the words people use,
looking for those reflecting emotions such as anger, anxiety, and fear. The logic here is that
the words people use reflect something about their underlying thought processes and that
people who are trying to lie will be more anxious and fearful as a result of the lie.

Step 3 is to measure these key concepts based on the decisions we made in step 2. This step
involves collecting data from participants and then conducting statistical analyses to test
our hypothesis. In this example, the challenge comes in determining whether and when
people are lying. In Susan Smith’s case, the truth was ultimately discovered, so we can

new85743_01_c01_001-062.indd 11 6/18/13 11:55 AM

CHAPTER 1Section 1.2 Scientific Thinking and Paths to Knowledge

say with some certainty that her language was deceptive. One way to do this in a research
study is to tell people to lie, tell others to be truthful, and compare differences in the way
they use language.

Step 4 is to explain the results and tie the statistical analyses back into our hypothesis.
In this case, we want to know whether people who were instructed to lie did indeed use
more words suggestive of negative emotion. If so, this supports our hypothesis. If not,
we would go back to the drawing board and try to determine whether something went
wrong with the study or people really don’t use more negative emotion when they lie.
Either way, the goal of this final step is to return full circle to our research question and
discuss the implications for understanding indicators of deception.

Goals of Science

In addition to sharing an overall approach, all forms of scientific inquiry tend to adopt one
of four overall goals. This section provides an overview of these goals, with a focus on their
application to psychological research. We will encounter the first three goals throughout
the course and use them to organize our discussion of different research methods.

Description
One of the most basic research
goals is to describe a phenomenon,
including descriptions of behav-
ior, attitudes, and emotions. Basic
research is the foundation on which
all subsequent research will be laid
and therefore should be built sol-
idly. You are probably very familiar
with this type of research because
it tends to crop up in everything
from the nightly news to your
favorite magazine. For example, if
CNN reports that 60% of Americans
approve of the president, they are
describing a trend in public opinion.
Descriptive research should always
be the starting point when studying
a new phenomenon. That is, before
we start trying to explain why col-
lege students binge drink, we need
to know how common the phenom-
enon really is. So we might start
with a simple survey that asked college students about their drinking behavior, and
we might find that 29% of them show signs of dangerous binge drinking. Now that we
have described the phenomenon, we are in a better position to conduct more sophisti-
cated research. (See Chapter 3 for more detail on descriptive research.)

iStockphoto/Thinkstock

Before a phenomenon can be explained, it must first be
described. For example, a survey might be used to collect
information used to describe the phenomenon of binge
drinking.

new85743_01_c01_001-062.indd 12 6/18/13 11:55 AM

CHAPTER 1Section 1.2 Scientific Thinking and Paths to Knowledge

Prediction

A second goal of research is to attempt to predict a phenomenon. This goal takes us from
describing the occurrence of binge drinking among college students to attempting to
understand when and why they do it. Do students give in to peer pressure? Is drinking a
way to deal with the stress of school? These questions could be addressed through a more
detailed survey that asked people to elaborate on the reasons that they drink. The goal
of this approach is to understand the factors that make something more likely to occur.
(See Chapter 4 for more detail on the process of designing surveys and conducting predic-
tive research.)

Explanation
A third, and much more powerful, goal of research is to attempt to explain a phenomenon.
This goal takes us from predicting relationships to testing possible causal links. Whereas
predictive research attempts to find associations between two phenomena (e.g., college
student drinking is more likely when students are under stress), explanatory research
attempts to make causal statements about the phenomenon of interest (e.g., stress causes
college students to drink more). This distinction may seem subtle at this point, but it is an
important one and is closely related to the way that we design our studies. (See Chapter 5
for more detail on explanatory research.)

Change
The fourth and final goal of research is generally limited to psychology and other social
science fields: When we are dealing with questions about behaviors, attitudes, and emo-
tions, we can conduct research to try to change the phenomenon of interest. Researchers
who attempt to change behaviors, attitudes, or emotions are essentially applying research
findings with the goal of solving real-world problems. In the 1970s, Elliot Aronson, a
social psychologist at the University of Texas at Austin, was interested in ways to reduce
prejudice in the classroom. Research conducted at the time was discovering that prejudice
is often triggered by feelings of competition; in the classroom, students competed for the
teacher’s attention. Aronson and his colleagues decided to change the classroom structure
in a way that required students to cooperate in order to finish an assignment. Essentially,
students worked in small groups, and each person mastered a piece of the material. (You
can read the details on this website: http://www.jigsaw.org/). Aronson found that using
this technique, known as the “jigsaw classroom,” both enhanced learning and decreased
prejudice among the students (e.g., Aronson, 1978).

Aronson’s work also illustrates the distinction between two categories of research. The first
three goals we have discussed fall mainly under the category of basic research, in which
the primary goal is to acquire knowledge, with less focus on how to apply the knowledge.
Scientists conducting basic research might spend their time trying to describe and under-
stand the causes of binge drinking but stop short of designing interventions to stop binge
drinking. This fourth goal of research is more often seen in applied research, in which the
primary goal is to solve a problem, with less focus on why the solution works. Scientists
conducting applied research might spend their time trying to stop binge drinking but
not get caught up in the details of why these interventions are effective. But Aronson’s
research is a great example of how these two categories should work together. The basic
research on sources of prejudice informed his applied research on ways to reduce preju-
dice, which in turn informed further basic research on why this technique is so effective.

new85743_01_c01_001-062.indd 13 6/18/13 11:55 AM

http://www.jigsaw.org/

CHAPTER 1Section 1.2 Scientific Thinking and Paths to Knowledge

One final note on changing behavior: Any time you set out with the goal of changing
what people do, your values enter the picture. Inherent in Aronson’s research was the
assumption that prejudice was a bad thing that needed to be changed. Although few peo-
ple would disagree with him, the risk is that he might have trouble remaining objective
throughout the research project. As we suggested earlier, the more emotionally involved
you are in the research question, the more you have to be aware of the potential for bias,
and the more you have to force yourself to pay attention to the data.

Quantitative Versus Qualitative Research

Imagine for a moment that you are a city planner interested in studying traffic patterns at
different times of the day. You might approach this research question in one of two ways.
You could fly over the city in a helicopter, take snapshots of a random set of busy intersec-
tions, and conduct statistical analyses on cars moving in different directions at different
times. This would give you a broad understanding of traffic patterns in the city. Alterna-
tively, you could spend your resources studying the busiest intersection in the middle
of downtown, trying to understand everything from driver behaviors to the effects of
weather conditions. This would give you a very deep understanding of traffic in the mid-
dle of your city.

These two approaches illustrate the differences between quantitative research and qualita-
tive research, respectively. Quantitative research is a systematic and empirical approach
that attempts to generalize results to other contexts. By taking “samples” of different inter-
sections and by conducting inferential statistics, our hypothetical city planner could learn
a little bit about traffic in general. Qualitative research, in contrast, is a more descriptive
approach that attempts to gain a deep understanding of particular cases and contexts.
By studying the busiest intersection in detail, our hypothetical city planner could learn a
great deal about the traffic patterns at that intersection.

The two approaches have traditionally been popular with different social science fields.
For example, much of the current research in psychology is quantitative because the goal
is to gain generalizable knowledge about behavior and mental processes. In contrast,
much of the current research in sociology and political science tends to be qualitative
because the goal is to gain a richer understanding of a particular context. If you want to
understand why college students around the country suffer from increased depression,
quantitative methods are the better choice. If you want to understand why the citizens of
Egypt revolted against their government, then qualitative methods are more appropriate.
Overall, qualitative research is especially useful when behavior has multiple causes that
researchers may not anticipate or when researchers have only a limited understanding of
the subjects’ cultural point of view.

new85743_01_c01_001-062.indd 14 6/18/13 11:55 AM

CHAPTER 1Section 1.2 Scientific Thinking and Paths to Knowledge

Table 1.1 presents a comparison of quantitative and qualitative methods, their descrip-
tions, purposes and approaches, and the researcher’s roles. (See also the Centers for
Disease Control and Prevention [CDC] website for further comparison: http://www.cdc
.gov/healthcommunication/CDCynergy/Appendix.html#H)

Table 1.1: Comparing quantitative and qualitative research methodologies

Quantitative Qualitative

Description Aim is to classify features, count them, and
construct statistical models in an attempt
to explain what is observed

Researcher knows clearly in advance what
he/she is looking for

All aspects of the study are carefully
designed before data are collected

Researcher uses tools, such as closed-
ended questionnaires, rating scales, tests,
etc. or equipment to collect numerical data

Data take the form of numbers and
statistics and are measurable

Focus is on objective assessment: seeking
precise measurement and analysis of
target concepts, e.g., uses closed-ended
surveys, questionnaires, etc.

Data are more efficient, able to test
hypotheses, and can be generalized

Researcher is objectively separated from
the subject matter

Aim is a complete, detailed description

Researcher may only know roughly in
advance what he/she is looking for

The design emerges as the study unfolds

Researcher uses observations, interviews
(open-ended questions), and written
documents (historical records, official
publications, other articles, photographs,
etc.)

Data take the form of words, pictures, or
objects and are not as easy to measure

Focus is on subjective assessment:
individuals’ interpretation of events
is important, e.g., uses participant
observation, in-depth interviews, etc.

Data are more detailed, time-consuming,
and less able to be generalized

Researcher is immersed in the subject
matter

Purpose Generalizability

Prediction

Causal explanations

Contextualization

Interpretation

Understanding actors’ perspectives

Approach Begins with hypotheses and theories

Manipulation and control of variables

Uses formal instruments of measurement

Experimentation

Deductive reasoning

Ends with hypotheses or grounded theory

Little control over variables

Researcher as instrument

Naturalistic observation

Inductive reasoning

Researcher’s
Role

Detachment and impartiality

Objective portrayal

Personal involvement and partiality

Empathic understanding

new85743_01_c01_001-062.indd 15 6/18/13 11:55 AM

http://www.cdc.gov/healthcommunication/CDCynergy/Appendix.html#H

CHAPTER 1Section

1.3 Research Problem and Questions

In an ideal world, a true understanding of any phenomenon requires the use of both
methods. That is, we can best understand depression if we both study statistical trends
and conduct in-depth interviews with depressed people. We can best understand binge
drinking by conducting both surveys and focus groups. And we can best understand the
experience of being bullied in school by both talking to the victims and collecting school-
wide statistics. Thus, researchers do not have to choose one method over another but
can combine elements of both quantitative and qualitative approaches to produce mixed
methods designs. Mixed methods designs are often used when one method does not
provide a complete picture of the phenomenon being investigated. In this text, the focus
is primarily on quantitative methods, reflecting current trends in the field of psychology.
We will primarily cover qualitative methods in Chapter 3 (on descriptive research) and
quantitative methods in Chapters 4 (predictive research) and 5 (experimental research). A
more thorough discussion on mixed methods designs will also be discussed in Chapter 5.

1.3 Research Problem and Questions

Before conducting research, whether it be through qualitative or quantitative meth-ods, a researcher must first identify a problem to investigate and then develop a research question or questions to ask about that particular problem. Theory and
hypothesis play a crucial role, as do research, observation, and top-down and bottom-up
thinking, informed by a thorough literature search.

While we often think we understand problems, we really do not. For example, a teacher
might notice that a student is easily distracted and inattentive in the classroom, leading
the teacher to believe, initially, that the student has an attention problem or attention-
deficit/hyperactivity disorder (ADHD). Upon further examination and possibly after
testing has occurred, the results might instead show that the student has a learning
disability in reading, writing, or math, and is being inattentive because he or she does not
understand the material or have the necessary skills to complete the assignment.

In another example, a teacher observes that a student is sleeping excessively during the
first two periods of school. The teacher may assume that the student stays up late play-
ing on the computer or texting with his or her friends. After speaking to the parents, the
teacher learns that the parents have recently gone through a divorce and that the student
is working a part-time job in the evenings to help out with the finances. Thus, the student
has been staying up late at night to complete his or her homework for the next school day.
As we can see, in some cases our initial beliefs or thoughts about a problem may not be
correct and may lead to inaccurate recommendations and treatments. Therefore, it is abso-
lutely crucial that we accurately identify the problem that we want to study.

Research Problems

A research problem is the topic or phenomenon that we want to address, investigate,
and research, either through quantitative, qualitative, or mixed methods. It is the heart
of any research project and is crucial to the success of the overall research effort (Leedy &
Ormrod, 2010). Problems needing more research are everywhere; however, finding a
research problem that interests you may take some work.

new85743_01_c01_001-062.indd 16 6/18/13 11:55 AM

CHAPTER 1Section 1.3 Research Problem and Questions

Sources of Research Problems
There are several different methods for identifying a good research problem. These
include reviewing theories about a topic, reviewing current professional literature on a
topic, attending professional conferences, and having discussions with colleagues on the
issue. A selective reading of the literature is probably the most advantageous method, as it
can provide a theoretical base to generate research questions and hypotheses and assist in
the selection of research methodologies and measurement methods. Later on, it can help
you to interpret the results in comparison with other literature in the field. Attending pro-
fessional conferences also provides advantages because there, researchers can explore the
most popular topics in their field as well as meet with experts who have been researching
a given problem.

Charles (1995; as cited in Houser, 2009) provides several helpful suggestions for research-
ers when identifying research problems. These include (1) having personal interest in the
topic, (2) selecting an important topic that will answer the “So what?” question we ask
when evaluating others’ research, (3) selecting a topic that is feasible and can be com-
pleted in a reasonable amount of time, and (4) selecting a topic that can be completed with
the amount of money allotted to studying it. Thus, it is important that we select something
that we are interested in and have some knowledge about, as we may not want to see the
study through if the topic has no interest to us or relevance in our lives.

Stating the Research Problem
Once a research problem is identi-
fied, the next step is to narrow the
topic so that it can be measurable
and presented in a clear problem
statement. For example, having a
research problem of “Lack of stu-
dent success in online classrooms”
is extremely broad and could take
many directions. For instance, would
the research include students’ expe-
riences with online learning, num-
ber and quality of student-to-teacher
interactions, quality of student-to-
student interactions, or another area?
Developing a problem statement, or
aim of the study, will help to clearly
describe the intent of the study.

Problem statements should be clearly
and specifically stated and should
describe the main goal of the total
research project. For example, using the preceding example, “Lack of student success
in online classrooms” lacks clarity and does not provide an understanding of what the
researcher plans to do. Developing this into a complete sentence that describes a research-
able problem would entail the following: “To determine the relationship between instruc-
tor involvement and student success during students’ first online course in college.”

Digital Vision/Thinkstock

Investigating lack of student success in online classrooms
requires a researcher to develop a clear and focused
problem statement.

new85743_01_c01_001-062.indd 17 6/18/13 11:55 AM

CHAPTER 1Section 1.3 Research Problem and Questions

This latter statement is clear regarding the intent of the study and the population that
will be included. Clearly defining a problem is key to the design and implementation of
a research study. Without a clear and specific problem statement, the researcher may find
him- or herself going on a “wild goose chase” and wasting unnecessary time trying to
investigate a vague problem or phenomenon.

The following guidelines adapted by Leedy and Ormrod (2010) will assist you in formulating
a clear, precise, and accurate problem statement:

• Is the problem stated in complete and grammatically correct sentences?
• Is it clear what the study will focus on?
• Is it clear that the results could go either way? Thus, does the statement suggest

an open mind to the research findings, or does it show a particular expected
outcome?

• Does the answer to the problem provide important and useful information
regarding the topic?

• Is the problem statement focused enough for the research to be completed in a
reasonable amount of time and within budget?

Dividing the Research Problem Into Subproblems
If your research problem covers more than one concept, you will want to break down your
research problem into subparts or subproblems, each of which represents only one con-
cept. For example, if we were to reword our problem statement as “To evaluate the influ-
ences that instructor involvement and student-to-student interactions have on students’
success during their first online course in college,” there would be two concepts being
evaluated: instructor involvement and quality of student-to-student interactions. To break
this problem statement into two subproblems, it would look like the following:

Problem Statement: To evaluate the influences that instructor involvement and
quality of student-to-student interactions have on students’ success during their
first online course in college.

Subproblem 1: Evaluate the influences of instructor involvement on students’
success during their first online course in college.

Subproblem 2: Evaluate the influence that quality of student-to-student interac-
tions has on students’ success during their first online course in college.

Thus, your problem statement should comprise all of the subproblems, while the sub-
problems should not introduce any new ideas or concepts that are not covered in the
problem statement.

When developing subproblems, you will want to adhere to these guidelines: (1) Each
subproblem should be a problem that can be researched on its own; (2) each subproblem
should be set forth as a statement and not as a question; and (3) the total number of sub-
problems should be between two and six (Leedy & Ormrod, 2010). Viewing a problem
statement through its subproblems will give you a better idea of how to approach the
overall research project (Leedy & Ormrod, 2010).

new85743_01_c01_001-062.indd 18 6/18/13 11:55 AM

CHAPTER 1Section 1.3 Research Problem and Questions

The Purpose Statement
The purpose statement, similar to the problem statement, takes the goal of the study one
step further. It not only includes the intent of the study but identifies what population
will be studied, what type of research will be conducted (e.g., a comparison between vari-
ables), and what the dependent and independent variables will be. Using our research
problem, “Lack of student success in online classrooms,” a purpose statement might
look like the following: “The present study was conducted to determine the relationship
between instructor involvement and student success during students’ first online course
in college.”

In most quantitative research as this, problem statements are often replaced with hypoth-
eses, which will be discussed later in the chapter. In contrast, qualitative research meth-
ods generally employ either problem statements or research questions. With any research
method, however, the purpose statement should show that the purpose and problem are
researchable.

Researchers utilizing quantitative methods generally include in their purpose statement
whether the study involved a comparison among groups or a relationship between two
or more variables, or a descriptive examination of one or more variables. Including this
information not only guides the researchers in selecting the appropriate data analyses
but also provides information on the type of study being conducted (Houser, 2009). For
instance, Kerrigan (2011) provides an example of a purpose statement that includes a com-
parison study:

The purpose of this comparative quasi-experimental study was to compare
the effect of coaching on comfort levels, as measured by an adapted ques-
tionnaire, and blood sugars levels, as recorded on individuals’ glucome-
ters, between two groups of individuals with diabetes who had attended a
formal diabetic education program (p. 7).

Researchers examining the types of relationships between two or more variables are inter-
ested in how well the variables correlate. For example, Cerit and Dinc (2013) conducted
a study that focused on a relationship between variables. They discussed their purpose
statement as follows: “The aim of this study was to investigate the correlation between
nurses’ professional behaviours and their ethical decision-making in a different cultural
context by adapting the Nursing Dilemma Test (NDT) into Turkish” (p. 202). Both exam-
ples provide the reader with information regarding the type of study utilized (i.e., com-
parison or correlation) as well as what the dependent and independent variables were.

On the other hand, when examining a phenomenon, characteristic, or event in great detail,
some researchers may choose to use qualitative or descriptive methods rather than quan-
titative methods. In these cases, the purpose statement will focus more on describing and
clarifying the phenomena or event than on comparing groups or identifying relationships
between variables (Houser, 2009). Here is an example of a qualitative purpose statement,
provided by Bradshaw, Sudhinaraset, Mmari, and Blum (2010):

The primary goals of the current study were to (a) describe the transition-
related stressors experienced by mobile military students; (b) describe
the efforts employed to help these students cope with their stress; and
(c) identify strategies that schools can use to ease the transition process for

new85743_01_c01_001-062.indd 19 6/18/13 11:55 AM

CHAPTER 1Section 1.3 Research Problem and Questions

mobile military students. To address these three goals, we conducted sepa-
rate focus groups with adolescents in military families, military parents,
and school staff in military-affected schools at select U.S. military bases.
(pp. 86–87)

The most important term in this purpose statement is the word describe, as it indicates that
the study is employing qualitative or descriptive methods rather than quantitative ones.
Regardless of whether a researcher is utilizing quantitative or qualitative methods, the
purpose statement is generally included at the end of the Introduction, usually in the last
paragraph before the Literature Review section.

Research Questions

As we have learned, it is important to narrow down one’s topic or ideas into a research-
able problem. Examining existing literature will provide information about what is
unclear in the field of study and whether any gaps exist. Doing so will also help to fur-
ther clarify the research focus or aim of the study as well as assisting in the development
of research questions.

Identifying a research problem, stating the problem, and providing a purpose statement
are all steps toward describing the aim or goal of the overall study. Research questions
are then developed to guide researchers toward their objectives. In quantitative studies,
research questions generally take the form of hypotheses, which are specific predictions or
educated guesses about the outcome of the study. However, some quantitative research-
ers choose to include hypotheses and research questions that are related to the research
problem. Generally, quantitative research questions focus on the Who, What, and When of
specific variables and are closed-ended questions that provide cause-and-effect answers.

In qualitative studies, research questions guide data collection and interpretation but do
not include speculations or predictions about the outcome. Qualitative research questions
tend to focus on the Why and How of a phenomena or event, providing more descriptive
and open-ended answers.

Both hypotheses and research questions provide the researcher with a starting point to
explore a problem, as well as assist the researcher to “stay on topic” and answer those
questions he or she initially wanted to address.

Developing Research Questions
How you conduct a research study depends largely on the research questions you develop.
Let us look back on our previous research problem statement involving online learning:
“To determine the relationship between instructor involvement and student success dur-
ing students’ first online course in college.” Some researchable questions might include
the following:

1. Are there relationships between instructor involvement and students’ success
with respect to students’ participation in the online classroom and students’
quality of work completed?

2. Does the amount of instructor involvement have an influence on student
involvement?

new85743_01_c01_001-062.indd 20 6/18/13 11:55 AM

CHAPTER 1Section

1.4 Hypotheses and Theories

Notice how these questions provide specific information about what will be examined.
For example, the first research question identifies how student success will be defined by
measuring the amount of participation in the classroom and the quality of work submit-
ted. Operationally defining, or clearly identifying, how student success is going to be
measured (i.e., through number of weekly participations and graded work) ensures that
all researchers and reviewers have a clear understanding of what “student success” means
in this study. Operational definitions, such as this one, establish the meaning of a concept
or variable in relation to a particular study. Without operationally defining student suc-
cess for this study, it would be unclear how that variable would be assessed or measured.
The second research question tells us that the researcher is going to measure the level of
instructor involvement and see how it relates to student involvement in the course. Thus,
we also need to operationally define how we are going to identify and measure “level of
instructor involvement.” Will it be measured by number of times an instructor responds
to a student each week, by the length and quality of the responses, or both? Both research
questions not only inform how the research will be conducted but also serve as guides
throughout the research project endeavor.

It is important to mention that, although research questions should be developed at the
beginning of a project, they can change as you design your study. Designing your study
involves making several careful decisions about your research questions in order to pre-
vent your study from foundering. Ask yourself, What types of data will be collected, and
what methods will be used to collect the data? Where and for how long will the research
be conducted, and what participants or groups will be included? Are the data collection
procedures consistent with the research questions? Once the project has started, if you
find that your research questions were not appropriate for the research problem or that
the data collection and analysis methods were not consistent with the research questions,
your study results may be unusable, forcing you to start the project over again.

1.4 Hypotheses and Theories

The use of hypotheses is one of the key distinguishing features of scientific inquiry. Rather than making things up as they go along, scientists develop a hypothesis ahead of time and design a study to test this hypothesis. In this section, we cover
the process of turning rough ideas about the world into testable hypotheses. We cover the
primary sources of hypotheses as well as several criteria for evaluating hypotheses.

Sources of Hypotheses

Hypotheses can be generated from the bottom up or the top down. From the bottom up,
hypotheses are built on real-world observations, using inductive reasoning. From the top
down, hypotheses begin with big ideas, or theories, which are then tested through deduc-
tive reasoning.

new85743_01_c01_001-062.indd 21 6/18/13 11:55 AM

CHAPTER 1Section 1.4 Hypotheses and Theories

Bottom-Up: From Observation to

Hypothesis

Research hypotheses are based on observations about the world around us. For example,
you may have noticed the following tendencies as you observe the people around you:

• Teenagers do a lot of reckless things when their friends do them.
• Close friends and couples tend to dress alike.
• Everyone faces the front of the elevator.
• Church attendees sit down and stand up at the same time.

Based on these observations, you might develop a general hypothesis about human
behavior—that people will conform to, or go along with, what the group is doing. This
process of developing a general statement out of a set of specific observations is called
induction and is perhaps best understood as a “bottom-up” approach. In this case, we
have developed our hypothesis about conformity from the ground up, based on observ-
ing behavioral tendencies.

The process of induction is a very common and very useful way to generate hypotheses.
Most notably, this process is a great source of ideas that are based in real-world phenom-
ena. Induction also helps us to think about the limits of an observed phenomenon. For
example, we might observe the same set of conforming behaviors and speculate whether
people will also conform in dangerous situations. What if smoke started pouring into a
room and no one else reacted? Would people act on their survival instinct or conform to
the group and stay put (Latané & Darley, 1969)? Your prediction about how this experi-
ment might turn out forms your hypothesis for the experiment.

The process of qualitative research is an excellent example of induction, in that the
researcher builds abstractions, concepts, hypotheses, and theories from details and obser-
vations in the world. Hypotheses are not established a priori but may emerge from
the research data and findings. Thus, qualitative approaches often lead to hypothesis-
generating research, which can lay the groundwork for future quantitative studies.

Top-Down: From Theory to Hypothesis
The other approach to developing research hypotheses is to work down from a bigger
idea. The term for these big ideas is a theory, which refers to a collection of ideas used
to explain the connections among variables and phenomena. For example, the theory of
evolution organizes our knowledge about how species have developed and changed over
time. One piece of this theory is that life originated in Africa and then spread to other parts
of the planet. However, this idea in and of itself is too big to test in a single study. Instead,
we move from the “top down” and develop a specific hypothesis out of a more general
theory; this process is known as deduction.

When we develop hypotheses using a process of deduction, the biggest advantage is that
it is easier to place the study—and our results—in the larger context of related research.
Because our hypotheses represent a specific test of a general theory, our results can be
combined with other research that tested the theory in different ways. For example, in
the evolution example, you might hypothesize that older fossils would be found in Africa
than would be found in other parts of the world. If this hypothesis were supported, it

new85743_01_c01_001-062.indd 22 6/18/13 11:55 AM

CHAPTER 1Section 1.4 Hypotheses and Theories

would be consistent with the overall theory about life originating in Africa. And, as more
and more researchers develop and test their own hypotheses about the origins of life, our
cumulative knowledge about evolution continues to grow.

Most research involves studying constructs that have been investigated extensively. In
such situations, particular theories will guide decisions about the research. Some of these
theories may be new and will have had only limited studies conducted on them. Others
will be more mature, having hundreds of research studies validating their predictions. In
some cases, a study may provide validation for more than one theory. To illustrate this
concept, consider a study on the causes of childhood obesity. The following are only some
of the many theoretical ideas that could contribute to such a study:

• Parents do not provide healthy eating choices at home.
• Children from low-income neighborhoods do not have access to healthier food

choices or cannot afford them.
• Busy families do not have time to cook and rely on fast food.
• Obesity is genetic. Thus, children with obese parents are 80% more likely to be

obese themselves.
• Media encourages the consumption of fast food.
• Cultural and ethnic differences exist regarding what is considered a healthy or an

unhealthy weight.
• Children are spending more time watching TV and playing video games, and

consuming junk food while doing so.
• Schools are not providing healthy food options.
• Children are not exercising enough at school or at home.

This example only scratches the surface of the role of theory in a study such as this.
Possible hypotheses that could be formulated from these theories include the following:
Children exposed to a school-based intervention to reduce time spent watching televi-
sion and playing video games will have significantly reduced body mass index (BMI); or,
Exposure to fast food, soft drink, and cereal advertising on television increases children’s
food consumption behaviors and, in turn, their BMI.

Table 1.2 compares the two sources of research hypotheses, showcasing their relative
advantages and disadvantages.

Table 1.2: Comparing sources of hypotheses

Deduction Induction

“Top-down,” from theory to hypothesis “Bottom-up,” from observation to hypothesis

Easy to interpret our findings Can be hard to interpret without prior research

Helps science build and grow Helps our understanding of the real world

Might miss out on new perspectives Great way to get new ideas

new85743_01_c01_001-062.indd 23 6/18/13 11:55 AM

CHAPTER 1Section 1.4 Hypotheses and Theories

Research: Thinking Critically

Controversy Grows Over Study Claiming Liberals and Atheists Are Smarter

By Daniela Perdomo

There’s a lot of buzz over a controversial study released in the journal Social Psychology Quarterly,
titled “Why Liberals and Atheists Are More Intelligent,” that compares IQ levels among liberals and
conservatives, atheists and religious believers. The widely circulated study claims that “more intel-
ligent individuals may be more likely to acquire and espouse evolutionarily novel values and prefer-
ences (such as liberalism and atheism . . .) than less intelligent individuals.” The study was written
by Satoshi Kanazawa (2010), a social scientist at the London School of Economics who employs evo-
lutionary psychology to analyze the social sciences, such as economics and politics, and who has a
history of attracting ire over his studies and opinions.

But before drawing any conclusions about Kanazawa’s latest study, it’s worth expanding on the data
he bases his claims on. First of all, quantifying intelligence on a societal level—and even from person
to person—is incredibly tricky, if not impossible. As an evolutionary psychologist, Kanazawa likely
recognizes this, and that may be why he decided to limit his intelligence measures to IQ points, a
convenient and notoriously narrow way of assessing cognitive abilities.

(continued)

Evaluating Theories

While experiments are designed to test one hypothesis at a time, the overall progress in a
field is measured by the strength and success of its theories. If we think of hypotheses as
being like individual combat missions on the battlefield, then our theories are the overall
battle plan. So how do we know whether our theories are any good? In this section, we
cover four criteria that are useful in evaluating theories.

Explains the Past; Predicts the Future
One of the most important requirements for a theory is that it either supports, refutes,
or provides additional perspectives on existing knowledge. If a physicist theorized that
everything on earth should float off into space, this would conflict with millennia’s worth
of evidence showing that gravity exists. And if a psychologist argued that people learn bet-
ter through punishment than through rewards, this would conflict with several decades
of research on learning and reinforcement. A theory should offer a new perspective and
a new way of thinking about familiar concepts, but it cannot be so creative that it clashes
with what we already know. Related to this, a theory also has to lead to accurate predic-
tions about the future, meaning that it has to stand up to empirical tests. There are usually
multiple ways to explain existing knowledge, but not all of them will be supported as we
test their assumptions in new circumstances. At the end of the day, the best theory is the
one that best explains both past and future data.

new85743_01_c01_001-062.indd 24 6/18/13 11:55 AM

CHAPTER 1Section 1.4 Hypotheses and Theories

Research: Thinking Critically (continued)
The first problem in the study comes with Kanazawa’s use of IQ as an accurate measure of intel-
ligence. P. Z. Myers, a leader in the field of evolutionary developmental biology (and an avowed
atheist and progressive), is not surprised. He calls Kanazawa the “great idiot of social science” and
points to a 2006 paper in which Kanazawa took the mean IQ of various countries and used those to
draw conclusions on their dedication to health care.

For example: Ethiopia has a mean IQ of 63. This low IQ explains why Ethiopia’s health care system
is awful, according to Kanazawa. Talk about simplistic. Not only does this ignore the fact that IQ
might better measure cognitive capabilities in the developed world, where it was designed, but it
completely tunes out the fact that Ethiopia has been embroiled in wars for many years, which would
appear to be a better explanation for why the health care system there hasn’t developed to West-
ern levels yet. “Intelligence is such a complex phenomenon—there are multiple parameters,” Myers
says. “And IQ is extremely sensitive to social conditions. Kanazawa wants to reverse it and say that IQ
is causing problematic social conditions.”

In this more recent study, not only does Kanazawa gloss over structural inequalities that may lead
to varying IQ levels in American society, but even the disparities he finds in this imperfect measure
of intelligence are relatively minuscule. For the most part, he is not speaking of a difference of more
than six IQ points between liberals and conservatives, atheists and believers—a negligible difference
one would never notice in real person-to-person interactions.

Kanazawa isn’t the first to study the intelligence–religiosity nexus. Other studies have also found a
three- to six-point IQ difference between atheists and religious believers, in the atheists’ favor. But
those studies didn’t claim that atheists were more evolved, as Kanazawa presumes, but merely con-
cluded that they are more skeptical owing to a certain kind of schooling and cultural exposure (which
might also account for why some people perform well on IQ tests).

Then there’s the issue of Kanazawa’s definition of liberalism, which he writes is the “contemporary
American” denotation: “the genuine concern for the welfare of genetically unrelated others and
the willingness to contribute larger proportions of private resources for the welfare of such others.”
Practically speaking, this means Kanazawa’s “liberalism” is defined as a willingness to pay a higher
tax rate and donate money to charity.

This definition of liberalism, says Ilya Somin, a legal scholar whose expertise includes popular politi-
cal participation, does not actually distinguish it from, say, conservatism or libertarianism. Somin
writes:

[A] libertarian who believes that free market policies best
promote the welfare of genetically unrelated others and
contributes a great deal of his money to charities promoting
libertarian causes counts as a liberal under this definition. The
same goes for a Religious Right conservative who believes that
everyone will be better off under socially conservative policies
and contributes lots of money to church charities.

(continued)

new85743_01_c01_001-062.indd 25 6/18/13 11:55 AM

CHAPTER 1Section 1.4 Hypotheses and Theories

Research: Thinking Critically (continued)
On this last point, it should be noted that recent research shows American political conservatives actu-
ally give more money to charity (and donate more blood) than their politically liberal counterparts.

The problem inherent in Kanazawa’s vague definition of liberalism is further compounded by the fact
that he gleans his data on intelligence and attitudes toward topics of religion, politics, and charity
from two massive national surveys—the National Longitudinal Study of Adolescent Health and the
General Social Survey.

These large-scale studies are greatly compromised by self-reporting. Most Americans don’t even
really know where they fall on the left–right political continuum. Polling shows, for example, that
more African Americans self-identify as conservative than liberal, but when it comes to actual votes,
data indicate that Blacks overwhelmingly vote for traditionally defined liberal causes and candidates.

And libertarians—estimated to be about 15% of the U.S. population—don’t neatly identify as liberals
or conservatives, or even centrists, depending on whether they more closely identify as economic
conservatives or social liberals. Even progressives shy away from identifying themselves as liberals, a
term that carries a negative connotation for many of them.

A particularly problematic idea presented by the study is how Kanazawa defines certain values and
preferences as “evolutionarily novel.” While he does not come out and say being atheist is a sign of
having evolved more than those who are religious, he does infer this, not only by referring to the
slightly higher mean IQ levels of American atheists but also by pointing out that atheism goes against
the grain of general human history. (Kanazawa doesn’t even touch upon the idea that beliefs are
more likely colored by one’s cultural background than one’s genetics.)

Personal values do play a positive role in motivating researchers to get to the bottom of situations
they care about. However, as we can see from this scholarship, there are dangers in narrowing one’s
cultural point of view and allowing one’s political bias to influence the interpretation of data. In
the end, Kanazawa’s study reinforces long-standing prejudices against conservatives and religious
believers. To think that conservatives or religious people “are dumber than you and me,” says Myers,
“fosters this tribalism that we’re out to replace people rather than to educate and inform them.” And
that’s not very smart.

Perdomo, D. (2010, March 5). Controversy grows over study claiming liberals and atheists are smarter. Alternet. Retrieved from
http://www.alternet.org/story/145903/controversy_grows_over_study_claiming_liberals_and_atheists_are_smarter

Think about it:

1. What general theory is Kanazawa trying to test? How does the theory differ from his specific
hypothesis?

2. How did Kanazawa operationalize liberalism and intelligence in his research? Are there prob-
lems with the way these constructs were operationalized? Explain.

3. What were Kanazawa’s main findings? Evaluate the strength of the evidence for and against his
hypothesis. How is the strength of this evidence influenced by his research methods?

4. Why do you think this research is controversial? If Kanazawa’s methodology were more rigorous,
would it still be controversial?

new85743_01_c01_001-062.indd 26 6/18/13 11:55 AM

http://www.alternet.org/story/145903/controversy_grows_over_study_claiming_liberals_and_atheists_are_smarter

CHAPTER 1Section 1.4 Hypotheses and Theories

Testable and Falsifiable
Second, a theory needs to be stated in such a way
that it leads to testable predictions. More specifi-
cally, a theory should be subject to a standard of
falsifiability, meaning that the right set of condi-
tions could prove it wrong (Popper, 1959). Calling
something “falsifiable” does not mean it is false,
only that it would be possible to demonstrate its
falsehood if it were false. The Darwinian theory of
evolution offers a great example of this criterion.
One of the primary components of evolutionary
theory is the idea that species change and evolve
from common ancestors over time in response
to changing conditions. So far, all evidence from
the fossil record has supported this theory—older
variants of species always appear farther down in
a fossil layer. However, if conflicting evidence ever
did appear, it would deal a serious blow to the the-
ory. The biologist J. B. S. Haldane was once asked
what kind of evidence could possibly disprove the
theory of natural selection, to which he replied,
“fossil rabbits in the Pre-Cambrian era”—that is,
a modern version of a mammal in a much older
fossil layer (Ridley, 2003).

Parsimonious
Third, a theory should strive to be parsimonious, or as simple and concise as possible
without sacrificing completeness. (Or, as Einstein famously quipped during a lecture at
Oxford, “Everything should be made as simple as possible, but no simpler” [Einstein,
1934, p. 165]). One helpful way to think about this criterion is in terms of efficiency. Our
theories need to spell out the components in a way that represents everything important
but doesn’t add so much detail that it becomes hard to understand. This means that our
theories can lack parsimony either because they are too complicated, or because they are
too simple. At one end of this spectrum, Figure 1.1 presents a theoretical model of the
causes of malnutrition (Cheah, Zabidi-Hussin, & Wan Manan, 2010.). This theory does a
superb job of summarizing all of the predictors of child malnutrition across multiple lev-
els of analysis. However, the potential problem is that it becomes too complicated to test.
At the other end of the spectrum, Figure 1.2 presents the overall theoretical perspective
behind behaviorism. In the early part of the 20th century, the behaviorist school of psy-
chology argued that everything organisms do could be represented in behavioral terms,
without any need to invoke the concept of a “mind.” The overarching theory looked
something like Figure 1.2, with the “black box” in the middle representing mental pro-
cesses. However, the cognitive revolution of the 1960s eventually displaced this theory, as
it became clear that behaviorism was too simple. The ideal balance, then, is to lay out your
theory in a way that includes the necessary pieces and nothing unnecessary.

iStockphoto/Thinkstock

The theory of evolution is falsifiable, meaning
that it could be disproved under the right
conditions—for example, if fossil evidence
that contradicted the theory was discovered.

new85743_01_c01_001-062.indd 27 6/18/13 11:55 AM

CHAPTER 1Section 1.4 Hypotheses and Theories

Figure 1.1: Predictors of malnutrition

A complex theoretical model of the causes of malnutrition.

Adapted from Cheah, W.L., Zabidi-Hussein, Z., Wan Manan, W.M. (2010). A structural equation model of the determinants of malnutrition
among children in rural Kelantan, Malaysia. Rural and Remote Health 10: 1248 (Online).

Figure 1.2: The behaviorist model

The overall theoretical perspective behind behaviorism. The “black box” in the middle represents mental
processes.

Promotes Research
Finally, science is a cumulative field, which means that a theory is really only as good as
the research it generates. Or to state it more bluntly, the theory that you are so attached to
is useless if no one follows up on it. Thus, one of the best bases for evaluating a theory is
whether it encourages new hypotheses. Consider the following example, drawn from real
research in social psychology. Since the early 1980s, Bill Swann and his colleagues have
argued that we prefer consistent feedback to positive feedback, meaning that we would

Siblings shared

Arm circumference Weight for age Height for age

Malnutrition Indicator

Gender

Age

Birth weight

Biological
Aspect

Sanitation

Location

Infrastructure
Health

services

Social and
economic

factors

Environmental
Aspect

Breastfeeding

Birth interval

Maternal factors

Health practice

Childcare practice

Shared preference for food

Feeding practice

Diet

Weaning

Behavioral
Aspect

Unobserved/latent, endogenous variables

Observed, endogenous variables

Observed, exogenous variables

Unobserved/latent, theoretical
exogenous variables

Stimulus
Response
(Behavior)

new85743_01_c01_001-062.indd 28 6/18/13 11:55 AM

CHAPTER 1Section 1.4 Hypotheses and Theories

rather hear things that confirm what we think of ourselves. One provocative hypothesis
that comes out of this theory is that people with low self-esteem are more comfortable
with a romantic partner who thinks less of them than anyone who might think well of
them. This hypothesis has been tested and supported many times in various contexts
and continues to draw people in because it is exciting. (For a review of this research, see
Swann, Rentfrow, & Guinn, 2005.)

The Cycle of Science

Let’s take a step back and look at the big picture. We have now covered the processes of
developing theories, developing hypotheses, and evaluating all of them. But of course,
none of these pieces occurs in isolation; science is an ongoing process of updating and
revising our views based on what the data show. This overall process works something
like the cycle depicted in Figure 1.3. We start with an overall theory about how concepts
relate to one another and use this to generate specific, testable, and falsifiable hypotheses.
These hypotheses then form the basis for research studies, which generate empirical data.
Based on these data, we may have reason to suspect that the overall theory needs to be
refined or revised. And so we develop a new hypothesis, collect some new data, and
either confirm or don’t confirm our suspicion. But it doesn’t end there: Other researchers
may see a new perspective on our theory and develop their own hypotheses, which lead
to their own data and possibly to a revision of the theory. If this is making your head spin,
you’re not alone. The scientific approach may be a slow and strange way to solve prob-
lems, but it is the most objective one available.

In the 1960s, social psychologists were beginning to study
the ways that people explain the behavior of others (e.g.,
when someone cuts you off in traffic, you tend to assume
he is a jerk). One early theory, called “correspondent infer-
ence theory,” argued that people would come up with
these explanations in a rational way. For example, if we
read a persuasive essay but then learned that the author
was assigned her position on the topic, we should refrain
from drawing any conclusions about her actual position.
However, research findings have demonstrated that people
make systematic errors in logical thinking. In a landmark
1967 study, participants actually ignored information about
whether authors had chosen their own position on the issue,
assuming instead that whatever they wrote reflected their
true opinions (Jones & Harris, 1967). In response to these data (and similar findings from
other studies), the theory was gradually revised to account for what was termed the “fun-
damental attribution error”—people tend to ignore situational influences and assume that
behavior reflects the person’s own disposition. These authors developed a theory, came
up with a specific hypothesis, and collected some empirical data to test it. But because the
data ran counter to the theory, the theory was ultimately revised to account for the empiri-
cal evidence. Theories of attribution continue to be refined to explain the way observers
make sense of people’s behavior.

Figure 1.3: The cycle
of science

Theory

Hypothesis

Empirical Data

Revised Theory

new85743_01_c01_001-062.indd 29 6/18/13 11:55 AM

CHAPTER 1Section 1.4 Hypotheses and Theories

Proof and Disproof

While we are on the subject of adjusting our theories, let’s take a look at the notions of
“proof” and “disproof.” Because science is a cumulative field, decisions about the validity
of a theory are ultimately made based on results of several studies from several research
laboratories. This means that a single research study has rather limited implications for an
overall theory. This also means that you, as a researcher, have to use the concepts of proof
and disproof in the correct way. We will elaborate on this as we move through the course,
but for now we can rely on two very simple rules:

1. If the data from one study are consistent with our hypothesis, we support the
hypothesis rather than “proving” it. In fact, we almost never prove a theory, but
our statistical tests can at least tell us how confident to be in our support.

2. If the data from one study are not consistent with our hypothesis, we fail to support
the hypothesis. As we will discuss throughout the course, many factors can cause a
study to fail; however, these often result from flaws in the design rather than flaws
in the overall theory.

Sources of Ideas

Where do all of these great ideas come from in the first place? Students are often nervous
about starting a career in research because they might not be able to come up with great
ideas to test. In reality, though, ideas are easy to come by, once you know where to look. In
this section, we offer a few tips and suggest handy sources for developing research ideas.

Real-World Problems
A great deal of research in psychology
and other social sciences is motivated
by a desire to understand—or even
solve—a problem in the world. This
process involves asking a big question
about some phenomenon and then
trying to think of answers based on
psychological mechanisms. For exam-
ple, according to the National Center
for Education Statistics, approximately
42 million Americans are unable to
read, and 20% of high school seniors
are unable to read when they graduate.
These statistics might lead you to think
about ways to improve reading instruc-
tion in the school system. And that
might lead you to the hypothesis that
individual tutoring will significantly
improve children’s reading skills.

Courtesy: CSU Archives/Everett Collection

Adolf Eichmann claimed he was just “following orders”
in his role as a Nazi Lieutenant Colonel of Holocaust
Logistics during World War II.

new85743_01_c01_001-062.indd 30 6/18/13 11:55 AM

CHAPTER 1Section 1.4 Hypotheses and Theories

In 1961, Adolf Eichmann was on trial in Jerusalem for his role in orchestrating the Holo-
caust. Eichmann’s repeated statements that he was only “following orders” caught the
attention of Stanley Milgram, a young social psychologist who had just earned a PhD
from Harvard University and who began to wonder about the limits of this phenomenon.
To understand the power of obedience, Milgram designed a well-known series of studies
that asked participants to help with a study of “punishment and learning.” The proto-
col required them to deliver “shocks” to another participant (actually an accomplice of
the researcher) every time he got an answer wrong. Milgram discovered that two thirds
of participants would obey the researcher’s commands to deliver dangerous levels of
shocks, even after the victim of these shocks appeared to lose consciousness. These results
revealed that all people have a frightening tendency to obey authority, even to the point
of violating their own conscience. We will return to this study in our later discussion of
ethics; you can read more about Milgram and his work on this website: http://explorable
.com/stanley-milgram-experiment.

To take one more example, you might notice that criminal-trial juries often seem to make
really poor decisions. This might lead you to wonder about the process of making deci-
sions in a group. And that might lead you to the hypothesis that juries are more interested
in getting along with the group than in finding the truth. The possibilities here are endless
but, as we discussed earlier, you must always be cautious when you design a research
project to solve a problem. Sometimes your desire to make a difference can bias your
interpretation of the data.

Reconciliation and Synthesis
New ideas can also spring from resolving conflicts between existing ideas. The process of
resolving an apparent conflict involves both reconciliation, or finding common ground
among the ideas, and synthesis, or merging all the pieces into a new explanation. In the
late 1980s, psychologists Jennifer Crocker and Brenda Major noticed an apparent conflict
in the prejudice literature. Based on everything then known about the development of
self-esteem, members of racial and ethnic minority groups would be expected to have
lower-than-average self-esteem because of the prejudice they faced. However, study after
study demonstrated that, in particular, African American college students had equivalent
or higher self-esteem than European American students. Crocker and Major offered a new
theory to resolve this conflict, suggesting that the existence of prejudice may sometimes
grant access to a number of “self-protective strategies.” For example, minority group
members can blame prejudice when they receive negative feedback, making the feedback
much less personal and therefore less damaging to self-esteem. The results of this synthe-
sis were published in a 1989 review paper that launched a vibrant new research area on
the targets of prejudice (Crocker & Major, 1989).

Learning From Failure
Kevin Dunbar, a professor at Dartmouth University, has spent much of his career study-
ing the research process. That is, he interviews scientists and sits in on lab meetings in
order to document how people actually do research in the trenches. In a 2010 interview
with the journalist Jonah Lehrer, Dunbar reported the shocking statistic that approxi-
mately 50% to 75% of research results are unexpected (some of these could have been null

new85743_01_c01_001-062.indd 31 6/18/13 11:55 AM

http://explorable.com/stanley-milgram-experiment

CHAPTER 1Section 1.4 Hypotheses and Theories
Research: Thinking Critically

Does 9 Just Sound Cheap?

By William Poundstone

We have all heard of calculating prodigies, those rare souls able to perform astounding feats with
numbers. For many of these individuals, numbers have colors, flavors, sounds, or other qualities
alien to the rest of us. Mental calculator Salo Finkelstein detested the number zero and adored 226.
The Russian mnemonist S. V. Shereshevskii associated the number 87 with a visual image of a fat
woman and a man twirling his mustache. This is known as synesthesia, the association of sensory
qualities with seemingly inappropriate objects. A recent study suggests that most people may have
a bit of number synesthesia. It might help explain the mysterious appeal of “charm” prices ending in
the digit 9—beloved by discounters everywhere.

At least since the 19th century, retailers have been using prices like 99 cents (rather than an even
$1.00) or $295 (rather than $300). There’s evidence that these prices induce shoppers to buy more
than the corresponding round prices do. There’s been a lot of debate among marketers, psycholo-
gists, and even cognitive scientists about why these prices trick people into buying something they
wouldn’t have bought at a round price that is hardly much higher. In fact, in some experiments, more
bought at a 9-ending price than at a price that was lower.

New research by Keith Coulter and Robin Coulter, published in The Journal of Consumer Research,
implies that certain numbers just sound bigger than others. This in turn can affect the perception of
discounts.

Coulter and Coulter begin by citing decades of research claiming that sounds pronounced with the
front of the mouth (long a, e, and i; fricatives like f, s, and z) trigger associations with smallness.
(Think of words like tiny and wee.) The vowels pronounced at the back of the mouth, like the “oo”
in foot or goose, are linked to largeness. (Think huge or crowds oohing and ahhing something really
big.) Crazy? Well consider how it applied to discounts in the study. Subjects were given “regular”
and “sale” prices and asked to estimate the percentage discount. The guesstimated discounts were
skewed by the sound effect. For instance, people estimated that a $3 product marked down to $2.33

(continued)

results due to lack of statistical power). Even though scientists plan their experiments
carefully and use established techniques, the data are often surprising. But even more
surprising was the tendency of most researchers to discard the data if they did not fit their
hypothesis. “These weren’t sloppy people,” Dunbar says. “They were working in some
of the finest labs in the world. But experiments rarely tell us what we think they’re going
to tell us. That’s the dirty secret of science.” The trick, then, is knowing what to do with
data that make a particular study seem like a failure (Lehrer, 2010).

The secret to turning failure into opportunity is twofold: First, question your assumptions
about why the study feels like a failure in the first place. Perhaps the data contradict your
hypothesis but can be explained by a new one. Or perhaps the data suggest a dramatic
shift in perspective. Second, seek new and diverse perspectives to help in interpreting
your results. Perhaps a cognitive psychologist can shed light on reactions to prejudice. Or
perhaps an anthropologist knows what to make of the surprising results of your aggres-
sion study. Some of the best and most fruitful research ideas have sprung from combining
perspectives from different disciplines. Sometimes, all that your strange dataset needs is
a fresh set of eyes.

new85743_01_c01_001-062.indd 32 6/18/13 11:55 AM

CHAPTER 1Section

1.5 Searching the Literature

Research: Thinking Critically (continued)
was about a 28% discount. But when the product was marked down to $2.22, the estimated saving
was only 24%. It was a bigger discount, really, but it didn’t seem that way. One explanation: Three,
with a long e, sounds small, and two, with a back-of-the-mouth vowel, sounds large.

That doesn’t prove the sounds were responsible. In one of the crucial experiments, Coulter and Coul-
ter tested perceptions of the prices $7.01 and $7.88 with English and Chinese speakers. In English
one is pronounced with the back of the mouth, and eight with the front. In Chinese, this is reversed.
So were the perceptions of how big or small discounts were. The researchers use this to argue that it
is indeed “phonetic symbolism” at work.

“Nine” has a long i, so it’s one of the small-sounding digits. Assuming the hypothesis is right, prices
ending in 9 would seem a little smaller than they would otherwise, enhancing the quick, largely
unconscious perception of a good deal. But 9 isn’t unique: It would seem that all the digits from 3
on up have a vowel or consonant sound supposedly associated with smallness. (Ironically, the truly
bigger digits sound small. Zero is a problematic case: The fricative z might put it in the small category,
but most people say “o” when reciting a phone number, and zeros at the end of a price aren’t pro-
nounced at all: $70 is “seventy dollars,” not “seven-zero dollars.”)

Obviously, retailers would want to charge the largest “small-sounding” price (the sound they care
about is ka-ching). From that perspective, the use of 9 makes sense.

This study adds more fuel to the debate about how 9-ending prices “work.” Coulter and Coulter
believe that shoppers must “rehearse” prices—say them to themselves, at least silently—for the
sounds to affect them. In the experiments, participants were told to repeat the sale prices to them-
selves. It’s not clear whether this would apply to silent reading of a fast-food menu. Still, the experi-
ment hints at what unexpected layers of meaning we may attach to simple numbers—including the
ones with dollar signs.

Poundstone, W. (2010, January 26). Does 9 just sounds cheap? The poetry of prices might trump the math. Psychology Today.
Retrieved from http://www.psychologytoday.com/node/375

Think about it:

1. What hypothesis are Coulter and Coulter trying to test? Try to state this as succinctly as possible.

2. How was “perception of discounts” operationalized in their studies?

3. How were the key variables measured?

4. How do Coulter and Coulter explain their findings? Are there other possible alternative
explanations?

5. Are these studies primarily aimed at description, explanation, prediction, or change? Explain.

1.5 Searching the Literature

Regardless of how you develop your hypothesis, an important step in the process is to connect it with what has been done before. Scientific knowledge accumulates one study at a time, so the best studies will build on earlier studies—by extending,
correcting, or contradicting them. And, on a practical note, it would be a waste of your
time to struggle over the best way to measure something when another researcher figured

new85743_01_c01_001-062.indd 33 6/18/13 11:55 AM

http://www.psychologytoday.com/node/37553

CHAPTER 1Section 1.5 Searching the Literature

it out 20 years ago. So, rather than reinvent the proverbial wheel, one of the first steps in
a research project is to consult published relevant articles. In this section, we will cover
the process of finding these articles, followed by an overview of how to read these articles
effectively.

Searching for Articles

Beginning a search for relevant research articles can seem like a daunting task, largely due
to the sheer number of available sources. Should you ask a librarian? Search Wikipedia?
Browse the Web? Fortunately, you can use a few tricks to make sure that your reference
sources are both objective and scholarly. First, it is important to understand the difference
between primary and secondary sources.

Primary sources contain full reports of a research study, including information on the
participants, the data collected, and the statistical analyses of these data. These types of
sources appear in professional academic journals and are evaluated by a set of experts
in the field before they are published—a process known as peer review. Thus, primary
sources are a reliable way to determine what has been done in a particular field.

Secondary sources, in contrast, consist only of summaries of primary sources. These types
of sources include textbooks, some academic books, and review articles in journals such as
Psychological Bulletin. As an analogy, think of the difference between telling your friends
about your adventurous weekend (primary source) and one of your friends repeating
the story to her roommate (secondary source). While some secondary sources undergo a
process of review and evaluation (academic books), others do not (e.g., websites, friends
retelling stories).

In this day and age, people are becoming more and more comfortable searching for infor-
mation via the Internet. Thus, it is particularly important to point out that websites are
often not objective in their summaries of research. The vaccine/autism scare discussed
at the beginning of the chapter is a great example of this point. If you search in Google
for the terms vaccine and autism, you will get more than 4 million hits, sorted in order of
popularity. As of summer 2011, the top hit was a summary by the Centers for Disease
Control, arguing in favor of vaccines. At another time, the top hit might have been Jenny
McCarthy’s website arguing that vaccines gave her child autism. In January 2013, it was
reported that the federal Vaccine Injury Compensation Program had awarded millions of
dollars to compensate children who developed autism after vaccination, confusing the
matter even more (Kirby, 2013). And this came after news of a recent study that found no
link between currently recommended vaccines and autism. The bottom line is that search
results in Google (or other Internet search engines) are not peer reviewed, are not listed in
order of reliability, but are customized to your browsing history, confirming your biases.
As a result, a Google search is a poor choice when it comes to finding trustworthy infor-
mation about academic research.

Another popular but untrustworthy source of information is Wikipedia. Wikipedia is a
tempting resource, given its marketing as a “free online encyclopedia.” But unlike more
authoritative or printed encyclopedias, Wikipedia can be edited by anyone with access to
the Internet. On the upside, this means that errors can be identified and corrected at any
time. On the downside, this means that errors can be made—either accidentally or

new85743_01_c01_001-062.indd 34 6/18/13 11:55 AM

CHAPTER 1Section 1.5 Searching the Literature

deliberately—at any time. The upshot is that there is no way to be sure that you are draw-
ing information from a page at a time when it sticks to the facts, and the content is always
evolving.

So what’s a researcher to do? Fortunately, there
are two reliable ways to access primary sources
(research articles), which allow you to draw your
own conclusions based on the patterns of data.
First, Google Scholar (http://scholar.google.com)
is a free resource that is managed by Google and
that works exactly like Google but is limited to
peer-reviewed academic articles. Thus, Google
Scholar provides one pipeline to access primary
sources. Second, many university libraries have
access to centralized databases of peer-reviewed
articles. The best-known database for psychol-
ogy articles is PsycINFO; this database contains
abstracts and citations for articles in psychology
and related fields, maintained by the American
Psychological Association. PsycINFO is updated
monthly and covers approximately 2,500 differ-
ent primary-source academic journals.

Searching in PsycINFO (or Google Scholar) is
as easy as typing key terms into a text box—
sometimes labeled “Find” or “Keywords.” But,
that said, the process of choosing the best key
words for your particular search can be a com-
plex process. If your search terms are too general,
the search might yield too many hits to be useful.

If your search terms are too specific, the search might yield only one or two articles and
fail to fully represent prior studies.

As an example, the following list of numbers represents different combinations of search
terms related to the topic of self-esteem:

“self-esteem” (in all fields) 35,847 hits

“self-esteem” (title only; peer reviewed) 4,977 hits

It’s clear we need to narrow the field a bit—you have better things to do than review
almost 5,000 abstracts! What aspect of self-esteem do we find most interesting? Perhaps
we want to learn more about self-esteem and sexual behavior?

“self-esteem” and “condom use” 2 hits

It seems we may have overdone the limits—two articles may not be very helpful in giving
you a sense of previous research. So let’s try one more combination, using a more general
search term:

“self-esteem” and “sexual behavior” 133 hits

University libraries provide students access
to hard copies and digital copies of relevant
research articles.

new85743_01_c01_001-062.indd 35 6/18/13 11:56 AM

http://scholar.google.com

CHAPTER 1Section 1.5 Searching the Literature

This number is a bit more manageable; we could tinker a bit more, but it no longer seems
overwhelming to skim through the search results and find the most useful articles. No
two searches will be the same, so the real take-home point is to try several combinations
of search terms in order to strike a balance in your number of results.

Reading Research Articles

Now that you have assembled a collection of research articles relevant to your hypothesis,
the next step is to read them. This may sound painfully obvious, but psychological journal
articles are written in a very formulaic way, which can be confusing at first glance. How-
ever, once you know what to look for, the format ultimately makes these articles easy to
read (and easy to write). As a matter of fact, the format of a journal article is designed to
follow the steps of the scientific method, with a section devoted to each of the four steps—
hypothesize, operationalize, measure, and explain. In this section, we examine each part
of a journal article to give you a sense of what to expect of each one. This overview is
based on a fantastic article by Jordan and Zanna (1999); the goal of both is to let you appre-
ciate the stories without getting bogged down in the details.

The Title and the

Abstract

At the top of every journal article (as well as in the search results in PsycINFO), you will
see both the title and an abstract, or a short summary of the article. While neither of these
is a section per se, both provide you with a valuable first impression of the contents of
the article. If your search query results in a large number of hits, you can usually scan the
titles to determine which ones are most likely to be useful. For example, if your research
question concerns the links between depression and alcohol consumption among college
students, you might search a database for the terms “alcohol” and “depression.” Most of
the results are likely to be relevant and useful, but you could most likely skip ones with a
title like “Fetal Alcohol Syndrome and Postpartum Depression,” since it is likely to focus
on a different population.

Once you narrow the list to the most useful titles, the abstract provides additional
information about the content of the article. A journal article abstract follows a stan-
dard formula of stating the objectives of the study, followed by information on the
methodology, results, and conclusions. Generally, an abstract has to fit all of this infor-
mation in about 150 words; as a result, it provides a nice concise summary that is worth
reading carefully.

The Introduction
The first main section of a journal article is the introduction, corresponding to the first step
(i.e., hypothesize) of our four-step research process. As the name implies, the goal of this
section is to introduce the research question, review background research, and state the
hypothesis that was investigated. When you are diving into a new research area for the
first time, it is a good idea to read the entire introduction carefully. This section provides
the context for the rest of the paper, as well as a valuable introduction to previous work
in the area.

new85743_01_c01_001-062.indd 36 6/18/13 11:56 AM

CHAPTER 1Section 1.5 Searching the Literature

The Method Section
The second main section of a journal article is the method section, corresponding to the
second step (i.e., operationalize) of our four-step research process. The goal of this section
is to explain how the hypothesis was translated into a set of specific measurable variables
and how the researchers gathered data to test their hypothesis. An additional—perhaps
even more important—goal of this section is to provide enough detail about the study that
someone could read the article and repeat the study.

The method section is typically divided into three parts: The participants section describes
the people who provided data for the study, including information about their age, gen-
der, and other relevant information. For example, in a study on treating depression, the
authors would specify whether the participants were “normal” college students or patients
who had been hospitalized for treatment of severe, clinical levels of depression. The mate-
rials section describes any questionnaires or equipment used in the study, including both
standardized measures and ones that the researchers created. The third and related sec-
tion, procedure, provides all of the details regarding the execution of the experiment. What
did participants experience, and in what order? If specific instructions were given before
a task, what were they?

The materials and procedure sections are crucial for two reasons. First, they provide the
necessary detail for someone else to recreate the study. In reading these sections, you
should focus on understanding the key variables and how they were defined. Second,
they allow readers to envision the study from the perspective of the participants and to
decide whether the authors’ interpretation of the results is the only one. For example,
the authors might claim that participants were placed under stress and that the results
showed a drop in concentration because of the stress. But, in reading over the procedure
section, the “stress” part of the study might seem more likely to invoke boredom. This
would give you an idea for a follow-up study: Perhaps people actually lose concentration
when they are bored. . . .

The Results Section
The third main section of a journal article is the results section, corresponding to the third
step (i.e., measure) of our four-step research process. The goal of this section is to describe
how the data were analyzed and to report the results of these analyses. The results section
consists primarily of statistical analyses and, as Jordan and Zanna put it, “statistics can be
intimidating” (1999, p. 356). When you first start to read journal articles, the statistics can
indeed seem overwhelming, but there are two reasons not to get discouraged. First, statis-
tical results are always followed by a translation into plain English and almost always by
tables and graphs of the data. As we move through this course, you will have the opportu-
nity to practice interpreting results in both statistical and graphical form. And this brings
us to the second reason: You will be surprised to learn how quickly the statistics stop being
intimidating. The more you read journal articles and place them in the context of your
own ideas, the more you become comfortable with interpreting statistical analyses. In fact,
as you become savvier with interpreting statistics, you may be surprised by how often
authors make mistakes in either their analyses or their interpretations of them!

new85743_01_c01_001-062.indd 37 6/18/13 11:56 AM

CHAPTER 1Section 1.5 Searching the Literature

The Discussion Section
The fourth and final section of a journal article is the discussion section, corresponding to
the fourth (i.e., explain) step of our four-step research process. The goal of this section is to
summarize the main findings and provide an evaluation of the hypothesis. Thus, the first
few paragraphs of the discussion are often a great summary of the entire article. Authors
state whether their predictions were confirmed and speculate on the meaning of the find-
ings. If some of the predictions were not confirmed, authors suggest explanations for this
and either acknowledge or defend potential flaws in the study. In addition, to encourage
others to follow up on the study, authors tie their findings into those of previous literature
and make suggestions for future research.

Evaluating Articles
So, in sum, a journal article will follow a predictable structure: Authors first describe the
problem and state their hypothesis (introduction), then explain their approach to test-
ing the hypothesis (method), then report the findings
of this test (results), and finally discuss the meaning of
these findings relative to the hypothesis (discussion).
These four sections are often described as following an
hourglass structure—that is, the paper starts broadly
in the introduction, narrows to the specific details of
the study, and ends broadly in the discussion by tying
everything back into the overall problem (e.g., Bem,
1987). This structure is shown in Figure 1.4.

Before we move on, let’s review some general guide-
lines for evaluating journal articles. After reading the
paper in its entirety, the following five questions can
be helpful in forming an overall evaluation of what
you’ve read.

1. What am I being asked to believe? What is the
author’s main argument? Before critiquing in
detail, make sure you have mastered the argu-
ment and can summarize it in a few sentences.

2. What evidence supports this claim? How does the author support the main
argument? If it is an empirical paper, look to the data; if it is a theoretical paper,
look at the literature the author summarizes.

3. Are there alternative explanations? Be creative here. Based on your reading of
the article, what else seems plausible? But, to make your critique a good one, you
should be able to test it.

4. What additional evidence would help us test alternatives? This question is one
of the keys to doing good science. Once you identify something wrong with the
original study, how can you test your alternative?

5. What conclusions are reasonable? Return to step 1 with your critiques in mind.
What should the author reasonably conclude, given the problems with the study?

Figure 1.4: Structure of
journal articles

Results

Discussion

Introduction

Method

new85743_01_c01_001-062.indd 38 6/18/13 11:56 AM

CHAPTER 1Section

1.6 Writing a Research Proposal

After reviewing the literature and putting considerable thought into planning a study, the next step is to prepare a research proposal. The goal of any research proposal is to present a detailed description about the research problem and the
methods with which you think that the research should be conducted. Research proposals
are extremely important because they are key to unlocking the research project (Leedy &
Ormrod, 2010). They may determine whether you receive approval or funding, so they
need to clearly articulate the purpose of the research and persuade the audience it is
worthwhile. If research proposals do not clearly and specifically define the research prob-
lem and methods, the project might not be accepted. Therefore, it is imperative that the
research proposal include “a clearly conceived goal and thorough, objective evaluation of
all aspects of the research endeavor” (Leedy & Ormrod, 2010, p. 117).

Research proposals can range from three pages for some grant applications to more than
30 pages (e.g., for a dissertation or federal grant). They may or may not require an abstract
and will have a different format for institutional review board (IRB) approval (see Section
1.7, Ethics in Research). For our purposes, in general, research proposals follow a standard
format. The following is an example you might use:

1. Title/Cover Page
2. Abstract
3. Introduction or Statement of the Problem

a. The research problem
b. The statement of the problem and possible subproblems
c. The purpose statement
d. Hypotheses and/or research questions
e. Independent and dependent variables
f. The assumptions
g. The importance of the study

4. Review of the Literature
5. Method

a. Research methodology
b. Participants and participant selection
c. Data collection procedures
d. Data analysis techniques

6. Discussion
a. Strengths and limitations
b. Ethical considerations

7. References
8. Appendixes

Research proposals are written like research articles in APA style, which is favored in
academia. The language must be clear and precise, in paragraph format, and written in a

new85743_01_c01_001-062.indd 39 6/18/13 11:56 AM

CHAPTER 1Section 1.6 Writing a Research Proposal

professional, academic manner. Unlike stories or memoirs, proposals are not intended to
be creative literary works; rather, they should set down certain facts. Organized with head-
ings and subheadings, the proposal should clearly and specifically explain the research
problem, who the participants will be and how they will be selected, what data collec-
tion methods will be used, and how the data will be analyzed and interpreted. Research
proposals are required for all theses and dissertations. If you are currently working on a
master’s thesis or doctoral dissertation, your university or committee chair may have a
specific format for you to follow that may differ slightly from the format presented in this
book. An example of an APA formatted proposal is provided in Appendix A.

Formatting the Research Proposal

As mentioned previously, research proposals are written in APA style and follow an orga-
nized format. Although there are different ways to format a proposal, most follow a simi-
lar format to the one that is discussed in this book. The following sections will discuss the
specifics of formatting of your proposal as well as the content that should be included
within each section.

Headings and Subheadings
Writing a proposal in APA style may seem complicated at first; however, the format is
similar to a research paper or any academic paper that is required to be written in APA
style. APA style uses a unique heading and subheading system that separates and classi-
fies sections of research papers. The Publication Manual of the American Psychological Asso-
ciation Sixth Edition (2010) utilizes five heading levels; although all heading levels may not
be used, it is important to follow them in sequential order:

• Level 1: Centered, Boldface, Uppercase and Lowercase Heading
• Level 2: Left-aligned, Boldface, Uppercase and Lowercase Heading
• Level 3: Indented five spaces, boldface, lowercase heading with a period.

For Level 3 headings, the body text begins after the period.
• Level 4: Indented five spaces, boldface, italicized, lowercase heading with

a period. For Level 4 headings, the body text begins after the period.
• Level 5: Indented five spaces, italicized, lowercase heading with a period. For Level

5 headings, the body text begins after the period.

Section headings such as Review of the Literature, Methods, and so forth, are Level 1
headings. Subsection headings such as Participants, Data Collection, and so on, that fol-
low under the section heading Methods, for example, are Level 2 headings. Subsections
of subsection headings are Level 3 through Level 5. The following is an example of the
various heading levels you might use in your research proposal:

new85743_01_c01_001-062.indd 40 6/18/13 11:56 AM

CHAPTER 1Section 1.6 Writing a Research Proposal

Introduction (Level 1)

The Research Problem (Level 2)
Purpose of the Study (Level 2)
Hypotheses and/or Research Questions (Level 2)
Independent and Dependent Variables (Level 2)
Assumptions (Level 2)
Importance of the Study (Level 2)

Review of the Literature (Level 1)

The Cognitive Profile of Learning Disabilities in Reading (Level 2)
The Cognitive Profile of Attention Deficit/Hyperactivity Disorder (Level 2)

Method (Level 1)

Research Methodology (Level 2)
Participants (Level 2)
Data Collection (Level 2)

Instrumentation. (Level 3)
WISC-IV. (Level 4)
WISC-IV PI. (Level 4)

Data Analysis (Level 2)

Discussion (Level 1)

Strengths and Limitations (Level 2)
Ethical Considerations (Level 2)

References (Level 1)

Appendix (Level 1)

An important guideline to remember is that you should be consistent in your use of head-
ing levels throughout the research proposal. Thus, all headings with equal importance
should follow the same heading level.

The Title Page
A title page is required for all research proposals as its first page. In general, title pages
include a running head with the page number, as well as the abbreviated title of the paper,
the student’s name, and the university or institution name. Although some universities
may have specific requirements regarding how the title page is formatted, the following is
formatted according to APA style:

Running head: PREMORBID COGNITIVE ABILITIES 1

Estimation of Premorbid Cognitive Abilities in

Children with Traumatic Brain Injury

Graduate Student

Research University

new85743_01_c01_001-062.indd 41 6/18/13 11:56 AM

CHAPTER 1Section 1.6 Writing a Research Proposal

The running head is a shortened version of the full title and is included in the top margin
of the page. The running head is set flush left with the abbreviated title in all capital let-
ters. On the same line of the running head, the page number is set flush right. The title of
the paper, the student’s name, and the university affiliation are centered approximately
in the middle of the page and formatted in uppercase and lowercase letters. It is recom-
mended that titles include no more than 12 words.

The Abstract Page
The abstract page is page two of your paper. An abstract is a summary of your proposal
and should include the research problem, the participants, data collection methods, and
any hypotheses or research questions. Abstracts for research proposals are generally
between 150 and 250 words in length.

The abstract should contain your running head title from the title page as well as the page
number. As shown in the example, the first word of the abstract is not indented. Thus,
the entire abstract is set flush left. Please keep in mind that the title “running head” is
dropped after page one and only the abbreviated title and page number are included, as
shown below:

PREMORBID COGNITIVE ABILITIES 2

Abstract

The present study will review currently available methods for estimat-
ing premorbid intellectual abilities in children. It examines the poten-
tial of the Wechsler Intelligence Scale for Children–Fourth Edition (WISC–IV;
Wechsler, 2003) as an estimate of premorbid IQ in children with traumatic
brain injury (TBI). Archival data will be obtained from a sample of 2,200
children aged 6:0–16:11 who participated in the standardization phase of
the WISC–IV and 43 children aged 6:0–16:11 with a history of moderate
or severe TBI who participated in a WISC–IV special group study. First,
demographic variables including sex, ethnicity, parent education level, and
geographic region will be entered into a regression analysis to determine
a demographic-based premorbid prediction equation for the WISC–IV Full
Scale Intelligence Quotient (FSIQ). Second, a logistic regression analysis
will be used to investigate which WISC–IV subtest–scaled scores improve
the differential diagnosis of TBI versus a matched control group. Third,
analysis of variance (ANOVA) will be used to examine which subtests
yielded the lowest mean scores for the TBI group. It is expected that paren-
tal education will be the strongest predictor of premorbid IQ and that indi-
viduals with TBI will have lower scores on Processing Speed and Working
Memory indices.

The Introduction Section
The Introduction section begins on page three of your proposal. The primary purpose of the
Introduction section is to introduce the reader to the nature of the study by including nec-
essary background that describes and supports your research problem. The introduction

new85743_01_c01_001-062.indd 42 6/18/13 11:56 AM

CHAPTER 1Section 1.6 Writing a Research Proposal

generally includes a statement of the research problem, any potential subproblems, the
purpose statement, hypotheses and/or research questions, identification of the variables,
assumptions of the study, and importance of the study. The introduction typically begins
with a statement of the research problem area and is followed by a justification for your
proposed study. Only research needed to explain the purpose of or need for your study
should be included in this section.

As discussed previously, the purpose statement should include the focus, population,
and methodology of the study. Depending upon whether your research is quantitative or
qualitative, you will want to include your hypotheses and/or research questions next and
discuss how your hypotheses and/or research questions relate to your research problem
and purpose statement. You should next review the key independent and dependent vari-
ables, followed by a discussion of the assumptions you will make about the research and
how the research will be expected to contribute to the field.

The length of the introduction can vary based on your university, committee chair, or
instructor’s requirements. In general, the introduction section ranges anywhere from 3 to
5 pages to 15 to 25 pages. The more detailed information you include in your proposal, the
closer you will be to completing your thesis or dissertation.

The Literature Review Section
The primary purpose of the literature review is to provide theoretical perspectives and
previous research findings on the research problem you have selected (Leedy & Ormrod,
2010). As a researcher, you should investigate your topic extremely well so that you have
a thorough understanding about the research problem area. Thus, your literature review
should contain both breadth and depth, and clarity and rigor, in order to support the need
for your research to be conducted. Any reader of your literature review should be able
to comprehend the importance of your research problem and the difference the research
will make to the field. Keep in mind that a literature review is not simply a collection
of summaries, abstracts, or annotated bibliographies but rather a thorough analysis and
synthesized review of the research and how each piece of research builds upon the other.

According to Levy and Ellis (2006), a literature review should go through the following
steps: (a) methodologically analyze and synthesize quality literature, (b) provide a firm
foundation to a research topic, (c) provide a firm foundation to the selection of research
methodology, and (d) demonstrate that the proposed research contributes something new
to the overall body of knowledge or advances the research field’s knowledge base (p. 182).
Remember: Your literature review should provide a theoretical foundation and justifica-
tion for your proposed study.

A good literature review does not simply report the literature but evaluates, organizes, and
synthesizes it (Leedy & Ormrod, 2010). When reading and reviewing existing literature,
it is important to critically evaluate what has already been done and what the findings
showed. Do not just take what the authors say at face value; instead, evaluate whether
the findings support the methods that were used and the analyses that were conducted.

In addition to evaluating the literature, you must organize it. This means grouping the
literature according to your subproblem areas, research questions, or variables being
assessed. For example, if conducting a study on the demographic predictors of special

new85743_01_c01_001-062.indd 43 6/18/13 11:56 AM

CHAPTER 1Section 1.6 Writing a Research Proposal

education, you would want to group your literature based on the various demographic
variables and the influences that they may have on placement in special education. Finally
and most importantly, you must synthesize the diverse perspectives and research results
you’ve read into a cohesive whole (Leedy & Ormrod, 2010). Leedy and Ormrod (2010)
discuss several approaches to synthesizing information, including the following:

• comparing and contrasting the literature
• showing how the literature has changed over time
• identifying trends or similarities in research findings
• identifying discrepancies or contradictions in research findings
• locating similar themes across the literature

The following example shows a paragraph synthesizing the literature. Note that the
review does not include summaries of the articles but rather displays similarities found
in the research:

Several studies have examined the relationship between demographic vari-
ables and cognitive functioning. Research has shown that demographic
variables such as socioeconomic status and education level are closely
related to scores on cognitive tests and contribute significantly to variance
in IQ scores (Crawford, 1992; Kaufman, 1990). Utilizing this close relation-
ship, Wilson et al. (1978) developed the first regression equation to pre-
dict premorbid IQ using the WAIS standardization sample. The equation
included age, sex, race, education, and occupation and accounted for 53%
of the variance in the Verbal IQ, 42% of the variance in the Performance IQ,
and 54% of the variance in the Full Scale IQ. Cross-validation studies have
confirmed the Wilson et al. equation to be a useful predictor of premorbid
IQ. The equation has been used to predict outcome from closed head injury
(Williams, Gomes, Drudge, & Kessler, 1984), to estimate British WAIS
scores (Crawford, Stewart et al., 1989), and to estimate premorbid func-
tioning among healthy adults (Goldstein, Gary, and Levin, 1986). Although
the use and application of Wilson’s formula has tended to overpredict high
scores and underpredict low scores, the formula appears to provide ade-
quate predictions for those within the average range of functioning.

An example of a compare-and-contrast synthesized review would look like the following:

As with all regression-based methods, a number of limitations are present
in the use of demographic-based prediction models. As Karzmark, Heaton,
Grant, and Matthews (1985) found in their use of the Wilson et al. formula
to predict WAIS IQ scores, demographic equations tend to overestimate
and underestimate IQ scores for individuals who are one standard devia-
tion or more from the population mean. Research has shown strong cor-
relations between specific demographic variables and measured IQ scores,
but Bolter, Gouvier, Veneklasen, and Long (1982) found the Wilson et al.
equation to be limited in its ability to predict groups of head injured indi-
viduals and controls.

new85743_01_c01_001-062.indd 44 6/18/13 11:56 AM

CHAPTER 1Section 1.6 Writing a Research Proposal

On the other hand, Wilson, Rosenbaum, and Brown (1979) compared the
hold method of the Deterioration Index developed by Wechsler in 1958
against Wilson’s 1978 demographic equation and found the Wilson et
al. formula to have a 73% accuracy of classification, while the Wechsler
method resulted in only 62% accuracy. Although the demographic-based
method may have mixed results at an individual level, cross-validation
studies have shown them to do an adequate job of predicting mean IQ
scores at the group level (Vanderploeg, 1994).

Remember that writing a literature review takes time and organization. It is important
that you thoroughly review the relevant literature you uncovered in your key term search.
This can be a painstaking endeavor, but the search should not conclude until you are rea-
sonably sure you have researched all the critical viewpoints of your research problem. It
is also helpful to develop an outline of topics you plan on addressing.

Finally, note that a good literature review is not plagiarized or copied and pasted from
other sources, as the Internet makes so tempting. When reviewing literature, be sure you
summarize the information in your own words and give credit where credit is due. It is
sometimes helpful to read the literature and then develop summaries of the articles in
your own words. You can then use these summaries to develop your literature review.
Keep in mind that your literature review is a working draft that will be modified and per-
fected throughout the research process.

The Method Section
The method section includes a detailed description of the method of inquiry (quantitative,
qualitative, or mixed design approach); research methodology used; the sample; data col-
lection procedures; and data analysis techniques. The key purpose of the method section
is to discuss your design and the specific steps and procedures you plan to follow in order
to complete your study. A detailed description of methods is essential in any research
proposal because it allows others to examine the efficacy of the study as well as replicate
it in the future.

Research Methodology

This section discusses whether quantitative, qualitative, or a mixed design approach was
used and the rationale for choosing this method of inquiry. It also includes specific infor-
mation on the selected research methodology. For example, will your study be utilizing
experimental methods, quasi-experimental methods, or observational methods? And
what is the purpose for selecting that method or methods? Remember that you should
be making an argument and justifying the type of research methodology you plan to use,
regardless of the type of inquiry.

Participants

The participant section describes the population of interest and the sample that will be
used. In quantitative studies, the sample is intended to represent the larger population
and tends to be larger in size than for qualitative studies. In qualitative studies, the sample
may be a small number of participants or even only one participant and is not intended
to represent the larger population. In both quantitative and qualitative studies, this sec-
tion should discuss the sample in detail: the population you want to learn about; where

new85743_01_c01_001-062.indd 45 6/18/13 11:56 AM

CHAPTER 1Section 1.6 Writing a Research Proposal

participants will be recruited or studied; how the participants will be notified about the
study; how the participants will be selected (e.g., what type of sampling method will be
used, such as random sampling, snowball sampling, etc.); what criteria will be required
for inclusion in the study (e.g., age, level of education obtained, marital status, employ-
ment position); and the overall proposed size of the sample. For quantitative studies,
when discussing the sample, it is also important to include which demographic informa-
tion (e.g., age, gender, ethnicity, level of education, socioeconomic status) you will need to
create a representative sample of the entire population. A representative sample ensures
that the results can be generalized to the entire population as a whole.

Data Collection Procedures

The data collection section describes how the data will be collected, step by step. This sec-
tion should detail how informed consent will be obtained from the participants, when the
data will be collected and for how long, and what methods or measures will be used to
collect the data. Remember: Providing detailed information is crucial to ensure that oth-
ers can follow your study and replicate it in the future. Thus, this section should include
a step-by-step description of each of the procedures you will follow to carry out the data
collection. Describe the data collection forms you will use, as well as any survey, research,
or testing instruments you may use or develop to collect the data, and the rationale for
utilizing such procedures. Copies of any forms or instruments used should be included in
the Appendix section of your research proposal.

Data Analysis

The data analysis section includes a brief step-by-step description of how the data will be
analyzed as well as what statistical methods or other methods of analysis and software
will be utilized. If you are doing quantitative method research, you will want to discuss
how the data will be entered into a statistical software program, how the data will be kept
confidential, and what statistical analyses will be run. If using qualitative methods, you
will want to discuss the type of qualitative method used, the interview type, interview
questions, sample type (e.g., random, convenience), how the data will be reviewed (e.g.,
how interviews or observations will be reviewed or transcribed), and how the data will
be coded.

The Discussion Section
As emphasized throughout this chapter, one of the most important characteristics of a
research proposal is to make a strong case for or justify the need to study your research
problem. In doing so, you will want to discuss the strengths of your research study as well
as any limitations and ethical issues that will need to be considered. It should be noted that
some universities require this information to be included in the Method section. In those cases, you
would include strengths, limitations, and ethical considerations after the Data Analysis heading
in the Method section.

Strengths and Limitations

This section is fairly straightforward. It should discuss the implications for future research,
practice, and theory as well as any potential limitations that might impact the research
process or results. Some limitations may include difficulty in obtaining participants, dif-
ficulty in obtaining a representative sample, or time and financial constraints.

new85743_01_c01_001-062.indd 46 6/18/13 11:56 AM

CHAPTER 1Section 1.6 Writing a Research Proposal

Ethical Considerations

This section should include any potential issues that might be considered ethical dilem-
mas. For example, if studying minors, how will you obtain consent and ensure con-
fidentiality? If studying certain employees, how will you keep information from their
supervisors? Or if your study may trigger emotional trauma, such as memories about
abuse, how will you reduce any stress or negative feelings that occur during the study?

The References Section
This section should include all references that were cited within your proposal in alpha-
betical order and using APA style. Only references used within your proposal should be
included on the References page; conversely, there should be no references listed on the
References page that were not cited in your proposal.

It is important to list all references in correct APA format. The following examples show
how to correctly cite journal articles, websites, and books according to the APA Publication
Manual Sixth Edition:

Example of a journal article with the document ID number included:

Brownlie, D. (2007). Toward effective poster presentations: An annotated bibliography.
European Journal of Marketing, 41, 1245–1283. doi:10.1108/030905607108211

Example of a journal article with no document ID assigned to it:

Kenneth, I. A. (2000). A Buddhist response to the nature of human rights. Journal of Bud-
dhist Ethics, 8. Retrieved from http://www.cac.psu.edu/jbe/twocont.html

Example of a print (or hardcopy) journal article:

Harlow, H. F. (1983). Fundamentals for preparing psychology journal articles. Journal of
Comparative and Physiological Psychology, 55, 893–896.

Example of a textbook:

Calfee, R. C., & Valencia, R. R. (1991). APA guide to preparing manuscripts for journal publi-
cation. Washington, DC: American Psychological Association.

Example of a chapter in a textbook:

O’Neil, J. M., & Egan, J. (1992). Men’s and women’s gender role journeys: A metaphor
for healing, transition, and transformation. In B. R. Wainrib (Ed.), Gender issues
across the life cycle (pp. 107–123). New York, NY: Springer.

Example of a website:

Keys, J. P. (1997). Research design in occupational education. Retrieved from
http://www.okstate.edu

new85743_01_c01_001-062.indd 47 6/18/13 11:56 AM

http://www.cac.psu.edu/jbe/twocont.html

http://www.okstate.edu

CHAPTER 1Section

1.7 Ethics in Research

The Appendix Section
The Appendix section should include a copy of any forms that will be used during your
research. These include consent forms, instructions for participants, and any additional
tables or figures that might supplement study information but not provide additional
data (e.g., a table of subtests included within an instrument you plan to use).

1.7 Ethics in Research

In the summer of 1971, psychologist Phillip Zimbardo conducted an experiment at Stanford University to test the power of social roles. Zimbardo hypothesized that peo-ple would take on the characteristics and behaviors of whatever role was assigned to
them, and he tested this hypothesis by creating a simulated prison in the basement of the
psychology building. A group of 24 psychologically healthy young men were selected
from the San Francisco Bay Area and randomly assigned to play the role of either “pris-
oner” or “guard.” Zimbardo appointed himself the role of “warden.” The researchers
gave each participant pieces of a uniform meant to reinforce their role—smocks for the
prisoners, khakis and mirrored sunglasses for the guards. Almost immediately, and with-
out instructions from the researchers, participants began to act out their roles. The guards
took it upon themselves to establish control and dominate the prisoners by withholding
privileges and devising clever ways to humiliate them. The prisoners, in turn, accepted all
of this without much protest since it was part of their prisoner role. The experiment was
scheduled to run for 14 days but was stopped after only 6 because things had gotten out
of hand—prisoners were going on hunger strikes and being locked in solitary confine-
ment, and one even suffered a serious mental breakdown (Zimbardo, 2013). This study is
known as the Stanford Prison Experiment; you can learn more about it and view video
clips on a website designed by Zimbardo and his colleagues: http://www.prisonexp.org/.

If this experiment reminds you of
the real-life prisoner abuse at Abu
Ghraib prison, you’re not alone.
Zimbardo was even called to testify
about the power of social roles dur-
ing the trial of one of the Abu Ghraib
guards. If this experiment strikes
you as ethically dubious, you are not
alone. When the research was pub-
lished, it raised serious questions
about the amount of distress that can
be inflicted in the name of research.
Although the proposal for this study
was approved under ethics stan-
dards of the time, it could not be run
under today’s more stringent stan-
dards. But how do we balance the
distress of the “prisoners” with the
valuable knowledge gained from the
study? Should the Stanford Prison

Associated Press

Psychologist Philip Zimbardo conducted an experiment on
the power of social roles that raised ethical concerns in the
scientific community about how research is conducted.

new85743_01_c01_001-062.indd 48 6/18/13 11:56 AM

http://www.prisonexp.org/

CHAPTER 1Section 1.7 Ethics in Research

Experiment ever have been run? Does the knowledge outweigh the distress? Before we
move on to the nuts and bolts of research designs in the next four chapters, it is important
to spend some time on the ethics of conducting research.

At the most basic level, all deliberations about the ethics of a particular study come down
to the balance between avoiding all unnecessary discomfort for participants and creating
a realistic situation that will provide a valid test of the hypothesis. But in practice, achiev-
ing this balance can be complicated. In this section, we first examine an overview of some
of the potential threats to participants’ well-being and then discuss how avoidance of
these threats has been formalized into rules for researchers to follow. Finally, we evaluate
a set of ethical dilemmas that represents the types of issues likely to arise in psychological
studies.

Threats to Participants

To help you appreciate the need for ethical guidelines, this section introduces some of the
possible threats to participants’ welfare in the context of research studies.

Physical Harm
Let’s start with a form of extreme threat: Sometimes a research paradigm, or worldview,
can place participants at risk for physical harm. For the most part, these types of studies
are limited to the medical field. For example, if you are testing a new medication for heart
attack survivors, there is a risk that an unexpected side effect could hasten the death of
participants. Or, perhaps the participants could have benefited more from another, more
established medication, but they were not taking it because they were participating in
your study. Because of these risks, medical researchers are required to perform prelimi-
nary testing—often using cell cultures and then animals—before administering drugs to a
clinical population. Occasionally, psychological research can pose a physical threat to par-
ticipants, albeit a more minor one. For the past 25 years, Sheldon Cohen has been conduct-
ing studies in which he exposes participants to the common cold virus and measures their
cold symptoms for several days. This work is designed to explore the link between indi-
viduals’ social environment and their susceptibility to illness; you can read more about it
on Cohen’s website: http://www.psy.cmu.edu/~scohen/.

Extreme Stress
More common among psychological studies are those that introduce high levels of men-
tal or emotional stress for participants. As we will discuss later, the key to evaluating
whether a stressful research paradigm is ethical is to think about whether—and to what
extent—it exceeds the stress that participants encounter in everyday life. In the Stanford
Prison Experiment, it is easy to see how stress experienced by the “prisoners” would
exceed normal levels. In 1924, Carney Landis conducted the first studies of facial expres-
sion and emotion. His goal was to map specific emotional states to specific expressions—
work that is now associated with Paul Ekman (and popularized by the television show
Lie to Me). Landis photographed his participants as they reacted to a variety of stimuli,
such as smelling ammonia and viewing pornography. But the most shocking and contro-
versial task was the final one. To measure responses to “disgust,” Landis asked his partici-
pants to either decapitate a live rat (a task they lacked the training to perform humanely)

new85743_01_c01_001-062.indd 49 6/18/13 11:56 AM

http://www.psy.cmu.edu/~scohen/

CHAPTER 1Section 1.7 Ethics in Research

or watch Landis behead the rat. In this case, the discomfort could not even be balanced
by the knowledge gained from it; Landis found no support for his hypotheses regarding
common facial expressions. Of course, this study is beyond anything deemed ethically
acceptable by today’s standards.

In reality, most research (and particularly, psychological research) presents a much more
minor degree of stress to participants. For example, in some of my own research, I observe
college students’ reactions as they are asked to prepare and give a speech. Most people
become anxious at the thought of public speaking, but this anxiety is mild and very much
temporary. In fact, among studies that receive approval from institutional review boards
(IRBs), the effects of the research on overall well-being are likely to be mild and temporary.

Deception
Finally, at the low end of the threat spectrum, many psychological studies involve deceiv-
ing participants about the purpose of the research—at least until the study is finished. This
deception is a way to ensure people’s honest reactions to the experimental setting. For
example, if Milgram’s participants had known he was studying obedience, they would
have reacted very differently and there would have been no point to doing the study. As
we will discuss in later chapters, people tend to change their behavior when they figure
out your research question (as well as when they think they figure it out).

Deception is described here as a threat because of the potential for abuse. The history of sci-
ence is rife with examples of medical research conducted on unsuspecting (and unwilling)
participants. In one of the most infamous, researchers in Tuskegee, Alabama, conducted a
study of the natural progression of syphilis among poor African-American farmers. The
study began in 1932 under the supervision of the Public Health Service and continued
until 1972. Where’s the deception? Well, it turns out that penicillin was discovered to be
a reliable cure for syphilis—in 1947. The researchers not only lied about the purposes of
the study (participants were never told they had syphilis), but they deliberately withheld
treatment in order to continue the study. (You can read more about the study on this web-
site: http://www.cdc.gov/tuskegee/timeline.htm.)

On the one hand, these types of studies are vastly different from research that could be
approved today, much less the type of research conducted in psychology. On the other
hand, every researcher must be mindful at all times that he or she does not abuse the trust
of participants. We now return to the issue of deception in the discussion of evaluating a
set of research scenarios.

APA and Other Ethical Guidelines

In response to public outcry over the Tuskegee Syphilis Study, the U.S. Congress formed
a panel to develop guidelines that would ensure that all human subjects were treated
ethically. This committee published the Belmont Report in 1979, laying out a set of basic
ethical principles for the use of human subjects. (The full report is available at http://www
.hhs.gov/ohrp/humansubjects/guidance/belmont.html.) Essentially, the Belmont Report
guidelines argue for treating participants with respect, minimizing harm, and avoiding
exploitation. Starting in 1981, these principles were formalized into a set of federal laws
referred to as the Common Rule, a baseline standard of ethics for all federally funded
research.

new85743_01_c01_001-062.indd 50 6/18/13 11:56 AM

http://www.cdc.gov/tuskegee/timeline.htm

http://www.hhs.gov/ohrp/humansubjects/guidance/belmont.html

CHAPTER 1Section 1.7 Ethics in Research

One critical part of the Common Rule was the creation of review boards to evaluate the
ethics of every proposed research study. The Common Rule mandated that any institution
receiving federal money must have an institutional review board (IRB), which reviews
and monitors all research involving humans in order to protect the welfare of research
participants. The IRB is tasked with determining whether a study is consistent with ethi-
cal principles, and it has the authority to approve, reject, or require modification of any
research proposal. To put it another way, the IRB serves a gatekeeper role for research,
ensuring that something like the Tuskegee Syphilis Study, or Landis’s “facial expression”
studies could not be run today.

An important piece of IRB review is to assess the degree of risk that a study poses for par-
ticipants. Based on these assessments, each proposed study undergoes one of three cat-
egories of review. The lowest-risk studies are subject to exempt review, in which an IRB
representative simply verifies the low risk and approves the study. In order to qualify for
exempt review, a study has to fit into one of six predefined categories, including research
done in educational settings (e.g., testing a new way to teach reading skills) and reanalysis
of existing data (e.g., looking for patterns in poll data) (Mayo Foundation, 2013). (The full
set of guidelines is available online at http://mayoresearch.mayo.edu/mayo/research/irb
/policy_manual.cfm.)

Studies classified as medium risk—including the majority of psychological studies—are
subject to expedited review, in which an IRB representative conducts a full review of the
proposed study’s procedures, ensuring that participants’ welfare and identity are pro-
tected. Expedited review also requires that a study fit into one of seven predefined catego-
ries (US HHS, 1998). These categories encompass most of the research that psychologists
conduct, even when these studies include collection of personal information and biologi-
cal specimens. The key to meeting expedited review criteria is that the risk of harm and
distress and the release of information are kept to a minimum.

Finally, studies classified as high risk are subject to full-board review, in which all mem-
bers of the IRB review the proposed study’s procedures and then meet as a group to discuss
the degree of risk and protection. This category includes studies involving medical pro-
cedures, children, prisoners, or pregnant women. Any time there is potential for physical
harm, release of confidential information, or undue pressure on people to participate (e.g.,
prisoners), the IRB pays careful attention to the procedures for minimizing these risks.

The APA has its own version of an ethical code, written specifically for the kinds of dilem-
mas faced by psychologists in both research and therapeutic settings. The APA ethics code
lays out five specific rules for research that involves human participants (APA, 2013a).
These rules take their inspiration from the Belmont guidelines: Treat people with respect,
minimize harm, and avoid exploitation. (You can view the full APA ethics code here:
http://www.apa.org/ethics/code/index.aspx.)

1. Informed Consent
First and foremost, research participants must be “informed of all features of the study
that would reasonably affect their decision to participate,” known as informed consent.
Before people agree to take part in your study, they need to know whether it involves
anything painful or uncomfortable or might reveal sensitive or embarrassing informa-
tion. Participants need to be informed of the risks and benefits of participating. And they

new85743_01_c01_001-062.indd 51 6/18/13 11:56 AM

http://mayoresearch.mayo.edu/mayo/research/irb/policy_manual.cfm

http://www.apa.org/ethics/code/index.aspx

CHAPTER 1Section 1.7 Ethics in Research

need to know how you will protect the information that they provide. What if your study
involves deception? This is where the “reasonably affect their decision” phrase comes
in. If you are pretending to study perception but are actually studying conformity, you
are under no obligation to reveal this. However, if your study involves, say, running on
a treadmill or taking drugs, people need to know in order to make informed decisions
about their overall health.

2. Free Consent
Researchers are forbidden from placing “undue pressure” on people to either participate
in or remain in a study. One lesson from the Milgram studies is that people are willing to
obey seemingly strange commands from an experimenter wearing a lab coat. As research-
ers, we therefore have an obligation not to abuse this tendency to obey. You probably don’t
need me to tell you that it’s wrong to recruit participants at gunpoint, but there are quite
a few gray areas when it comes to free consent. For example, many psychology depart-
ments require students to participate in research studies or offer extra credit for doing so.
(There are always alternative ways to earn the credit.) Could students who are failing the
class feel more compelled to agree to participate in a research study? What about students
who wait until the last minute and have fewer options? Free consent also becomes an
issue when prisoners or soldiers serve as research subjects. Do these populations really
feel free to say no to a request to participate? The answer to all of these questions depends
on the context.

3. Protection From Harm
Participants cannot be exposed to physical or emotional risk “beyond what they would
encounter in real life.” But where should we draw the line regarding “real life” harm? Is
it acceptable to make people feel stupid or embarrassed? Is it okay to reject people from a
group in order to observe their reactions? The answer, once again, depends on the context
and, more specifically, on the balance of costs and benefits. If participants experience mild
rejection for the sake of understanding how to cope with it, that’s probably fine. But if
participants experience severe verbal abuse for the sake of learning whether people like
abuse, then that’s less acceptable. (If that one sounds made-up, check out this study of
stuttering from the 1930s: http://www.spring.org.uk/2007/06/monster-study.php.)

4. Confidentiality
It is critical that all personal information collected during the research study be protected
and prevented from being released to anyone not authorized to view it. If you were to ask
people about their history of drug use, this information could compromise their political
or job prospects. If you ask employees to report attitudes toward their employers, the
employers who saw that information could retaliate against unfavorable ratings. There
are two related options for protecting personal information.

Whenever possible, responses should be anonymous, meaning that you do not collect
identifying information from participants. There is no risk of retaliation or other back-
lash if your participants cannot be individually identified. But in some cases, anonym-
ity is not possible, such as when you need to track people for a period of time and then
link their data. In these situations, identifying information should be kept confidential,

new85743_01_c01_001-062.indd 52 6/18/13 11:56 AM

http://www.spring.org.uk/2007/06/monster-study.php

53
CHAPTER 1Section 1.7 Ethics in Research

meaning that the information is collected but kept secret. One common way to do this is
for researchers to maintain and closely guard a master list of participants matched to code
numbers and identifiers, which are used during the study instead of names.

5. Debriefing
Finally, as mentioned, many experiments cannot avoid using some degree of deception. In
its list of ethical rules, the APA suggests a compromise regarding deception. First, it should
be done only when necessary, meaning that you should never create an elaborate cover
story just for its own sake. Second, participants should always be debriefed, or informed
of the true purpose once the study is concluded. In Milgram’s obedience studies, partici-
pants went through a long debriefing that involved meeting the “victim” and understand-
ing that they had not done any actual harm to another human being. If participants have
been under the illusion that your conformity study focused on “perceptual processing,”
then tell them the truth at the end. If your study involved having participants be rejected
from the group at random, then tell them this decision was random. The goal of this dis-
closure is to remove possible negative effects of the study procedure and to explain why
the deception was necessary. If done well, a debriefing also educates the participants, who
may be psychology students themselves. It can also make them feel appreciated and give
them a chance to ask questions. In this way, researchers can learn from their reactions to
procedures and assess whether they have experienced any unexpected negative effects.

Ethical Dilemmas

To give you a feel for what these guidelines look like in everyday research studies, let’s
walk through a pair of experimental scenarios and evaluate whether they meet the APA
guidelines.

Scenario 1
A cognitive psychologist tells students she is interested in their reading comprehen-
sion when in reality she is recording the speed of their responses rather than their
comprehension.

Evaluation: There is no risk of physical harm or extreme stress, but participants have been
deceived about the purpose of the study. Rule 5 is most relevant, but any IRB is likely to
approve the study, provided that participants are given a full debriefing at the end of it.

Scenario 2
In a field experiment designed to test whether people would help more when they are
alone or with others, male subjects walking alone or in a group were exposed to a simu-
lated rape (Harari, Harari, & White, 1995). As subjects walked along, a male and female
confederate acted out the rape. The man grabbed the woman around the waist, put his
hand over her mouth, and dragged her into the bushes as she screamed for help. Observ-
ers stationed at various points recorded the number of subjects who offered help. Before
they could actually intervene, a researcher stopped them and told them the “rape” was
part of a study.

new85743_01_c01_001-062.indd 53 6/18/13 11:56 AM

CHAPTER 1Section 1.7 Ethics in Research

Evaluation: This study is likely to have induced extreme stress in participants and quite
likely presented emotional risks beyond what participants normally encounter (Rule 3).
In addition, participants did not give their consent to be in the study (Rule 1) until after
their data were collected. However, this study was approved by a modern-day IRB, which
means that at least one group of reviewers felt that these threats were outweighed by the
benefits of the study.

Ethics in Animal Research

Our discussion so far has focused on ethical issues in dealing with human participants.
However, a significant portion of psychological research involves nonhuman animals.
Studying the behavior of nonhuman animals provides an additional important avenue
for understanding basic principles of behavior and ultimately for improving the welfare
of both humans and animals.

Many people object to the use of animals in scientific research, arguing that animals should
have the same rights and protections as human subjects. However, the majority of scien-
tists reject this view, arguing that the benefits of animal research outweigh the costs. One
of the most salient examples involves testing the effectiveness of drugs to cure cancer,
depression, and other diseases. The first stage in testing these drugs is to examine chemi-
cal reactions in isolation, using test tubes and petri dishes. Before moving on to research
involving humans, however, researchers are required to conduct safety testing of these
drugs on nonhuman animals. Thus, any discomfort experienced by the animals is justified
by the fact that these drugs can save human lives. Most scientists are in favor of the con-
tinued use of this practice, provided that the nonhuman animals are treated humanely
(Plous, 1996).

To this end, the APA has also developed
a set of guidelines to govern research
with nonhuman animals, overseen by
the Committee for Animal Research
and Ethics (CARE). (The CARE guide-
lines are available at http://www.apa
.org/science/leadership/care/guidelines.
aspx.) The upshot of these guidelines is to
ensure that animals are treated humanely
at all stages of the study by well-trained
personnel, and that there is a strong jus-
tification for their use (APA, 2013b). And,
just as research with human subjects is
reviewed by an IRB, all research with
nonhuman animals is reviewed by the
Institutional Animal Care and Use
Committee (IACUC) to ensure that the
benefits of the research outweigh any dis-
comfort the animals experience.

iStockphoto/Thinkstock

In certain fields of research, studying animal behaviors
helps researchers learn more about human behaviors.

new85743_01_c01_001-062.indd 54 6/18/13 11:56 AM

http://www.apa.org/science/leadership/care/guidelines.aspx

CHAPTER 1Section 1.7 Ethics in Research

Scientific Misconduct

Before we leave the subject of ethical conduct, there is one more important topic to cover
that has less to do with protecting participants’ welfare and more to do with the overall
ethics of research. Because science is a cumulative discipline, every research study contrib-
utes to the body of knowledge in that discipline. Our understanding of the development
of aggression, the process of forming memories, and the mechanisms for coping with
trauma all come from knowledge gained one study at a time. And so, when researchers
do not accurately represent their data and publish dishonest results, this seriously threat-
ens the cumulative body of knowledge. These types of violations are captured under the
umbrella of scientific misconduct, defined as intentional or negligent distortion of the
research process. To give you a better sense of how this happens, this section describes
two real cases of scientific misconduct: one probably “negligent” and the other very much
intentional.

Negligent Misconduct: Race Differences in Skull Size
In the 19th century, physician Samuel Morton argued that he could measure the intel-
ligence of a racial group by measuring its average skull size—bigger skulls would mean
bigger brains and, therefore, more intelligence. (We now know that intelligence is much
more complicated than this, but the science was young in the 1830s.) Morton’s work is
often credited with kick-starting more than a century’s worth of racially tinged science by
a subgroup of researchers who attempted to show that some races were superior to others.
In his 1996 book, The Mismeasure of Man, Stephen Jay Gould dissects and discredits this
entire line of work, and it is now taken for granted that this work was terribly biased and
fundamentally flawed. (For a short audio program that explains the context of this work,
see http://www.uh.edu/engines/epi429.htm.)

Gould was able to obtain access to all of Samuel Morton’s laboratory notes, and the latter
turns out to be a fascinating example of negligent misconduct. Morton’s method of quan-
tifying skull sizes was to pour lead shot into the hole in the bottom and then measure the
volume of lead shot that each skull held. But he was hardly consistent in his pouring: As
he held a known European skull in his hand, he might pack it full of lead shot to make
sure it was full. And as he held a known African skull, he might declare it full when there
was still space at the top. Morton also discarded data from skulls that didn’t seem to fit
the patterns and occasionally guessed at the race of a skull based on its size! The incred-
ible thing is that he did not try to hide any of this. Gould’s interpretation is that Morton
believed so strongly in his hypothesis that his so-called data collection was biased every
step of the way. While Morton’s intentions may have been good, the danger of this type of
misconduct is that it can happen without our knowledge.

Intentional Misconduct: Reactions to Discrimination
In the late 1990s, social psychologist Karen Ruggiero was interested in the way people
responded to instances of discrimination and prejudice. Other researchers had docu-
mented a strange discrepancy among targets of prejudice: People perceive more discrimi-
nation directed at their group as a whole than at themselves as individuals (Taylor, Wright,
Moghaddam, & LaLonde, 1990). Ruggiero argued that this indicated a reluctance to admit

new85743_01_c01_001-062.indd 55 6/18/13 11:56 AM

http://www.uh.edu/engines/epi429.htm

CHAPTER 1Section 1.7 Ethics in Research

to personal discrimination because it would mean acknowledging a lack of control over
one’s own outcomes. That is: I haven’t personally seen any sexism because I’m in charge
of my own destiny, but it’s a big problem for other women.

In a compelling 1999 paper, Ruggiero showed that members of high-status groups were
more likely to blame discrimination in a single situation because there were fewer impli-
cations for one’s degree of long-term control. Fascinating, right? But there’s just one prob-
lem: These data were completely fabricated. Not one of the 240 supposed participants
actually existed; Ruggiero had written a piece of fiction and passed it off as a scientific
journal article. This was her most egregious offense, but others surfaced as well. She
fabricated partial data for another paper; she discarded participants that did not fit her
hypothesis; she used federal grant money to pretend to collect these data; and she used
these fake data to apply for future funding. Ruggiero was eventually caught and forced to
submit retractions to several scientific journals to correct the fabricated publication. She
was also forced to resign from her faculty position and barred from working on feder-
ally funded research for 5 years (National Institutes of Health, 2001). (You can read the
official report of the investigation here: http://grants.nih.gov/grants/guide/notice-files
/NOT-OD-02-020.html.)

Dr. Ruggiero had completed her PhD at McGill University and began a prestigious faculty
position at Harvard University before being wooed away to the University of Texas with
a $100,000 start-up package for setting up her laboratory. In short, she had given every
sign of being a rising star in the field. So why would she take such a big risk? One of her
fellow graduate students, interviewed for a 2002 article in The Chicago Tribune, suggested
that she was motivated by a sincere belief in the work she was doing: “She was invested
in proving people were denying discrimination. . . . She knew what the answer ought to
be.” Another possible motivation has to do with the way incentives work for academic
research. Science works one slow step at a time, but people are often rewarded for making
a big, counterintuitive splash. Ruggiero was certainly rewarded for her efforts, at least in
the short term, but it couldn’t last.

This case is fascinating because it sheds real light on the scientific process and its correc-
tive effect. The reason her deception was ultimately uncovered was that other people tried
to recreate her experiments. Again, this is how science works—one finding doesn’t really
mean much until other people can repeat it in their own laboratories. However, because
these data were fictional, there was no way to replicate them. So people started talking at
conferences, which eventually led to official questions, and the rest is history.

The silver lining to the Ruggiero story is that it illustrates the strength of the scientific
approach. Ultimately, this approach is self-correcting, and people who attempt to cheat
the system eventually will get caught. An interesting website that tracks retractions of
journal articles is http://retractionwatch.wordpress.com/. This blog highlights problem-
atic research, including faked experiments and plagiarized articles.

new85743_01_c01_001-062.indd 56 6/18/13 11:56 AM

http://grants.nih.gov/grants/guide/notice-files/NOT-OD-02-020.html

http://retractionwatch.wordpress.com/

CHAPTER 1

Summary

This chapter has provided an introduction to the scientific approach to problem solv-ing. We first discussed what it means to think “scientifically” and contrasted this approach to other ways of making decisions, such as reliance on authority or indi-
vidual experiences. We then covered the four steps of the research process: forming a
research question, deciding how to test it, collecting data, and interpreting the results.
The key distinguishing feature of scientific thinking is that our decision-making process is
based on empirical evidence. If our data run counter to our initial predictions—especially
if this happens over and over again—then we have to conclude that our prediction was
wrong. Science means that we draw conclusions about even the most important questions
based on facts. Do vaccines cause autism? Is the planet getting warmer? What is the best
way to improve children’s reading skills? In every case, we would collect the appropri-
ate set of data and then decide, regardless of whether the answer fits our preconceived
notions or what we want to be true.

The first and most important step of the research process is to form a testable and falsifi-
able research hypothesis. We covered the process of developing hypotheses and of placing
them in the broader context of research in the field. Broadly speaking, hypotheses can be
developed in one of two ways. Induction is a bottom-up process that involves trying to
generalize from our observations about the world. Deduction is a top-down process that
involves trying to generate a specific prediction from a broader theoretical perspective.
One of the key points from this section is that science is a cumulative discipline, meaning
that our knowledge in a particular field grows and accumulates with each study. The the-
ory of evolution sprang not from a single fossil discovery but from the combined evidence
of thousands of fossils and ethological studies. Thus, it is particularly important that each
study be placed in the proper context of prior studies, and this requires the ability to find
and digest peer-reviewed journal articles that are relevant to your research question.

We subsequently covered how to do a thorough literature search, critique the existing
literature, and follow the step-by-step process for writing a research proposal in APA
style. The final section of this chapter emphasized the importance of ethics in conducting
research. Whenever research involves human or nonhuman animals, we have to protect
the rights of these participants. The history books are full of abuses of human participants,
such as deceiving people about the diseases they had, subjecting them to extreme stress,
and the horrors inflicted by Japanese and Nazi doctors on prisoners during World War
II. In response to these and countless other less egregious abuses, the federal govern-
ment has mandated that all research treat participants with respect, minimize harm, and
avoid exploitation. The APA has established its own guidelines governing psychological
research studies: Participants must give both informed and free consent; they must be
protected from undue harm; their personal information must be protected; and they must
be told the full purpose of the study at its conclusion. Finally, we covered the subject of
scientific misconduct, which includes negligent or intentional distortions of the research
process. The beauty of the scientific process is that those who attempt to commit fraud
don’t get away with it forever.

new85743_01_c01_001-062.indd 57 6/18/13 11:56 AM

CHAPTER 1

Key Terms

abstract A summary of a journal article,
appearing both at the top of the article and
in search results.

analysis of variance (ANOVA) A statisti-
cal procedure that tests for differences by
comparing the variance explained by sys-
tematic factors to the variance explained
by error.

anonymous data Data collected without
identifying information from participants.

applied research Research in which the
primary goal is to solve a problem, with
less focus on why the solution works.

basic research Research in which the
primary goal is to acquire knowledge, with
less focus on how to apply the knowledge.

biopsychology The study of connections
between biological systems (including the
brain, hormones, and neurotransmitters)
and our thoughts, feelings, and behaviors.

clinical psychology An applied field
focused on understanding the best ways to
treat psychological disorders; the study of
best practices for understanding, treating,
and preventing distress and dysfunction.

cognitive psychology The study of inter-
nal mental processes, including the ways
that people think, learn, remember, speak,
and perceive.

Committee for Animal Research and Eth-
ics (CARE) APA committee responsible
for guidelines governing animal research;
the upshot of these guidelines is to ensure
that animals are treated humanely at all
stages of the study by well-trained person-
nel and that there is a strong justification
for the animals’ use.

Common Rule A set of federal laws, start-
ing in 1981, that established the baseline
standard of ethics for all federally funded
research.

confidential data Data collected in such a
way that identifying information is pro-
tected and kept secret.

debriefing A practice of disclosure that
upholds the ethical principle stating that
participants should be informed of the
study’s true purpose when it is concluded.

deduction The process of developing a
specific hypothesis out of a more general
theory; best understood as a “top-down”
approach to reasoning.

dependent variable Outcome variable
that is measured by the experimenter.

developmental psychology The system-
atic study of physical, social, and cogni-
tive changes over the human life span;
initially focused on childhood develop-
ment, though many researchers now
study changes and key stages over the
entire life span.

empiricism A scientific approach to deci-
sion making that focuses solely on the role
of observation and sensory experience
over the roles of reason and logic.

exempt review Category of IRB review
reserved for low-risk studies falling into a
set of predefined categories; involves hav-
ing an IRB representative simply verify the
low risk and approve the study.

expedited review Category of IRB review
used for medium-risk studies falling into a
set of predefined categories; involves hav-
ing an IRB representative conduct a full
review of the study procedures and ensure
that participants’ welfare and identity are
protected.

new85743_01_c01_001-062.indd 58 6/18/13 11:56 AM

CHAPTER 1Key Terms

falsifiability A concept applied to theories
and hypotheses meaning that the right set
of conditions could prove it wrong; call-
ing something falsifiable does not mean it
is false, only that it would be possible to
demonstrate its falsehood if it were false.

free consent Ethical principle stating that
those involved in studies must agree to do
so without coercion; thus, researchers are
forbidden from placing undue pressure
on people to participate in or remain in a
study.

full-board review Category of IRB review
used for high-risk studies, which contain
an inflated risk to participants’ welfare
or the potential for release of confidential
information; involves having all members
of the IRB review the study procedures
and then meet as a group to discuss the
degree of risk and protection.

hypothesis A specific and falsifiable state-
ment about the relationship between two
or more variables.

independent variable Variable in an
experimental design that is manipulated
by the experimenter.

induction The process of developing a
general hypothesis out of a set of specific
observations; best understood as a
“bottom-up” approach to reasoning.

informed consent Ethical principle stat-
ing that research participants must be
informed of all features of the study that
would reasonably affect their decision to
participate.

Institutional Animal Care and Use
Committee (IACUC) Review panel that
monitors all research involving nonhuman
animals in order to protect the welfare
of research subjects; tasked with ensur-
ing that the benefits of the research out-
weigh any discomfort experienced by the
animals.

institutional review board (IRB) Review
panel that monitors all research involving
humans in order to protect the welfare of
research participants; tasked with deter-
mining whether a study is consistent with
ethical principles and has the authority
to approve, reject, or require modification
from each research proposal.

mixed methods design Research design
that combines or associates both quantita-
tive and qualitative methods.

operational definitions Define the mean-
ing of a concept or variable in relation to a
study.

operationalization The process of choos-
ing measurable variables to represent the
components of a hypothesis.

parsimonious Term applied to theories
meaning that our concepts should be as
simple as possible without sacrificing
completeness.

peer review A process that involves
having experts in the field evaluate the
merits of research articles before they are
published.

primary source Full firsthand reports of a
research study, including information on
the participants, the data collected, and
the statistical analyses of these data; these
appear in professional academic journals.

problem statement (Aim of the study),
provides a clear description of the intent of
the study.

purpose statement Similar to the problem
statement, not only includes the intent of
the study but identifies what population
will be studied, what type of research will
be conducted (e.g., comparison between
variables, examination of the relationships
between variables, a descriptive examina-
tion of one or more variables), and what
the dependent and independent variables
will be.

new85743_01_c01_001-062.indd 59 6/18/13 11:56 AM

CHAPTER 1

Apply Your Knowledge

1. For each of the following broad theoretical statements, think of a specific
research hypothesis that would test the theory. There are many possibilities
for each one, but remember that your hypothesis needs to be both testable and
falsifiable. The first one is provided as an example.

Theory: Infants look cute and helpless so that adults will take care of them.
Hypothesis: Parents will be more attentive to cute infants than to less cute
infants.

qualitative research A descriptive
approach that attempts to gain a deep
understanding of particular cases and
contexts.

quantitative research A systematic and
empirical approach that attempts to gener-
alize results to other contexts.

rationalism An approach to decision
making that relies on making logical
arguments.

reconciliation and synthesis The process
of resolving an apparent conflict by find-
ing common ground among the ideas and
then merging all the pieces into one new
explanation.

research problem The topic or phenom-
enon to be addressed, investigated, and
researched, either through quantitative or
qualitative methods.

research proposal Provides a detailed
description about the research problem
and the planned research methods to be
used in a study.

research questions Questions developed
to make the research problem testable.
Generally take the form of hypotheses,
which are specific predictions or educated
guesses about the outcome of the study;
some researchers may choose to include
hypotheses and research questions that are
related to the research problem.

scientific method A means of approaching
problems and drawing conclusions based
on empirical observations. Consists of four
steps: hypothesize, operationalize, mea-
sure, and explain, abbreviated as HOME.

scientific misconduct Intentional or negli-
gent distortion of the research process.

secondary source Secondhand summary
of primary source articles; these include
textbooks and academic books, as well as
less than trustworthy websites.

social psychology The study of the ways
our thoughts, feelings, and behaviors are
shaped by other people.

theory A collection of ideas used to
explain the connections between variables
and phenomena.

variable In the context of an experiment,
a factor that is subject to change and that is
measured or studied.

new85743_01_c01_001-062.indd 60 6/18/13 11:56 AM

CHAPTER 1

Critical Thinking & Discussion Questions

Theory: People are inherently social and value the approval of others.
Hypothesis:

Theory: People prefer to feel good about themselves.
Hypothesis:

2. a. Read the abstract of a published research study (Langer & Rodin, 1976) found
here, http://www.ncbi.nlm.nih.gov/pubmed/1011073, and identify
the four components of the research process.

Hypothesis:
Operationalization (how they defined variables):
Measure (how they conducted the study and collected the data):
Explain:

b. Read the abstract of a published research study (Swim & Hyers, 1999) found
here, http://www.ingentaconnect.com/content/ap/js/1999/00000035/00000001
/art01370, and identify the four components of the research process:

Hypothesis:
Operationalization (how they defined variables):
Measure (how they conducted the study and collected the data):
Explain:

3. Read the following description of a research study, and then evaluate whether it
meets the five APA ethical guidelines:

A researcher told students that their responses to an online survey on cheating
were anonymous. One question asked students for their email address to use in
a raffle drawing. Instead, the researcher used this to locate GPAs in school files
so he could correlate frequency of cheating and GPA.

Informed consent?
Free consent?
Protection from harm?
Confidentiality?
Debriefing?

Based on this evaluation, is the study likely to be approved by an institutional
review board? Why or why not?

Critical Thinking & Discussion Questions

1. You have been asked to help determine whether watching violent television
leads people to become more violent. Explain how you would approach this
task using the four steps of the research process (Hint: HOME).

2. Take a second to review the guidelines for evaluating theories. Using these
criteria, evaluate Freud’s theory of unconscious drives. Hint: The key to this
theory is that much of our behavior is driven by internal conflicts that exist
outside our awareness.

new85743_01_c01_001-062.indd 61 6/18/13 11:56 AM

http://www.ncbi.nlm.nih.gov/pubmed/1011073

http://www.ingentaconnect.com/content/ap/js/1999/00000035/00000001/art01370

new85743_01_c01_001-062.indd 62 6/18/13 11:56 AM

What Will You Get?

We provide professional writing services to help you score straight A’s by submitting custom written assignments that mirror your guidelines.

Premium Quality

Get result-oriented writing and never worry about grades anymore. We follow the highest quality standards to make sure that you get perfect assignments.

Experienced Writers

Our writers have experience in dealing with papers of every educational level. You can surely rely on the expertise of our qualified professionals.

On-Time Delivery

Your deadline is our threshold for success and we take it very seriously. We make sure you receive your papers before your predefined time.

24/7 Customer Support

Someone from our customer support team is always here to respond to your questions. So, hit us up if you have got any ambiguity or concern.

Complete Confidentiality

Sit back and relax while we help you out with writing your papers. We have an ultimate policy for keeping your personal and order-related details a secret.

Authentic Sources

We assure you that your document will be thoroughly checked for plagiarism and grammatical errors as we use highly authentic and licit sources.

Moneyback Guarantee

Still reluctant about placing an order? Our 100% Moneyback Guarantee backs you up on rare occasions where you aren’t satisfied with the writing.

Order Tracking

You don’t have to wait for an update for hours; you can track the progress of your order any time you want. We share the status after each step.

Order Now Talk to Us

Areas of Expertise

Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

Essay

Thesis

Presentation

Dissertation

Term Paper

Research Paper

Book Review

Assignment

Report

Case Study

Letter

Article

Coursework

Speech

Q & A

Critical Thinking

Areas of Expertise

Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

Essay

Thesis

Presentation

Dissertation

Term Paper

Research Paper

Book Review

Assignment

Report

Case Study

Letter

Article

Coursework

Speech

Q & A

Critical Thinking

Trusted Partner of 9650+ Students for Writing

From brainstorming your paper's outline to perfecting its grammar, we perform every step carefully to make your paper worthy of A grade.

Preferred Writer

Hire your preferred writer anytime. Simply specify if you want your preferred expert to write your paper and we’ll make that happen.

Grammar Check Report

Get an elaborate and authentic grammar check report with your work to have the grammar goodness sealed in your document.

One Page Summary

You can purchase this feature if you want our writers to sum up your paper in the form of a concise and well-articulated summary.

Plagiarism Report

You don’t have to worry about plagiarism anymore. Get a plagiarism report to certify the uniqueness of your work.

Free Features $66FREE

Most Qualified Writer $10FREE
Plagiarism Scan Report $10FREE
Unlimited Revisions $08FREE
Paper Formatting $05FREE
Cover Page $05FREE
Referencing & Bibliography $10FREE
Dedicated User Area $08FREE
24/7 Order Tracking $05FREE
Periodic Email Alerts $05FREE

Our Services

Join us for the best experience while seeking writing assistance in your college life. A good grade is all you need to boost up your academic excellence and we are all about it.

On-time Delivery
24/7 Order Tracking
Access to Authentic Sources

Academic Writing

We create perfect papers according to the guidelines.

Professional Editing

We seamlessly edit out errors from your papers.

Thorough Proofreading

We thoroughly read your final draft to identify errors.

Thorough Proofreading

We thoroughly read your final draft to identify errors.

Delegate Your Challenging Writing Tasks to Experienced Professionals

Work with ultimate peace of mind because we ensure that your academic work is our responsibility and your grades are a top concern for us!

Check Out Our Sample Work

Dedication. Quality. Commitment. Punctuality

It May Not Be Much, but It’s Honest Work!

Here is what we have achieved so far. These numbers are evidence that we go the extra mile to make your college journey successful.

0+

Happy Clients

0+

Words Written This Week

0+

Ongoing Orders

0%

Customer Satisfaction Rate

Process as Fine as Brewed Coffee

We have the most intuitive and minimalistic process so that you can easily place an order. Just follow a few steps to unlock success.

Call Us +1 (877) 657-8180 Discuss Order Details Now

See How We Helped 9000+ Students Achieve Success

We Analyze Your Problem and Offer Customized Writing

We understand your guidelines first before delivering any writing service. You can discuss your writing needs and we will have them evaluated by our dedicated team.

Clear elicitation of your requirements.
Customized writing as per your needs.

We Mirror Your Guidelines to Deliver Quality Services

We write your papers in a standardized way. We complete your work in such a way that it turns out to be a perfect description of your guidelines.

Proactive analysis of your writing.
Active communication to understand requirements.

We Handle Your Writing Tasks to Ensure Excellent Grades

We promise you excellent grades and academic excellence that you always longed for. Our writers stay in touch with you via email.

Thorough research and analysis for every order.
Deliverance of reliable writing service to improve your grades.

Place an Order Start Chat Now

4 discussions due in 48 hours

Required Texts

Required References

What Will You Get?

Premium Quality

Experienced Writers

On-Time Delivery

24/7 Customer Support

Complete Confidentiality

Authentic Sources

Moneyback Guarantee

Order Tracking

Areas of Expertise

Essay

Thesis

Presentation

Dissertation

Term Paper

Research Paper

Book Review

Assignment

Report

Case Study

Letter

Article

Coursework

Speech

Q & A

Critical Thinking

Areas of Expertise

Essay

Thesis

Presentation

Dissertation

Term Paper

Research Paper

Book Review

Assignment

Report

Case Study

Letter

Article

Coursework

Speech

Q & A

Critical Thinking

Trusted Partner of 9650+ Students for Writing

Preferred Writer

Grammar Check Report

One Page Summary

Plagiarism Report

Free Features $66FREE

Our Services

Academic Writing

Professional Editing

Thorough Proofreading

Thorough Proofreading

Delegate Your Challenging Writing Tasks to Experienced Professionals

Check Out Our Sample Work

It May Not Be Much, but It’s Honest Work!

0+

Happy Clients

0+

Words Written This Week

0+

Ongoing Orders

0%

Customer Satisfaction Rate

Process as Fine as Brewed Coffee

Share Your Requirements

Place Order & Deposit Funds

Release Payment to Your Writer

See How We Helped 9000+ Students Achieve Success

We Analyze Your Problem and Offer Customized Writing

We Mirror Your Guidelines to Deliver Quality Services

We Handle Your Writing Tasks to Ensure Excellent Grades