please answer these questions “STATISTICAL QUESTIONS based of SPSS knowledge.
Richard G. Lomax
The Ohio State University
Debbie L. Hahs-Vaughn
University of Central Florida
Routledge
Taylor & Francis Group
711 Third Avenue
New York, NY 10017
Routledge
Taylor & Francis Group
27 Church Road
Hove, East Sussex BN3 2FA
© 2012 by Taylor & Francis Group, LLC
Routledge is an imprint of Taylor & Francis Group, an Informa business
Printed in the United States of America on acid-free paper
Version Date: 20111003
International Standard Book Number: 978-0-415-88005-3 (Hardback)
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://
www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923,
978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For
organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Library of Congress Cataloging‑in‑Publication Data
Lomax, Richard G.
An introduction to statistical concepts / Richard G. Lomax, Debbie L. Hahs-Vaughn. — 3rd ed.
p. cm.
Includes bibliographical references and index.
ISBN 978-0-415-88005-3
1. Statistics. 2. Mathematical statistics. I. Hahs-Vaughn, Debbie L. II. Title.
QA276.12.L67 2012
519.5–dc23 2011035052
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the Psychology Press Web site at
http://www.psypress.com
http://www.copyright.com/
http://www.copyright.com/
http://www.taylorandfrancis.com
http://www.psypress.com
www.copyright.com
This book is dedicated to our families
and to all of our former students.
vii
Contents
Preface�������������������������������������������������������������������������������������������������������������������������������������������� xiii
Acknowledgments���������������������������������������������������������������������������������������������������������������������� xvii
1. Introduction������������������������������������������������������������������������������������������������������������������������������ 1
1�1� What�Is�the�Value�of�Statistics?������������������������������������������������������������������������������������ 3
1�2� Brief�Introduction�to�History�of�Statistics������������������������������������������������������������������� 4
1�3� General�Statistical�Definitions�������������������������������������������������������������������������������������� 5
1�4� Types�of�Variables���������������������������������������������������������������������������������������������������������� 7
1�5� Scales�of�Measurement�������������������������������������������������������������������������������������������������� 8
1�6� Summary����������������������������������������������������������������������������������������������������������������������� 13
Problems����������������������������������������������������������������������������������������������������������������������������������� 14
2. Data Representation�������������������������������������������������������������������������������������������������������������� 17
2�1� �Tabular�Display�of�Distributions������������������������������������������������������������������������������� 18
2�2� �Graphical�Display�of�Distributions��������������������������������������������������������������������������� 23
2�3� �Percentiles��������������������������������������������������������������������������������������������������������������������� 29
2�4� �SPSS�������������������������������������������������������������������������������������������������������������������������������� 33
2�5� �Templates�for�Research�Questions�and�APA-Style�Paragraph������������������������������ 41
2�6� �Summary����������������������������������������������������������������������������������������������������������������������� 42
Problems����������������������������������������������������������������������������������������������������������������������������������� 43
3. Univariate Population Parameters and Sample Statistics��������������������������������������������� 49
3�1� �Summation�Notation��������������������������������������������������������������������������������������������������� 50
3�2� Measures�of�Central�Tendency����������������������������������������������������������������������������������� 51
3�3� �Measures�of�Dispersion����������������������������������������������������������������������������������������������� 56
3�4� �SPSS�������������������������������������������������������������������������������������������������������������������������������� 65
3�5� �Templates�for�Research�Questions�and�APA-Style�Paragraph������������������������������ 69
3�6� �Summary����������������������������������������������������������������������������������������������������������������������� 70
Problems����������������������������������������������������������������������������������������������������������������������������������� 71
4. Normal Distribution and Standard Scores���������������������������������������������������������������������� 77
4�1� �Normal�Distribution���������������������������������������������������������������������������������������������������� 78
4�2� �Standard�Scores������������������������������������������������������������������������������������������������������������ 84
4�3� �Skewness�and�Kurtosis�Statistics������������������������������������������������������������������������������� 87
4�4� �SPSS�������������������������������������������������������������������������������������������������������������������������������� 91
4�5� �Templates�for�Research�Questions�and�APA-Style�Paragraph������������������������������ 98
4�6� �Summary����������������������������������������������������������������������������������������������������������������������� 99
Problems����������������������������������������������������������������������������������������������������������������������������������� 99
5. Introduction to Probability and Sample Statistics������������������������������������������������������� 105
5�1� �Brief�Introduction�to�Probability������������������������������������������������������������������������������ 106
5�2� �Sampling�and�Estimation����������������������������������������������������������������������������������������� 109
5�3� �Summary��������������������������������������������������������������������������������������������������������������������� 117
� Appendix:�Probability�That�at�Least�Two Individuals�Have�the�Same�Birthday�������� 117
Problems��������������������������������������������������������������������������������������������������������������������������������� 118
viii Contents
6. Introduction to Hypothesis Testing: Inferences About a Single Mean������������������� 121
6�1� Types�of�Hypotheses������������������������������������������������������������������������������������������������� 122
6�2� Types�of�Decision�Errors������������������������������������������������������������������������������������������� 124
6�3� Level�of�Significance�(α)��������������������������������������������������������������������������������������������� 127
6�4� Overview�of�Steps�in�Decision-Making�Process��������������������������������������������������� 129
6�5� Inferences�About�μ�When�σ�Is�Known�������������������������������������������������������������������� 130
6�6� Type�II�Error�(β)�and�Power�(1�−�β)��������������������������������������������������������������������������� 134
6�7� Statistical�Versus�Practical�Significance������������������������������������������������������������������ 138
6�8� Inferences�About�μ�When�σ�Is�Unknown��������������������������������������������������������������� 139
6�9� SPSS������������������������������������������������������������������������������������������������������������������������������ 145
6�10� G*Power����������������������������������������������������������������������������������������������������������������������� 149
6�11� Template�and�APA-Style�Write-Up�������������������������������������������������������������������������� 155
6�12� Summary��������������������������������������������������������������������������������������������������������������������� 156
Problems��������������������������������������������������������������������������������������������������������������������������������� 157
7. Inferences About the Difference Between Two Means����������������������������������������������� 163
7�1� �New�Concepts������������������������������������������������������������������������������������������������������������� 164
7�2� �Inferences�About�Two�Independent�Means����������������������������������������������������������� 166
7�3� �Inferences�About�Two�Dependent�Means�������������������������������������������������������������� 176
7�4� �SPSS������������������������������������������������������������������������������������������������������������������������������ 180
7�5� �G*Power����������������������������������������������������������������������������������������������������������������������� 192
7�6� �Template�and�APA-Style�Write-Up�������������������������������������������������������������������������� 195
7�7� �Summary��������������������������������������������������������������������������������������������������������������������� 198
Problems��������������������������������������������������������������������������������������������������������������������������������� 198
8. Inferences About Proportions������������������������������������������������������������������������������������������ 205
8�1� �Inferences�About�Proportions�Involving�Normal�Distribution�������������������������� 206
8�2� �Inferences�About�Proportions�Involving�Chi-Square�Distribution�������������������� 217
8�3� �SPSS������������������������������������������������������������������������������������������������������������������������������ 224
8�4� �G*Power����������������������������������������������������������������������������������������������������������������������� 231
8�5� �Template�and�APA-Style�Write-Up�������������������������������������������������������������������������� 234
8�6� �Summary��������������������������������������������������������������������������������������������������������������������� 236
Problems��������������������������������������������������������������������������������������������������������������������������������� 237
9. Inferences About Variances���������������������������������������������������������������������������������������������� 241
9�1� �New�Concepts������������������������������������������������������������������������������������������������������������� 242
9�2� �Inferences�About�Single�Variance���������������������������������������������������������������������������� 244
9�3� �Inferences�About�Two�Dependent�Variances��������������������������������������������������������� 246
9�4� Inferences�About�Two�or�More�Independent�Variances�(Homogeneity�
of Variance�Tests)�������������������������������������������������������������������������������������������������������� 248
9�5� �SPSS������������������������������������������������������������������������������������������������������������������������������ 252
9�6� �Template�and�APA-Style�Write-Up�������������������������������������������������������������������������� 253
9�7� �Summary��������������������������������������������������������������������������������������������������������������������� 253
Problems��������������������������������������������������������������������������������������������������������������������������������� 254
ixContents
10. Bivariate Measures of Association����������������������������������������������������������������������������������� 259
10�1� �Scatterplot������������������������������������������������������������������������������������������������������������������� 260
10�2� �Covariance������������������������������������������������������������������������������������������������������������������ 263
10�3� �Pearson�Product–Moment�Correlation�Coefficient����������������������������������������������� 265
10�4� �Inferences�About�Pearson�Product–Moment�Correlation�Coefficient���������������� 266
10�5� �Assumptions�and�Issues�Regarding�Correlations������������������������������������������������� 269
10�6� �Other�Measures�of�Association�������������������������������������������������������������������������������� 272
10�7� �SPSS������������������������������������������������������������������������������������������������������������������������������ 276
10�8� �G*Power����������������������������������������������������������������������������������������������������������������������� 283
10�9� �Template�and�APA-Style�Write-Up�������������������������������������������������������������������������� 286
10�10� �Summary��������������������������������������������������������������������������������������������������������������������� 287
Problems��������������������������������������������������������������������������������������������������������������������������������� 287
11. One-Factor Analysis of Variance: Fixed-Effects Model��������������������������������������������� 291
11�1� �Characteristics�of�One-Factor�ANOVA�Model������������������������������������������������������� 292
11�2� �Layout�of�Data������������������������������������������������������������������������������������������������������������ 296
11�3� �ANOVA�Theory���������������������������������������������������������������������������������������������������������� 296
11�4� �ANOVA�Model����������������������������������������������������������������������������������������������������������� 302
11�5� �Assumptions�and�Violation�of�Assumptions��������������������������������������������������������� 309
11�6� �Unequal�n’s�or�Unbalanced�Procedure������������������������������������������������������������������� 312
11�7� �Alternative�ANOVA�Procedures������������������������������������������������������������������������������ 312
11�8� �SPSS�and�G*Power������������������������������������������������������������������������������������������������������ 313
11�9� �Template�and�APA-Style�Write-Up�������������������������������������������������������������������������� 334
11�10� �Summary��������������������������������������������������������������������������������������������������������������������� 336
Problems��������������������������������������������������������������������������������������������������������������������������������� 336
12. Multiple Comparison Procedures������������������������������������������������������������������������������������ 341
12�1� �Concepts�of�Multiple�Comparison�Procedures������������������������������������������������������ 342
12�2� �Selected�Multiple�Comparison�Procedures������������������������������������������������������������ 348
12�3� �SPSS������������������������������������������������������������������������������������������������������������������������������ 362
12�4� �Template�and�APA-Style�Write-Up�������������������������������������������������������������������������� 366
12�5� �Summary��������������������������������������������������������������������������������������������������������������������� 366
Problems��������������������������������������������������������������������������������������������������������������������������������� 367
13. Factorial Analysis of Variance: Fixed-Effects Model��������������������������������������������������� 371
13�1� �Two-Factor�ANOVA�Model��������������������������������������������������������������������������������������� 372
13�2� �Three-Factor�and�Higher-Order�ANOVA��������������������������������������������������������������� 390
13�3� �Factorial�ANOVA�With�Unequal�n’s������������������������������������������������������������������������ 393
13�4� �SPSS�and�G*Power������������������������������������������������������������������������������������������������������ 395
13�5� �Template�and�APA-Style�Write-Up�������������������������������������������������������������������������� 417
13�6� �Summary��������������������������������������������������������������������������������������������������������������������� 419
Problems��������������������������������������������������������������������������������������������������������������������������������� 420
14. Introduction to Analysis of Covariance: One- Factor Fixed-Effects Model
With Single Covariate��������������������������������������������������������������������������������������������������������� 427
14�1� �Characteristics�of�the�Model������������������������������������������������������������������������������������� 428
14�2� �Layout�of�Data������������������������������������������������������������������������������������������������������������ 431
14�3� �ANCOVA�Model��������������������������������������������������������������������������������������������������������� 431
x Contents
14�4� �ANCOVA�Summary�Table���������������������������������������������������������������������������������������� 432
14�5� �Partitioning�the�Sums�of�Squares���������������������������������������������������������������������������� 433
14�6� �Adjusted�Means�and�Related�Procedures�������������������������������������������������������������� 434
14�7� �Assumptions�and�Violation�of�Assumptions��������������������������������������������������������� 436
14�8� �Example����������������������������������������������������������������������������������������������������������������������� 441
14�9� �ANCOVA�Without�Randomization������������������������������������������������������������������������� 443
14�10� �More�Complex�ANCOVA�Models���������������������������������������������������������������������������� 444
14�11� �Nonparametric�ANCOVA�Procedures�������������������������������������������������������������������� 444
14�12� �SPSS�and�G*Power������������������������������������������������������������������������������������������������������ 445
14�13� �Template�and�APA-Style�Paragraph������������������������������������������������������������������������ 469
14�14� �Summary��������������������������������������������������������������������������������������������������������������������� 471
Problems��������������������������������������������������������������������������������������������������������������������������������� 471
15. Random- and Mixed-Effects Analysis of Variance Models��������������������������������������� 477
15�1� �One-Factor�Random-Effects�Model������������������������������������������������������������������������� 478
15�2� �Two-Factor�Random-Effects�Model������������������������������������������������������������������������� 483
15�3� �Two-Factor�Mixed-Effects�Model����������������������������������������������������������������������������� 488
15�4� �One-Factor�Repeated�Measures�Design������������������������������������������������������������������ 493
15�5� �Two-Factor�Split-Plot�or�Mixed�Design������������������������������������������������������������������� 500
15�6� �SPSS�and�G*Power������������������������������������������������������������������������������������������������������ 508
15�7� �Template�and�APA-Style�Write-Up�������������������������������������������������������������������������� 548
15�8� �Summary��������������������������������������������������������������������������������������������������������������������� 551
Problems��������������������������������������������������������������������������������������������������������������������������������� 551
16. Hierarchical and Randomized Block Analysis of Variance Models������������������������ 557
16�1� �Two-Factor�Hierarchical�Model������������������������������������������������������������������������������� 558
16�2� �Two-Factor�Randomized�Block�Design�for�n�=�1��������������������������������������������������� 566
16�3� �Two-Factor�Randomized�Block�Design�for�n�>�1��������������������������������������������������� 574
16�4� �Friedman�Test������������������������������������������������������������������������������������������������������������� 574
16�5� �Comparison�of�Various�ANOVA�Models��������������������������������������������������������������� 575
16�6� �SPSS������������������������������������������������������������������������������������������������������������������������������ 576
16�7� �Template�and�APA-Style�Write-Up�������������������������������������������������������������������������� 603
16�8� �Summary��������������������������������������������������������������������������������������������������������������������� 605
Problems��������������������������������������������������������������������������������������������������������������������������������� 605
17. Simple Linear Regression�������������������������������������������������������������������������������������������������� 611
17�1� �Concepts�of�Simple�Linear�Regression������������������������������������������������������������������� 612
17�2� �Population�Simple�Linear�Regression�Model��������������������������������������������������������� 614
17�3� �Sample�Simple�Linear�Regression�Model��������������������������������������������������������������� 615
17�4� �SPSS������������������������������������������������������������������������������������������������������������������������������ 634
17�5� �G*Power����������������������������������������������������������������������������������������������������������������������� 647
17�6� �Template�and�APA-Style�Write-Up�������������������������������������������������������������������������� 650
17�7� �Summary��������������������������������������������������������������������������������������������������������������������� 652
Problems��������������������������������������������������������������������������������������������������������������������������������� 652
xiContents
18. Multiple Regression������������������������������������������������������������������������������������������������������������ 657
18�1� Partial�and�Semipartial�Correlations���������������������������������������������������������������������� 658
18�2� Multiple�Linear�Regression�������������������������������������������������������������������������������������� 661
18�3� Methods�of�Entering�Predictors������������������������������������������������������������������������������� 676
18�4� Nonlinear�Relationships������������������������������������������������������������������������������������������� 679
18�5� Interactions����������������������������������������������������������������������������������������������������������������� 680
18�6� Categorical�Predictors����������������������������������������������������������������������������������������������� 680
18�7� SPSS������������������������������������������������������������������������������������������������������������������������������ 682
18�8� G*Power����������������������������������������������������������������������������������������������������������������������� 698
18�9� Template�and�APA-Style�Write-Up�������������������������������������������������������������������������� 701
18�10� Summary��������������������������������������������������������������������������������������������������������������������� 703
Problems��������������������������������������������������������������������������������������������������������������������������������� 704
19. Logistic Regression������������������������������������������������������������������������������������������������������������� 709
19�1� �How�Logistic�Regression�Works������������������������������������������������������������������������������ 710
19�2� �Logistic�Regression�Equation����������������������������������������������������������������������������������� 711
19�3� �Estimation�and�Model�Fit����������������������������������������������������������������������������������������� 715
19�4� �Significance�Tests������������������������������������������������������������������������������������������������������� 716
19�5� �Assumptions�and�Conditions���������������������������������������������������������������������������������� 721
19�6� �Effect�Size�������������������������������������������������������������������������������������������������������������������� 725
19�7� �Methods�of�Predictor�Entry�������������������������������������������������������������������������������������� 726
19�8� �SPSS������������������������������������������������������������������������������������������������������������������������������ 727
19�9� �G*Power����������������������������������������������������������������������������������������������������������������������� 746
19�10� �Template�and�APA-Style�Write-Up�������������������������������������������������������������������������� 749
19�11� �What�Is�Next?�������������������������������������������������������������������������������������������������������������� 751
19�12� �Summary��������������������������������������������������������������������������������������������������������������������� 752
Problems��������������������������������������������������������������������������������������������������������������������������������� 752
Appendix: Tables������������������������������������������������������������������������������������������������������������������������ 757
References������������������������������������������������������������������������������������������������������������������������������������ 783
Odd-Numbered Answers to Problems����������������������������������������������������������������������������������� 793
Author Index�������������������������������������������������������������������������������������������������������������������������������� 809
Subject Index������������������������������������������������������������������������������������������������������������������������������� 813
xiii
Preface
Approach
We�know,�we�know!�We’ve�heard�it�a�million�times�before��When�you�hear�someone�at�a�
party�mention�the�word�statistics�or�statistician,�you�probably�say�“I�hate�statistics”�and�turn�
the�other�cheek��In�the�many�years�that�we�have�been�in�the�field�of�statistics,�it�is�extremely�
rare� when� someone� did� not� have� that� reaction�� Enough� is� enough�� With� the� help� of� this�
text,�we�hope�that�“statistics�hating”�will�become�a�distant�figment�of�your�imagination�
As�the�title�suggests,�this�text�is�designed�for�a�course�in�statistics�for�students�in�educa-
tion� and� the� behavioral� sciences�� We� begin� with� the� most� basic� introduction� to�statistics�
in� the� first� chapter� and� proceed� through� intermediate� statistics�� The� text� is� designed� for�
you�to�become�a�better�prepared�researcher�and�a�more�intelligent�consumer�of�research��
We�do�not�assume�that�you�have�extensive�or�recent�training�in�mathematics��Many�of�you�
have�only�had�algebra,�perhaps�some�time�ago��We�also�do�not�assume�that�you�have�ever�
had�a�statistics�course��Rest�assured;�you�will�do�fine�
We�believe�that�a�text�should�serve�as�an�effective�instructional�tool��You�should�find�this�
text�to�be�more�than�a�reference�book;�you�might�actually�use�it�to�learn�statistics��(What�an�
oxymoron�that�a�statistics�book�can�actually�teach�you�something�)�This�text�is�not�a�theo-
retical�statistics�book,�nor�is�it�a�cookbook�on�computing�statistics�or�a�statistical�software�
manual�� Recipes� have� to� be� memorized;� consequently,� you� tend� not� to� understand� how�
or�why�you�obtain�the�desired�product��As�well,�knowing�how�to�run�a�statistics�package�
without� understanding� the� concepts� or� the� output� is� not� particularly� useful�� Thus,� con-
cepts�drive�the�field�of�statistics�
Goals and Content Coverage
Our�goals�for�this�text�are�lofty,�but�the�effort�and�its�effects�will�be�worthwhile��First,�the�
text�provides�a�comprehensive�coverage�of�topics�that�could�be�included�in�an�undergradu-
ate�or�graduate�one-�or�two-course�sequence�in�statistics��The�text�is�flexible�enough�so�that�
instructors�can�select�those�topics�that�they�desire�to�cover�as�they�deem�relevant�in�their�
particular�discipline��In�other�words,�chapters�and�sections�of�chapters�from�this�text�can�
be�included�in�a�statistics�course�as�the�instructor�sees�fit��Most�of�the�popular�as�well�as�
many�of�the�lesser-known�procedures�and�models�are�described�in�the�text��A�particular�
feature�is�a�thorough�and�up-to-date�discussion�of�assumptions,�the�effects�of�their�viola-
tion,�and�how�to�deal�with�their�violation�
The�first�five�chapters�of�the�text�cover�basic�descriptive�statistics,�including�ways�of�repre-
senting�data�graphically,�statistical�measures�which�describe�a�set�of�data,�the�normal�distri-
bution�and�other�types�of�standard�scores,�and�an�introduction�to�probability�and�sampling��
xiv Preface
The�remainder�of�the�text�covers�different�inferential�statistics��In�Chapters�6�through�10,�we�
deal�with�different�inferential�tests�involving�means�(e�g�,�t�tests),�proportions,�variances,�and�
correlations��In�Chapters�11�through�16,�all�of�the�basic�analysis�of�variance�(ANOVA)�models�
are�considered��Finally,�in�Chapters�17�through�19�we�examine�various�regression�models�
Second,�the�text�communicates�a�conceptual,�intuitive�understanding�of�statistics,�which�
requires� only� a� rudimentary� knowledge� of� basic� algebra� and� emphasizes� the� important�
concepts�in�statistics��The�most�effective�way�to�learn�statistics�is�through�the�conceptual�
approach��Statistical�concepts�tend�to�be�easy�to�learn�because�(a)�concepts�can�be�simply�
stated,�(b)�concepts�can�be�made�relevant�through�the�use�of�real-life�examples,�(c)�the�same�
concepts�are�shared�by�many�procedures,�and�(d)�concepts�can�be�related�to�one�another�
This�text�will�help�you�to�reach�these�goals��The�following�indicators�will�provide�some�
feedback�as�to�how�you�are�doing��First,�there�will�be�a�noticeable�change�in�your�attitude�
toward� statistics�� Thus,� one� outcome� is� for� you� to� feel� that� “statistics� is� not� half� bad,”� or�
“this� stuff� is� OK�”� Second,� you� will� feel� comfortable� using� statistics� in� your� own� work��
Finally,�you�will�begin�to�“see�the�light�”�You�will�know�when�you�have�reached�this�high-
est�stage�of�statistics�development�when�suddenly,�in�the�middle�of�the�night,�you�wake�up�
from�a�dream�and�say,�“now�I�get�it!”�In�other�words,�you�will�begin�to�think�statistics�rather�
than�think�of�ways�to�get�out�of�doing�statistics�
Pedagogical Tools
The�text�contains�several�important�pedagogical�features�to�allow�you�to�attain�these�goals��
First,�each�chapter�begins�with�an�outline�(so�you�can�anticipate�what�will�be�covered),�and�
a�list�of�key�concepts�(which�you�will�need�in�order�to�understand�what�you�are�doing)��
Second,�realistic�examples�from�education�and�the�behavioral�sciences�are�used�to�illustrate�
the�concepts�and�procedures�covered�in�each�chapter��Each�of�these�examples�includes�an�
initial� vignette,� an� examination� of� the� relevant� procedures� and� necessary� assumptions,�
how�to�run�SPSS�and�develop�an�APA�style�write-up,�as�well�as�tables,�figures,�and�anno-
tated�SPSS�output�to�assist�you��Third,�the�text�is�based�on�the�conceptual�approach��That�
is,�material�is�covered�so�that�you�obtain�a�good�understanding�of�statistical�concepts��If�
you� know� the� concepts,� then� you� know� statistics�� Finally,� each� chapter� ends� with� three�
sets�of�problems,�computational,�conceptual,�and�interpretive��Pay�particular�attention�to�
the� conceptual� problems� as� they� provide� the� best� assessment� of� your� understanding� of�
the�concepts�in�the�chapter��We�strongly�suggest�using�the�example�data�sets�and�the�com-
putational� and� interpretive� problems� for� additional� practice� through� available� statistics�
software��This�will�serve�to�reinforce�the�concepts�covered��Answers�to�the�odd-numbered�
problems�are�given�at�the�end�of�the�text�
New to This Edition
A� number� of� changes� have� been� made� in� the� third� edition� based� on� the� suggestions�
of�reviewers,�instructors,�teaching�assistants,�and�students��These�improvements�have�
been� made� in� order� to� better� achieve� the� goals� of� the� text�� You� will� note� the� addition�
of� a� coauthor� to� this� edition,� Debbie� Hahs-Vaughn,� who� has� contributed� greatly� to�
xvPreface
the�further�development�of�this�text��The�changes�include�the�following:�(a)�additional�
end�of�chapter�problems�have�been�included;�(b)�more�information�on�power�has�been�
added,�particularly�use�of�the�G*Power�software�with�screenshots;�(c)�content�has�been�
updated�and�numerous�additional�references�have�been�provided;�(d)�the�final�chapter�
on� logistic� regression� has� been� added� for� a� more� complete� presentation� of� regression�
models;�(e)�numerous�SPSS�(version�19)�screenshots�on�statistical�techniques�and�their�
assumptions�have�been�included�to�assist�in�the�generation�and�interpretation�of�output;�
(f)�more�information�has�been�added�to�most�chapters�on�SPSS;�(g)�research�vignettes�
and�templates�have�been�added�to�the�beginning�and�end�of�each�chapter,�respectively;�
(h)�a�discussion�of�expected�mean�squares�has�been�folded�into�the�analysis�of�variance�
chapters�to�provide�a�rationale�for�the�formation�of�proper�F�ratios;�and�(i)�a�website�for�
the�text�that�provides�students�and�instructors�access�to�detailed�solutions�to�the�book’s�
odd-numbered�problems;�chapter�outlines;�lists�of�key�terms�for�each�chapter;�and�SPSS�
datasets� that� correspond� to� the� chapter� examples� and� end-of-chapter� problems� that�
can�be�used�in�SPSS�and�other�packages�such�as�SAS,�HLM,�STATA,�and�LISREL��Only�
instructors� are� granted� access� to� the� PowerPoint� slides� for� each� chapter� that� include�
examples� and� APA� style� write� ups,� chapter� outlines,� and� key� terms;� multiple-choice�
(approximately�25�for�each�chapter)�and�short�answer�(approximately�5�for�each�chapter)�
test� questions;� and� answers� to� the� even-numbered� problems�� This� material� is� available� at:�
http://www�psypress�com/an-introduction-to-statistical-concepts-9780415880053�
http://www.psypress.com/an-introduction-to-statistical-concepts-9780415880053.
xvii
Acknowledgments
There� are� many� individuals� whose� assistance� enabled� the� completion� of� this� book�� We�
would� like� to� thank� the� following� individuals� whom� we� studied� with� in� school:� Jamie�
Algina,� Lloyd� Bond,� Amy� Broeseker,� Jim� Carlson,� Bill� Cooley,� Judy� Giesen,� Brian� Gray,�
Harry�Hsu,�Mary�Nell�McNeese,�Camille�Ogden,�Lou�Pingel,�Rod�Roth,�Charles�Stegman,�
and�Neil�Timm��Next,�numerous�colleagues�have�played�an�important�role�in�our�personal�
and� professional� lives� as� statisticians�� Rather� than� include� an� admittedly� incomplete�
listing,�we�just�say�“thank�you”�to�all�of�you��You�know�who�you�are�
Thanks� also� to� all� of� the� wonderful� people� at� Lawrence� Erlbaum� Associates� (LEA),� in�
particular,�to�Ray�O’Connell�for�inspiring�this�project�back�in�1986,�and�to�Debra�Riegert�
(formerly� at� LEA� and� now� at� Routledge)� for� supporting� the� development� of� subsequent�
texts� and� editions�� We� are� most� appreciative� of� the� insightful� suggestions� provided� by�
the� reviewers� of� this� text� over� the� years,� and� in� particular� the� reviewers� of� this� edition:�
Robert�P��Conti,�Sr��(Mount�Saint�Mary�College),�Feifei�Ye�(University�of�Pittsburgh),�Nan�
Thornton� (Capella� University),� and� one� anonymous� reviewer�� A� special� thank� you� to�
all�of�the�terrific�students�that�we�have�had�the�pleasure�of�teaching�at�the�University�of�
Pittsburgh,�the�University�of�Illinois–Chicago,�Louisiana�State�University,�Boston�College,�
Northern� Illinois� University,� the� University� of� Alabama,� The� Ohio� State� University,� and�
the� University� of� Central� Florida�� For� all� of� your� efforts,� and� the� many� lights� that� you�
have�seen�and�shared�with�us,�this�book�is�for�you��We�are�most�grateful�to�our�families,�
in�particular�to�Lea�and�Kristen,�and�to�Mark�and�Malani��It�is�because�of�your�love�and�
understanding�that�we�were�able�to�cope�with�such�a�major�project��Thank�you�one�and�all�
Richard G. Lomax
Debbie L. Hahs-Vaughn
1
1
Introduction
Chapter Outline
1�1� What�Is�the�Value�of�Statistics?
1�2� Brief�Introduction�to�History�of�Statistics
1�3� General�Statistical�Definitions
1�4� Types�of�Variables
1�5� Scales�of�Measurement
1�5�1� Nominal�Measurement�Scale
1�5�2� Ordinal�Measurement�Scale
1�5�3� Interval�Measurement�Scale
1�5�4� Ratio�Measurement�Scale
Key Concepts
� 1�� General�statistical�concepts
Population
Parameter
Sample
Statistic
Descriptive�statistics
Inferential�statistics
� 2�� Variable-related�concepts
Variable
Constant
Categorical
Dichotomous�variables
Numerical
Discrete�variables
Continuous�variables
2 An Introduction to Statistical Concepts
� 3�� Measurement�scale�concepts
Measurement
Nominal
Ordinal
Interval
Ratio
We�want�to�welcome�you�to�the�wonderful�world�of�statistics��More�than�ever,�statistics�are�
everywhere��Listen�to�the�weather�report�and�you�hear�about�the�measurement�of�variables�
such�as�temperature,�rainfall,�barometric�pressure,�and�humidity��Watch�a�sporting�event�
and�you�hear�about�batting�averages,�percentage�of�free�throws�completed,�and�total�rush-
ing�yardage��Read�the�financial�page�and�you�can�track�the�Dow�Jones�average,�the�gross�
national�product,�and�bank�interest�rates��Turn�to�the�entertainment�section�to�see�movie�
ratings,�movie�revenue,�or�the�top�10�best-selling�novels��These�are�just�a�few�examples�of�
statistics�that�surround�you�in�every�aspect�of�your�life�
Although�you�may�be�thinking�that�statistics�is�not�the�most�enjoyable�subject�on�the�planet,�
by�the�end�of�this�text,�you�will�(a)�have�a�more�positive�attitude�about�statistics,�(b)�feel�more�
comfortable�using�statistics,�and�thus�be�more�likely�to�perform�your�own�quantitative�data�
analyses,�and�(c)�certainly�know�much�more�about�statistics�than�you�do�now��In�other�words,�
our�goal�is�to�equip�you�with�the�skills�you�need�to�be�both�a�better�consumer�and�producer�of�
research��But�be�forewarned;�the�road�to�statistical�independence�is�not�easy��However,�we�will�
serve�as�your�guides�along�the�way��When�the�going�gets�tough,�we�will�be�there�to�help�you�
with�advice�and�numerous�examples�and�problems��Using�the�powers�of�logic,�mathematical�
reasoning,�and�statistical�concept�knowledge,�we�will�help�you�arrive�at�an�appropriate�solu-
tion�to�the�statistical�problem�at�hand�
Some�students�begin�their�first�statistics�class�with�some�anxiety��This�could�be�caused�
by�not�having�had�a�quantitative�course�for�some�time,�apprehension�built�up�by�delaying�
taking�statistics,�a�poor�past�instructor�or�course,�or�less�than�adequate�past�success��Let�
us�offer�a�few�suggestions�along�these�lines��First,�this�is�not�a�math�class�or�text��If�you�
want�one�of�those,�then�you�need�to�walk�over�to�the�math�department��This�is�a�course�
and�text�on�the�application�of�statistics�to�education�and�the�behavioral�sciences��Second,�
the�philosophy�of�the�text�is�on�the�understanding�of�concepts�rather�than�on�the�deriva-
tion�of�statistical�formulas��It�is�more�important�to�understand�concepts�than�to�derive�or�
memorize�various�and�sundry�formulas��If�you�understand�the�concepts,�you�can�always�
look�up�the�formulas�if�need�be��If�you�do�not�understand�the�concepts,�then�knowing�the�
formulas�will�only�allow�you�to�operate�in�a�cookbook�mode�without�really�understanding�
what� you� are� doing�� Third,� the� calculator� and� computer� are� your� friends�� These� devices�
are�tools�that�allow�you�to�complete�the�necessary�computations�and�obtain�the�results�of�
interest��If�you�are�performing�hand�computations,�find�a�calculator�that�you�are�comfort-
able�with;�it�need�not�have�800�functions,�as�the�four�basic�operations�and�sum�and�square�
root� functions� are� sufficient� (one� of� our� personal� calculators� is� one� of� those� little� credit�
card�calculators,�although�we�often�use�the�calculator�on�our�computers)��If�you�are�using�
a� statistical� software� program,� find� one� that� you� are� comfortable� with� (most� instructors�
will�have�you�using�a�program�such�as�SPSS,�SAS,�or�Statistica)��In�this�text,�we�use�SPSS�
to�illustrate�statistical�applications��Finally,�this�text�will�take�you�from�raw�data�to�results�
using�realistic�examples��These�can�then�be�followed�up�using�the�problems�at�the�end�of�
each�chapter��Thus,�you�will�not�be�on�your�own�but�will�have�the�text,�a�computer/calculator,�
as�well�as�your�course�and�instructor,�to�help�guide�you�
3Introduction
The�intent�and�philosophy�of�this�text�is�to�be�conceptual�and�intuitive�in�nature��Thus,�the�
text�does�not�require�a�high�level�of�mathematics�but�rather�emphasizes�the�important�con-
cepts�in�statistics��Most�statistical�concepts�really�are�fairly�easy�to�learn�because�(a)�concepts�
can�be�simply�stated,�(b)�concepts�can�be�related�to�real-life�examples,�(c)�many�of�the�same�
concepts�run�through�much�of�statistics,�and�therefore,�(d)�many�concepts�can�be�related�
In� this� introductory� chapter,� we� describe� the� most� basic� statistical� concepts�� We� begin�
with� the� question,� “What� is� the� value� of� statistics?”� We� then� look� at� a� brief� history� of�
statistics� by� mentioning� a� few� of� the� more� important� and� interesting� statisticians�� Then�
we�consider�the�concepts�of�population,�parameter,�sample�and�statistic,�descriptive�and�
inferential�statistics,�types�of�variables,�and�scales�of�measurement��Our�objectives�are�that�
by�the�end�of�this�chapter,�you�will�(a)�have�a�better�sense�of�why�statistics�are�necessary,�
(b)�see�that�statisticians�are�an�interesting�group�of�people,�and�(c)�have�an�understanding�
of�several�basic�statistical�concepts�
1.1 What Is the Value of Statistics?
Let�us�start�off�with�a�reasonable�rhetorical�question:�why�do�we�need�statistics?�In�other�
words,�what�is�the�value�of�statistics,�either�in�your�research�or�in�your�everyday�life?�As�a�
way�of�thinking�about�these�questions,�consider�the�following�headlines,�which�have�prob-
ably�appeared�in�your�local�newspaper�
Cigarette Smoking Causes Cancer—Tobacco Industry Denies Charges
A� study� conducted� at� Ivy-Covered� University� Medical� School,� recently� published� in�
the� New England Journal of Medicine,� has� definitively� shown� that� cigarette� smoking�
causes�cancer��In�interviews�with�100�randomly�selected�smokers�and�nonsmokers�over�
50 years�of�age,�30%�of�the�smokers�have�developed�some�form�of�cancer,�while�only�
10%� of� the� nonsmokers� have� cancer�� “The� higher� percentage� of� smokers� with� cancer�
in� our� study� clearly� indicates� that� cigarettes� cause� cancer,”� said� Dr�� Jason� P�� Smythe��
On� the� contrary,� “this� study� doesn’t� even� suggest� that� cigarettes� cause� cancer,”� said�
tobacco�lobbyist�Cecil�B��Hacker��“Who�knows�how�these�folks�got�cancer;�maybe�it�is�
caused�by�the�aging�process�or�by�the�method�in�which�individuals�were�selected�for�
the�interviews,”�Mr��Hacker�went�on�to�say�
North Carolina Congressional Districts
Gerrymandered—African-Americans Slighted
A�study�conducted�at�the�National�Center�for�Legal�Research�indicates�that�congressio-
nal�districts�in�the�state�of�North�Carolina�have�been�gerrymandered�to�minimize�the�
impact�of�the�African-American�vote��“From�our�research,�it�is�clear�that�the�districts�
are�apportioned�in�a�racially�biased�fashion��Otherwise,�how�could�there�be�no�single�
district� in� the� entire� state� which� has� a� majority� of� African-American� citizens� when�
over� 50%� of� the� state’s� population� is� African-American�� The� districting� system� abso-
lutely�has�to�be�changed,”�said�Dr��I��M��Researcher��A�spokesman�for�The�American�
Bar�Association�countered�with�the�statement�“according�to�a�decision�rendered�by�the�
United�States�Supreme�Court�in�1999�(No��98-85),�intent�or�motive�must�be�shown�for�
racial�bias�to�be�shown�in�the�creation�of�congressional�districts��The�decision�states�a�
4 An Introduction to Statistical Concepts
‘facially�neutral�law�…�warrants�strict�scrutiny�only�if�it�can�be�proved�that�the�law�was�
motivated�by�a�racial�purpose�or�object�’�The�data�in�this�study�do�not�show�intent�or�
motive��To�imply�that�these�data�indicate�racial�bias�is�preposterous�”
Global Warming—Myth According to the President
Research�conducted�at�the�National�Center�for�Global�Warming�(NCGW)�has�shown�
the�negative�consequences�of�global�warming�on�the�planet�Earth��As�summarized�by�
Dr��Noble�Pryze,�“our�studies�at�NCGW�clearly�demonstrate�that�if�global�warming�is�
not�halted�in�the�next�20�years,�the�effects�on�all�aspects�of�our�environment�and�cli-
matology�will�be�catastrophic�”�A�different�view�is�held�by�U�S��President�Harold�W��
Tree��He�stated�in�a�recent�address�that�“the�scientific�community�has�not�convinced�
him�that�global�warming�even�exists��Why�should�our�administration�spend�millions�
of�dollars�on�an�issue�that�has�not�been�shown�to�be�a�real�concern?”
How� is� one� to� make� sense� of� the� studies� described� by� these� headlines?� How� is� one� to�
decide�which� side�of�the�issue�these�data�support,�so�as�to�take�an�intellectual� stand?�In�
other�words,�do�the�interview�data�clearly�indicate�that�cigarette�smoking�causes�cancer?�
Do� the� congressional� district� percentages� of� African-Americans� necessarily� imply� that�
there�is�racial�bias?�Have�scientists�convinced�us�that�global�warming�is�a�problem?�These�
studies�are�examples�of�situations�where�the�appropriate�use�of�statistics�is�clearly�neces-
sary��Statistics�will�provide�us�with�an�intellectually�acceptable�method�for�making�deci-
sions�in�such�matters��For�instance,�a�certain�type�of�research,�statistical�analysis,�and�set�
of� results� are� all� necessary� to� make� causal� inferences� about� cigarette� smoking�� Another�
type�of�research,�statistical�analysis,�and�set�of�results�are�all�necessary�to�lead�one�to�con-
fidently�state�that�the�districting�system�is�racially�biased�or�not,�or�that�global�warming�
needs�to�be�dealt�with��The�bottom�line�is�that�the�purpose�of�statistics,�and�thus�of�this�
text,�is�to�provide�you�with�the�tools�to�make�important�decisions�in�an�appropriate�and�
confident�manner��You�will�not�have�to�trust�a�statement�made�by�some�so-called�expert�on�
an�issue,�which�may�or�may�not�have�any�empirical�basis�or�validity;�you�can�make�your�
own�judgments�based�on�the�statistical�analyses�of�data��For�you,�the�value�of�statistics�can�
include�(a)�the�ability�to�read�and�critique�articles�in�both�professional�journals�and�in�the�
popular� press� and� (b)� the� ability� to� conduct� statistical� analyses� for� your� own� research�
(e�g�,�thesis�or�dissertation)�
1.2 Brief Introduction to History of Statistics
As�a�way�of�getting�to�know�the�topic�of�statistics,�we�want�to�briefly�introduce�you�to�a�
few� famous� statisticians�� The� purpose� of� this� section� is� not� to� provide� a� comprehensive�
history�of�statistics,�as�those�already�exist�(e�g�,�Heyde,�Seneta,�Crepel,�Fienberg,�&�Gani,�
2001;�Pearson,�1978;�Stigler,�1986)��Rather,�the�purpose�of�this�section�is�to�show�that�famous�
statisticians�not�only�are�interesting�but�are�human�beings�just�like�you�and�me�
One� of� the� fathers� of� probability� (see� Chapter� 5)� is� acknowledged� to� be� Blaise� Pascal�
from� the� late� 1600s�� One� of� Pascal’s� contributions� was� that� he� worked� out� the� probabili-
ties� for� each� dice� roll� in� the� game� of� craps,� enabling� his� friend,� a� member� of� royalty,� to�
become�a�consistent�winner��He�also�developed�Pascal’s�triangle�which�you�may�remember�
5Introduction
from�your�early�mathematics�education��The�statistical�development�of�the�normal�or�bell-
shaped�curve�(see�Chapter�4)�is�interesting��For�many�years,�this�development�was�attrib-
uted�to�Karl�Friedrich�Gauss�(early�1800s)�and�was�actually�known�for�some�time�as�the�
Gaussian� curve�� Later� historians� found� that� Abraham� DeMoivre� actually� developed� the�
normal�curve�in�the�1730s��As�statistics�was�not�thought�of�as�a�true�academic�discipline�
until� the� late� 1800s,� people� like� Pascal� and� DeMoivre� were� consulted� by� the� wealthy� on�
odds�about�games�of�chance�and�by�insurance�underwriters�to�determine�mortality�rates�
Karl� Pearson� is� one� of� the� most� famous� statisticians� to� date� (late� 1800s� to� early� 1900s)��
Among�his�many�accomplishments�is�the�Pearson�product–moment�correlation�coefficient�
still�in�use�today�(see�Chapter�10)��You�may�know�of�Florence�Nightingale�(1820–1910)�as�an�
important�figure�in�the�field�of�nursing��However,�you�may�not�know�of�her�importance�in�
the�field�of�statistics��Nightingale�believed�that�statistics�and�theology�were�linked�and�that�
by�studying�statistics�we�might�come�to�understand�God’s�laws�
A�quite�interesting�statistical�personality�is�William�Sealy�Gossett,�who�was�employed�
by� the� Guinness� Brewery� in� Ireland�� The� brewery� wanted� to� select� a� sample� of� people�
from�Dublin�in�1906�for�purposes�of�taste�testing��Gossett�was�asked�how�large�a�sample�
was�needed�in�order�to�make�an�accurate�inference�about�the�entire�population�(see�next�
section)�� The� brewery� would� not� let� Gossett� publish� any� of� his� findings� under� his� own�
name,� so� he� used� the� pseudonym� of� Student�� Today,� the� t� distribution� is� still� known� as�
Student’s�t�distribution��Sir�Ronald�A��Fisher�is�another�of�the�most�famous�statisticians�of�
all�time��Working�in�the�early�1900s,�Fisher�introduced�the�analysis�of�variance�(ANOVA)�
(see�Chapters�11�through�16)�and�Fisher’s�z�transformation�for�correlations�(see�Chapter�10)��
In� fact,� the� major� statistic� in� the� ANOVA� is� referred� to� as� the� F� ratio� in� honor� of� Fisher��
These� individuals� represent� only� a� fraction� of� the� many� famous� and� interesting� statisti-
cians�over�the�years��For�further�information�about�these�and�other�statisticians,�we�sug-
gest�you�consult�references�such�as�Pearson�(1978),�Stigler�(1986),�and�Heyde�et�al��(2001),�
which�consist�of�many�interesting�stories�about�statisticians�
1.3 General Statistical Definitions
In�this�section,�we�define�some�of�the�most�basic�concepts�in�statistics��Included�here�are�
definitions�and�examples�of�the�following�concepts:�population,�parameter,�sample,�statis-
tic,�descriptive�statistics,�and�inferential�statistics�
The�first�four�concepts�are�tied�together,�so�we�discuss�them�together��A�population�is�
defined�as�consisting�of�all�members�of�a�well-defined�group��A�population�may�be�large�
in�scope,�such�as�when�a�population�is�defined�as�all�of�the�employees�of�IBM�worldwide��
A� population� may� be� small� in� scope,� such� as� when� a� population� is� defined� as� all� of� the�
IBM� employees� at� the� building� on� Main� Street� in� Atlanta�� Thus,� a� population� could� be�
large�or�small�in�scope��The�key�is�that�the�population�is�well�defined�such�that�one�could�
determine�specifically�who�all�of�the�members�of�the�group�are�and�then�information�or�
data�could�be�collected�from�all�such�members��Thus,�if�our� population�is�defined�as�all�
members�working�in�a�particular�office�building,�then�our�study�would�consist�of�collect-
ing�data�from�all�employees�in�that�building��It�is�also�important�to�remember�that�you,�the�
researcher,�define�the�population�
A� parameter� is� defined� as� a� characteristic� of� a� population�� For� instance,� parameters�
of� our� office� building� example� might� be� the� number� of� individuals� who� work� in� that�
6 An Introduction to Statistical Concepts
building�(e�g�,�154),�the�average�salary�of�those�individuals�(e�g�,�$49,569),�and�the�range�of�
ages�of�those�individuals�(e�g�,�21–68�years�of�age)��When�we�think�about�characteristics�of�
a�population,�we�are�thinking�about�population parameters��Those�two�terms�are�often�
linked�together�
A� sample� is� defined� as� consisting� of� a� subset� of� a� population�� A� sample� may� be� large�
in�scope,�such�as�when�a�population�is�defined�as�all�of�the�employees�of�IBM�worldwide�
and�20%�of�those�individuals�are�included�in�the�sample��A�sample�may�be�small�in�scope,�
such�as�when�a�population�is�defined�as�all�of�the�IBM�employees�at�the�building�on�Main�
Street�in�Atlanta�and�10%�of�those�individuals�are�included�in�the�sample��Thus,�a�sample�
could�be�large�or�small�in�scope�and�consist�of�any�portion�of�the�population��The�key�is�
that� the� sample� consists� of� some,� but� not� all,� of� the� members� of� the� population;� that� is,�
anywhere�from�one�individual�to�all�but�one�individual�from�the�population�is�included�in�
the�sample��Thus,�if�our�population�is�defined�as�all�members�working�in�the�IBM�building�
on�Main�Street�in�Atlanta,�then�our�study�would�consist�of�collecting�data�from�a�sample�
of�some�of�the�employees�in�that�building��It�follows�that�if�we,�the�researcher,�define�the�
population,�then�we�also�determine�what�the�sample�will�be�
A�statistic�is�defined�as�a�characteristic�of�a�sample��For�instance,�statistics�of�our�office�
building�example�might�be�the�number�of�individuals�who�work�in�the�building�that�we�
sampled�(e�g�,�77),�the�average�salary�of�those�individuals�(e�g�,�$54,090),�and�the�range�of�
ages� of� those� individuals� (e�g�,� 25–62� years� of� age)�� Notice� that� the� statistics� of� a� sample�
need�not�be�equal�to�the�parameters�of�a�population�(more�about�this�in�Chapter�5)��When�
we�think�about�characteristics�of�a�sample,�we�are�thinking�about�sample statistics��Those�
two� terms� are� often� linked� together�� Thus,� we� have� population� parameters� and� sample�
statistics,� but� no� other� combinations� of� those� terms� exist�� The� field� has� become� known�
as�statistics�simply�because�we�are�almost�always�dealing�with�sample�statistics�because�
population�data�are�rarely�obtained�
The�final�two�concepts�are�also�tied�together�and�thus�considered�together��The�field�of�
statistics�is�generally�divided�into�two�types�of�statistics,�descriptive�statistics�and�inferen-
tial�statistics��Descriptive statistics�are�defined�as�techniques�which�allow�us�to�tabulate,�
summarize,�and�depict�a�collection�of�data�in�an�abbreviated�fashion��In�other�words,�the�
purpose�of�descriptive�statistics�is�to�allow�us�to�talk�about�(or�describe)�a�collection�of�data�
without�having�to�look�at�the�entire�collection��For�example,�say�we�have�just�collected�a�
set�of�data�from�100,000�graduate�students�on�various�characteristics�(e�g�,�height,�weight,�
gender,�grade�point�average,�aptitude�test�scores)��If�you�were�to�ask�us�about�the�data,�we�
could�do�one�of�two�things��On�the�one�hand,�we�could�carry�around�the�entire�collection�
of�data�everywhere�we�go,�and�when�someone�asks�us�about�the�data,�simply�say�“Here�is�
the�data;�take�a�look�at�them�yourself�”�On�the�other�hand,�we�could�summarize�the�data�
in�an�abbreviated�fashion,�and�when�someone�asks�us�about�the�data,�simply�say�“Here�is�
a�table�and�a�graph�about�the�data;�they�summarize�the�entire�collection�”�So,�rather�than�
viewing�100,000�sheets�of�paper,�perhaps�we�would�only�have�to�view�two�sheets�of�paper��
Since� statistics� is� largely� a� system� of� communicating� information,� descriptive� statistics�
are�considerably�more�useful�to�a�consumer�than�an�entire�collection�of�data��Descriptive�
statistics�are�discussed�in�Chapters�2�through�4�
Inferential statistics�are�defined�as�techniques�which�allow�us�to�employ�inductive�rea-
soning�to�infer�the�properties�of�an�entire�group�or�collection�of�individuals,�a�population,�
from�a�small�number�of�those�individuals,�a�sample��In�other�words,�the�purpose�of�infer-
ential�statistics�is�to�allow�us�to�collect�data�from�a�sample�of�individuals�and�then�infer�the�
properties�of�that�sample�back�to�the�population�of�individuals��In�case�you�have�forgotten�
about�logic,�inductive�reasoning�is�where�you�infer�from�the�specific�(here�the�sample)�to�
7Introduction
the�general�(here�the�population)��For�example,�say�we�have�just�collected�a�set�of�sample�
data�from�5,000�of�the�population�of�100,000�graduate�students�on�various�characteristics�
(e�g�,�height,�weight,�gender,�grade�point�average,�aptitude�test�scores)��If�you�were�to�ask�
us�about�the�data,�we�could�compute�various�sample�statistics�and�then�infer�with�some�
confidence�that�these�would�be�similar�to�the�population�parameters��In�other�words,�this�
allows� us� to� collect� data� from� a� subset� of� the� population� yet� still� make� inferential� state-
ments�about�the�population�without�collecting�data�from�the�entire�population��So,�rather�
than�collecting�data�from�all�100,000�graduate�students�in�the�population,�we�could�collect�
data�on�a�sample�of�say�5,000�students�
As�another�example,�Gossett�(aka�Student)�was�asked�to�conduct�a�taste�test�of�Guinness�
beer� for� a� sample� of� Dublin� residents�� Because� the� brewery� could� not� afford� to� do� this�
with�the�entire�population�of�Dublin,�Gossett�collected�data�from�a�sample�of�Dublin�resi-
dents�and�was�able�to�make�an�inference�from�these�sample�results�back�to�the�population��
A discussion�of�inferential�statistics�begins�in�Chapter�5��In�summary,�the�field�of�statistics�
is�roughly�divided�into�descriptive�statistics�and�inferential�statistics��Note,�however,�that�
many�further�distinctions�are�made�among�the�types�of�statistics,�but�more�about�that�later�
1.4 Types of Variables
There�are�several�terms�we�need�to�define�about�variables��First,�it�might�be�useful�to�define�
the�term�variable��A�variable�is�defined�as�any�characteristic�of�persons�or�things�that�is�
observed�to�take�on�different�values��In�other�words,�the�values�for�a�particular�character-
istic�vary�across�the�individuals�observed��For�example,�the�annual�salary�of�the�families�
in�your�neighborhood�varies�because�not�every�family�earns�the�same�annual�salary��One�
family�might�earn�$50,000�while�the�family�right�next�door�might�earn�$65,000��Thus,�the�
annual�family�salary�is�a�variable�because�it�varies�across�families�
In� contrast,� a� constant� is� defined� as� any� characteristic� of� persons� or� things� that� is�
observed�to�take�on�only�a�single�value��In�other�words,�the�values�for�a�particular�char-
acteristic� are� the� same� for� all� individuals� observed�� For� example,� say� every� family� in�
your� neighborhood� has� a� lawn�� Although� the� nature� of� the� lawns� may� vary,� everyone�
has�a�lawn��Thus,�whether�a�family�has�a�lawn�in�your�neighborhood�is�a�constant�and�
therefore�would�not�be�a�very�interesting�characteristic�to�study��When�designing�a�study,�
you�(i�e�,�the�researcher)�can�determine�what�is�a�constant��This�is�part�of�the�process�of�
delimiting,�or�narrowing�the�scope�of,�your�study��As�an�example,�you�may�be�interested�
in� studying� career� paths� of� girls� who� complete� AP� science� courses�� In� designing� your�
study,�you�are�only�interested�in�girls,�and�thus,�gender�would�be�a�constant��This�is�not�
to� say� that� the� researcher� wholly� determines� when� a� characteristic� is� a� constant�� It� is�
sometimes�the�case�that�we�find�that�a�characteristic�is�a�constant�after�we�conduct�the�
study�� In� other� words,� one� of� the� measures� has� no� variation—everyone� or� everything�
scored�or�remained�the�same�on�that�particular�characteristic�
There�are�different�typologies�for�describing�variables��One�typology�is�categorical�(or�
qualitative)� versus� numerical� (or� quantitative),� and� within� numerical,� discrete,� and� con-
tinuous��A�categorical�variable�is�a�qualitative�variable�that�describes�categories�of�a�char-
acteristic�or�attribute��Examples�of�categorical�variables�include�political�party�affiliation�
(Republican�=�1,�Democrat�=�2,�Independent�=�3),�religious�affiliation�(e�g�,�Methodist�=�1,�
Baptist�=�2,�Roman�Catholic�=�3),�and�course�letter�grade�(A�=�4,�B�=�3,�C�=�2,�D�=�1,�F�=�0)��
8 An Introduction to Statistical Concepts
A�dichotomous variable�is�a�special,�restricted�type�of�categorical�variable�and�is�defined�
as� a� variable� that� can� take� on� only� one� of� two� values�� For� example,� biologically� deter-
mined�gender�is�a�variable�that�can�only�take�on�the�values�of�male�or�female�and�is�often�
coded�numerically�as�0�(e�g�,�for�males)�or�1�(e�g�,�for�females)��Other�dichotomous�variables�
include�pass/fail,�true/false,�living/dead,�and�smoker/nonsmoker��Dichotomous�variables�
will�take�on�special�importance�as�we�study�binary�logistic�regression�(Chapter�19)�
A�numerical�variable�is�a�quantitative�variable��Numerical�variables�can�further�be�clas-
sified�as�either�discrete�or�continuous��A�discrete variable�is�defined�as�a�variable�that�can�
only�take�on�certain�values��For�example,�the�number�of�children�in�a�family�can�only�take�on�
certain�values��Many�values�are�not�possible,�such�as�negative�values�(e�g�,�the�Joneses�cannot�
have�−2�children)�or�decimal�values�(e�g�,�the�Smiths�cannot�have�2�2�children)��In�contrast,�
a�continuous variable�is�defined�as�a�variable�that�can�take�on�any�value�within�a�certain�
range�given�a�precise�enough�measurement�instrument��For�example,�the�distance�between�
two� cities� can� be� measured� in� miles,� with� miles� estimated� in� whole� numbers�� However,�
given� a� more� precise� instrument� with� which� to� measure,� distance� can� even� be� measured�
down� to� the� inch� or� millimeter�� When� considering� the� difference� between� a� discrete� and�
continuous� variable,� keep� in� mind� that� discrete variables arise from the counting process� and�
continuous variables arise from the measuring process�� For� example,� the� number� of� students�
enrolled�in�your�statistics�class�is�a�discrete�variable��If�we�were�to�measure�(i�e�,�count)�the�
number�of�students�in�the�class,�it�would�not�matter�if�we�counted�first�names�alphabetically�
from�A�to�Z�or�if�we�counted�beginning�with�who�sat�in�the�front�row�to�the�last�person�in�
the�back�row—either�way,�we�would�arrive�at�the�same�value��In�other�words,�how�we�“mea-
sure”�(again,�count)�the�students�in�the�class�does�not�matter—we�will�always�arrive�at�the�
same�result��In�comparison,�the�value�of�a�continuous�variable�is�dependent�on�how�precise�
the�measuring�instrument�is��Weighing�yourself�on�a�scale�that�rounds�to�whole�numbers�
will�give�us�one�measure�of�weight��However,�weighing�on�another,�more�precise,�scale�that�
rounds�to�three�decimal�places�will�provide�a�more�precise�measure�of�weight�
Here�are�a�few�additional�examples�of�the�discrete�and�continuous�variables��Other�dis-
crete� variables� include� a� number� of� CDs� owned,� number� of� credit� hours� enrolled,� and�
number�of�teachers�employed�at�a�school��Other�continuous�variables�include�salary�(from�
zero�to�billions�in�dollars�and�cents),�age�(from�zero�up,�in�millisecond�increments),�height�
(from� zero� up,�in�increments�of�fractions�of�millimeters),�weight� (from� zero�up,� in�incre-
ments�of�fractions�of�ounces),�and�time�(from�zero�up,�in�millisecond�increments)��Variable�
type�is�often�important�in�terms�of�selecting�an�appropriate�statistic,�as�shown�later�
1.5 Scales of Measurement
Another�concept�useful�for�selecting�an�appropriate�statistic�is�the�scale�of�measurement�
of�the�variables��First,�however,�we�define�measurement�as�the�assignment�of�numerical�
values�to�persons�or�things�according�to�explicit�rules��For�example,�how�do�we�measure�a�
person’s�weight?�Well,�there�are�rules�that�individuals�commonly�follow��Currently,�weight�
is�measured�on�some�sort�of�balance�or�scale�in�pounds�or�grams��In�the�old�days,�weight�
was�measured�by�different�rules,�such�as�the�number�of�stones�or�gold�coins��These�explicit�
rules�were�developed�so�that�there�was�a�standardized�and�generally�agreed�upon�method�
of� measuring� weight�� Thus,� if� you� weighted� 10� stones� in� Coventry,� England,� then� that�
meant�the�same�as�10�stones�in�Liverpool,�England�
9Introduction
In�1951,�the�psychologist�S�S��Stevens�developed�four�types�of�measurement�scales�that�
could�be�used�for�assigning�these�numerical�values��In�other�words,�the�type�of�rule�used�
was�related�to�the�measurement�scale��The�four�types�of�measurement�scales�are�the�nomi-
nal,�ordinal,�interval,�and�ratio�scales��They�are�presented�in�order�of�increasing�complex-
ity�and�of�increasing�information�(remembering�the�acronym�NOIR�might�be�helpful)��
It�is�worth�restating�the�importance�of�understanding�the�measurement�scales�of�variables�
as�the�measurement�scales�will�dictate�what�statistical�procedures�can�be�performed�with�
the�data�
1.5.1 Nominal Measurement Scale
The� simplest� scale� of� measurement� is� the� nominal scale�� Here� individuals� or� objects�
are�classified�into�categories�so�that�all�of�those�in�a�single�category�are�equivalent�with�
respect� to� the� characteristic� being� measured�� For� example,� the� country� of� birth� of� an�
individual� is� a� nominally� scaled� variable�� Everyone� born� in� France� is� equivalent� with�
respect�to�this�variable,�whereas�two�people�born�in�different�countries�(e�g�,�France�and�
Australia)�are�not�equivalent�with�respect�to�this�variable��The�categories�are�truly�quali-
tative�in�nature,�not�quantitative��Categories�are�typically�given�names�or�numbers��For�
our� example,� the� country� name� would� be� an� obvious� choice� for� categories,� although�
numbers�could�also�be�assigned�to�each�country�(e�g�,�Brazil�=�5,�India�=�34)��The�numbers�
do�not�represent�the�amount�of�the�attribute�possessed��An�individual�born�in�India�does�
not�possess�any�more�of�the�“country�of�birth�origin”�attribute�than�an�individual�born�
in�Brazil�(which�would�not�make�sense�anyway)��The�numbers�merely�identify�to�which�
category� an� individual� or� object� belongs�� The� categories� are� also� mutually� exclusive��
That�is,�an�individual�can�belong�to�one�and�only�one�category,�such�as�a�person�being�
born�in�only�one�country�
The� statistics� of� a� nominal� scale� variable� are� quite� simple� as� they� can� only� be� based�
on� the� frequencies� that� occur� within� each� of� the� categories�� For� example,� we� may� be�
studying�characteristics�of�various�countries�in�the�world��A�nominally�scaled�variable�
could� be� the� hemisphere� in� which� the� country� is� located� (northern,� southern,� eastern,�
and�western)��While�it�is�possible�to�count�the�number�of�countries�that�belong�to�each�
hemisphere,�that�is�all�that�we�can�do��The�only�mathematical�property�that�the�nominal�
scale�possesses�is�that�of�equality�versus�inequality��In�other�words,�two�individuals�or�
objects�are�either�in�the�same�category�(equal)�or�in�different�categories�(unequal)��For�the�
hemisphere�variable,�we�can�either�use�the�country�name�or�assign�numerical�values�
to�each�country��We�might�perhaps�assign�each�hemisphere�a�number�alphabetically�from�
1�to�4��Countries�that�are�in�the�same�hemisphere�are�equal�with�respect�to�this�character-
istic��Countries�that�are�in�different�hemispheres�are�unequal�with�respect�to�this�charac-
teristic��Again,�these�particular�numerical�values�are�meaningless�and�could�arbitrarily�
be�any�values��The�numerical�values�assigned�only�serve�to�keep�the�categories�distinct�
from�one�another��Many�other�numerical�values�could�be�assigned�for�the�hemispheres�
and� still� maintain� the� equality� versus� inequality� property�� For� example,� the� northern�
hemisphere�could�easily�be�categorized�as�1000�and�the�southern�hemisphere�as�2000�with�
no�change�in�information��Other�examples�of�nominal�scale�variables�include�hair�color,�
eye�color,�neighborhood,�gender,�ethnic�background,�religious�affiliation,�political�party�
affiliation,�type�of�life�insurance�owned�(e�g�,�term,�whole�life),�blood�type,�psychological�
clinical�diagnosis,�Social�Security�number,�and�type�of�headache�medication�prescribed��
The� term� nominal� is� derived� from� “giving� a� name�”� Nominal� variables� are� considered�
categorical�or�qualitative�
10 An Introduction to Statistical Concepts
1.5.2 Ordinal Measurement Scale
The�next�most�complex�scale�of�measurement�is�the�ordinal scale��Ordinal�measurement�
is�determined�by�the�relative�size�or�position�of�individuals�or�objects�with�respect�to�the�
characteristic�being�measured��That�is,�the�individuals�or�objects�are�rank-ordered�accord-
ing�to�the�amount�of�the�characteristic�that�they�possess��For�example,�say�a�high�school�
graduating� class� had� 250� students�� Students� could� then� be� assigned� class� ranks� accord-
ing�to�their�academic�performance�(e�g�,�grade�point�average)�in�high�school��The�student�
ranked�1�in�the�class�had�the�highest�relative�performance,�and�the�student�ranked�250�had�
the�lowest�relative�performance�
However,�equal�differences�between�the�ranks�do�not�imply�equal�distance�in�terms�of�
the�characteristic�being�measured��For�example,�the�students�ranked�1�and�2�in�the�class�
may�have�a�different�distance�in�terms�of�actual�academic�performance�than�the�students�
ranked� 249� and� 250,� even� though� both� pairs� of� students� differ� by� a� rank� of� 1�� In� other�
words,�here�a�rank�difference�of�1�does�not�imply�the�same�actual�performance�distance��
The�pairs�of�students�may�be�very,�very�close�or�be�quite�distant�from�one�another��As�
a�result�of�equal�differences�not�implying�equal�distances,�the�statistics�that�we�can�use�
are�limited�due�to�these�unequal�intervals��The�ordinal�scale�then�consists�of�two�math-
ematical�properties:�equality�versus�inequality�again;�and�if�two�individuals�or�objects�
are�unequal,�then�we�can�determine�greater�than�or�less�than��That�is,�if�two�individuals�
have�different�class�ranks,�then�we�can�determine�which�student�had�a�greater�or�lesser�
class�rank��Although�the�greater�than�or�less�than�property�is�evident,�an�ordinal�scale�
cannot�tell�us�how�much�greater�than�or�less�than�because�of�the�unequal�intervals��Thus,�
the�student�ranked�250�could�be�farther�away�from�student�249�than�the�student�ranked�2�
from�student�1�
When� we� have� untied� ranks,� as� shown� on� the� left� side� of� Table� 1�1,� assigning� ranks� is�
straightforward�� What� do� we� do� if� there� are� tied� ranks?� For� example,� suppose� there� are�
two�students�with�the�same�grade�point�average�of�3�8�as�given�on�the�right�side�of�Table�1�1��
How�do�we�assign�them�into�class�ranks?�It�is�clear�that�they�have�to�be�assigned�the�same�
rank,�as�that�would�be�the�only�fair�method��However,�there�are�at�least�two�methods�for�
dealing�with�tied�ranks��One�method�would�be�to�assign�each�of�them�a�rank�of�2�as�that�is�
the�next�available�rank��However,�there�are�two�problems�with�that�method��First,�the�sum�
of�the�ranks�for�the�same�number�of�scores�would�be�different�depending�on�whether�there�
Table 1.1
Untied�Ranks�and�Tied�Ranks�for�Ordinal�Data
Untied Ranks Tied Ranks
Grade Point
Average Rank
Grade Point
Average Rank
4�0 1 4�0 1
3�9 2 3�8 2�5
3�8 3 3�8 2�5
3�6 4 3�6 4
3�2 5 3�0 6
3�0 6 3�0 6
2�7 7 3�0 6
Sum�=�28 Sum�=�28
11Introduction
were�ties�or�not��Statistically,�this�is�not�a�satisfactory�solution��Second,�what�rank�would�
the�next�student�having�the�3�6�grade�point�average�be�given,�a�rank�of�3�or�4?
The� second� and� preferred� method� is� to� take� the� average� of� the� available� ranks� and�
assign�that�value�to�each�of�the�tied�individuals��Thus,�the�two�persons�tied�at�a�grade�
point�average�of�3�8�have�as�available�ranks�2�and�3��Both�would�then�be�assigned�the�
average�rank�of�2�5��Also,�the�three�persons�tied�at�a�grade�point�average�of�3�0�have�as�
available�ranks�5,�6,�and�7��These�all�would�be�assigned�the�average�rank�of�6��You�also�
see�in�the�table�that�with�this�method�the�sum�of�the�ranks�for�7�scores�is�always�equal�
to�28,�regardless�of�the�number�of�ties��Statistically,�this�is�a�satisfactory�solution�and�the�
one� we� prefer,� whether� we� are� using� a� statistical� software� package� or� hand� computa-
tions�� Other� examples� of� ordinal� scale� variables� include� course� letter� grades,� order� of�
finish� in� the� Boston� Marathon,� socioeconomic� status,� hardness� of� minerals� (1� =� soft-
est�to�10�=�hardest),�faculty�rank�(assistant,�associate,�and�full�professor),�student�class�
(freshman,�sophomore,�junior,�senior,�graduate�student),�ranking�on�a�personality�trait�
(e�g�,� extreme� intrinsic� to� extreme� extrinsic� motivation),� and� military� rank�� The� term�
ordinal� is� derived� from� “ordering”� individuals� or� objects�� Ordinal� variables� are� most�
often�considered�categorical�or�qualitative�
1.5.3 Interval Measurement Scale
The�next�most�complex�scale�of�measurement�is�the�interval scale��An�interval�scale�is�one�
where� individuals� or� objects� can� be� ordered,� and� equal� differences� between� the� values�
do�imply�equal�distance�in�terms�of�the�characteristic�being�measured��That�is,�order�and�
distance�relationships�are�meaningful��However,�there�is�no�absolute�zero�point��Absolute�
zero,�if�it�exists,�implies�the�total�absence�of�the�property�being�measured��The�zero�point�of�
an�interval�scale,�if�it�exists,�is�arbitrary�and�does�not�reflect�the�total�absence�of�the�prop-
erty�being�measured��Here�the�zero�point�merely�serves�as�a�placeholder��For�example,�sup-
pose�that�we�gave�you�the�final�exam�in�advanced�statistics�right�now��If�you�were�to�be�so�
unlucky�as�to�obtain�a�score�of�0,�this�score�does�not�imply�a�total�lack�of�knowledge�of�sta-
tistics��It�would�merely�reflect�the�fact�that�your�statistics�knowledge�is�not�that�advanced�
yet�(or�perhaps�the�questions�posed�on�the�exam�just�did�not�capture�those�concepts�that�
you�do�understand)��You�do�have�some�knowledge�of�statistics�but�just�at�an�introductory�
level�in�terms�of�the�topics�covered�so�far�
Take�as�an�example�the�Fahrenheit�temperature�scale,�which�has�a�freezing�point�of�
32�degrees��A�temperature�of�zero�is�not�the�total�absence�of�heat,�just�a�point�slightly�
colder� than� 1� degree� and� slightly� warmer� than� −1� degree�� In� terms� of� the� equal� dis-
tance�notion,�consider�the�following�example��Say�that�we�have�two�pairs�of�Fahrenheit�
temperatures,�the�first�pair�being�55�and�60�degrees�and�the�second�pair�being�25�and�
30�degrees��The�difference�of�5�degrees�is�the�same�for�both�pairs�and�is�also�the�same�
everywhere�along�the�Fahrenheit�scale��Thus,�every�5�degree�interval�is�an�equal�interval��
However,�we�cannot�say�that�60�degrees�is�twice�as�warm�as�30�degrees,�as�there�is�no�
absolute�zero��In�other�words,�we�cannot�form�true�ratios�of�values�(i�e�,�60/30�=�2)��This�
property�only�exists�for�the�ratio�scale�of�measurement��The�interval�scale�has�as�math-
ematical� properties� equality� versus� inequality,� greater� than� or� less� than� if� unequal,�
and�equal�intervals��Other�examples�of�interval�scale�variables�include�the�Centigrade�
temperature� scale,� calendar� time,� restaurant� ratings� by� the� health� department� (on� a�
100-point�scale),�year�(since�1�AD),�and�arguably,�many�educational�and�psychological�
assessment�devices�(although�statisticians�have�been�debating�this�one�for�many�years;�
12 An Introduction to Statistical Concepts
e�g�,�on�occasion�there�is�a�fine�line�between�whether�an�assessment�is�measured�along�
the�ordinal�or�the�interval�scale)��Interval�variables�are�considered�numerical�and�pri-
marily�continuous�
1.5.4 Ratio Measurement Scale
The�most�complex�scale�of�measurement�is�the�ratio scale��A�ratio�scale�has�all�of�the�proper-
ties�of�the�interval�scale,�plus�an�absolute�zero�point�exists��Here�a�measurement�of�0�indi-
cates�a�total�absence�of�the�property�being�measured��Due�to�an�absolute�zero�point�existing,�
true�ratios�of�values�can�be�formed�which�actually�reflect�ratios�in�the�amounts�of�the�charac-
teristic�being�measured��Thus,�if�concepts�such�as�“one-half�as�big”�or�“twice�as�large”�make�
sense,�then�that�may�be�a�good�indication�that�the�variable�is�ratio�in�scale�
For�example,�the�height�of�individuals�is�a�ratio�scale�variable��There�is�an�absolute�zero�
point� of� zero� height�� We� can� also� form� ratios� such� that� 6′0″� Sam� is� twice� as� tall� as� his�
3′0″� daughter� Samantha�� The� ratio� scale� of� measurement� is� not� observed� frequently� in�
education�and�the�behavioral�sciences,�with�certain�exceptions��Motor�performance�vari-
ables�(e�g�,�speed�in�the�100�meter�dash,�distance�driven�in�24�hours),�elapsed�time,�calorie�
consumption,�and�physiological�characteristics�(e�g�,�weight,�height,�age,�pulse�rate,�blood�
pressure)� are� ratio� scale� measures� (and� are� all� also� examples� of� continuous� variables)��
Discrete�variables,�those�that�arise�from�the�counting�process,�are�also�examples�of�ratio�
variables�since�zero�indicates�an�absence�of�what�is�measured�(e�g�,�the�number�of�children�
in�a�family�or�the�number�of�trees�in�a�park)��A�summary�of�the�measurement�scales,�their�
characteristics,� and� some� examples� is� given� in� Table� 1�2�� Ratio� variables� are� considered�
numerical�and�can�be�either�discrete�or�continuous�
Table 1.2
Summary�of�the�Scales�of�Measurement
Scale Characteristics Examples
Nominal Classify�into�categories;�categories�are�given�
names�or�numbers,�but�the�numbers�are�
arbitrary;�mathematical�property:
1��Equal�versus�unequal
Hair�or�eye�color,�ethnic�background,�
neighborhood,�gender,�country�of�birth,�social�
security�number,�type�of�life�insurance,�religious�
or�political�affiliation,�blood�type,�clinical�
diagnosis
Ordinal Rank-ordered�according�to�relative�size�
or position;�mathematical�properties:
1��Equal�versus�unequal
2��If�unequal,�then�greater�than�or�less�than
Letter�grades,�order�of�finish�in�race,�class�rank,�
SES,�hardness�of�minerals,�faculty�rank,�student�
class,�military�rank,�rank�on�personality�trait
Interval Rank-ordered�and�equal�differences�between�
values�imply�equal�distances�in�the�attribute;�
mathematical�properties:
1��Equal�versus�unequal
2��If�unequal,�then�greater�than�or�less�than
3��Equal�intervals
Temperature,�calendar�time,�most�assessment�
devices,�year,�restaurant�ratings
Ratio Rank-ordered,�equal�intervals,�absolute�zero�
allows�ratios�to�be�formed;�mathematical�
properties:
1��Equal�versus�unequal
2��If�unequal,�then�greater�than�or�less�than
3��Equal�intervals
4��Absolute�zero
Speed�in�100�meter�dash,�height,�weight,�age,�
distance�driven,�elapsed�time,�pulse�rate,�blood�
pressure,�calorie�consumption
13Introduction
1.6 Summary
In� this� chapter,� an� introduction� to� statistics� was� given�� First,� we� discussed� the� value� and�
need�for�knowledge�about�statistics�and�how�it�assists�in�decision�making��Next,�a�few�of�
the�more�colorful�and�interesting�statisticians�of�the�past�were�mentioned��Then,�we�defined�
the�following�general�statistical�terms:�population,�parameter,�sample,�statistic,�descriptive�
statistics,�and�inferential�statistics��We�then�defined�variable-related�terms�including�vari-
ables,� constants,� categorical� variables,� and� continuous� variables�� For� a� summary� of� these�
definitions,�see�Box�1�1��Finally,�we�examined�the�four�classic�types�of�measurement�scales,�
nominal,� ordinal,� interval,� and� ratio�� By� now,� you� should� have� met� the� following� objec-
tives:�(a) have�a�better�sense�of�why�statistics�are�necessary;�(b)�see�that�statisticians�are�an�
interesting�group�of�people;�and�(c)�have�an�understanding�of�the�basic�statistical�concepts�
of�population,�parameter,�sample,�and�statistic,�descriptive�and�inferential�statistics,�types�
of� variables,� and� scales� of� measurement�� The� next� chapter� begins� to� address� some� of� the�
details�of�descriptive�statistics�when�we�consider�how�to�represent�data�in�terms�of�tables�
and�graphs��In�other�words,�rather�than�carrying�our�data�around�with�us�everywhere�we�go,�
we�examine�ways�to�display�data�in�tabular�and�graphical�forms�to�foster�communication�
STOp aNd ThINk bOx 1.1
Summary�of�Definitions
Term Definition Example(s)
Population All�members�of�a�well-defined�group All�employees�of�IBM�Atlanta
Parameter A�characteristic�of�a�population Average�salary�of�a�population
Sample A�subset�of�a�population Some�employees�of�IBM�Atlanta
Statistic A�characteristic�of�a�sample Average�salary�of�a�sample
Descriptive�statistics Techniques�which�allow�us�to�tabulate,�
summarize,�and�depict�a�collection�of�data�
in an�abbreviated�fashion
Table�or�graph�summarizing�data
Inferential�statistics Techniques�which�allow�us�to�employ�inductive�
reasoning�to�infer�the�properties�of�a�
population�from�a�sample
Taste�test�statistics�from�sample�
of Dublin�residents
Variable Any�characteristic�of�persons�or�things�that�
is observed�to�take�on�different�values
Salary�of�the�families�in�your�
neighborhood
Constant Any�characteristic�of�persons�or�things�that�
is observed�to�take�on�only�a�single�value
Every�family�has�a�lawn�in�your�
neighborhood
Categorical�variable A�qualitative�variable Political�party�affiliation
Dichotomous�variable A�categorical�variable�that�can�take�on�only�
one of�two�values
Biologically�determined�gender
Numerical�variable A�quantitative�variable�that�is�either�discrete�
or continuous
Number�of�children�in�a�family;�
the�distance�between�two�cities
Discrete�variable A�numerical�variable�that�arises�from�the�
counting�process�that�can�take�on�only�certain�
values
Number�of�children�in�a�family
Continuous�variable A�numerical�variable�that�can�take�on�any�value�
within�a�certain�range�given�a�precise�enough�
measurement�instrument
Distance�between�two�cities
14 An Introduction to Statistical Concepts
Problems
Conceptual problems
1.1� �A�mental�health�counselor�is�conducting�a�research�study�on�satisfaction�that�married�
couples� have� with� their� marriage�� “Marital� status”� (e�g�,� single,� married,� divorced,�
widowed),�in�this�scenario,�is�which�one�of�the�following?
� a�� Constant
� b�� Variable
1.2� �Belle� randomly� samples� 100� library� patrons� and� gathers� data� on� the� genre� of� the�
“first�book”�that�they�checked�out�from�the�library��She�finds�that�85�library�patrons�
checked� out� a� fiction� book� and� 15� library� patrons� checked� out� a� nonfiction� book��
Which�of�the�following�best�characterizes�the�type�of�“first�book”�checked�out�in�this�
study?
� a�� Constant
� b�� Variable
1.3� For�interval�level�variables,�which�of�the�following�properties�does�not�apply?
� a�� Jim�is�two�units�greater�than�Sally�
� b�� Jim�is�greater�than�Sally�
� c�� Jim�is�twice�as�good�as�Sally�
� d�� Jim�differs�from�Sally�
1.4� �Which�of�the�following�properties�is�appropriate�for�ordinal�but�not�for�nominal�variables?
� a�� Sue�differs�from�John�
� b�� Sue�is�greater�than�John�
� c�� Sue�is�10�units�greater�than�John�
� d�� Sue�is�twice�as�good�as�John�
1.5� �Which� scale� of� measurement� is� implied� by� the� following� statement:� “Jill’s� score� is�
three�times�greater�than�Eric’s�score”?
� a�� Nominal
� b�� Ordinal
� c�� Interval
� d�� Ratio
1.6� �Which�scale�of�measurement�is�implied�by�the�following�statement:�“Bubba�had�the�
highest�score”?
� a�� Nominal
� b�� Ordinal
� c�� Interval
� d�� Ratio
1.7� �A�band�director�collects�data�on�the�number�of�years�in�which�students�in�the�band�
have� played� a� musical� instrument�� Which� scale� of� measurement� is� implied� by� this�
scenario?
15Introduction
� a�� Nominal
� b�� Ordinal
� c�� Interval
� d�� Ratio
1.8� �Kristen�has�an�IQ�of�120��I�assert�that�Kristen�is�20%�more�intelligent�than�the�average�
person�having�an�IQ�of�100��Am�I�correct?
1.9� Population�is�to�parameter�as�sample�is�to�statistic��True�or�false?
1.10� Every�characteristic�of�a�sample�of�100�persons�constitutes�a�variable��True�or�false?
1.11� A�dichotomous�variable�is�also�a�categorical�variable��True�or�false?
1.12� �The� amount� of� time� spent� studying� in� 1� week� for� a� population� of� students� is� an�
inferential�statistic��True�or�false?
1.13� For�ordinal�level�variables,�which�of�the�following�properties�does�not�apply?
� a�� IBM�differs�from�Apple�
� b�� IBM�is�greater�than�Apple�
� c�� IBM�is�two�units�greater�than�Apple�
� d�� All�of�the�aforementioned�properties�apply�
1.14� �A�sample�of�50�students�take�an�exam,�and�the�instructor�decides�to�give�the�top�5�
scores�a�bonus�of�5�points��Compared�to�the�original�set�of�scores�(no�bonus),�I�assert�
that�the�ranks�of�the�new�set�of�scores�(including�bonus)�will�be�exactly�the�same��Am�
I�correct?
1.15� �Johnny�and�Buffy�have�class�ranks�of�5�and�6��Ingrid�and�Toomas�have�class�ranks�of�
55�and�56��I�assert�that�the�GPAs�of�Johnny�and�Buffy�are�the�same�distance�apart�as�
are�the�GPAs�of�Ingrid�and�Toomas��Am�I�correct?
Computational problems
1.1� �Rank� the� following� values� of� the� number� of� CDs� owned,� assigning� rank� 1� to� the�
largest�value:
10 15 12 8 20 17 5 21 3 19
1.2� �Rank�the�following�values�of�the�number�of�credits�earned,�assigning�rank�1�to�the�
largest�value:
10 16 10 8 19 16 5 21 3 19
1.3� �Rank�the�following�values�of�the�number�of�pairs�of�shoes�owned,�assigning�rank�1�
to�the�largest�value:
8 6 3 12 19 7 10 25 4 42
Interpretive problems
Consider�the�following�class�survey:
1.1� What�is�your�gender?
1.2� What�is�your�height�in�inches?
1.3� What�is�your�shoe�size�(length)?
16 An Introduction to Statistical Concepts
1.4� Do�you�smoke?
1.5� Are�you�left-�or�right-handed?�Your�mother?�Your�father?
1.6� How�much�did�you�spend�at�your�last�hair�appointment�(including�tip)?
1.7� How�many�CDs�do�you�own?
1.8� What�was�your�quantitative�GRE�score?
1.9� What�is�your�current�GPA?
1.10� On�average,�how�much�exercise�do�you�get�per�week�(in�hours)?
1.11� �On�a�5-point�scale,�what�is�your�political�view�(1�=�very�liberal,�3�=�moderate,�5�=�very�
conservative)?
1.12� On�average,�how�many�hours�of�TV�do�you�watch�per�week?
1.13� How�many�cups�of�coffee�did�you�drink�yesterday?
1.14� How�many�hours�did�you�sleep�last�night?
1.15� On�average,�how�many�alcoholic�drinks�do�you�have�per�week?
1.16� Can�you�tell�the�difference�between�Pepsi�and�Coke?
1.17� What�is�the�natural�color�of�your�hair�(black,�blonde,�brown,�red,�other)?
1.18� What�is�the�natural�color�of�your�eyes�(black,�blue,�brown,�green,�other)?
1.19� How�far�do�you�live�from�this�campus�(in�miles)?
1.20� On�average,�how�many�books�do�you�read�for�pleasure�each�month?
1.21� On�average,�how�many�hours�do�you�study�per�week?
1.22� �Which�question�on�this�survey�is�the�most�interesting�to�you?�The�least�interesting?
Possible Activities
1�� �For�each�item,�determine�the�most�likely�scale�of�measurement�(nominal,�ordinal,�inter-
val,�or�ratio)�and�the�type�of�variable�[categorical�or�numerical�(if�numerical,�discrete�or�
continuous)]�
2�� �Create� scenarios� in� which� one� or� more� of� the� variables� in� this� survey� would� be� a�
constant,�given�the�delimitations�that�you�define�for�your�study��For�example,�we�are�
designing�a�study�to�measure�study�habits�(as�measured�by�Question�1�21)�for�students�
who�do�not�exercise�(Question�1�10)��In�this�sample�study,�our�constant�is�the�number�of�
hours�per�week�that�a�student�studies�(in�this�case,�we�are�delimiting�that�to�be�zero—
and�thus,�Question�1�10�will�be�a�constant;�all�students�in�our�study�will�have�answered�
Question�1�10�as�“zero”)�
3�� �Collect�data�from�a�sample�of�individuals��In�subsequent�chapters,�you�will�be�asked�to�
analyze�these�data�for�different�procedures�
N O T E : � An�actual�sample�dataset�using�this�survey�is�contained�on�the�website�(SPSS�file:�
survey1)�and�is�utilized�in�later�chapters�
17
2
Data Representation
Chapter Outline
2�1� Tabular�Display�of�Distributions
2�1�1� Frequency�Distributions
2�1�2� Cumulative�Frequency�Distributions
2�1�3� Relative�Frequency�Distributions
2�1�4� Cumulative�Relative�Frequency�Distributions
2�2� Graphical�Display�of�Distributions
2�2�1� Bar�Graph
2�2�2� Histogram
2�2�3� Frequency�Polygon
2�2�4� Cumulative�Frequency�Polygon
2�2�5� Shapes�of�Frequency�Distributions
2�2�6� Stem-and-Leaf�Display
2�3� Percentiles
2�3�1� Percentiles
2�3�2� Quartiles
2�3�3� Percentile�Ranks
2�3�4� Box-and-Whisker�Plot
2�4� SPSS
2�5� Templates�for�Research�Questions�and�APA-Style�Paragraph
Key Concepts
� 1�� Frequencies,�cumulative�frequencies,�relative�frequencies,�and�cumulative�relative�
frequencies
� 2�� Ungrouped�and�grouped�frequency�distributions
� 3�� Sample�size
� 4�� Real�limits�and�intervals
� 5�� Frequency�polygons
� 6�� Normal,�symmetric,�and�skewed�frequency�distributions
� 7�� Percentiles,�quartiles,�and�percentile�ranks
18 An Introduction to Statistical Concepts
In� Chapter� 1,� we� introduced� the� wonderful� world� of� statistics�� There,� we� discussed� the�
value�of�statistics,�met�a�few�of�the�more�interesting�statisticians,�and�defined�several�basic�
statistical� concepts�� The� concepts� included� population,� parameter,� sample� and� statistic,�
descriptive�and�inferential�statistics,�types�of�variables,�and�scales�of�measurement��In�this�
chapter,�we�begin�our�examination�of�descriptive�statistics,�which�we�previously�defined�
as�techniques�that�allow�us�to�tabulate,�summarize,�and�depict�a�collection�of�data�in�an�
abbreviated� fashion�� We� used� the� example� of� collecting� data� from� 100,000� graduate� stu-
dents�on�various�characteristics�(e�g�,�height,�weight,�gender,�grade�point�average,�aptitude�
test� scores)�� Rather� than� having� to� carry� around� the� entire� collection� of� data� in� order� to�
respond�to�questions,�we�mentioned�that�you�could�summarize�the�data�in�an�abbreviated�
fashion�through�the�use�of�tables�and�graphs��This�way,�we�could�communicate�features�of�
the�data�through�a�few�tables�or�figures�without�having�to�carry�around�the�entire�dataset�
This�chapter�deals�with�the�details�of�the�construction�of�tables�and�figures�for�purposes�
of�describing�data��Specifically,�we�first�consider�the�following�types�of�tables:�frequency�dis-
tributions�(ungrouped�and�grouped),�cumulative�frequency�distributions,�relative�frequency�
distributions,�and�cumulative�relative�frequency�distributions��Next�we�look�at�the�following�
types�of�figures:�bar�graph,�histogram,�frequency�polygon,�cumulative�frequency�polygon,�
and�stem-and-leaf�display��We�also�discuss�common�shapes�of�frequency�distributions��Then�
we� examine� the� use� of� percentiles,� quartiles,� percentile� ranks,� and� box-and-whisker� plots��
Finally,�we�look�at�the�use�of�SPSS�and�develop�an�APA-style�paragraph�of�results��Concepts�
to�be�discussed�include�frequencies,�cumulative�frequencies,�relative�frequencies,�and�cumu-
lative�relative�frequencies;�ungrouped�and�grouped�frequency�distributions;�sample�size;�real�
limits�and�intervals;�frequency�polygons;�normal,�symmetric,�and�skewed�frequency�distri-
butions;�and�percentiles,�quartiles,�and�percentile�ranks��Our�objectives�are�that�by�the�end�of�
this�chapter,�you�will�be�able�to�(1)�construct�and�interpret�statistical�tables,�(2)�construct�and�
interpret�statistical�graphs,�and�(3)�determine�and�interpret�percentile-related�information�
2.1 Tabular Display of Distributions
Consider�the�following�research�scenario:
Marie,�a�graduate�student�pursuing�a�master’s�degree�in�educational�research,�has�been�
assigned�to�her�first�task�as�a�research�assistant��Her�faculty�mentor�has�given�Marie�
quiz�data�collected�from�25�students�enrolled�in�an�introductory�statistics�course�and�
has�asked�Marie�to�summarize�the�data��In�addition�to�the�data,�the�faculty�mentor�has�
shared�the�following�research�question�that�should�guide�Marie�in�her�analysis�of�the�
data:�How can the quiz scores of students enrolled in an introductory statistics class be graphi-
cally represented in a table? In a figure? What is the distributional shape of the statistics quiz
score? What is the 50th�percentile of the quiz scores?
In�this�section,�we�consider�ways�in�which�data�can�be�represented�in�the�form�of�tables��
More�specifically,�we�are�interested�in�how�the�data�for�a�single�variable�can�be�represented�
(the�representation�of�data�for�multiple�variables�is�covered�in�later�chapters)��The�methods�
described� here� include� frequency� distributions� (both� ungrouped� and� grouped),� cumu-
lative� frequency� distributions,� relative� frequency� distributions,� and� cumulative� relative�
frequency�distributions�
19Data Representation
2.1.1 Frequency distributions
Let�us�use�an�example�set�of�data�in�this�chapter�to�illustrate�ways�in�which�data�can�be�
represented��We�have�selected�a�small�dataset�for�purposes�of�simplicity,�although�datasets�
are�typically�larger�in�size��Note�that�there�is�a�larger�dataset�(based�on�the�survey�from�
Chapter�1�interpretive�problem)�utilized�in�the�end�of�chapter�problems�and�available�on�
our�website�as�“survey1�”�As�shown�in�Table�2�1,�the�smaller�dataset�consists�of�a�sample�
of�25�student�scores�on�a�statistics�quiz,�where�the�maximum�score�is�20�points��If�a�col-
league� asked� a� question� about� these� data,� again� a� response� could� be,� “take� a� look� at� the�
data�yourself�”�This�would�not�be�very�satisfactory�to�the�colleague,�as�the�person�would�
have�to�eyeball�the�data�to�answer�his�or�her�question��Alternatively,�one�could�present�the�
data�in�the�form�of�a�table�so�that�questions�could�be�more�easily�answered��One�question�
might�be�which�score�occurred�most�frequently?�In�other�words,�what�score�occurred�more�
than�any�other�score?�Other�questions�might�be�which�scores�were�the�highest�and�lowest�
scores�in�the�class?�and�where�do�most�of�the�scores�tend�to�fall?�In�other�words,�how�well�
did�the�students�tend�to�do�as�a�class?�These�and�other�questions�can�be�easily�answered�
by�looking�at�a�frequency distribution�
Let�us�first�look�at�how�an�ungrouped frequency distribution�can�be�constructed�for�
these� and� other� data�� By� following� these� steps,� we� develop� the� ungrouped� frequency�
distribution�as�shown�in�Table�2�2��The�first�step�is�to�arrange�the�unique�scores�on�a�list�
from�the�lowest�score�to�the�highest�score��The�lowest�score�is�9�and�the�highest�score�is�20��Even�
though�scores�such�as�15�were�observed�more�than�once,�the�value�of�15�is�only�entered�
in�this�column�once��This�is�what�we�mean�by�unique��Note�that�if�the�score�of�15�was�not�
observed,�it�could�still�be�entered�as�a�value�in�the�table�to�serve�as�a�placeholder�within�
Table 2.1
Statistics�Quiz�Data
9 11 20 15 19 10 19 18 14 12 17 11 13
16 17 19 18 17 13 17 15 18 17 19 15
Table 2.2
Ungrouped�Frequency�Distribution�
of Statistics�Quiz�Data
X f cf rf crf
9 1 1 f/n�=�1/25�=��04 �04
10 1 2 �04 �08
11 2 4 �08 �16
12 1 5 �04 �20
13 2 7 �08 �28
14 1 8 �04 �32
15 3 11 �12 �44
16 1 12 �04 �48
17 5 17 �20 �68
18 3 20 �12 �80
19 4 24 �16 �96
20 1 25 �04 1�00
n�=�25 1�00
20 An Introduction to Statistical Concepts
the�distribution�of�scores�observed��We�label�this�column�as�“raw�score”�or�“X,”�as�shown�by�
the�first�column�in�the�table��Raw scores�are�a�set�of�scores�in�their�original�form;�that�is,�the�
scores�have�not�been�altered�or�transformed�in�any�way��X�is�often�used�in�statistics�to�denote�
a�variable,�so�you�see�X�quite�a�bit�in�this�text��(As�a�side�note,�whenever�upper�or�lowercase�
letters�are�used�to�denote�statistical�notation,�the�letter�is�always�italicized�)
The� second� step� is� to� determine� for� each� unique� score� the� number� of� times� it� was�
observed�� We� label� this� second� column� as� “frequency”� or� by� the� abbreviation� “f�”� The�
frequency� column� tells� us� how� many� times� or� how� frequently� each� unique� score� was�
observed��For�instance,�the�score�of�20�was�only�observed�one�time�whereas�the�score�of�17�
was�observed�five�times��Now�we�have�some�information�with�which�to�answer�the�ques-
tions�of�our�colleague��The�most�frequently�observed�score�is�17,�the�lowest�score�is�9,�and�
the�highest�score�is�20��We�can�also�see�that�scores�tended�to�be�closer�to�20�(the�highest�
score)�than�to�9�(the�lowest�score)�
Two�other�concepts�need�to�be�introduced�that�are�included�in�Table�2�2��The�first�concept�
is�sample size��At�the�bottom�of�the�second�column,�you�see�n�=�25��From�now�on,�n�will�
be�used�to�denote�sample�size,�that�is,�the�total�number�of�scores�obtained�for�the�sample��
Thus,�because�25�scores�were�obtained�here,�then�n�=�25�
The�second�concept�is�related�to�real limits�and�intervals��Although�the�scores�obtained�
for� this� dataset� happened� to� be� whole� numbers,� not� fractions� or� decimals,� we� still� need� a�
system�that�will�cover�that�possibility��For�example,�what�would�we�do�if�a�student�obtained�
a�score�of�18�25?�One�option�would�be�to�list�that�as�another�unique�score,�which�would�prob-
ably�be�more�confusing�than�useful��A�second�option�would�be�to�include�it�with�one�of�the�
other�unique�scores�somehow;�this�is�our�option�of�choice��The�system�that�all�researchers�
use�to�cover�the�possibility�of�any�score�being�obtained�is�through�the�concepts�of�real�limits�
and� intervals�� Each� value� of� X� in� Table� 2�2� can� be� thought� of� as� being� the� midpoint� of� an�
interval��Each�interval�has�an�upper�and�a�lower�real�limit��The�upper�real�limit�of�an�interval�
is�halfway�between�the�midpoint�of�the�interval�under�consideration�and�the�midpoint�of�
the�next�larger�interval��For�example,�the�value�of�18�represents�the�midpoint�of�an�interval��
The�next�larger�interval�has�a�midpoint�of�19��Therefore,�the�upper�real�limit�of�the�interval�
containing�18�would�be�18�5,�halfway�between�18�and�19��The�lower�real�limit�of�an�interval�
is�halfway�between�the�midpoint�of�the�interval�under�consideration�and�the�midpoint�of�the�
next�smaller�interval��Following�the�example�interval�of�18�again,�the�next�smaller�interval�
has�a�midpoint�of�17��Therefore,�the�lower�real�limit�of�the�interval�containing�18�would�be�
17�5,�halfway�between�18�and�17��Thus,�the�interval�of�18�has�18�5�as�an�upper�real�limit�and�
17�5�as�a�lower�real�limit��Other�intervals�have�their�upper�and�lower�real�limits�as�well�
Notice� that� adjacent� intervals� (i�e�,� those� next� to� one� another)� touch� at� their� respective�
real�limits��For�example,�the�18�interval�has�18�5�as�its�upper�real�limit�and�the�19�interval�
has� 18�5� as� its� lower� real� limit�� This� implies� that� any� possible� score� that� occurs� can� be�
placed�into�some�interval�and�no�score�can�fall�between�two�intervals��If�someone�obtains�
a�score�of�18�25,�that�will�be�covered�in�the�18�interval��The�only�limitation�to�this�procedure�
is� that� because� adjacent� intervals� must� touch� in� order� to� deal� with� every� possible� score,�
what�do�we�do�when�a�score�falls�precisely�where�two�intervals�touch�at�their�real�limits�
(e�g�,�at�18�5)?�There�are�two�possible�solutions��The�first�solution�is�to�assign�the�score�to�
one�interval�or�another�based�on�some�rule��For�instance,�we�could�randomly�assign�such�
scores� to� one� interval� or� the� other� by� flipping� a� coin�� Alternatively,� we� could� arbitrarily�
assign�such�scores�always�into�either�the�larger�or�smaller�of�the�two�intervals��The�second�
solution�is�to�construct�intervals�such�that�the�number�of�values�falling�at�the�real�limits�
is�minimized��For�example,�say�that�most�of�the�scores�occur�at��5�(e�g�,�15�5,�16�5,�17�5)��We�
could�construct�the�intervals�with��5�as�the�midpoint�and��0�as�the�real�limits��Thus,�the�15�5�
21Data Representation
interval�would�have�15�5�as�the�midpoint,�16�0�as�the�upper�real�limit,�and�15�0�as�the�lower�
real�limit��It�should�also�be�noted�that,�strictly�speaking,�real�limits�are�only�appropriate�
for�continuous�variables�but�not�for�discrete�variables��That�is,�since�discrete�variables�can�
only�have�limited�values,�we�probably�don’t�need�to�worry�about�real�limits�(e�g�,�there�is�
not�really�an�interval�for�two�children)�
Finally,� the� width� of� an� interval� is� defined� as� the� difference� between� the� upper� and�
lower�real�limits�of�an�interval��We�can�denote�this�as�w = URL − LRL,�where�w�is�interval�
width,�and�URL�and�LRL�are�the�upper�and�lower�real�limits,�respectively��In�the�case�of�
our�example�interval�again,�we�see�that�w = URL − LRL�=�18�5�−�17�5�=�1�0��For�Table�2�2,�
then,�all�intervals�have�the�same�interval�width�of�1�0��For�each�interval,�we�have�a�mid-
point,�a�lower�real�limit�that�is�one-half�unit�below�the�midpoint,�and�an�upper�real�limit�
that�is�one-half�unit�above�the�midpoint��In�general,�we�want�all�of�the�intervals�to�have�the�
same�width�for�consistency�as�well�as�for�equal�interval�reasons��The�only�exception�might�
be�if�the�largest�or�smallest�intervals�were�above�a�certain�value�(e�g�,�greater�than�20)�or�
below�a�certain�value�(e�g�,�less�than�9),�respectively�
A�frequency�distribution�with�an�interval�width�of�1�0�is�often�referred�to�as�an�ungrouped
frequency distribution,�as�the�intervals�have�not�been�grouped�together��Does�the�interval�
width�always�have�to�be�equal�to�1�0?�The�answer,�of�course,�is�no��We�could�group�intervals�
together�and�form�what�is�often�referred�to�as�a�grouped frequency distribution��For�our�
example�data,�we�can�construct�a�grouped�frequency�distribution�with�an�interval�width�
of�2�0,�as�shown�in�Table�2�3��The�largest�interval�now�contains�the�scores�of�19�and�20,�the�
second� largest� interval� the� scores� of� 17� and� 18,� and� so� on� down� to� the� smallest� interval�
with�the�scores�of�9�and�10��Correspondingly,�the�largest�interval�contains�a�frequency�of�5,�
the�second�largest�interval�a�frequency�of�8,�and�the�smallest�interval�a�frequency�of�2��All�
we�have�really�done�is�collapse�the�intervals�from�Table�2�2,�where�interval�width�was�1�0,�
into�the�intervals�of�width�2�0,�as�shown�in�Table�2�3��If�we�take,�for�example,�the�interval�
containing�the�scores�of�17�and�18,�then�the�midpoint�of�the�interval�is�17�5,�the�URL�is�18�5,�
the�LRL�is�16�5,�and�thus�w�=�2�0��The�interval�width�could�actually�be�any�value,�including�
�20�or�100,�depending�on�what�best�suits�the�data�
How�does�one�determine�what�the�proper�interval�width�should�be?�If�there�are�many�
frequencies�for�each�score�and�less�than�15�or�20�intervals,�then�an�ungrouped�frequency�
distribution�with�an�interval�width�of�1�is�appropriate�(and�this�is�the�default�in�SPSS�for�
determining� frequency� distributions)�� If� there� are� either� minimal� frequencies� per� score�
(say� 1� or� 2)� or� a� large� number� of� unique� scores� (say� more� than� 20),� then� a� grouped� fre-
quency�distribution�with�some�other�interval�width�is�appropriate��For�a�first�example,�say�
Table 2.3
Grouped�Frequency�Distribution�
of�Statistics�Quiz�Data
X f
9–10 2
11–12 3
13–14 3
15–16 4
17–18 8
19–20 5
n�=�25
22 An Introduction to Statistical Concepts
that�there�are�100�unique�scores�ranging�from�0�to�200��An�ungrouped�frequency�distri-
bution�would�not�really�summarize�the�data�very�well,�as�the�table�would�be�quite�large��
The�reader�would�have�to�eyeball�the�table�and�actually�do�some�quick�grouping�in�his�or�
her�head�so�as�to�gain�any�information�about�the�data��An�interval�width�of�perhaps�10–15�
would�be�more�useful��In�a�second�example,�say�that�there�are�only�20�unique�scores�rang-
ing�from�0�to�30,�but�each�score�occurs�only�once�or�twice��An�ungrouped�frequency�dis-
tribution�would�not�be�very�useful�here�either,�as�the�reader�would�again�have�to�collapse�
intervals�in�his�or�her�head��Here�an�interval�width�of�perhaps�2–5�would�be�appropriate�
Ultimately,�deciding�on�the�interval�width,�and�thus,�the�number�of�intervals,�becomes�a�
trade-off�between�good�communication�of�the�data�and�the�amount�of�information�contained�
in�the�table��As�interval�width�increases,�more�and�more�information�is�lost�from�the�original�
data�� For� the� example� where� scores� range� from� 0� to� 200� and� using� an� interval� width� of� 10,�
some� precision� in� the� 15� scores� contained� in� the� 30–39� interval� is� lost�� In� other� words,� the�
reader�would�not�know�from�the�frequency�distribution�where�in�that�interval�the�15�scores�
actually� fall�� If� you� want� that� information� (you� may� not),� you� would� need� to� return� to� the�
original�data��At�the�same�time,�an�ungrouped�frequency�distribution�for�those�data�would�
not�have�much�of�a�message�for�the�reader��Ultimately,�the�decisive�factor�is�the�adequacy�with�
which�information�is�communicated�to�the�reader��The�nature�of�the�interval�grouping�comes�
down�to�whatever�form�best�represents�the�data��With�today’s�powerful�statistical�computer�
software,�it�is�easy�for�the�researcher�to�try�several�different�interval�widths�before�deciding�
which�one�works�best�for�a�particular�set�of�data��Note�also�that�the�frequency�distribution�can�
be�used�with�variables�of�any�measurement�scale,�from�nominal�(e�g�,�the�frequencies�for�eye�
color�of�a�group�of�children)�to�ratio�(e�g�,�the�frequencies�for�the�height�of�a�group�of�adults)�
2.1.2 Cumulative Frequency distributions
A�second�type�of�frequency�distribution�is�known�as�the�cumulative frequency distribution��
For�the�example�data,�this�is�depicted�in�the�third�column�of�Table�2�2�and�labeled�as�“cf�”�To�
put�it�simply,�the�number�of�cumulative�frequencies�for�a�particular�interval�is�the�number�of�
scores�contained�in�that�interval�and�all�of�the�smaller�intervals��Thus,�the�nine�interval�con-
tains�one�frequency,�and�there�are�no�frequencies�smaller�than�that�interval,�so�the�cumulative�
frequency�is�simply�1��The�10�interval�contains�one�frequency,�and�there�is�one�frequency�in�
a�smaller�interval,�so�the�cumulative�frequency�is�2��The�11�interval�contains�two�frequencies,�
and�there�are�two�frequencies�in�smaller�intervals;�thus,�the�cumulative�frequency�is�4��Then�
four�people�had�scores�in�the�11�interval�and�smaller�intervals��One�way�to�think�about�deter-
mining�the�cumulative�frequency�column�is�to�take�the�frequency�column�and�accumulate�
downward�(i�e�,�from�the�top�down,�yielding�1;�1�+�1�=�2;�1�+�1�+�2�=�4;�etc�)��Just�as�a�check,�the�
cf�in�the�largest�interval�(i�e�,�the�interval�largest�in�value)�should�be�equal�to�n,�the�number�
of�scores�in�the�sample,�25�in�this�case��Note�also�that�the�cumulative�frequency�distribution�
can�be�used�with�variables�of�measurement�scales�from�ordinal�(e�g�,�the�number�of�students�
receiving�a�B�or�less)�to�ratio�(e�g�,�the�number�of�adults�that�are�5′7″�or�less),�but�cannot�be�
used�with�nominal�as�there�is�not�at�least�rank�order�to�nominal�data�(and�thus�accumulating�
information�from�one�nominal�category�to�another�does�not�make�sense)�
2.1.3 Relative Frequency distributions
A�third�type�of�frequency�distribution�is�known�as�the�relative frequency distribution��For�
the�example�data,�this�is�shown�in�the�fourth�column�of�Table�2�2�and�labeled�as�“rf�”�Relative�
frequency�is�simply�the�percentage�of�scores�contained�in�an�interval��Computationally,�
23Data Representation
rf = f/n�� For� example,� the� percentage� of� scores� occurring� in� the� 17� interval� is� computed� as�
rf� =� 5/25� =� �20�� Relative� frequencies� take� sample� size� into� account� allowing� us� to� make�
statements�about�the�number�of�individuals�in�an�interval�relative�to�the�total�sample��Thus,�
rather�than�stating�that�5�individuals�had�scores�in�the�17�interval,�we�could�say�that�20%�of�
the�scores�were�in�that�interval��In�the�popular�press,�relative�frequencies�(which�they�call�
percentages)�are�quite�often�reported�in�tables�without�the�frequencies��Note�that�the�sum�of�
the�relative�frequencies�should�be�1�00�(or�100%)�within�rounding�error��Also�note�that�the�
relative�frequency�distribution�can�be�used�with�variables�of�any�measurement�scale,�from�
nominal�(e�g�,�the�percent�of�children�with�blue�eye�color)�to�ratio�(e�g�,�the�percent�of�adults�
that�are�5′7″)�
2.1.4 Cumulative Relative Frequency distributions
A�fourth�and�final�type�of�frequency�distribution�is�known�as�the�cumulative relative fre-
quency distribution��For�the�example�data,�this�is�depicted�in�the�fifth�column�of�Table�2�2�
and�labeled�as�“crf�”�The�number�of�cumulative�relative�frequencies�for�a�particular�interval�
is�the�percentage�of�scores�in�that�interval�and�smaller��Thus,�the�nine�interval�has�a�rela-
tive�frequency�of��04,�and�there�are�no�relative�frequencies�smaller�than�that�interval,�so�the�
cumulative�relative�frequency�is�simply��04��The�10�interval�has�a�relative�frequency�of��04,�
and� the� relative� frequencies� less� than� that� interval� are� �04,� so� the� cumulative� relative� fre-
quency�is��08��The�11�interval�has�a�relative�frequency�of��08,�and�the�relative�frequencies�less�
than�that�interval�total��08,�so�the�cumulative�relative�frequency�is��16��Thus,�16%�of�the�peo-
ple�had�scores�in�the�11�interval�and�smaller��In�other�words,�16%�of�people�scored�11�or�less��
One�way�to�think�about�determining�the�cumulative�relative�frequency�column�is�to�take�
the�relative�frequency�column�and�accumulate�downward�(i�e�,�from�the�top�down,�yield-
ing��04;��04�+��04�=��08;��04�+��04�+��08�=��16;�etc�)��Just�as�a�check,�the�crf�in�the�largest�interval�
should�be�equal�to�1�0,�within�rounding�error,�just�as�the�sum�of�the�relative�frequencies�is�
equal�to�1�0��Also�note�that�the�cumulative�relative�frequency�distribution�can�be�used�with�
variables�of�measurement�scales�from�ordinal�(e�g�,�the�percent�of�students�receiving�a�B�or�
less)�to�ratio�(e�g�,�the�percent�of�adults�that�are�5′7″�or�less)��As�with�relative�frequency�dis-
tributions,�cumulative�relative�frequency�distributions�cannot�be�used�with�nominal�data�
2.2 Graphical Display of Distributions
In� this� section,� we� consider� several� types� of� graphs�for� viewing� a� distribution� of� scores��
Again,�we�are�still�interested�in�how�the�data�for�a�single�variable�can�be�represented,�but�
now� in� a� graphical� display� rather� than� a� tabular� display�� The� methods� described� here�
include�the�bar�graph,�histogram,�frequency,�relative�frequency,�cumulative�frequency�and�
cumulative� relative� frequency� polygons,� and� stem-and-leaf� display�� Common� shapes� of�
distributions�will�also�be�discussed�
2.2.1 bar Graph
A�popular�method�used�for�displaying�nominal�scale�data�in�graphical�form�is�the�bar
graph�� As� an� example,� say� that� we� have� data� on� the� eye� color� of� a� sample� of� 20� chil-
dren�� Ten� children� are� blue� eyed,� six� are� brown� eyed,� three� are� green� eyed,� and� one�
24 An Introduction to Statistical Concepts
is�black�eyed��A�bar�graph�for�these�data�is�shown�in�Figure�2�1�(SPSS�generated)��The�
horizontal� axis,� going� from� left� to� right� on� the� page,� is� often� referred� to� in� statistics�
as�the�X�axis�(for�variable�X,�in�this�example�our�variable�is�eye color)��On�the�X�axis�of�
Figure�2�1,�we�have�labeled�the�different�eye�colors�that�were�observed�from�individu-
als� in� our� sample�� The� order� of� the� colors� is� not� relevant� (remember,� this� is� nominal�
data,�so�order�or�rank�is�irrelevant)��The�vertical�axis,�going�from�bottom�to�top�on�the�
page,�is�often�referred�to�in�statistics�as�the�Y�axis�(the�Y�label�will�be�more�relevant�in�
later�chapters�when�we�have�a�second�variable�Y)��On�the�Y�axis�of�Figure�2�1,�we�have�
labeled�the�frequencies��Finally,�a�bar�is�drawn�for�each�eye�color�where�the�height�of�
the�bar�denotes�the�number�of�frequencies�for�that�particular�eye�color�(i�e�,�the�number�
of�times�that�particular�eye�color�was�observed�in�our�sample)��For�example,�the�height�
of�the�bar�for�the�blue-eyed�category�is�10�frequencies��Thus,�we�see�in�the�graph�which�
eye� color� is� most� popular� in� this� sample� (i�e�,� blue)� and� which� eye� color� occurs� least�
(i�e�,�black)�
Note�that�the�bars�are�separated�by�some�space�and�do�not�touch�one�another,�reflect-
ing�the�nature�of�nominal�data��As�there�are�no�intervals�or�real�limits�here,�we�do�not�
want� the� bars� to� touch� one� another�� One� could� also� plot� relative� frequencies� on� the� Y�
axis�to�reflect�the�percentage�of�children�in�the�sample�who�belong�to�each�category�of�
eye�color��Here�we�would�see�that�50%�of�the�children�had�blue�eyes,�30%�brown�eyes,�
15%�green�eyes,�and�5%�black�eyes��Another�method�for�displaying�nominal�data�graphi-
cally�is�the�pie�chart,�where�the�pie�is�divided�into�slices�whose�sizes�correspond�to�the�
frequencies� or� relative� frequencies� of� each� category�� However,� for� numerous� reasons�
(e�g�,�contains�little�information�when�there�are�few�categories;�is�unreadable�when�there�
are�many�categories;�visually�assessing�the�sizes�of�each�slice�is�difficult�at�best),�the�pie�
chart� is� statistically� problematic� such� that� Tufte� (1992)� states,� “the� only� worse� design�
than�a�pie�chart�is�several�of�them”�(p��178)��The�bar�graph�is�the�recommended�graphic�
for�nominal�data�
FIGuRe 2.1
Bar�graph�of�eye-color�data�
10
8
6
4
Fr
eq
ue
nc
y
2
Black Blue Brown
Eye color
Green
25Data Representation
2.2.2 histogram
A�method�somewhat�similar�to�the�bar�graph�that�is�appropriate�for�data�that�are�at�least�
ordinal�(i�e�,�ordinal,�interval,�or�ratio)�is�the�histogram��Because�the�data�are�at�least�theo-
retically�continuous�(even�though�they�may�be�measured�in�whole�numbers),�the�main�dif-
ference�in�the�histogram�(as�compared�to�the�bar�graph)�is�that�the�bars�touch�one�another,�
much�like�intervals�touching�one�another�as�real�limits��An�example�of�a�histogram�for�the�
statistics�quiz�data�is�shown�in�Figure�2�2�(SPSS�generated)��As�you�can�see,�along�the�X�axis�
we�plot�the�values�of�the�variable�X�and�along�the�Y�axis�the�frequencies�for�each�interval��
The�height�of�the�bar�again�corresponds�to�the�number�of�frequencies�for�a�particular�value�
of�X��This�figure�represents�an�ungrouped�histogram�as�the�interval�size�is�1��That�is,�along�
the�X�axis�the�midpoint�of�each�bar�is�the�midpoint�of�the�interval,�the�bar�begins�on�the�left�
at�the�lower�real�limit�of�the�interval,�the�bar�ends�on�the�right�at�the�upper�real�limit,�and�
the�bar�is�one�unit�wide��If�we�wanted�to�use�an�interval�size�of�2,�for�example,�using�the�
grouped�frequency�distribution�in�Table�2�3,�then�we�could�construct�a�grouped�histogram�
in�the�same�way;�the�differences�would�be�that�the�bars�would�be�two�units�wide,�and�the�
height�of�the�bars�would�obviously�change��Try�this�one�on�your�own�for�practice�
One� could� also� plot� relative� frequencies� on� the� Y� axis� to� reflect� the� percentage� of� stu-
dents�in�the�sample�whose�scores�fell�into�a�particular�interval��In�reality,�all�that�we�have�
to�change�is�the�scale�of�the�Y�axis��The�height�of�the�bars�would�remain�the�same��For�this�
particular�dataset,�each�frequency�corresponds�to�a�relative�frequency�of��04�
2.2.3 Frequency polygon
Another� graphical� method� appropriate� for� data� that� have� at� least� some� rank� order� (i�e�,�
ordinal,�interval,�or�ratio)�is�the�frequency polygon�(line�graph�in�SPSS�terminology)��A�poly-
gon�is�defined�simply�as�a�many-sided�figure��The�frequency�polygon�is�set�up�in�a�fashion�
5
4
3
2
Fr
eq
ue
nc
y
1
9 10 11 12 13 14 15
Quiz
16 17 18 19 20 FIGuRe 2.2
Histogram�of�statistics�quiz�data�
26 An Introduction to Statistical Concepts
similar�to�the�histogram��However,�rather�than�plotting�a�bar�for�each�interval,�points�are�
plotted�for�each�interval�and�then�connected�together�as�shown�in�Figure�2�3�(SPSS�gener-
ated)��The�axes�are�the�same�as�with�the�histogram��A�point�is�plotted�at�the�intersection�(or�
coordinates)�of�the�midpoint�of�each�interval�along�the�X�axis�and�the�frequency�for�that�
interval�along�the�Y�axis��Thus,�for�the�15�interval,�a�point�is�plotted�at�the�midpoint�of�the�
interval� 15�0�and�for�three� frequencies��Once� the�points� are� plotted� for� each� interval,� we�
“connect�the�dots�”
One�could�also�plot�relative�frequencies�on�the�Y�axis�to�reflect�the�percentage�of�students�
in�the�sample�whose�scores�fell�into�a�particular�interval��This�is�known�as�the�relative fre-
quency polygon��As�with�the�histogram,�all�we�have�to�change�is�the�scale�of�the�Y�axis��The�
position�of�the�polygon�would�remain�the�same��For�this�particular�dataset,�each�frequency�
corresponds�to�a�relative�frequency�of��04�
Note�also�that�because�the�histogram�and�frequency�polygon�each�contain�the�exact�same�
information,�Figures�2�2�and�2�3�can�be�superimposed�on�one�another��If�you�did�this,�you�
would�see�that�the�points�of�the�frequency�polygon�are�plotted�at�the�top�of�each�bar�of�the�
histogram��There�is�no�advantage�of�the�histogram�or�frequency�polygon�over�the�other;�how-
ever,�the�histogram�is�more�frequently�used�due�to�its�availability�in�all�statistical�software�
2.2.4 Cumulative Frequency polygon
Cumulative�frequencies�of�data�that�have�at�least�some�rank�order�(i�e�,�ordinal,�interval,�
or�ratio)�can�be�displayed�as�a�cumulative frequency polygon�(sometimes�referred�to�as�
the�ogive curve)��As�shown�in�Figure�2�4�(SPSS�generated),�the�differences�between�the�
frequency� polygon� and� the� cumulative� frequency� polygon� are� that� (a)� the� cumulative�
frequency� polygon� involves� plotting� cumulative� frequencies� along� the� Y� axis,� (b)� the�
points� should� be� plotted� at the upper real limit� of� each� interval� (although� SPSS� plots�
the points�at�the�interval�midpoints�by�default),�and�(c)�the�polygon�cannot�be�closed�on�
the�right-hand�side�
FIGuRe 2.3
Frequency�polygon�of�statistics�quiz�data�
5
Markers/lines show count
4
3
Fr
eq
ue
nc
y
2
1
0
9 10 11 12 13 14 15
Quiz
16 17 18 19 20
27Data Representation
Let�us�discuss�each�of�these�differences��First,�the�Y�axis�represents�the�cumulative�frequen-
cies�from�the�cumulative�frequency�distribution��The�X�axis�is�the�usual�set�of�raw�scores��
Second,�to�reflect�the�cumulative�nature�of�this�type�frequency,�the�points�must�be�plotted�at�
the�upper�real�limit�of�each�interval��For�example,�the�cumulative�frequency�for�the�16�inter-
val�is�12,�indicating�that�there�are�12�scores�in�that�interval�and�smaller��Finally,�the�polygon�
cannot� be� closed� on� the� right-hand� side�� Notice� that� as� you� move� from� left� to� right� in� the�
cumulative�frequency�polygon,�the�height�of�the�points�always�increases�or�stays�the�same��
Because� of� the� nature� of� accumulating� information,� there� will� never� be� a� decrease� in� the�
accumulation�of�the�frequencies��For�example,�there�is�an�increase�in�cumulative�frequency�
from�the�16�to�the�17�interval�as�five�new�frequencies�are�included��Beyond�the�20�interval,�the�
number�of�cumulative�frequencies�remains�at�25�as�no�new�frequencies�are�included�
One�could�also�plot�cumulative�relative�frequencies�on�the�Y�axis�to�reflect�the�percent-
age�of�students�in�the�sample�whose�scores�fell�into�a�particular�interval�and�smaller��This�
is� known� as� the� cumulative relative frequency polygon�� All� we� have� to� change� is� the�
scale� of� the� Y� axis� to� cumulative� relative� frequency�� The� position� of� the� polygon� would�
remain�the�same��For�this�particular�dataset,�each�cumulative�frequency�corresponds�to�a�
cumulative�relative�frequency�of��04��Thus,�a�cumulative�relative�frequency�polygon�of�the�
example�data�would�look�exactly�like�Figure�2�4;�except�on�the�Y�axis�we�plot�cumulative�
relative�frequencies�ranging�from�0�to�1�
2.2.5 Shapes of Frequency distributions
There�are�several�common�shapes�of�frequency�distributions�that�you�are�likely�to�encoun-
ter,� as� shown� in� Figure� 2�5�� These� are� briefly� described� here� and� more� fully� in� later�
chapters�� Figure� 2�5a� is� a� normal distribution� (or� bell-shaped� curve)� where� most� of� the�
scores�are�in�the�center�of�the�distribution�with�fewer�higher�and�lower�scores��The�normal�
distribution�plays�a�large�role�in�statistics,�both�for�descriptive�statistics�(as�we�show�begin-
ning�in�Chapter�4)�and�particularly�as�an�assumption�for�many�inferential�statistics�(as�we�
show�beginning�in�Chapter�6)��This�distribution�is�also�known�as�symmetric�because�if�we�
divide�the�distribution�into�two�equal�halves�vertically,�the�left�half�is�a�mirror�image�of�
the�right�half�(see�Chapter�4)��Figure�2�5b�is�a�positively skewed�distribution�where�most�
of�the�scores�are�fairly�low�and�there�are�a�few�higher�scores�(see�Chapter�4)��Figure�2�5c�is�
25
20
15
10
C
um
ul
at
iv
e
fr
eq
ue
nc
y
5
0
9 10 11 12 13 14 15
Quiz
16 17 18 19 20
FIGuRe 2.4
Cumulative� frequency� polygon� of�
statistics�quiz�data�
28 An Introduction to Statistical Concepts
a�negatively skewed�distribution�where�most�of�the�scores�are�fairly�high�and�there�are�a�
few�lower�scores�(see�Chapter�4)��Skewed�distributions�are�not�symmetric�as�the�left�half�is�
not�a�mirror�image�of�the�right�half�
2.2.6 Stem-and-leaf display
A�refined�form�of�the�grouped�frequency�distribution�is�the�stem-and-leaf display,�devel-
oped�by�John�Tukey�(1977)��This�is�shown�in�Figure�2�6�(SPSS�generated)�for�the�example�
statistics�quiz�data��The�stem-and-leaf�display�was�originally�developed�to�be�constructed�
on� a� typewriter� using� lines� and� numbers� in� a� minimal� amount� of� space�� In� a� way,� the�
f
x(a)
f
x(b)
f
x(c)
FIGuRe 2.5
Common�shapes�of�frequency�distributions:�(a)�normal,�(b)�positively�skewed,�and�(c)�negatively�skewed�
FIGuRe 2.6
Stem-and-leaf�display�of�statistics�quiz�data�
Quiz Stem-and-Leaf Plot
Frequency Stem and Leaf
1.00 0 . 9
7.00 1 . 0112334
16.00 1 . 5556777778889999
1.00 2 . 0
Stem width: 10.0
Each leaf: 1 case(s)
29Data Representation
stem-and-leaf�display�looks�like�a�grouped�type�of�histogram�on�its�side��The�vertical�value�
on�the�left�is�the�stem�and,�in�this�example,�represents�all�but�the�last�digit�(i�e�,�the�tens�digit)��
The� leaf� represents,� in� this� example,� the� remaining� digit� of� each� score� (i�e�,� the� unit’s�
digit)��Note�that�SPSS�has�grouped�values�in�increments�of�five��For�example,�the�second�
line�(“1�0112334”)�indicates�that�there�are�7�scores�from�10�to�14;�thus,�“1�0”�means�that�there�
is�one�frequency�for�the�score�of�10��The�fact�that�there�are�two�values�of�“1”�that�occur�in�
that�stem�indicates�that�the�score�of�11�occurred�twice��Interpreting�the�rest�of�this�stem,�we�
see�that�12�occurred�once�(i�e�,�there�is�only�one�2�in�the�stem),�13�occurred�twice�(i�e�,�there�
are�two�3s�in�the�stem),�and�14�occurred�once�(i�e�,�only�one�4�in�the�stem)��From�the�stem-
and-leaf�display,�one�can�determine�every�one�of�the�raw�scores;�this�is�not�possible�with�
a� typical� grouped� frequency� distribution� (i�e�,� no� information� is� lost� in� a� stem-and-leaf�
display)��However,�with�a�large�sample�the�display�can�become�rather�unwieldy��Consider�
what�a�stem-and-leaf�display�would�look�like�for�100,000�GRE�scores!
In�summary,�this�section�included�the�most�basic�types�of�statistical�graphics,�although�
more� advanced� graphics� are� described� in� later� chapters�� Note,� however,� that� there� are� a�
number�of�publications�on�how�to�properly�display�graphics,�that�is,�“how�to�do�graphics�
right�”�While�a�detailed�discussion�of�statistical�graphics�is�beyond�the�scope�of�this�text,�
the� following� publications� are� recommended:� Chambers,� Cleveland,� Kleiner,� and� Tukey�
(1983),�Schmid�(1983),�Wainer�(e�g�,�1984,�1992,�2000),�Tufte�(1992),�Cleveland�(1993),�Wallgren,�
Wallgren,�Persson,�Jorner,�and�Haaland�(1996),�Robbins�(2004),�and�Wilkinson�(2005)�
2.3 Percentiles
In�this�section,�we�consider�several�concepts�and�the�necessary�computations�for�the�area�
of�percentiles,�including�percentiles,�quartiles,�percentile�ranks,�and�the�box-and-whisker�
plot��For�instance,�you�might�be�interested�in�determining�what�percentage�of�the�distribu-
tion�of�the�GRE-Quantitative�subtest�fell�below�a�score�of�600�or�in�what�score�divides�the�
distribution�of�the�GRE-Quantitative�subtest�into�two�equal�halves�
2.3.1 percentiles
Let�us�define�a�percentile�as�that�score�below�which�a�certain�percentage�of�the�distribu-
tion�lies��For�instance,�you�may�be�interested�in�that�score�below�which�50%�of�the�distri-
bution�of�the�GRE-Quantitative�subscale�lies��Say�that�this�score�is�computed�as�480;�then�
this�would�mean�that�50%�of�the�scores�fell�below�a�score�of�480��Because�percentiles�are�
scores,�they�are�continuous�values�and�can�take�on�any�value�of�those�possible��The�30th�
percentile�could�be,�for�example,�the�score�of�387�6750��For�notational�purposes,�a�percen-
tile�will�be�known�as�Pi,�where�the�i�subscript�denotes�the�particular�percentile�of�interest,�
between�0�and�100��Thus,�the�30th�percentile�for�the�previous�example�would�be�denoted�
as�P30�=�387�6750�
Let�us�now�consider�how�percentiles�are�computed��The�formula�for�computing�the�Pi�
percentile�is
� P LRL
i n cf
f
wi = +
−
%( )
� (2�1)
30 An Introduction to Statistical Concepts
where
LRL�is�the�lower�real�limit�of�the�interval�containing�Pi
i%�is�the�percentile�desired�(expressed�as�a�proportion�from�0�to�1)
n�is�the�sample�size
cf� is� the� cumulative� frequency� less� than� but� not� including� the� interval� containing� Pi�
(known�as�cf�below)
f�is�the�frequency�of�the�interval�containing�Pi
w�is�the�interval�width
As� an� example,� consider� computing� the� 25th� percentile� of� our� statistics� quiz� data�� This�
would�correspond�to�that�score�below�which�25%�of�the�distribution�falls��For�the�example�
data�in�the�form�presented�in�Table�2�2,�using�Equation�2�1,�we�compute�P25�as�follows:
�
P LRL
i n cf
f
w25 12 5
25 25 5
2
1 12 5 0 625= +
−
= +
−
= +
%( )
.
%( )
. . == 13 125.
Conceptually,� let� us� discuss� how� the� equation� works�� First,� we� have� to� determine� what�
interval�contains�the�percentile�of�interest��This�is�easily�done�by�looking�in�the�crf�column�
of�the�frequency�distribution�for�the�interval�that�contains�a�crf�of��25�somewhere�within�
the� interval�� We� see� that� for� the� 13� interval� the� crf� =� �28,� which� means� that� the� interval�
spans�a�crf�of��20�(the�URL�of�the�12�interval)�up�to��28�(the�URL�of�the�13�interval)�and�thus�
contains��25��The�next�largest�interval�of�14�takes�us�from�a�crf�of��28�up�to�a�crf�of��32�and�
thus�is�too�large�for�this�particular�percentile��The�next�smallest�interval�of�12�takes�us�from�
a�crf�of��16�up�to�a�crf�of��20�and�thus�is�too�small��The�LRL�of�12�5�indicates�that�P25�is�at�least�
12�5��The�rest�of�the�equation�adds�some�positive�amount�to�the�LRL�
Next�we�have�to�determine�how�far�into�that�interval�we�need�to�go�in�order�to�reach�the�
desired�percentile��We�take�i�percent�of�n,�or�in�this�case�25%�of�the�sample�size�of�25,�which�is�
6�25��So�we�need�to�go�one-fourth�of�the�way�into�the�distribution,�or�6�25�scores,�to�reach�the�
25th�percentile��Another�way�to�think�about�this�is,�because�the�scores�have�been�rank-ordered�
from�lowest�or�smallest�(top�of�the�frequency�distribution)�to�highest�or�largest�(bottom�of�the�
frequency�distribution),�we�need�to�go�25%,�or�6�25�scores,�into�the�distribution�from�the�top�
(or�smallest�value)�to�reach�the�25th�percentile��We�then�subtract�out�all�cumulative�frequen-
cies�smaller�than�(or�below)�the�interval�we�are�looking�in,�where�cf�below�=�5��Again�we�just�
want�to�determine�how�far�into�this�interval�we�need�to�go,�and�thus,�we�subtract�out�all�of�
the�frequencies�smaller�than�this�interval,�or�cf�below��The�numerator�then�becomes�6�25�−�5�=�
1�25��Then�we�divide�by�the�number�of�frequencies�in�the�interval�containing�the�percentile�
we�are�looking�for��This�forms�the�ratio�of�how�far�into�the�interval�we�go��In�this�case,�we�
needed�to�go�1�25�scores�into�the�interval�and�the�interval�contains�2�scores;�thus,�the�ratio�is�
1�25/2�=��625��In�other�words,�we�need�to�go��625�unit�into�the�interval�to�reach�the�desired�
percentile��Now�that�we�know�how�far�into�the�interval�to�go,�we�need�to�weigh�this�by�the�
width�of�the�interval��Here�we�need�to�go�1�25�scores�into�an�interval�containing�2�scores�that�
is�1�unit�wide,�and�thus,�we�go��625�unit�into�the�interval�[(1�25/2)�1�=��625]��If�the�interval�width�
was�instead�10,�then�1�25�scores�into�the�interval�would�be�equal�to�6�25�units�
Consider�two�more�worked�examples�to�try�on�your�own,�either�through�statistical�software�
or�by�hand��The�50th�percentile,�P50,�is
�
P50 16 5
50 25 12
5
1 16 5 0 100 16 600= +
−
= + =.
%( )
. . .
31Data Representation
while�the�75th�percentile,�P75,�is
�
P75 17 5
75 25 17
3
1 17 5 0 583 18 083= +
−
= + =.
%( )
. . .
We�have�only�examined�a�few�example�percentiles�of�the�many�possibilities�that�exist��For�
example,�we�could�also�have�determined�P55�5�or�even�P99�5��Thus,�we�could�determine�any�
percentile,�in�whole�numbers�or�decimals,�between�0�and�100��Next�we�examine�three�par-
ticular�percentiles�that�are�often�of�interest,�the�quartiles�
2.3.2 Quartiles
One�common�way�of�dividing�a�distribution�of�scores�into�equal�groups�of�scores�is�known�
as�quartiles��This�is�done�by�dividing�a�distribution�into�fourths�or�quartiles�where�there�are�
four�equal�groups,�each�containing�25%�of�the�scores��In�the�previous�examples,�we�deter-
mined�P25,�P50,�and�P75,�which�divided�the�distribution�into�four�equal�groups,�from�0�to�25,�
from�25�to�50,�from�50�to�75,�and�from�75�to�100��Thus,�the�quartiles�are�special�cases�of�per-
centiles��A�different�notation,�however,�is�often�used�for�these�particular�percentiles�where�
we�denote�P25�as�Q1,�P50�as�Q2,�and�P75�as�Q3��Thus,�the�Qs�represent�the�quartiles�
An�interesting�aspect�of�quartiles�is�that�they�can�be�used�to�determine�whether�a�distri-
bution�of�scores�is�positively�or�negatively�skewed��This�is�done�by�comparing�the�values�of�
the�quartiles�as�follows��If�(Q3�−�Q2)�>�(Q2�−�Q1),�then�the�distribution�of�scores�is�positively�
skewed� as� the� scores� are� more� spread� out� at� the� high� end� of� the� distribution� and� more�
bunched�up�at�the�low�end�of�the�distribution�(remember�the�shapes�of�the�distributions�
from�Figure�2�5)��If�(Q3�−�Q2)�<�(Q2�−�Q1),�then�the�distribution�of�scores�is�negatively�skewed�
as�the�scores�are�more�spread�out�at�the�low�end�of�the�distribution�and�more�bunched�up�
at�the�high�end�of�the�distribution��If�(Q3�−�Q2)�=�(Q2�−�Q1),�then�the�distribution�of�scores�
is�obviously�not�skewed,�but�is�symmetric�(see�Chapter�4)��For�the�example�statistics�quiz�
data,�(Q3�−�Q2)�=�1�4833�and�(Q2�−�Q1)�=�3�4750;�thus,�(Q3�−�Q2)�<�(Q2�−�Q1)�and�we�know�
that� the� distribution� is� negatively� skewed�� This� should� already� have� been� evident� from�
examining� the� frequency� distribution� in� Figure� 2�3� as� scores� are� more� spread� out� at� the�
low�end�of�the�distribution�and�more�bunched�up�at�the�high�end��Examining�the�quartiles�
is�a�simple�method�for�getting�a�general�sense�of�the�skewness�of�a�distribution�of�scores�
2.3.3 percentile Ranks
Let�us�define�a�percentile rank�as�the�percentage�of�a�distribution�of�scores�that�falls�below�
(or�is�less�than)�a�certain� score��For�instance,�you�may�be�interested� in�the�percentage� of�
scores�of�the�GRE-Quantitative�subscale�that�falls�below�the�score�of�480��Say�that�the�per-
centile�rank�for�the�score�of�480�is�computed�to�be�50;�then�this�would�mean�that�50%�of�
the�scores�fell�below�a�score�of�480��If�this�sounds�familiar,�it�should��The�50th�percentile�
was� previously� stated� to� be� 480�� Thus,� we� have� logically� determined� that� the� percentile�
rank�of�480�is�50��This�is�because�percentile�and�percentile�rank�are�actually�opposite�sides�
of�the�same�coin��Many�are�confused�by�this�and�equate�percentiles�and�percentile�ranks;�
however,� they� are� related� but� different� concepts�� Recall� earlier� we� said� that� percentiles�
were�scores��Percentile�ranks�are�percentages,�as�they�are�continuous�values�and�can�take�
on�any�value�from�0�to�100��The�score�of�400�can�have�a�percentile�rank�of�42�6750��For�nota-
tional�purposes,�a�percentile�rank�will�be�known�as�PR(Pi),�where�Pi�is�the�particular�score�
32 An Introduction to Statistical Concepts
whose�percentile�rank,�PR,�you�wish�to�determine��Thus,�the�percentile�rank�of�the�score�
400�would�be�denoted�as�PR(400)�=�42�6750��In�other�words,�about�43%�of�the�distribution�
falls�below�the�score�of�400�
Let�us�now�consider�how�percentile�ranks�are�computed��The�formula�for�computing�the�
PR(Pi)�percentile�rank�is
� PR P
cf
f P LRL
w
n
i
i
( )
( )
%=
+
−
100 � (2�2)
where
PR(Pi)�indicates�that�we�are�looking�for�the�percentile�rank�PR�of�the�score�Pi
cf� is� the� cumulative� frequency� up� to� but� not� including� the� interval� containing� PR(Pi)�
(again�known�as�cf�below)
f�is�the�frequency�of�the�interval�containing�PR(Pi)
LRL�is�the�lower�real�limit�of�the�interval�containing�PR(Pi)
w�is�the�interval�width
n�is�the�sample�size,�and�finally�we�multiply�by�100%�to�place�the�percentile�rank�on�a�
scale�from�0�to�100�(and�also�to�remind�us�that�the�percentile�rank�is�a�percentage)
As�an�example,�consider�computing�the�percentile�rank�for�the�score�of�17��This�would�cor-
respond�to�the�percentage�of�the�distribution�that�falls�below�a�score�of�17��For�the�example�
data�again,�using�Equation�2�2,�we�compute�PR(17)�as�follows:
�
PR( )
( . )
%
.
%17
12
5 17 16 5
1
25
100
12 2 5
25
100 5=
+
−
=
+
= 88 00. %
Conceptually,�let�us�discuss�how�the�equation�works��First,�we�have�to�determine�what�inter-
val�contains�the�percentile�rank�of�interest��This�is�easily�done�because�we�already�know�the�
score�is�17�and�we�simply�look�in�the�interval�containing�17��The�cf�below�the�17�interval�is�
12�and�n�is�25��Thus,�we�know�that�we�need�to�go�at�least�12/25,�or�48%,�of�the�way�into�the�
distribution�to�obtain�the�desired�percentile�rank��We�know�that�Pi�=�17�and�the�LRL�of�that�
interval�is�16�5��There�are�5�frequencies�in�that�interval,�so�we�need�to�go�2�5�scores�into�
the�interval�to�obtain�the�proper�percentile�rank��In�other�words,�because�17�is�the�midpoint�
of�an�interval�with�width�of�1,�we�need�to�go�halfway�or�2�5/5�of�the�way�into�the�interval�
to�obtain�the�percentile�rank��In�the�end,�we�need�to�go�14�5/25�(or��58)�of�the�way�into�the�
distribution�to�obtain�our�percentile�rank,�which�translates�to�58%�
As� another� example,� we� have� already� determined� that� P50� =� 16�6000�� Therefore,� you�
should�be�able�to�determine�on�your�own�that�PR(16�6000)�=�50%��This�verifies�that�percen-
tiles�and�percentile�ranks�are�two�sides�of�the�same�coin��The�computation�of�percentiles�
identifies� a� specific� score,� and� you� start� with� the� score� to� determine� the� score’s� percen-
tile� rank�� You� can� further� verify� this� by� determining� that� PR(13�1250)� =� 25�00%� and�
PR(18�0833)� =� 75�00%�� Next� we� consider� the� box-and-whisker� plot,� where� quartiles� and�
percentiles�are�used�graphically�to�depict�a�distribution�of�scores�
33Data Representation
2.3.4 box-and-Whisker plot
A�simplified�form�of�the�frequency�distribution�is�the�box-and-whisker plot�(often�referred�
to� simply� as� a� “box� plot”),� developed� by� John� Tukey� (1977)�� This� is� shown� in� Figure� 2�7�
(SPSS�generated)�for�the�example�data��The�box-and-whisker�plot�was�originally�developed�
to�be�constructed�on�a�typewriter�using�lines�in�a�minimal�amount�of�space��The�box�in�
the� center� of� the� figure� displays� the� middle� 50%� of� the� distribution� of� scores�� The� left-
hand�edge�or�hinge�of�the�box�represents�the�25th�percentile�(or�Q1)��The�right-hand�edge�
or�hinge�of�the�box�represents�the�75th�percentile�(or�Q3)��The�middle�vertical�line�in�the�
box�represents�the�50th�percentile�(or�Q2)��The�lines�extending�from�the�box�are�known�as�
the�whiskers��The�purpose�of�the�whiskers�is�to�display�data�outside�of�the�middle�50%��
The�left-hand�whisker�can�extend�down�to�the�lowest�score�(as�is�the�case�with�SPSS),�or�
to�the�5th�or�the�10th�percentile�(by�other�means),�to�display�more�extreme�low�scores,�and�
the�right-hand� whisker� correspondingly� can� extend� up� to�the�highest� score�(SPSS),� or�to�
the� 95th� or� 90th� percentile� (elsewhere),� to� display� more� extreme� high� scores�� The� choice�
of�where�to�extend�the�whiskers�is�the�preference�of�the�researcher�and/or�the�software��
Scores�that�fall�beyond�the�end�of�the�whiskers,�known�as�outliers�due�to�their�extreme-
ness�relative�to�the�bulk�of�the�distribution,�are�often�displayed�by�dots�and/or�asterisks��
Box-and-whisker�plots�can�be�used�to�examine�such�things�as�skewness�(through�the�quar-
tiles),�outliers,�and�where�most�of�the�scores�tend�to�fall�
2.4 SPSS
The�purpose�of�this�section�is�to�briefly�consider�applications�of�SPSS�for�the�topics�covered�
in�this�chapter�(including�important�screenshots)��The�following�SPSS�procedures�will�be�
illustrated:�“Frequencies”�and�“Graphs.”
8 10 12 14 16 18 20
Q
ui
z
FIGuRe 2.7
Box-and-whisker�plot�of�statistics�quiz�data�
34 An Introduction to Statistical Concepts
Frequencies
Frequencies: Step 1.�For�the�types�of�tables�discussed�in�this�chapter,�in�SPSS�go�to�
“Analyze”� in� the� top� pulldown� menu,� then�“Descriptive Statistics,”� and� then�
select� “Frequencies.”� Following� the� screenshot� for� “Frequencies: Step 1”� will�
produce�the�“Frequencies”�dialog�box�
�
A
B
C
Frequencies:
Step 1
Stem and leaf
plots (and many
other statistics)
can be generated
using the
“Explore”
program.
Frequencies: Step 2.�The�“Frequencies”�dialog�box�will�open�(see�screenshot�for�
“Frequencies: Step 2”)��From�this�main�“Frequencies”�dialog�box,�click�the�vari-
able�of�interest�from�the�list�on�the�left�(e�g�,�quiz)�and�move�it�into�the�“Variables”�box�
by�clicking�on�the�arrow�button��By�default,�there�is�a�checkmark�in�the�box�for�“Display
frequency tables,”�and�we�will�keep�this�checked��This�(i�e�,�selecting�“Display fre-
quency tables”)�will�generate�a�table�of�frequencies,�relative�frequencies,�and�cumula-
tive�relative�frequencies��There�are�three�buttons�on�the�right�side�of�the�“Frequencies”�
dialog�box�(“Statistics,” “Charts,” and “Format”)��Let�us�first�cover�the�options�
available�through�“Statistics.”
Select the variable
of interest from
the list on the left
and use the arrow
to move to the
“Variable” box
on the right.
�is is checked by
default and will produce
a frequency distribution
table in the output.
Clicking on these
options will allow
you to select
various statistics
and graphs.
Frequencies: Step 2
35Data Representation
Frequencies: Step 3a. If� you� click� on� the� “Statistics”� button� from� the� main�
“Frequencies”�dialog�box�(see�“Frequencies: Step 2”),�a�new�box�labeled�“Frequencies:
Statistics”�will�appear�(see�screenshot�for�“Frequencies: Step 3a”)��From�here,�you�can�
obtain�quartiles�and�selected�percentiles�as�well�as�numerous�other�descriptive�statistics�simply�
by�placing�a�checkmark�in�the�boxes�for�the�statistics�that�you�want�to�generate��For�better�accu-
racy�when�generating�the�median,�quartiles,�and�percentiles,�check�the�box�for�“Values are
group midpoints.”�However,�it�should�be�noted�that�these�values�are�not�always�as�precise�
as�those�from�the�formula�given�earlier�in�this�chapter�
Check this for
better
accuracy with
the median,
quartiles and
percentiles.
Options available when clicking
on “Statistics” from the main
dialog box for Frequencies. Placing a
checkmark will generate the
respective statistic in the output.
Frequencies: Step 3a
Frequencies: Step 3b.�If�you�click�on�the�“Charts”�button�from�the�main�“Frequencies”�
dialog�box�(see�screenshot�for�“Frequencies: Step 2”),�a�new�box�labeled�“Frequencies:
Charts”�will�appear�(see�screenshot�for�“Frequencies: Step 3b”)��From�here,�you�can�
select� options� to� generate� bar� graphs,� pie� charts,� or� histograms�� If� you� select� bar� graphs� or�
pie� charts,� you� can� plot� either� frequencies� or� percentages� (relative� frequencies)�� Thus,� the�
“Frequencies”�program�enables�you�to�do�much�of�what�this�chapter�has�covered��In�addi-
tion,� stem-and-leaf� plots� are� available� in� the�“Explore”� program� (see�“Frequencies:
Step 1”�for�a�screenshot�on�where�the�“Explore”�program�can�be�accessed)�
Options available
when clicking on
“Charts” from the
main dialog box for
frequencies.
Frequencies: Step 3b
36 An Introduction to Statistical Concepts
Graphs
There�are�multiple�graphs�that�can�be�generated�in�SPSS��We�will�examine�how�to�generate�
histograms,�boxplots,�bar�graphs,�and�more�using�the�“Graphs”�procedure�in�SPSS�
Histograms
Histograms: Step 1.�For�other�ways�to�generate�the�types�of�graphical�displays�covered�
in�this�chapter,�go�to�“Graphs”�in�the�top�pulldown� menu��From� there,�select�“Legacy
Dialogs,”�then�“Histogram”�(see�screenshot�for�“Graphs: Step 1”)��Another�option�
for�creating�a�histogram,�although�not�shown�here,�starts�again�from�the�“Graphs”�option�
in� the� top� pulldown� menu,� where� you� select�“Legacy Dialogs,”� then�“Graphboard
Template Chooser,”�and�finally�“Histogram.”
Options available
when clicking on
“Legacy Dialogs”
from the main
pulldown menu
for graphs.
Graphs: Step 1
A
B
C
Histograms: Step 2.�This�will�bring�up�the�“Histogram”�dialog�box�(see�screenshot�
for�“Histograms: Step 2”)��Click�the�variable�of�interest�(e�g�,�quiz)�and�move�it�into�the�
“Variable(s)”�box�by�clicking�on�the�arrow��Place�a�checkmark�in�“Display normal
curve,”� and� then� click�“OK.”� This� will� generate� the� same� histogram� as� was� produced�
through�the�“Frequencies”�program�already�mentioned�
Histograms: Step 2
37Data Representation
Boxplots
Boxplots: Step 1.�To�produce�a�boxplot�for�individual�variables,�click�on�“Graphs”�
in�the�top�pulldown�menu��From�there,�select�“Legacy Dialogs,”�then�“Boxplot”�
(see� “GRAPHS: Step 1”� for� screenshot� of� this� step)�� Another� option� for� creating�
a� boxplot� (although� not� shown� here)� starts� again� from� the� “Graphs”� option� in� the�
top� pulldown� menu,� where� you� select� “Graphboard Template chooser,”� then�
“Boxplots.”
Boxplots: Step 2.� This� will� bring� up� the� “Boxplot”� dialog� box� (see� screenshot�
for�“Boxplots: Step 2”)��Select�the�“Simple”�option�(by�default,�this�will�already�be�
selected)��To�generate�a�separate�boxplot�for�individual�variables,�click�on�the�“Summaries
of separate variables”�radio�button��Then�click�“Define.”
Boxplots: Step 2
Boxplots: Step 3.�This�will�bring�up�the�“Define Simple Boxplot: Summaries
of Separate Variables”�dialog�box�(see�screenshot�for�“Boxplots: Step 3”)��Click�
the�variable�of�interest�(e�g�,�quiz)�into�the�“Variable(s)”�box��Then�click�“OK.”�This�will�
generate�a�boxplot�
Boxplots:
Step 3
38 An Introduction to Statistical Concepts
Bar Graphs
Bar Graphs: Step 1.� To� produce� a� bar� graph� for� individual� variables,� click� on�
“Graphs”�in�the�top�pulldown�menu��From�there,�select�“Legacy Dialogs,”�then�“Bar”�
(see�“Graphs: Step 1”�for�screenshot�of�this�step)�
Bar Graphs: Step 2.�From�the�main�“Bar Chart”�dialog�box,�select�“Simple”�(which�
will�be�selected�by�default)�and�click�on�the�“Summaries for groups of cases”�radio�
button�(see�screenshot�for�“Bar Graphs: Step 2”)�
Bar graphs: Step 2
Bar Graphs: Step 3.�A�new�box�labeled�“Define Simple Bar: Summaries for
Groups of Cases”�will�appear��Click�the�variable�of�interest�(e�g�,�eye�color)�and�move�
it�into�the�“Variable”�box�by�clicking�the�arrow�button��Then�a�decision�must�be�made�
for�how�the�bars�will�be�displayed��Several�types�of�displays�for�bar�graph�data�are�avail-
able,�including�“N of cases”�for�frequencies,�“cum. N”�for�cumulative�frequencies,�
“% of cases”�for�relative�frequencies,�and�“cum. %”�for�cumulative�relative�frequen-
cies�(see�screenshot�for�“Bar Graphs: Step 3”)��Additionally,�other�statistics�can�be�
selected�through�the�“Other statistic (e.g., mean)”�option��The�most�common�
bar�graph�is�one�which�simply�displays�the�frequencies�(i�e�,�selecting�the�radio�button�
for�“N of cases”)��Once�your�selections�are�made,�click�“OK.”�This�will�generate�a�
bar�graph�
39Data Representation
When “Other
statistic (e.g.,
mean)” is selected, a
dialog box (shown here
as “Statistic”) will
appear.
All other statistics
that can be
represented by the
bars in the graph
are listed.
Clicking on the
radio button will
select the statistic.
Once the selection
is made, click on
“Continue” to return
to the “Define
Simple:Summaries
for Groups of
Cases” dialog box.
Bar graphs: Step 3
Frequency Polygons
Frequency Polygons: Step 1.� Frequency� polygons� can� be� generated� by� clicking�
on�“Graphs”� in� the� top� pulldown� menu�� From� there,� select�“Legacy Dialogs,”� then�
“Line”�(see�“Graphs: Step 1”�for�a�screenshot�of�this�step)�
Frequency Polygons: Step 2.�From�the�main�“Line Charts”�dialog�box,�select�
“Simple”�(which�will�be�selected�by�default)�and�click�on�the�“Summaries for groups
of cases”�(which�will�be�selected�by�default)�radio�button�(see�screenshot�for�“Frequency
Polygons: Step 2”)�
40 An Introduction to Statistical Concepts
Frequency
polygons:
Step 2
Frequency Polygons: Step 3.�A�new�box�labeled�“Define Simple Line:
Summaries for Groups of Cases”� will� appear�� Click� the� variable� of� interest� (e�g�,�
quiz)�and�move�it�into�the�“Variable”�box�by�clicking�the�arrow�button��Then�a�decision�
must�be�made�for�how�the�lines�will�be�displayed��Several�types�of�displays�for�line�graph�
(i�e�,� frequency� polygon)� data� are� available,� including� “N of cases”� for� frequencies,�
“cum. N”�for�cumulative�frequencies,�“% of cases”�for�relative�frequencies,�and�“cum.
%”�for�cumulative�relative�frequencies�(see�screenshot�for�“Frequency Polygons: Step
3”)��Additionally,�other�statistics�can�be�selected�through�the�“Other statistic (e.g.,
mean)”� option�� The� most� common� frequency� polygon� is� one� which� simply� displays� the�
frequencies�(i�e�,�selecting�the�radio�button�for�“N of cases”)��Once�your�selections�are�
made,�click�“OK.”�This�will�generate�a�frequency�polygon�
When “Other
statistic (e.g.,
mean)” is selected, a
dialog box (shown here
as “Statistic”)
will appear.
All other statistics
that can be
represented by the
bars in the graph
are listed.
Clicking on the
radio button will
select the statistic.
Once the selection
is made, click on
“Continue” to
return to the
“Define Simple:
Summaries for
Groups of Cases”
dialog box.
Frequency polygons: Step 3
41Data Representation
Editing Graphs
Once�a�graph�or�table�is�created,�double�clicking�on�the�table�or�graph�produced�in�the�out-
put�will�allow�the�user�to�make�changes�such�as�changing�the�X�and/or�Y�axis,�colors,�and�
more��An�illustration�of�the�options�available�in�chart�editor�is�presented�here�
5
4
3
Fr
eq
ue
nc
y
2
1
0
9.0 12.0 15.0
Quiz
18.0 21.0
Mean = 15.56
Std. Dev. = 3.163
N = 25
Chart editor
2.5 Templates for Research Questions and APA-Style Paragraph
Depending�on�the�purpose�of�your�research�study,�you�may�or�may�not�write�a�research�
question�that�corresponds�to�your�descriptive�statistics��If�the�end�result�of�your�research�paper�
is� to� present� results� from� inferential� statistics,� it� may� be� that� your� research� questions�
correspond�only�to�those�inferential�questions�and�thus�no�question�is�presented�to�rep-
resent�the�descriptive�statistics��That�is�quite�common��On�the�other�hand,�if�the�ultimate�
purpose�of�your�research�study�is�purely�descriptive�in�nature,�then�writing�one�or�more�
research�questions�that�correspond�to�the�descriptive�statistics�is�not�only�entirely�appro-
priate� but� (in� most� cases)� absolutely� necessary�� At� this� time,� let� us� revisit� our� gradu-
ate� research� assistant,� Marie,� who� was� introduced� at� the� beginning� of� the� chapter�� As�
you� may� recall,� her� task� was� to� summarize� data� from� 25� students� enrolled� in� a� statis-
tics�course��The�questions�that�Marie’s�faculty�mentor�shared�with�her�were�as�follows:�
How can the quiz scores of students enrolled in an introductory
42 An Introduction to Statistical Concepts
statistics class be graphically represented in a table? In a figure?
What is the distributional shape of the statistics quiz score? What
is the 50th percentile of the quiz scores?�A�template�for�writing�descriptive�
research�questions�for�summarizing�data�may�be�as�follows��Please�note�that�these�are�
just�a�few�examples��Given�the�multitude�of�descriptive�statistics�that�can�be�generated,�
these�are�not�meant�to�be�exhaustive�
How can [variable] be graphically represented in a table? In a
figure? What is the distributional shape of the [variable]? What
is the 50th percentile of [variable]?
Next,�we�present�an�APA-like�paragraph�summarizing�the�results�of�the�statistics�quiz�data�
example�
As shown in Table 2.2 and Figure 2.2, scores ranged from 9 to 20,
with more students achieving a score of 17 than any other score
(20%). From Figure 2.2, we also know that the distribution of
scores was negatively skewed, with the bulk of the scores being
at the high end of the distribution. Skewness was also evident
as the quartiles were not equally spaced, as shown in Figure
2.7. Thus, overall the sample of students tended to do rather
well on this particular quiz (must have been the awesome teach-
ing), although a few low scores should be troubling (as 20% did
not pass the quiz and need some remediation).
2.6 Summary
In�this�chapter,�we�considered�both�tabular�and�graphical�methods�for�representing�data��
First,� we� discussed� the� tabular� display� of� distributions� in� terms� of� frequency� distribu-
tions� (ungrouped� and� grouped),� cumulative� frequency� distributions,� relative� frequency�
distributions,�and�cumulative�relative�frequency�distributions��Next,�we�examined�various�
methods�for�depicting�data�graphically,�including�bar�graphs,�histograms�(ungrouped�and�
grouped),� frequency� polygons,� cumulative� frequency� polygons,� shapes� of� distributions,�
and� stem-and-leaf� displays�� Then,� concepts� and� procedures� related� to� percentiles� were�
covered,� including� percentiles,� quartiles,� percentile� ranks,� and� box-and-whisker� plots��
Finally,� an� overview� of� SPSS� for� these� procedures� was� included,� as� well� as� a� summary�
APA-style�paragraph�of�the�quiz�dataset��We�include�Box�2�1�as�a�summary�of�which�data�
representation� techniques� are� most� appropriate� for� each� type� of� measurement� scale�� At�
this�point,�you�should�have�met�the�following�objectives:�(a)�be�able�to�construct�and�inter-
pret�statistical�tables,�(b)�be�able�to�construct�and�interpret�statistical�graphs,�and�(c)�be�able�
to�determine�and�interpret�percentile-related�information��In�the�next�chapter,�we�address�
the�major�population�parameters�and�sample�statistics�useful�for�looking�at�a�single�vari-
able��In�particular,�we�are�concerned�with�measures�of�central�tendency�and�measures�of�
dispersion�
43Data Representation
STOp aNd ThINk bOx 2.1
Appropriate�Data�Representation�Techniques
Measurement Scale Tables Figures
Nominal •�Frequency�distribution •�Bar�graph
•�Relative�frequency�distribution
Ordinal,�interval,�or�ratio •�Frequency�distribution •�Histogram
•��Cumulative�frequency�
distribution
•�Relative�frequency�distribution
•��Cumulative�relative�frequency�
distribution
•�Frequency�polygon
•�Relative�frequency�polygon
•�Cumulative�frequency�polygon
•��Cumulative�relative�frequency�
polygon
•�Stem-and-leaf�display
�•�Box-and-whisker�plot
Problems
Conceptual problems
2.1� For�a�distribution�where�the�50th�percentile�is�100,�what�is�the�percentile�rank�of�100?
� a�� 0
� b�� �50
� c�� 50
� d�� 100
2.2� Which�of�the�following�frequency�distributions�will�generate�the�same�relative�fre-
quency�distribution?
X f Y f Z f
100 2 100 6 100 8
99 5 99 15 99 18
98 8 98 24 98 28
97 5 97 15 97 18
96 2 96 6 96 8
� a�� X�and�Y�only
� b�� X�and�Z�only
� c�� Y�and�Z�only
� d�� X,�Y,�and�Z
� e�� None�of�the�above
44 An Introduction to Statistical Concepts
2.3� Which� of� the� following� frequency� distributions� will� generate� the� same� cumulative�
relative�frequency�distribution?
X f Y f Z f
100 2 100 6 100 8
99 5 99 15 99 18
98 8 98 24 98 28
97 5 97 15 97 18
96 2 96 6 96 8
� a�� X�and�Y�only
� b�� X�and�Z�only
� c�� Y�and�Z�only
� d�� X,�Y,�and�Z
� e�� None�of�the�above
2.4� In�a�histogram,�48%�of�the�area�lies�below�the�score�whose�percentile�rank�is�52��True�
or�false?
2.5� Among�the�following,�the�preferred�method�of�graphing�data�pertaining�to�the�eth-
nicity�of�a�sample�would�be
� a�� A�histogram
� b�� A�frequency�polygon
� c�� A�cumulative�frequency�polygon
� d�� A�bar�graph
2.6� The�proportion�of�scores�between�Q1�and�Q3�may�be�less�than��50��True�or�false?
2.7� The�values�of�Q1,�Q2,�and�Q3�in�a�positively�skewed�population�distribution�are�calcu-
lated��What�is�the�expected�relationship�between�(Q2�−�Q1)�and�(Q3�−�Q2)?
� a�� (Q2�−�Q1)�is�greater�than�(Q3�−�Q2)�
� b�� (Q2�−�Q1)�is�equal�to�(Q3�−�Q2)�
� c�� (Q2�−�Q1)�is�less�than�(Q3�−�Q2)�
� d�� Cannot�be�determined�without�examining�the�data�
2.8� If�the�percentile�rank�of�a�score�of�72�is�65,�we�may�say�that�35%�of�the�scores�exceed�
72��True�or�false?
2.9� In�a�negatively�skewed�distribution,�the�proportion�of�scores�between�Q1�and�Q2�is�
less�than��25��True�or�false?
2.10� A� group� of� 200� sixth-grade� students� was� given� a� standardized� test� and� obtained�
scores�ranging�from�42�to�88��If�the�scores�tended�to�“bunch�up”�in�the�low�80s,�the�
shape�of�the�distribution�would�be�which�one�of�the�following:
� a�� Symmetrical
� b�� Positively�skewed
� c�� Negatively�skewed
� d�� Normal
45Data Representation
2.11� The�preferred�method�of�graphing�data�on�the�eye�color�of�a�sample�is�which�one�of�
the�following?
� a�� Bar�graph
� b�� Frequency�polygon
� c�� Cumulative�frequency�polygon
� d�� Relative�frequency�polygon
2.12� If�Q2�=�60,�then�what�is�P50?
� a�� 50
� b�� 60
� c�� 95
� d�� Cannot�be�determined�with�the�information�provided
2.13� With�the�same�data�and�using�an�interval�width�of�1,�the�frequency�polygon�and�his-
togram�will�display�the�same�information��True�or�false?
2.14� A�researcher�develops�a�histogram�based�on�an�interval�width�of�2��Can�she�recon-
struct�the�raw�scores�using�only�this�histogram?�Yes�or�no?
2.15� Q2�=�50�for�a�positively�skewed�variable,�and�Q2�=�50�for�a�negatively�skewed�variable��
I�assert�that�Q1�will�not�necessarily�be�the�same�for�both�variables��Am�I�correct?�True�
or�false?
2.16� Which�of�the�following�statements�is�correct�for�a�continuous�variable?
� a�� The�proportion�of�the�distribution�below�the�25th�percentile�is�75%�
� b�� The�proportion�of�the�distribution�below�the�50th�percentile�is�25%�
� c�� The�proportion�of�the�distribution�above�the�third�quartile�is�25%�
� d�� The�proportion�of�the�distribution�between�the�25th�and�75th�percentiles�is 25%�
2.17� For�a�dataset�with�four�unique�values�(55,�70,�80,�and�90),�the�relative�frequency�for�the�
value�55�is�20%,�the�relative�frequency�for�70�is�30%,�the�relative�frequency�for�80�is�20%,�
and�the�relative�frequency�for�90�is�30%��What�is�the�cumulative�relative�frequency�for�
the�value�70?
� a�� 20%
� b�� 30%
� c�� 50%
� d�� 100%
2.18� In�examining�data�collected�over�the�past�10�years,�researchers�at�a�theme�park�find�
the�following�for�5000�first-time�guests:�2250�visited�during�the�summer�months;�
675� visited� during� the� fall;� 1300� visited� during� the� winter;� and� 775� visited� dur-
ing� the� spring�� What� is� the� relative� frequency� for� guests� who� visited� during� the�
spring?
� a�� �135
� b�� �155
� c�� �260
� d�� �450
46 An Introduction to Statistical Concepts
Computational problems
2.1� The�following�scores�were�obtained�from�a�statistics�exam:
47 50 47 49 46 41 47 46 48 44
46 47 45 48 45 46 50 47 43 48
47 45 43 46 47 47 43 46 42 47
49 44 44 50 41 45 47 44 46 45
42 47 44 48 49 43 45 49 49 46
Using�an�interval�size�of�1,�construct�or�compute�each�of�the�following:
� a�� Frequency�distribution
� b�� Cumulative�frequency�distribution
� c�� Relative�frequency�distribution
� d�� Cumulative�relative�frequency�distribution
� e� Histogram�and�frequency�polygon
� f� Cumulative�frequency�polygon
� g�� Quartiles
� h�� P10�and�P90
� i�� PR(41)�and�PR(49�5)
� j�� Box-and-whisker�plot
� k�� Stem-and-leaf�display
2.2� The�following�data�were�obtained�from�classroom�observations�and�reflect�the�num-
ber�of�incidences�that�preschool�children�shared�during�an�8�hour�period�
4 8 10 5 12 10 14 5
10 14 12 14 8 5 0 8
12 8 12 5 4 10 8 5
Using�an�interval�size�of�1,�construct�or�compute�each�of�the�following:
� a�� Frequency�distribution
� b�� Cumulative�frequency�distribution
� c�� Relative�frequency�distribution
� d�� Cumulative�relative�frequency�distribution
� e�� Histogram�and�frequency�polygon
� f�� Cumulative�frequency�polygon
� g�� Quartiles
� h�� P10�and�P90
� i�� PR(10)
� j�� Box-and-whisker�plot
� k�� Stem-and-leaf�display
47Data Representation
2.3� A�sample�distribution�of�variable�X�is�as�follows:
X f
2 1
3 2
4 5
5 8
6 4
7 3
8 4
9 1
10 2
Calculate�or�draw�each�of�the�following�for�the�sample�distribution�of�X:
� a�� Q1
� b�� Q2
� c�� Q3
� d�� P44�5
� e�� PR(7�0)
� f�� Box-and-whisker�plot
� g�� Histogram�(ungrouped)
2.4� A�sample�distribution�of�classroom�test�scores�is�as�follows:
X f
70 1
75 2
77 3
79 2
80 6
82 5
85 4
90 4
96 3
Calculate�or�draw�each�of�the�following�for�the�sample�distribution�of�X:
� a�� Q1
� b�� Q2
� c�� Q3
� d�� P44�5
� e�� PR(82)
� f�� Box-and-whisker�plot
� g�� Histogram�(ungrouped)
48 An Introduction to Statistical Concepts
Interpretive problems
Select�two�variables�from�the�survey1�dataset�on�the�website,�one�that�is�nominal�and�one�
that�is�not�
2.1� �Write� research� questions� that� will� be� answered� from� these� data� using� descriptive�
statistics�(you�may�want�to�review�the�research�question�template�in�this�chapter)�
2.2� �Construct�the�relevant�tables�and�figures�to�answer�the�questions�you�posed�
2.3� �Write�a�paragraph�which�summarizes�the�findings�for�each�variable�(you�may�want�
to�review�the�writing�template�in�this�chapter)�
49
3
Univariate Population Parameters
and Sample Statistics
Chapter Outline
3�1� Summation�Notation
3�2� Measures�of�Central�Tendency
3�2�1� Mode
3�2�2� Median
3�2�3� Mean
3�2�4� Summary�of�Measures�of�Central�Tendency
3�3� Measures�of�Dispersion
3�3�1� Range
3�3�2� H�Spread
3�3�3� Deviational�Measures
3�3�4� Summary�of�Measures�of�Dispersion
3�4� SPSS
3�5� Templates�for�Research�Questions�and�APA-Style�Paragraph
Key Concepts
� 1�� Summation
� 2�� Central�tendency
� 3�� Outliers
� 4�� Dispersion
� 5�� Exclusive�versus�inclusive�range
� 6�� Deviation�scores
� 7�� Bias
In�the�second�chapter,�we�began�our�discussion�of�descriptive�statistics�previously�defined�as�
techniques�which�allow�us�to�tabulate,�summarize,�and�depict�a�collection�of�data�in�an�abbre-
viated�fashion��There�we�considered�various�methods�for�representing�data�for�purposes�of�
communicating�something�to�the�reader�or�audience��In�particular,�we�were�concerned�with�
ways�of�representing�data�in�an�abbreviated�fashion�through�both�tables�and�figures�
50 An Introduction to Statistical Concepts
In� this� chapter,� we� delve� more� into� the� field� of� descriptive� statistics� in� terms� of� three�
general� topics�� First,� we� examine� summation� notation,� which� is� important� for� much� of�
the� chapter� and,� to� some� extent,� the� remainder� of� the� text�� Second,� measures� of� central�
tendency�allow�us�to�boil�down�a�set�of�scores�into�a�single�value,�a�point�estimate,�which�
somehow� represents� the� entire� set�� The� most� commonly� used� measures� of� central� ten-
dency�are�the�mode,�median,�and�mean��Finally,�measures�of�dispersion�provide�us�with�
information�about� the�extent�to� which� the� set�of�scores� varies—in� other�words,� whether�
the�scores�are�spread�out�quite�a�bit�or�are�pretty�much�the�same��The�most�commonly�used�
measures�of�dispersion�are�the�range�(exclusive�and�inclusive�ranges),�H�spread,�and�vari-
ance�and�standard�deviation��In�summary,�concepts�to�be�discussed�in�this�chapter�include�
summation,�central�tendency,�and�dispersion��Within�this�discussion,�we�also�address�out-
liers�and�bias��Our�objectives�are�that�by�the�end�of�this�chapter,�you�will�be�able�to�do�
the�following:�(a)�understand�and�utilize�summation�notation,�(b)�determine�and�interpret�
the�three�commonly�used�measures�of�central�tendency,�and�(c)�determine�and�interpret�dif-
ferent�measures�of�dispersion�
3.1 Summation Notation
We�were�introduced�to�the�following�research�scenario�in�Chapter�2�and�revisit�Marie�in�
this�chapter�
Marie,� a� graduate� student� pursuing� a� master’s� degree� in� educational� research,� has�
been� assigned� to� her� first� task� as� a� research� assistant�� Her� faculty� mentor� has� given�
Marie�quiz�data�collected�from�25�students�enrolled�in�an�introductory�statistics�course�
and� has� asked� Marie� to� summarize� the� data�� The� faculty� member� was� pleased� with�
the�descriptive�analysis�and�presentation�of�results�previously�shared,�and�has�asked�
Marie�to�conduct�additional�analysis�related�to�the�following�research�questions:�How
can quiz scores of students enrolled in an introductory statistics class be summarized using
measures of central tendency? Measures of dispersion?
Many� areas� of� statistics,� including� many� methods� of� descriptive� and� inferential� statis-
tics,�require�the�use�of�summation�notation��Say�we�have�collected�heart�rate�scores�from�
100�students��Many�statistics�require�us�to�develop�“sums”�or�“totals”�in�different�ways��
For� example,� what� is� the� simple� sum� or� total� of� all� 100� heart� rate� scores?� Summation�
(i�e�,�addition)�is�not�only�quite�tedious�to�do�computationally�by�hand,�but�we�also�need�
a�system�of�notation�to�communicate�how�we�have�conducted�this�summation�process��
This�section�describes�such�a�notational�system�
For�simplicity,�let�us�utilize�a�small�set�of�scores,�keeping�in�mind�that�this�system�can�
be�used�for�a�set�of�numerical�values�of�any�size��In�other�words,�while�we�speak�in�terms�
of�“scores,”�this�could�just�as�easily�be�a�set�of�heights,�distances,�ages,�or�other�measures��
Specifically� in� this� example,� we� have� a� set� of� five� ages:� 7,� 11,� 18,� 20,� and� 24�� Recall� from�
Chapter�2�the�use�of�X�to�denote�a�variable��Here�we�define�Xi�as�the�score�for�variable�X�(in�
this�example,�age)�for�a�particular�individual�or�object�i��The�subscript�i�serves�to�identify�
one�individual�or�object�from�another��These�scores�would�then�be�denoted�as�follows:�
X1�=�7,�X2�=�11,�X3�=�18,�X4�=�20,�and�X5�=�24��To�interpret�X1�=�7�means�that�for�variable�X�
and�individual�1,�the�value�of�the�variable�age�is�7��In�other�words,�individual�1�is�7�years�of�age��
51Univariate Population Parameters and Sample Statistics
With�five�individuals�measured�on�age,�then�i�=�1,�2,�3,�4,�5��However,�with�a�large�set�of�
values,�this�notation�can�become�quite�unwieldy,�so�as�shorthand�we�abbreviate�this�as�
i�=�1,…,�5,�meaning�that�X�ranges�or�goes�from�i�=�1�to�i�=�5�
Next�we�need�a�system�of�notation�to�denote�the�summation�or�total�of�a�set�of�scores��
The�standard�notation�used�is� Xi
i a
b
=
∑ ,�where�Σ�is�the�Greek�capital�letter�sigma�and�merely�
means�“the�sum�of,”�Xi�is�the�variable�we�are�summing�across�for�each�of�the�i�individuals,�
i = a�indicates�that�a�is�the�lower�limit�(or�beginning)�of�the�summation�(i�e�,�the�first�value�
with� which� we� begin� our� addition),� and� b� indicates� the� upper� limit� (or� end)� of� the� sum-
mation�(i�e�,�the�last�value�added)��For�our�example�set�of�ages,�the�sum�of�all�of�the�ages�
would�be�denoted�as� Xi
i=
∑
1
5
�in�shorthand�version�and�as� X X X X X Xi
i=
∑ = + + + +
1
5
1 2 3 4 5�in�
longhand�version��For�the�example�data,�the�sum�of�all�of�the�ages�is�computed�as�follows:
X X X X X Xi
i=
∑ = + + + + = + + + + =
1
5
1 2 3 4 5 7 11 18 20 24 80
Thus,�the�sum�of�the�age�variable�across�all�five�individuals�is�80�
For�large�sets�of�values,�the�longhand�version�is�rather�tedious,�and,�thus,�the�shorthand�
version�is�almost�exclusively�used��A�general�form�of�the�longhand�version�is�as�follows:
X X X X Xi
i a
b
a a b b
=
+ −∑ = + + + +1 1…
The�ellipse�notation�(i�e�,�…)�indicates�that�there�are�as�many�values�in�between�the�two�
values� on� either� side� of� the� ellipse� as� are� necessary�� The� ellipse� notation� is� then� just�
shorthand�for�“there�are�some�values�in�between�here�”�The�most�frequently�used�values�
for�a�and�b�with�sample�data�are�a�=�1�and�b = n�(as�you�may�recall,�n�is�the�notation�used�
to� represent� our� sample� size)�� Thus,� the� most� frequently� used� summation� notation� for�
sample�data�is� Xi
i
n
=
∑
1
.
3.2 Measures of Central Tendency
One�method�for�summarizing�a�set�of�scores�is�to�construct�a�single�index�or�value�that�can�
somehow�be�used�to�represent�the�entire�collection�of�scores��In�this�section,�we�consider�
the�three�most�popular�indices,�known�as�measures of central tendency��Although�other�
indices�exist,�the�most�popular�ones�are�the�mode,�the�median,�and�the�mean�
3.2.1 Mode
The� simplest� method� to� use� for� measuring� central� tendency� is� the� mode�� The� mode� is�
defined� as� that� value� in� a� distribution� of� scores� that� occurs� most� frequently�� Consider�
the�example�frequency�distributions�of�the�number�of�hours�of�TV�watched�per�week,�as�
52 An Introduction to Statistical Concepts
shown� in� Table� 3�1�� In� distribution� (a),� the� mode� is� easy� to� determine,� as� the� interval� for�
value�8�contains�the�most�scores,�3�(i�e�,�the�mode�number�of�hours�of�TV�watched�is�8)��In�
distribution�(b),�the�mode�is�a�bit�more�complicated�as�two�adjacent�intervals�each�contain�
the�most�scores;�that�is,�the�8�and�9�hour�intervals�each�contain�three�scores��Strictly�speak-
ing,�this�distribution�is�bimodal,�that�is,�containing�two�modes,�one�at�8�and�one�at�9��This�
is� our� personal� preference� for� reporting� this� particular� situation�� However,� because� the�
two�modes�are�in�adjacent�intervals,�some�individuals�make�an�arbitrary�decision�to�aver-
age�these�intervals�and�report�the�mode�as�8�5�
Distribution�(c)�is�also�bimodal;�however,�here�the�two�modes�at�7�and�11�hours�are�not�
in�adjacent�intervals��Thus,�one�cannot�justify�taking�the�average�of�these�intervals,�as�the�
average� of�9�hours�[i�e�,�(7�+�11)/2]�is�not�representative�of� the�most�frequently�occurring�
score��The�score�of�9�occurs�less�than�any�other�score�observed��We�recommend�reporting�
both� modes� here� as� well�� Obviously,� there� are� other� possible� situations� for� the� mode�
(e�g�,�trimodal�distribution),�but�these�examples�cover�the�basics��As�one�further�example,�
the�example�data�on�the�statistics�quiz�from�Chapter�2�are�shown�in�Table�3�2�and�are�used�
to� illustrate� the� methods� in� this� chapter�� The� mode� is� equal� to� 17� because� that� interval�
contains�more�scores�(5)�than�any�other�interval��Note�also�that�the�mode�is�determined�in�
Table 3.2
Frequency�Distribution�
of Statistics�Quiz�Data
X f cf rf crf
9 1 1 �04 �04
10 1 2 �04 �08
11 2 4 �08 �16
12 1 5 �04 �20
13 2 7 �08 �28
14 1 8 �04 �32
15 3 11 �12 �44
16 1 12 �04 �48
17 5 17 �20 �68
18 3 20 �12 �80
19 4 24 �16 �96
20 1 25 �04 1�00
n�=�25 1�00
Table 3.1
Example�Frequency�Distributions
X f(a) f(b) f(c)
6 1 1 2
7 2 2 3
8 3 3 2
9 2 3 1
10 1 2 2
11 0 1 3
12 0 0 2
53Univariate Population Parameters and Sample Statistics
precisely�the�same�way�whether�we�are�talking�about�the�population�mode�(i�e�,�the�popu-
lation�parameter)�or�the�sample�mode�(i�e�,�the�sample�statistic)�
Let�us�turn�to�a�discussion�of�the�general�characteristics�of�the�mode,�as�well�as�whether�
a�particular�characteristic�is�an�advantage�or�a�disadvantage�in�a�statistical�sense��The�first�
characteristic�of�the�mode�is�it�is�simple�to�obtain��The�mode�is�often�used�as�a�quick-and-
dirty� method� for� reporting� central� tendency�� This� is� an� obvious� advantage�� The� second�
characteristic�is�the�mode�does�not�always�have�a�unique�value��We�saw�this�in�distribu-
tions� (b)� and� (c)� of� Table� 3�1�� This� is� generally� a� disadvantage,� as� we� initially� stated� we�
wanted�a�single�index�that�could�be�used�to�represent�the�collection�of�scores��The�mode�
cannot�guarantee�a�single�index�
Third,�the�mode�is�not�a�function�of�all�of�the�scores�in�the�distribution,�and�this�is�generally�
a�disadvantage��The�mode�is�strictly�determined�by�which�score�or�interval�contains�the�most�
frequencies��In�distribution�(a),�as�long�as�the�other�intervals�have�fewer�frequencies�than�the�
interval�for�value�8,�then�the�mode�will�always�be�8��That�is,�if�the�interval�for�value�8�contains�
three�scores�and�all�of�the�other�intervals�contain�less�than�three�scores,�then�the�mode�will�
be�8��The�number�of�frequencies�for�the�remaining�intervals�is�not�relevant�as�long�as�it�is�less�
than�3��Also,�the�location�or�value�of�the�other�scores�is�not�taken�into�account�
The�fourth�characteristic�of�the�mode�is�that�it�is�difficult�to�deal�with�mathematically��For�
example,�the�mode�is�not�very�stable�from�one�sample�to�another,�especially�with�small�sam-
ples��We�could�have�two�nearly�identical�samples�except�for�one�score,�which�can�alter�the�
mode��For�example,�in�distribution�(a),�if�a�second�similar�sample�contains�the�same�scores�
except�that�an�8�is�replaced�with�a�7,�then�the�mode�is�changed�from�8�to�7��Thus,�changing�
a� single� score� can� change� the� mode,� and� this� is� considered� to� be� a� disadvantage�� A� fifth�
and�final�characteristic�is�the�mode�can�be�used�with�any�type�of�measurement�scale,�from�
nominal�to�ratio,�and�is�the�only�measure�of�central�tendency�appropriate�for�nominal�data�
3.2.2 Median
A�second�measure�of�central�tendency�represents�a�concept�that�you�are�already�familiar�
with��The�median�is�that�score�which�divides�a�distribution�of�scores�into�two�equal�parts��
In� other� words,� one-half� of� the� scores� fall� below� the� median,� and� one-half� of� the� scores�
fall�above�the�median��We�already�know�this�from�Chapter�2�as�the�50th�percentile�or�Q2��
In� other� words,� the� 50th� percentile,� or� Q2,� represents� the� median� value�� The� formula�for�
computing�the�median�is
�
Median LRL
n cf
f
w= +
−50% ( )
�
(3�1)
where�the�notation�is�the�same�as�previously�described�in�Chapter�2��Just�as�a�reminder,�
LRL� is� the� lower� real� limit� of� the� interval� containing� the� median,� 50%� is� the� percentile�
desired,�n�is�the�sample�size,�cf�is�the�cumulative�frequency�of�all�intervals�less�than�but�
not�including�the�interval�containing�the�median�(cf�below),�f�is�the�frequency�of�the�interval�
containing�the�median,�and�w�is�the�interval�width��For�the�example�quiz�data,�the�median�
is�computed�as�follows:
Median = +
−
= + =16 5
50 25 12
5
1 16 5 0 1000 16 6000.
% ( )
( ) . . .
54 An Introduction to Statistical Concepts
Occasionally,� you� will� run� into� simple� distributions� of� scores� where� the� median� is� easy�
to�identify��If�you�have�an�odd�number�of�untied�scores,�then�the�median�is�the�middle-
ranked�score��For�an�example,�say�we�have�measured�individuals�on�the�number�of�CDs�
owned�and�find�values�of�1,�3,�7,�11,�and�21��For�these�data,�the�median�is�7�(e�g�,�7�CDs�is�
the�middle-ranked�value�or�score)��If�you�have�an�even�number�of�untied�scores,�then�the�
median�is�the�average�of�the�two�middle-ranked�scores��For�example,�a�different�sample�
reveals�the�following�number�of�CDs�owned:�1,�3,�5,�11,�21,�and�32��The�two�middle�scores�
are�5�and�11,�and,�thus,�the�median�is�the�average�of�8�CDs�owned�(i�e�,�(5�+�11)/2)��In�most�
other� situations� where� there� are� tied� scores,� the� median� is� not� as� simple� to� locate� and�
Equation�3�1�is�necessary��Note�also�that�the�median�is�computed�in�precisely�the�same�way�
whether�we�are�talking�about�the�population�median�(i�e�,�the�population�parameter)�or�the�
sample�median�(i�e�,�the�sample�statistic)�
The�general�characteristics�of�the�median�are�as�follows��First,�the�median�is�not�influenced�
by�extreme�scores�(scores�far�away�from�the�middle�of�the�distribution�are�known�as�outliers)��
Because�the�median�is�defined�conceptually�as�the�middle�score,�the�actual�size�of�an�extreme�
score�is�not�relevant��For�the�example�statistics�quiz�data,�imagine�that�the�extreme�score�of�9�
was�somehow�actually�0�(e�g�,�incorrectly�scored)��The�median�would�still�be�16�6,�as�half�of�the�
scores�are�still�above�this�value�and�half�below��Because�the�extreme�score�under�consideration�
here�still�remained�below�the�50th�percentile,�the�median�was�not�altered��This�characteristic�
is�an�advantage,�particularly�when�extreme�scores�are�observed��As�another�example�using�
salary�data,�say�that�all�but�one�of�the�individual�salaries�are�below�$100,000�and�the�median�
is�$50,000��The�remaining�extreme�observation�has�a�salary�of�$5,000,000��The�median�is�not�
affected�by�this�millionaire—the�extreme�individual�is�simply�treated�as�every�other�observa-
tion�above�the�median,�no�more�or�no�less�than,�say,�the�salary�of�$65,000�
A� second� characteristic� is� the� median� is� not� a� function� of� all� of� the� scores�� Because� we�
already�know�that�the�median�is�not�influenced�by�extreme�scores,�we�know�that�the�median�
does� not� take� such� scores� into� account�� Another� way� to� think� about� this� is� to� examine�
Equation�3�1�for�the�median��The�equation�only�deals�with�information�for�the�interval�con-
taining�the�median��The�specific�information�for�the�remaining�intervals�is�not�relevant�so�
long�as�we�are�looking�in�the�median-contained�interval��We�could,�for�instance,�take�the�top�
25%�of�the�scores�and�make�them�even�more�extreme�(say�we�add�10�bonus�points�to�the�top�
quiz�scores)��The�median�would�remain�unchanged��As�you�probably�surmised,�this�charac-
teristic�is�generally�thought�to�be�a�disadvantage��If�you�really�think�about�the�first�two�char-
acteristics,�no�measure�could�possibly�possess�both��That�is,�if�a�measure�is�a�function�of�all�
of�the�scores,�then�extreme�scores�must�also�be�taken�into�account��If�a�measure�does�not�take�
extreme�scores�into�account,�like�the�median,�then�it�cannot�be�a�function�of�all�of�the�scores�
A� third� characteristic� is� the� median� is� difficult� to� deal� with� mathematically,� a� disad-
vantage� as� with� the� mode�� The� median� is� somewhat� unstable� from� sample� to� sample,�
especially�with�small�samples��As�a�fourth�characteristic,�the�median�always�has�a�unique�
value,�another�advantage��This�is�unlike�the�mode,�which�does�not�always�have�a�unique�
value��Finally,�the�fifth�characteristic�of�the�median�is�that�it�can�be�used�with�all�types�of�
measurement�scales�except�the�nominal��Nominal�data�cannot�be�ranked,�and,�thus,�per-
centiles�and�the�median�are�inappropriate�
3.2.3 Mean
The� final� measure� of� central� tendency� to� be� considered� is� the� mean,� sometimes� known�
as�the�arithmetic�mean�or�“average”�(although�the�term�average�is�used�rather�loosely�by�
laypeople)��Statistically,�we�define�the�mean�as�the�sum�of�all�of�the�scores�divided�by�the�
55Univariate Population Parameters and Sample Statistics
number�of�scores��Thought�of�in�those�terms,�you�may�have�been�computing�the�mean�for�
many�years,�and�may�not�have�even�known�it�
The�population�mean�is�denoted�by�μ�(Greek�letter�mu)�and�computed�as�follows:
µ = =
∑X
N
i
i
N
1
For�sample�data,�the�sample�mean�is�denoted�by�X
–
�(read�“X�bar”)�and�computed�as�follows:
X
X
n
i
i
n
= =
∑
1
For�the�example�quiz�data,�the�sample�mean�is�computed�as�follows:
X
X
n
i
i
n
= = ==
∑
1 389
25
15 5600.
Here�are�the�general�characteristics�of�the�mean��First,�the�mean�is�a�function�of�every�score,�
a�definite�advantage�in�terms�of�a�measure�of�central�tendency�representing�all�of�the�data��
If�you�look�at�the�numerator�of�the�mean,�you�see�that�all�of�the�scores�are�clearly�taken�into�
account�in�the�sum��The�second�characteristic�of�the�mean�is�that�it�is�influenced�by�extreme�
scores��Because�the�numerator�sum�takes�all�of�the�scores�into�account,�it�also�includes�the�
extreme�scores,�which�is�a�disadvantage��Let�us�return�for�a�moment�to�a�previous�example�
of�salary�data�where�all�but�one�of�the�individuals�have�an�annual�salary�under�$100,000,�and�
the�one�outlier�is�making�$5,000,000��Because�this�one�outlying�value�is�so�extreme,�the�mean�
will�be�greatly�influenced��In�fact,�the�mean�could�easily�fall�somewhere�between�the�second�
highest�salary�and�the�millionaire,�which�does�not�represent�well�the�collection�of�scores�
Third,�the�mean�always�has�a�unique�value,�another�advantage��Fourth,�the�mean�is�easy�
to�deal�with�mathematically��The�mean�is�the�most�stable�measure�of�central�tendency�from�
sample�to�sample,�and�because�of�that�is�the�measure�most�often�used�in�inferential�statistics�
(as�we�show�in�later�chapters)��Finally,�the�fifth�characteristic�of�the�mean�is�that�it�is�only�
appropriate�for�interval�and�ratio�measurement�scales��This�is�because�the�mean�implicitly�
assumes�equal�intervals,�which�of�course�the�nominal�and�ordinal�scales�do�not�possess�
3.2.4 Summary of Measures of Central Tendency
To�summarize�the�measures�of�central�tendency�then,
� 1�� The�mode�is�the�only�appropriate�measure�for�nominal�data�
� 2�� The�median�and�mode�are�both�appropriate�for�ordinal�data�(and�conceptually�the�
median�fits�the�ordinal�scale�as�both�deal�with�ranked�scores)�
� 3�� All�three�measures�are�appropriate�for�interval�and�ratio�data�
A�summary�of�the�advantages�and�disadvantages�of�each�measure�is�presented�in�Box�3�1�
56 An Introduction to Statistical Concepts
STOp aNd ThINk bOx 3.1
Advantages�and�Disadvantages�of�Measures�of�Central�Tendency
Measure of
Central Tendency Advantages Disadvantages
Mode •��Quick�and�easy�method�for�reporting�
central�tendency
•��Can�be�used�with�any�measurement�scale�
of variable
•�Does�not�always�have�a�unique�value
•��Not�a�function�of�all�scores�in�the�
distribution
•��Difficult�to�deal�with�mathematically�
due�to�its�instability
Median •�Not�influenced�by�extreme�scores
•�Has�a�unique�value
•��Can�be�used�with�ordinal,�interval,�and�
ratio�measurement�scales�of�variables
•��Not�a�function�of�all�scores�in�the�
distribution
•��Difficult�to�deal�with�mathematically�
due�to�its�instability
•�Cannot�be�used�with�nominal�data
Mean •�Function�of�all�scores�in�the�distribution
•�Has�a�unique�value
•�Easy�to�deal�with�mathematically
•��Can�be�used�with�interval�and�ratio�
measurement�scales�of�variables
•�Influenced�by�extreme�scores
•��Cannot�be�used�with�nominal�or�
ordinal�variables
3.3 Measures of Dispersion
In�the�previous�section,�we�discussed�one�method�for�summarizing�a�collection�of�scores,�
the�measures�of�central�tendency��Central�tendency�measures�are�useful�for�describing�a�
collection�of�scores�in�terms�of�a�single�index�or�value�(with�one�exception:�the�mode�for�
distributions�that�are�not�unimodal)��However,�what�do�they�tell�us�about�the�distribution�
of�scores?�Consider�the�following�example��If�we�know�that�a�sample�has�a�mean�of�50,�what�
do�we�know�about�the�distribution�of�scores?�Can�we�infer�from�the�mean�what�the�distri-
bution�looks�like?�Are�most�of�the�scores�fairly�close�to�the�mean�of�50,�or�are�they�spread�
out�quite�a�bit?�Perhaps�most�of�the�scores�are�within�two�points�of�the�mean��Perhaps�most�
are�within�10�points�of�the�mean��Perhaps�most�are�within�50�points�of�the�mean��Do�we�
know?�The�answer,�of�course,�is�that�the�mean�provides�us�with�no�information�about�what�
the�distribution�of�scores�looks�like,�and�any�of�the�possibilities�mentioned,�and�many�oth-
ers,�can�occur��The�same�goes�if�we�only�know�the�mode�or�the�median�
Another�method�for�summarizing�a�set�of�scores�is�to�construct�an�index�or�value�that�
can� be� used� to� describe� the� amount� of� spread� among� the� collection� of� scores�� In� other�
words,� we� need� measures� that� can� be� used� to� determine� whether� the� scores� fall� fairly�
close� to� the� central� tendency� measure,� are� fairly� well� spread� out,� or� are� somewhere� in�
between��In�this�section,�we�consider�the�four�most�popular�such�indices,�which�are�known�
as�measures of dispersion�(i�e�,�the�extent�to�which�the�scores�are�dispersed�or�spread�out)��
Although�other�indices�exist,�the�most�popular�ones�are�the�range�(exclusive�and�inclusive),�
H�spread,�the�variance,�and�the�standard�deviation�
3.3.1 Range
The�simplest�measure�of�dispersion�is�the�range��The�term�range�is�one�that�is�in�common�
use�outside�of�statistical�circles,�so�you�have�some�familiarity�with�it�already��For�instance,�
57Univariate Population Parameters and Sample Statistics
you�are�at�the�mall�shopping�for�a�new�pair�of�shoes��You�find�six�stores�have�the�same�pair�
of�shoes�that�you�really�like,�but�the�prices�vary�somewhat��At�this�point,�you�might�actu-
ally�make�the�statement�“the�price�for�these�shoes�ranges�from�$59�to�$75�”�In�a�way,�you�
are�talking�about�the�range�
Let�us�be�more�specific�as�to�how�the�range�is�measured��In�fact,�there�are�actually�two�
different� definitions� of� the� range,� exclusive� and� inclusive,� which� we� consider� now�� The�
exclusive range�is�defined�as�the�difference�between�the�largest�and�smallest�scores�in�a�
collection� of� scores�� For� notational� purposes,� the� exclusive� range� (ER)� is� shown� as� ER =
Xmax�−�Xmin,�where�Xmax�is�the�largest�or�maximum�score�obtained,�and�Xmin�is�the�smallest�
or�minimum�score�obtained��For�the�shoe�example�then,�ER = Xmax�−�Xmin�=�75�−�59�=�16��In�
other�words,�the�actual�exclusive�range�of�the�scores�is�16�because�the�price�varies�from�59�
to�75�(in�dollar�units)�
A�limitation�of�the�exclusive�range�is�that�it�fails�to�account�for�the�width�of�the�intervals�
being�used��For�example,�if�we�use�an�interval�width�of�1�dollar,�then�the�59�interval�really�
has�59�5�as�the�upper�real�limit�and�58�5�as�the�lower�real�limit��If�the�least�expensive�shoe�
is� $58�95,� then� the� exclusive� range� covering� from� $59� to� $75� actually� excludes� the� least�
expensive�shoe��Hence�the�term�exclusive range�means�that�scores�can�be�excluded�from�
this�range��The�same�would�go�for�a�shoe�priced�at�$75�25,�as�it�would�fall�outside�of�the�
exclusive�range�at�the�high�end�of�the�distribution�
Because�of�this�limitation,�a�second�definition�of�the�range�was�developed,�known�as�the�
inclusive range��As�you�might�surmise,�the�inclusive�range�takes�into�account�the�interval�
width�so�that�all�scores�are�included�in�the�range��The�inclusive�range�is�defined�as�the�differ-
ence�between�the�upper�real�limit�of�the�interval�containing�the�largest�score�and�the�lower�
real�limit�of�the�interval�containing�the�smallest�score�in�a�collection�of�scores��For�notational�
purposes,�the�inclusive�range�(IR)�is�shown�as�IR = URL�of�Xmax�−�LRL�of�Xmin��If�you�think�
about�it,�what�we�are�actually�doing�is�extending�the�range�by�one-half�of�an�interval�at�each�
extreme,�one-half�an�interval�width�at�the�maximum�value,�and�one-half�an�interval�width�at�
the�minimum�value��In�notational�form,�IR = ER + w��For�the�shoe�example,�using�an�interval�
width�of�1,�then�IR = URL�of�Xmax�−�LRL�of�Xmin�=�75�5�−�58�5�=�17��In�other�words,�the�actual�
inclusive�range�of�the�scores�is�17�(in�dollar�units)��If�the�interval�width�was�instead�2,�then�
we�would�add�1�unit�to�each�extreme�rather�than�the��5�unit�that�we�previously�added�to�each�
extreme��The�inclusive�range�would�instead�be�18��For�the�example�quiz�data�(presented�
in�Table�3�2),�note�that�the�exclusive�range�is�11�and�the�inclusive�range�is�12�(as�interval�
width�is�1)�
Finally,�we�need�to�examine�the�general�characteristics�of�the�range�(they�are�the�same�
for�both�definitions�of�the�range)��First,�the�range�is�simple�to�compute,�which�is�a�definite�
advantage�� One� can� look� at� a� collection� of� data� and� almost� immediately,� even� without� a�
computer�or�calculator,�determine�the�range�
The�second�characteristic�is�the�range�is�influenced�by�extreme�scores,�a�disadvantage��
Because� the� range� is� computed� from� the� two� most� extreme� scores,� this� characteristic� is�
quite�obvious��This�might�be�a�problem,�for�instance,�if�all�of�the�salary�data�range�from�
$10,000�to�$95,000�except�for�one�individual�with�a�salary�of�$5,000,000��Without�this�out-
lier,�the�exclusive�range�is�$85,000��With�the�outlier,�the�exclusive�range�is�$4,990,000��Thus,�
the�millionaire’s�salary�has�a�drastic�impact�on�the�range�
Third,� the� range� is� only� a� function� of� two� scores,� another� disadvantage�� Obviously,� the�
range�is�computed�from�the�largest�and�smallest�scores�and�thus�is�only�a�function�of�those�
two�scores��The�spread�of�the�distribution�of�scores�between�those�two�extreme�scores�is�not�
at�all�taken�into�account��In�other�words,�for�the�same�maximum�($5,000,000)�and�minimum�
($10,000)�salaries,�the�range�is�the�same�whether�the�salaries�are�mostly�near�the�maximum�
58 An Introduction to Statistical Concepts
salary,�mostly�near�the�minimum�salary,�or�spread�out�evenly��The�fourth�characteristic�is�
the� range� is� unstable� from� sample� to� sample,� another� disadvantage�� Say� a� second� sample�
of�salary�data�yielded�the�exact�same�data�except�for�the�maximum�salary�now�being�a�less�
extreme� $100,000�� The� range� is� now� dramatically� different�� Also,� in� statistics� we� tend� to�
worry�about�measures�that�are�not�stable�from�sample�to�sample,�as�that�implies�the�results�
are�not�very�reliable��Finally,�the�range�is�appropriate�for�data�that�are�ordinal,�interval,�or�
ratio�in�measurement�scale�
3.3.2 H Spread
The�next�measure�of�dispersion�is�H�spread,�a�variation�on�the�range�measure�with�one�
major� exception�� Although� the� range� relies� upon� the� two� extreme� scores,� resulting� in�
certain� disadvantages,� H� spread� relies� upon� the� difference� between� the� third� and� first�
quartiles�� To� be� more� specific,� H� spread� is� defined� as� Q3� −� Q1,� the� simple� difference�
between�the�third�and�first�quartiles��The�term�H�spread�was�developed�by�Tukey�(1977),�
H�being�short�for�hinge�from�the�box-and-whisker�plot,�and�is�also�known�as�the�inter-
quartile�range�
For�the�example�statistics�quiz�data�(presented�in�Table�3�2),�we�already�determined�in�
Chapter�2�that�Q3�=�18�0833�and�Q1�=�13�1250��Therefore,�H = Q3�−�Q1�=�18�0833�−�13�1250�=�
4�9583��H�measures�the�range�of�the�middle�50%�of�the�distribution��The�larger�the�value,�
the�greater�is�the�spread�in�the�middle�of�the�distribution��The�size�or�magnitude�of�any�of�
the� range� measures� takes� on� more� meaning� when� making� comparisons� across� samples��
For�example,�you�might�find�with�salary�data�that�the�range�of�salaries�for�middle�manage-
ment�is�smaller�than�the�range�of�salaries�for�upper�management��As�another�example,�we�
might�expect�the�salary�range�to�increase�over�time�
What� are� the� characteristics� of� H� spread?� The� first� characteristic� is� H� is� unaffected� by�
extreme�scores,�an�advantage��Because�we�are�looking�at�the�difference�between�the�third�
and�first�quartiles,�extreme�observations�will�be�outside�of�this�range��Second,�H is�not�a�
function�of�every�score,�a�disadvantage��The�precise�placement�of�where�scores�fall�above�
Q3,�below�Q1,�and�between�Q3�and�Q1�is�not�relevant��All�that�matters�is�that�25%�of�the�
scores�fall�above�Q3,�25%�fall�below�Q1,�and�50%�fall�between�Q3�and Q1��Thus,�H�is�not�a�
function�of�very�many�of�the�scores�at�all,�just�those�around�Q3 and Q1��Third,�H�is�not�very�
stable�from�sample�to�sample,�another�disadvantage�especially�in�terms�of�inferential�sta-
tistics�and�one’s�ability�to�be�confident�about�a�sample�estimate�of�a�population�parameter��
Finally,�H�is�appropriate�for�all�scales�of�measurement�except�for�nominal�
3.3.3 deviational Measures
In�this�section,�we�examine�deviation�scores,�population�variance�and�standard�deviation,�
and�sample�variance�and�standard�deviation,�all�methods�that�deal�with�deviations�from�
the�mean�
3.3.3.1 Deviation Scores
In� the� last� category� of� measures� of� dispersion� are� those� that� utilize� deviations� from� the�
mean��Let�us�define�a�deviation score�as�the�difference�between�a�particular�raw�score�and�
the�mean�of�the�collection�of�scores�(population�or�sample,�either�will�work)��For�popula-
tion�data,�we�define�a�deviation�as�di�=�Xi�−�μ��In�other�words,�we�can�compute�the�deviation�
59Univariate Population Parameters and Sample Statistics
from�the�mean�for�each�individual�or�object��Consider�the�credit�card�dataset�as�shown�in�
Table�3�3��To�make�matters�simple,�we�only�have�a�small�population�of�data,�five�values�to�
be�exact��The�first�column�lists�the�raw�scores,�which�are�in�this�example�the�number�of�
credit�cards�owned�for�five�individuals�and,�at�the�bottom�of�the�first�column,�indicates�the�
sum�(Σ�=�30),�population�size�(N�=�5),�and�population�mean�(μ�=�6�0)��The�second�column�
provides�the�deviation�scores�for�each�observation�from�the�population�mean�and,�at�the�
bottom�of�the�second�column,�indicates�the�sum�of�the�deviation�scores,�denoted�by
( )Xi
i
N
−
=
∑ µ
1
From�the�second�column,�we�see�that�two�of�the�observations�have�positive�deviation�scores�
as�their�raw�score�is�above�the�mean,�one�observation�has�a�zero�deviation�score�as�that�raw�
score�is�at�the�mean,�and�two�other�observations� have�negative�deviation�scores�as�their�
raw� score� is� below� the� mean�� However,� when� we� sum� the� deviation� scores,� we� obtain� a�
value�of�zero��This�will�always�be�the�case�as�follows:
( )Xi
i
N
− =
=
∑ µ 0
1
The� positive� deviation� scores� will� exactly� offset� the� negative� deviation� scores�� Thus� any�
measure�involving�simple�deviation�scores�will�be�useless�in�that�the�sum�of�the�deviation�
scores�will�always�be�zero,�regardless�of�the�spread�of�the�scores�
What�other�alternatives�are�there�for�developing�a�deviational�measure�that�will�yield�a�
sum�other�than�zero?�One�alternative�is�to�take�the�absolute�value�of�the�deviation�scores�
(i�e�,�where�the�sign�is�ignored)��Unfortunately,�however,�this�is�not�very�useful�mathematically�
in� terms� of�deriving� other�statistics,�such� as�inferential� statistics��As�a�result,� this� devia-
tional�measure�is�rarely�used�in�statistics�
3.3.3.2 Population Variance and Standard Deviation
So�far,�we�found�the�sum�of�the�deviations�and�the�sum�of�the�absolute�deviations�not�to�be�
very�useful�in�describing�the�spread�of�the�scores�from�the�mean��What�other�alternative�
Table 3.3
Credit�Card�Data
X X − μ (X − μ)2
1 −5 25
5 −1 1
6 0 0
8 2 4
10 4 16
=∑ 30 =∑ 0 =∑ 46
N�=�5
μ�=�6
60 An Introduction to Statistical Concepts
might�be�useful?�As�shown�in�the�third�column�of�Table�3�3,�one�could�square�the�devia-
tion�scores�to�remove�the�sign�problem��The�sum�of�the�squared�deviations�is�shown�at�the�
bottom�of�the�column�as�Σ�=�46�and�denoted�as
( )Xi
i
N
−
=
∑ µ 2
1
As�you�might�suspect,�with�more�scores,�the�sum�of�the�squared�deviations�will�increase��
So�we�have�to�weigh�the�sum�by�the�number�of�observations�in�the�population��This�yields�
a�deviational�measure�known�as�the�population variance,�which�is�denoted�as�σ2�(lower-
case�Greek�letter�sigma)�and�computed�by�the�following�formula:
σ
µ
2
2
1=
−
=
∑( )X
N
i
i
N
For�the�credit�card�example,�the�population�variance�σ2�=�(46/5)�=�9�2��We�refer�to�this�par-
ticular�formula�for�the�population�variance�as�the�definitional formula,�as�conceptually�
that�is�how�we�define�the�variance��Conceptually,�the�variance�is�a�measure�of�the�area�of�a�
distribution��That�is,�the�more�spread�out�the�scores,�the�more�area�or�space�the�distribution�
takes�up�and�the�larger�is�the�variance��The�variance�may�also�be�thought�of�as�an�average�
distance�from�the�mean��The�variance�has�nice�mathematical�properties�and�is�useful�for�
deriving�other�statistics,�such�as�inferential�statistics�
The�computational formula�for�the�population�variance�is
σ2
2
1 1
2
2=
−
= =
∑ ∑N X X
N
i
i
N
i
i
N
This�method�is�computationally�easier�to�deal�with�than�the�definitional�formula��Imagine�
if�you�had�a�population�of�100�scores��Using�hand�computations,�the�definitional�formula�
would�take�considerably�more�time�than�the�computational�formula��With�the�computer,�
this�is�a�moot�point,�obviously��But�if�you�do�have�to�compute�the�population�variance�by�
hand,�then�the�easiest�formula�to�use�is�the�computational�one�
Exactly� how� does� this� formula� work?� For� the� first� summation� in� the� numerator,� we�
square�each�score�first,�then�sum�all�the�squared�scores��This�value�is�then�multiplied�by�
the� population� size�� For� the� second� summation� in� the� numerator,� we� sum� all� the� scores�
first,�then�square�the�summed�scores��After�subtracting�the�values�computed�in�the�numer-
ator,�we�divide�by�the�squared�population�size�
For the first summation in the
numerator, we square each
score first, then sum across
the squared scores.
For the second summation
in the numerator, we sum
across the scores �rst, then
square the summed scores.N
2
σ 2 =
Σ
N
X2i
i=1
Σ
N
i=1
2
XiN
61Univariate Population Parameters and Sample Statistics
The�two�quantities�derived�by�the�summation�operations�in�the�numerator�are�computed�
in�much�different�ways�and�generally�yield�different�values�
Let� us� return� to� the� credit� card� dataset� and� see� if� the� computational� formula� actually�
yields�the�same�value�for�σ2�as�the�definitional�formula�did�earlier�(σ2�=�9�2)��The�computa-
tional�formula�shows�σ2�to�be�as�follows:
σ2
( ) ( )
( )
=
−
=
−
=
−
== =
∑ ∑N X X
N
i
i
N
i
i
N
2
1 1
2
2
2
2
5 226 30
5
1130 900
25
99 2000.
which�is�precisely�the�value�we�computed�previously�
A�few�individuals�(none�of�us,�of�course)�are�a�bit�bothered�about�the�variance�for�the�
following�reason��Say�you�are�measuring�the�height�of�children�in�inches��The�raw�scores�
are�measured�in�terms�of�inches,�the�mean�is�measured�in�terms�of�inches,�but�the�vari-
ance�is�measured�in�terms�of�inches�squared��Squaring�the�scale�is�bothersome�to�some�
as� the� scale� is� no� longer� in� the� original� units� of� measure,� but� rather� a� squared� unit� of�
measure—making�interpretation�a�bit�difficult��To�generate�a�deviational�measure�in�the�
original�scale�of�inches,�we�can�take�the�square�root�of�the�variance��This�is�known�as�the�
standard deviation� and� is� the� final� measure� of� dispersion� we� discuss�� The� population�
standard�deviation�is�defined�as�the�positive�square�root�of�the�population�variance�and�
is�denoted�by�σ�(i�e�,�σ σ= + 2 )��The�standard�deviation,�then,�is�measured�in�the�original�
scale�of�inches��For�the�credit�card�data,�the�standard�deviation�is�computed�as�follows:
σ σ= + = + =2 9 2 3 0332. .
What�are�the�major�characteristics�of�the�population�variance�and�standard�deviation?�
First,�the�variance�and�standard�deviation�are�a�function�of�every�score,�an�advantage��
An� examination� of� either� the� definitional� or� computational� formula� for� the� variance�
(and�standard�deviation�as�well)�indicates�that�all�of�the�scores�are�taken�into�account,�
unlike�the�range�or�H�spread��Second,�therefore,�the�variance�and�standard�deviation�are�
affected�by�extreme�scores,�a�disadvantage��As�we�said�earlier,�if�a�measure�takes�all�of�
the�scores�into�account,�then�it�must�take�into�account�the�extreme�scores�as�well��Thus,�a�
child�much�taller�than�all�of�the�rest�of�the�children�will�dramatically�increase�the�vari-
ance,�as�the�area�or�size�of�the�distribution�will�be�much�more�spread�out��Another�way�
to�think�about�this�is�the�size�of�the�deviation�score�for�such�an�outlier�will�be�large,�and�
then�it�will�be�squared,�and�then�summed�with�the�rest�of�the�deviation�scores��Thus,�an�
outlier�can�really�increase�the�variance��Also,�it�goes�without�saying�that�it�is�always�a�
good�idea�when�using�the�computer�to�verify�your�data��A�data�entry�error�can�cause�an�
outlier�and�therefore�a�larger�variance�(e�g�,�that�child�coded�as�700�inches�tall�instead�of�
70�will�surely�inflate�your�variance)�
Third,� the� variance� and� standard� deviation� are� only� appropriate� for� interval� and� ratio�
measurement�scales��Like�the�mean,�this�is�due�to�the�implicit�requirement�of�equal�intervals��
A� fourth� and� final� characteristic� of� the� variance� and� standard� deviation� is� they� are� quite�
useful�for�deriving�other�statistics,�particularly�in�inferential�statistics,�another�advantage��
In�fact,�Chapter�9�is�all�about�making�inferences�about�variances,�and�many�other�inferential�
statistics�make�assumptions�about�the�variance��Thus,�the�variance�is�quite�important�as�a�
measure� of� dispersion�� It� is� also� interesting� to� compare� the� measures� of� central� tendency�
with�the�measures�of�dispersion,�as�they�do�share�some�important�characteristics��The�mode�
62 An Introduction to Statistical Concepts
and� the� range� share� certain� characteristics�� Both� only� take� some� of� the� data� into� account,�
are�simple�to�compute,�and�are�unstable�from�sample�to�sample��The�median�shares�certain�
characteristics�with�H�spread��These�are�not�influenced�by�extreme�scores,�are�not�a�function�
of�every�score,�are�difficult�to�deal�with�mathematically�due�to�their�instability�from�sample�
to�sample,�and�can�be�used�with�all�measurement�scales�except�the�nominal�scale��The�mean�
shares�many�characteristics�with�the�variance�and�standard�deviation��These�all�are�a�func-
tion�of�every�score,�are�influenced�by�extreme�scores,�are�useful�for�deriving�other�statistics,�
and�are�only�appropriate�for�interval�and�ratio�measurement�scales�
To�complete�this�section�of�the�chapter,�we�take�a�look�at�the�sample�variance�and�stan-
dard�deviation�and�how�they�are�computed�for�large�samples�of�data�(i�e�,�larger�than�our�
credit�card�dataset)�
3.3.3.3 Sample Variance and Standard Deviation
Most�of�the�time,�we�are�interested�in�computing�the�sample�variance�and�standard�devia-
tion;�we�also�often�have�large�samples�of�data�with�multiple�frequencies�for�many�of�the�
scores��Here�we�consider�these�last�aspects�of�the�measures�of�dispersion��Recall�when�we�
computed� the� sample� statistics� of� central� tendency�� The� computations� were� exactly� the�
same� as� with� the� population� parameters� (although� the� notation� for� the� population� and�
sample�means�was�different)��There�are�also�no�differences�between�the�sample�and�popu-
lation�values�for�the�range,�or�H�spread��However,�there�is�a�difference�between�the�sample�
and�population�values�for�the�variance�and�standard�deviation,�as�we�see�next�
Recall�the�definitional�formula�for�the�population�variance�as�follows:
σ
µ
2
2
1=
−
=
∑( )X
N
i
i
N
Why� not� just� take� this� equation� and� convert� everything� to� sample� statistics?� In� other�
words,�we�could�simply�change�N�to�n�and�μ�to�X
–
��What�could�be�wrong�with�that?�The�
answer�is�that�there�is�a�problem�which�prevents�us�from�simply�changing�the�notation�in�
the�formula�from�population�notation�to�sample�notation�
Here�is�the�problem��First,�the�sample�mean,�X
–
,�may�not�be�exactly�equal�to�the�popu-
lation� mean,� � In� fact,� for� most� samples,� the� sample� mean� will� be� somewhat� different�
from� the� population� mean�� Second,� we� cannot� use� the� population� mean� anyway� as� it� is�
unknown� (in� most� instances� anyway)�� Instead,� we� have� to� substitute� the� sample� mean�
into�the�equation�(i�e�,�the�sample�mean,�X
–
,�is�the�sample�estimate�for�the�population�mean,�μ)��
Because� the� sample� mean� is� different� from� the� population� mean,� the� deviations� will� all�
be� affected�� Also,� the� sample� variance� that� would� be� obtained� in� this� fashion� would� be�
a� biased� estimate� of� the� population� variance�� In� statistics,� bias� means� that� something� is�
systematically� off�� In� this� case,� the� sample� variance� obtained� in� this� manner� would� be�
systematically�too�small�
In�order�to�obtain�an�unbiased�sample�estimate�of�the�population�variance,�the�following�
adjustments�have�to�be�made�in�the�definitional�and�computational�formulas,�respectively:
s
X X
n
i
i
n
2
2
1
1
=
−
−
=
∑( )
63Univariate Population Parameters and Sample Statistics
s
n X X
n n
i
i
n
i
i
n
2
2
1 1
2
1
=
−
−
= =
∑ ∑
( )
In�terms�of�the�notation,
s2�is�the�sample�variance
n�has�been�substituted�for�N
X
–
�has�been�substituted�for�μ
These�changes�are�relatively�minor�and�expected��The�major�change�is�in�the�denominator,�
where�instead�of�N�for�the�definitional�formula�we�have�n −�1,�and�instead�of�N 2�for�the�com-
putational�formula�we�have�n(n�−�1)��This�turns�out�to�be�the�correction�that�early�statisticians�
discovered�was�necessary�to�obtain�an�unbiased�estimate�of�the�population�variance�
It�should�be�noted�that�(a)�when�sample�size�is�relatively�large�(e�g�,�n�=�1000),�the�correc-
tion�will�be�quite�small,�and�(b)�when�sample�size�is�relatively�small�(e�g�,�n�=�5),�the�cor-
rection�will�be�quite�a�bit�larger��One�suggestion�is�that�when�computing�the�variance�on�a�
calculator�or�computer,�you�might�want�to�be�aware�of�whether�the�sample�or�population�
variance�is�being�computed�as�it�can�make�a�difference�(typically�the�sample�variance�is�
computed)��The�sample�standard�deviation�is�denoted�by�s�and�computed�as�the�positive�
square�root�of�the�sample�variance�s2�(i�e�,�s s= + 2 )�
For�our�example�statistics�quiz�data�(presented�in�Table�3�2),�we�have�multiple�frequen-
cies�for�many�of�the�raw�scores�which�need�to�be�taken�into�account��A�simple�procedure�
for�dealing�with�this�situation�when�using�hand�computations�is�shown�in�Table�3�4��Here�
we�see�that�in�the�third�and�fifth�columns,�the�scores�and�squared�scores�are�multiplied�by�
their�respective�frequencies��This�allows�us�to�take�into�account,�for�example,�that�the�score�
of�19�occurred�four�times��Note�for�the�fifth�column�that�the�frequencies�are�not�squared;�
only�the�scores�are�squared��At�the�bottom�of�the�third�and�fifth�columns�are�the�sums�we�
need�to�compute�the�parameters�of�interest�
Table 3.4
Sums�for�Statistics�Quiz�Data
X f fX X2 fX2
9 1 9 81 81
10 1 10 100 100
11 2 22 121 242
12 1 12 144 144
13 2 26 169 338
14 1 14 196 196
15 3 45 225 675
16 1 16 256 256
17 5 85 289 1445
18 3 54 324 972
19 4 76 361 1444
20 1 20 400 400
n�=�25 =∑ 389 =∑ 6293
64 An Introduction to Statistical Concepts
The�computations�are�as�follows��We�compute�the�sample�mean�to�be
X
fX
n
i
i
n
= = ==
∑
1 389
25
15 5600.
The�sample�variance�is�computed�to�be�as�follows:
s
n fX fX
n n
i
i
n
i
i
n
2
2
1 1
2
2
1
25 6 293 389
25 24
=
−
−
=
−= =
∑ ∑
( )
( , ) ( )
( )
==
−
= =
157 325 151 321
600
6 004
600
10 0067
, , ,
.
Therefore,�the�sample�standard�deviation�is
s s= + = + =2 10 0067 3 1633. .
3.3.4 Summary of Measures of dispersion
To�summarize�the�measures�of�dispersion�then,
� 1�� The� range� is� the� only� appropriate� measure� for� ordinal� data�� The� H� spread,� vari-
ance,� and� standard� deviation� can� be� used� with� interval� or� ratio� measurement�
scales�
� 2�� There�are�no�measures�of�dispersion�appropriate�for�nominal�data�
A� summary� of� the� advantages� and� disadvantages� of� each� measure� is� presented� in�
Box�3�2�
STOp aNd ThINk bOx 3.2
Advantages�and�Disadvantages�of�Measures�of�Dispersion
Measure of
Dispersion Advantages Disadvantages
Range •�Simple�to�compute
•��Can�be�used�with�ordinal,�interval,�and�
ratio�measurement�scales�of�variables
•�Influenced�by�extreme�scores
•�Function�of�only�two�scores
•�Unstable�from�sample�to�sample
•�Cannot�be�used�with�nominal�data
H�spread •�Unaffected�by�extreme�scores
•��Can�be�used�with�ordinal,�interval,�and�
ratio�measurement�scales�of�variables
•��Not�a�function�of�all�scores�in�the�distribution
•��Difficult�to�deal�with�mathematically�due�to�
its�instability
•�Cannot�be�used�with�nominal�data
Variance�and�
standard�
deviation
•�Function�of�all�scores�in�the�distribution
•�Useful�for�deriving�other�statistics
•��Can�be�used�with�interval�and�ratio�
measurement�scales�of�variables
•�Influenced�by�extreme�scores
•��Cannot�be�used�with�nominal�or�ordinal�
variables
65Univariate Population Parameters and Sample Statistics
3.4 SPSS
The�purpose�of�this�section�is�to�see�what�SPSS�has�to�offer�in�terms�of�computing�mea-
sures� of� central� tendency� and� dispersion�� In� fact,� SPSS� provides� us� with� many� differ-
ent�ways�to�obtain�such�measures��The�three�programs� that�we�have�found�to�be�most�
useful� for� generating� descriptive� statistics� covered� in� this� chapter� are� “Explore,”
“Descriptives,” and “Frequencies.”�Instructions�for�using�each�are�provided�as�
follows�
Explore
Explore: Step 1.� The� first� program,�“Explore,”� can� be� invoked� by� clicking� on�
“Analyze”�in�the�top�pulldown�menu,�then�“Descriptive Statistics,”�and�then�
“Explore.”� Following� the� screenshot,� as� follows,� will� produce� the�Explore� dialog�
box�� For� brevity,� we� have� not� reproduced� this� initial� screenshot� when� we� discuss� the�
“Descriptives”� and�“Frequencies”� programs;� however,� you� can� see� here� where�
they�can�be�found�from�the�pulldown�menus�
A
B
C
Descriptives and
frequencies can
also be invoked
from this menu.
Explore:
Step 1
Explore: Step 2.�Next,�from�the�main�“Explore”�dialog�box,�click�the�variable�of�
interest�from�the�list�on�the�left�(e�g�,�quiz),�and�move�it�into�the�“Dependent List”�
box�by�clicking�on�the�arrow�button�(see�screenshot�for�“Explore: Step 2”)��Then�
click�on�the�“OK”�button�
66 An Introduction to Statistical Concepts
Explore: Step 2
Select the variable
of interest from
the list on the left
and use the arrow
to move to the
“Dependent List”
box on the right.
This� will� automatically� generate� the� mean,� median� (approximate),� variance,� standard�
deviation,�minimum,�maximum,�exclusive�range,�and�interquartile�range�(H)�(plus�skew-
ness�and�kurtosis�to�be�covered�in�Chapter�4)��The�SPSS�output�from�“Explore”�is�shown�
in�the�top�panel�of�Table�3�5�
Table 3.5
Select�SPSS�Output�for�Statistics�Quiz�Data�Using�“Explore,”�“Descriptives,”�
and “Frequencies”
Descriptives
Statistic Std. Error
Mean .63267
Lower bound95% Confidence interval
for mean Upper bound
5% Trimmed mean
Median
Variance
Std. deviation
Minimum
Maximum
Range
Interquartile range
Skewness .464
Quiz
Kurtosis
15.5600
14.2542
16.8658
15.6778
17.0000
10.007
3.16333
9.00
20.00
11.00
5.00
–.598
–.741 .902
Descriptive Statistics
N Range Minimum Maximum Mean Std. Deviation Variance
Quiz 25 11.00 9.00 20.00 15.5600 3.16333 10.007
Valid N (listwise) 25
�is is an example of the output generated using the“Descriptives”
procedure in SPSS.
This is an example of the
output generated using the
“Explore” procedure in
SPSS. By default, a stem-and-
leaf plot and boxplot are also
generated from “Explore”
(but are not presented here).
67Univariate Population Parameters and Sample Statistics
Table 3.5 (continued)
Select�SPSS�Output�for�Statistics�Quiz�Data�Using�“Explore,”�“Descriptives,”�
and�“Frequencies”
Statistics
Quiz
Valid 25N
Missing 0
Mean 15.5600
Median 16.3333a
Mode 17.00
Std. deviation 3.16333
Variance 10.007
Range 11.00
Minimum 9.00
Maximum 20.00
a Calculated from grouped data.
�is is an example of the output generated
using the “Frequencies” procedure in
SPSS. By default, a frequency table is
generated from “Frequencies”
(but is not presented here).
Descriptives
Descriptives: Step 1.� The� second� program� to� consider� is� “Descriptives.”� It�
can� also� be� accessed� by� going� to�“Analyze”� in� the� top� pulldown� menu,� then� selecting�
“Descriptive Statistics,”� and� then�“Descriptives”� (see�“Explore: Step 1”�
for�a�screenshot�of�this�step)�
Descriptives: Step 2.� This� will� bring� up� the� “Descriptives”� dialog� box� (see�
“Descriptives: Step 2”� screenshot)�� From� the� main�“Descriptives”� dialog� box,�
click�the�variable�of�interest�(e�g�,�quiz)�and�move�into�the�“Variable(s)”�box�by�clicking�
on�the�arrow��Next,�click�on�the�“Options”�button�
Select the variable
of interest from the
list on the left and
use the arrow to
move to the
“Variable” box on
the right.
Clicking on
“Options” will allow
you to select
various statistics to
be generated.
Descriptives: Step 2
Descriptives: Step 3.�A�new�box�called�“Descriptives: Options”�will�appear�
(see�“Descriptives: Step 3”�screenshot),�and�you�can�simply�place�a�checkmark�in�
the�boxes�for�the�statistics�that�you�want�to�generate��From�here,�you�can�obtain�the�mean,�
variance,� standard� deviation,� minimum,� maximum,� and� exclusive� range� (among� oth-
ers�available)��The�SPSS�output�from�“Descriptives”�is�shown�in�the�middle�panel�of�
68 An Introduction to Statistical Concepts
Table�3�5��After�making�your�selections,�click�on�“Continue.”�You�will�then�be�returned�
to�the�main�“Descriptives”�dialog�box��From�there,�click�“OK.”
Descriptives:
Step 3
Statistics available when
clicking on “Options”
from the main dialog
box for Descriptives.
Placing a checkmark will
generate the respective
statistic in the output.
Frequencies
Frequencies: Step 1.� The� final� program� to� consider� is� “Frequencies.”� Go� to�
“Analyze”� in� the� top� pulldown� menu,� then�“Descriptive Statistics,”� and� then�
select�“Frequencies.”�See�“Explore: Step 1”�for�a�screenshot�of�this�step�
Frequencies: Step 2.�The�“Frequencies”�dialog�box�will�open�(see�screenshot�for�
“Frequencies: Step 2”)�� From� this� main� “Frequencies”� dialog� box,� click� the� variable�
of� interest� from� the� list� on� the� left� (e�g�,� quiz)� and� move� it� into� the�“Variables”� box� by�
clicking� on� the� arrow� button�� By� default,� there� is� a� checkmark� in� the� box� for� “Display
frequency tables,”�and�we�will�keep�this�checked��This�(i�e�,�selecting�“Display fre-
quency tables”)�will�generate�a�table�of�frequencies,�relative�frequencies,�and�cumulative�
relative�frequencies��Then�click�on�“Statistics”�located�in�the�top�right�corner�
Select the variable
of interest from
the list on the left
and use the arrow
to move to the
“Variable” box on
the right.
Clicking on
“Statistics” will
allow you to select
various statistics to
be generated.
Frequencies: Step 2
69Univariate Population Parameters and Sample Statistics
Frequencies: Step 3.�A�new�dialog�box�labeled�“Frequencies: Statistics”�will�
appear� (see� screenshot� for�“Frequencies: Step 3”)�� Here� you� can� obtain� the� mean,�
median� (approximate),� mode,� variance,� standard� deviation,� minimum,� maximum,� and�
exclusive�range�(among�others)��In�order�to�obtain�the�closest�approximation�to�the�median,�
check�the�“Values are group midpoints”�box,�as�shown��However,�it�should�be�noted�
that�these�values�are�not�always�as�precise�as�those�from�the�formula�given�earlier�in�this�
chapter��The�SPSS�output�from�“Frequencies”�is�shown�in�the�bottom�panel�of�Table�3�5��
After� making� your� selections,� click� on� “Continue.”� You� will� then� be� returned� to� the�
main�“Frequencies”�dialog�box��From�there,�click�“OK.”
Options available when clicking on
“Statistics” from the main dialog
box for Frequencies. Placing a
checkmark will generate the
respective statistic in the output.
Check this for
better
accuracy with
quartiles and
percentiles
(e.g., the
median).
Frequencies: Step 3
3.5 Templates for Research Questions and APA-Style Paragraph
As�we�stated�in�Chapter�2,�depending�on�the�purpose�of�your�research�study,�you�may�
or�may�not�write�a�research�question�that�corresponds�to�your�descriptive�statistics��If�
the� end� result� of� your� research� paper� is� to� present� results� from� inferential� statistics,�
it�may�be�that�your�research�questions�correspond�only�to�those�inferential�questions�
and�thus�no�question�is�presented�to�represent�the�descriptive�statistics��That�is�quite�
common��On�the�other�hand,�if�the�ultimate�purpose�of�your�research�study�is�purely�
descriptive� in� nature,� then� writing� one� or� more� research� questions� that� correspond�
to� the� descriptive� statistics� is� not� only� entirely� appropriate� but� (in� most� cases)� abso-
lutely� necessary�� At� this� time,� let� us� revisit� our� graduate� research� assistant,� Marie,�
who�was�introduced�at�the�beginning�of�the�chapter��As�you�may�recall,�her�task�was�
70 An Introduction to Statistical Concepts
to�summarize�data�from�25�students�enrolled�in�a�statistics�course��The�questions�that�
Marie’s�faculty�mentor�shared�with�her�were�as�follows:�How can quiz scores of
students enrolled in an introductory statistics class be summarized
using measures of central tendency? Measures of dispersion?�A�tem-
plate�for�writing�descriptive�research�questions�for�summarizing�data�with�measures�
of�central�tendency�and�dispersion�are�presented�as�follows:
How can [variable] be summarized using measures of central tendency?
Measures of dispersion?
Next,�we�present�an�APA-like�paragraph�summarizing�the�results�of�the�statistics�quiz�data�
example�answering�the�questions�posed�to�Marie:
As shown in Table 3.5, scores ranged from 9 to 20. The mean was
15.56, the approximate median was 17.00 (or 16.33 when calculated from
grouped data), and the mode was 17.00. Thus, the scores tended to
lump together at the high end of the scale. A negatively skewed dis-
tribution is suggested given that the mean was less than the median
and mode. The exclusive range was 11, H spread (interquartile range)
was 5.0, variance was 10.007, and standard deviation was 3.1633. From
this, we can tell that the scores tended to be quite variable. For
example, the middle 50% of the scores had a range of 5 (H spread)
indicating that there was a reasonable spread of scores around the
median. Thus, despite a high “average” score, there were some low
performing students as well. These results are consistent with those
described in Section 2.4.
3.6 Summary
In�this�chapter,�we�continued�our�exploration�of�descriptive�statistics�by�considering�some�
basic� univariate� population� parameters� and� sample� statistics�� First,� we� examined� sum-
mation� notation� which� is� necessary� in� many� areas� of� statistics�� Then� we� looked� at� the�
most�commonly�used�measures�of�central�tendency,�the�mode,�the�median,�and�the�mean��
The�next�section�of�the�chapter�dealt�with�the�most�commonly�used�measures�of�disper-
sion��Here�we�discussed�the�range�(both�exclusive�and�inclusive�ranges),�H�spread,�and�the�
population�variance�and�standard�deviation,�as�well�as�the�sample�variance�and�standard�
deviation��We�concluded�the�chapter�with�a�look�at�SPSS,�a�template�for�writing�research�
questions�for�summarizing�data�using�measures�of�central�tendency�and�dispersion,�and�
then�developed�an�APA-style�paragraph�of�results��At�this�point,�you�should�have�met�the�
following�objectives:�(a)�be�able�to�understand�and�utilize�summation�notation,�(b)�be�able�
to�determine�and�interpret�the�three�commonly�used�measures�of�central�tendency,�and�(c)�be�
able� to� determine� and� interpret� different� measures� of� dispersion�� A� summary� of� when�
these�descriptive�statistics�are�most�appropriate�for�each�of�the�scales�of�measurement�is�
shown�in�Box�3�3��In�the�next�chapter,�we�will�have�a�more�extended�discussion�of�the�nor-
mal�distribution�(previously�introduced�in�Chapter�2),�as�well�as�the�use�of�standard�scores�
as�an�alternative�to�raw�scores�
71Univariate Population Parameters and Sample Statistics
STOp aNd ThINk bOx 3.3
Appropriate�Descriptive�Statistics
Measurement Scale Measure of Central Tendency Measure of Dispersion
Nominal •�Mode
Ordinal •�Mode •�Range
•�Median •�H�spread
Interval/ratio •�Mode •�Range
•�Median •�H�spread
•�Mean •�Variance�and�standard�deviation
Problems
Conceptual problems
3.1� �Adding�just�one�or�two�extreme�scores�to�the�low�end�of�a�large�distribution�of�scores�
will�have�a�greater�effect�on�which�one�of�the�following?
� a�� Q�than�the�variance�
� b�� The�variance�than�Q�
� c�� The�mode�than�the�median�
� d�� None�of�the�above�will�be�affected�
3.2� The�variance�of�a�distribution�of�scores�is�which�one�of�the�following?
� a�� Always�1�
� b�� May�be�any�number,�negative,�0,�or�positive�
� c�� May�be�any�number�greater�than�0�
� d�� May�be�any�number�equal�to�or�greater�than�0�
3.3� �A�20-item�statistics�test�was�graded�using�the�following�procedure:�a�correct�response�
is�scored�+1,�a�blank�response�is�scored�0,�and�an�incorrect�response�is�scored�−1��The�
highest�possible�score�is�+20;�the�lowest�score�possible�is�−20��Because�the�variance�of�
the�test�scores�for�the�class�was�−3,�we�conclude�which�one�of�the�following?
� a�� The�class�did�very�poorly�on�the�test�
� b�� The�test�was�too�difficult�for�the�class�
� c�� Some�students�received�negative�scores�
� d�� A�computational�error�certainly�was�made�
3.4� �Adding� just� one� or� two� extreme� scores� to� the� high� end� of� a� large� distribution� of�
scores�will�have�a�greater�effect�on�which�one�of�the�following?
� a�� The�mode�than�the�median�
� b�� The�median�than�the�mode�
� c�� The�mean�than�the�median�
� d�� None�of�the�above�will�be�affected�
3.5� �In� a� negatively� skewed� distribution,� the� proportion� of� scores� between� Q1� and� the�
median�is�less�than��25��True�or�false?
72 An Introduction to Statistical Concepts
3.6� Median�is�to�ordinal�as�mode�is�to�nominal��True�or�false?
3.7� �I�assert�that�it�is�appropriate�to�utilize�the�mean�in�dealing�with�class�rank�data��Am�
I�correct?
3.8� �For� a� perfectly� symmetrical� distribution� of� data,� the� mean,� median,� and� mode� are�
calculated�� I� assert� that� the� values� of� all� three� measures� are� necessarily� equal�� Am�
I correct?
3.9� �In� a� distribution� of� 100� scores,� the� top� 10� examinees� received� an� additional� bonus�
of� 5� points�� Compared� to� the� original� median,� I� assert� that� the� median� of� the� new�
(revised)�distribution�will�be�the�same�value��Am�I�correct?
3.10� �A�set�of�eight�scores�was�collected,�and�the�variance�was�found�to�be�0��I�assert�that�a�
computational�error�must�have�been�made��Am�I�correct?
3.11� �Researcher�A�and�Researcher�B�are�using�the�same�dataset�(n�=�10),�where�Researcher�
A� computes� the� sample� variance,� and� Researcher� B� computes� the� population� vari-
ance�� The� values� are� found� to� differ� by� more� than� rounding� error�� I� assert� that� a�
computational�error�must�have�been�made��Am�I�correct?
3.12� �For� a� set� of� 10� test� scores,� which� of� the� following� values� will� be� different� for� the�
sample�statistic�and�population�parameter?
� a�� Mean
� b�� H
� c�� Range
� d�� Variance
3.13� Median�is�to�H�as�mean�is�to�standard�deviation��True�or�false?
3.14� �The� inclusive� range� will� be� greater� than� the� exclusive� range� for� any� data�� True� or�
false?
3.15� �For�a�set�of�IQ�test�scores,�the�median�was�computed�to�be�95�and�Q1�to�be�100��I�assert�
that�the�statistician�is�to�be�commended�for�their�work��Am�I�correct?
3.16� �A� physical� education� teacher� is� conducting� research� related� to� elementary� chil-
dren’s� time� spent� in� physical� activity�� As� part� of� his� research,� he� collects� data�
from�schools�related�to�the�number�of�minutes�that�they�require�children�to�par-
ticipate� in� physical� education� classes�� She� finds� that� the� most� frequently� occur-
ring�number�of�minutes�required�for�children�to�participate�in�physical�education�
classes�is�22�00�minutes��Which�measure�of�central�tendency�does�this�statement�
represent?
� a�� Mean
� b�� Median
� c�� Mode
� d�� Range
� e�� Standard�deviation
3.17� �A�physical�education�teacher�is�conducting�research�related�to�elementary�children’s�
time�spent�in�physical�activity��As�part�of�his�research,�he�collects�data�from�schools�
related�to�the�number�of�minutes�that�they�require�children�to�participate�in�physical�
education�classes��She�finds�that�the�fewest�number�of�minutes�required�per�week�is�
73Univariate Population Parameters and Sample Statistics
15�minutes�and�the�maximum�number�of�minutes�is�45��Which�measure�of�dispersion�
do�these�values�reflect?
� a�� Mean
� b�� Median
� c�� Mode
� d�� Range
� e�� Standard�deviation
3.18� �A�physical�education�teacher�is�conducting�research�related�to�elementary�children’s�
time�spent�in�physical�activity��As�part�of�his�research,�he�collects�data�from�schools�
related�to�the�number�of�minutes�that�they�require�children�to�participate�in�physical�
education�classes��She�finds�that�50%�of�schools�required�20�or�more�minutes�of�par-
ticipation�in�physical�education�classes��Which�measure�of�central�tendency�does�this�
statement�represent?
� a�� Mean
� b�� Median
� c�� Mode
� d�� Range
� e�� Standard�deviation
3.19� �One�item�on�a�survey�of�recent�college�graduates�asks�students�to�indicate�if�they�plan�
to�live�within�a�50�mile�radius�of�the�university��Responses�to�the�question�include�
“yes”�or�“no�”�The�researcher�who�gathers�these�data�computes�the�variance�of�this�
variable��Is�this�appropriate�given�the�measurement�scale�of�this�variable?
3.20� �A�marriage�and�family�counselor�randomly�samples�250�clients�and�collects�data�on�
the�number�of�hours�they�spent�in�counseling�during�the�past�year��What�is�the�most�
stable� measure� of� central� tendency� to� compute� given� the� measurement� scale� of� this�
variable?
� a�� Mean
� b�� Median
� c�� Mode
� d�� Range
� e�� Standard�deviation
Computational problems
3.1� �For�the�population�data�in�Computational�Problem�2�1,�and�again�assuming�an�inter-
val�width�of�1,�compute�the�following:
� a�� Mode
� b�� Median
� c�� Mean
� d�� Exclusive�and�inclusive�range
� e�� H�spread
� f�� Variance�and�standard�deviation
74 An Introduction to Statistical Concepts
3.2� �Given�a�negatively�skewed�distribution�with�a�mean�of�10,�a�variance�of�81,�and�N�=�500,�
what�is�the�numerical�value�of�the�following?
( )Xi
i
N
−
=
∑ µ
1
3.3� �For�the�sample�data�in�Computational�Problem�2�2,�and�again�assuming�an�interval�
width�of�1,�compute�the�following:
� a�� Mode
� b�� Median
� c�� Mean
� d�� Exclusive�and�inclusive�range
� e�� H�spread
� f�� Variance�and�standard�deviation
3.4� �For�the�sample�data�in�Computational�Problem�4�(classroom�test�scores)�of�Chapter�2,�
and�again�assuming�an�interval�width�of�1,�compute�the�following:
� a�� Mode
� b�� Median
� c�� Mean
� d�� Exclusive�and�inclusive�range
� e�� H�spread
� f�� Variance�and�standard�deviation
3.5� A�sample�of�30�test�scores�is�as�follows:
X f
8 1
9 4
10 3
11 7
12 9
13 0
14 0
15 3
16 0
17 0
18 2
19 0
20 1
75Univariate Population Parameters and Sample Statistics
Compute�each�of�the�following�statistics:
� a�� Mode
� b�� Median
� c�� Mean
� d�� Exclusive�and�inclusive�range
� e�� H�spread
� f�� Variance�and�standard�deviation
3.6� �Without�doing�any�computations,�which�of�the�following�distributions�has�the�largest�
variance?
X f Y f Z f
15 6 15 4 15 2
16 7 16 7 16 7
17 9 17 11 17 13
18 9 18 11 18 13
19 7 19 7 19 7
20 6 20 4 20 2
3.7� �Without� doing� any� computations,� which� of� the� following� distributions� has� the�
largest�variance?
X f Y f Z f
5 3 5 1 5 6
6 2 6 0 6 2
7 4 7 4 7 3
8 3 8 3 8 1
9 5 9 2 9 0
10 2 10 1 10 7
Interpretive problems
3.1� Select�one�interval�or�ratio�variable�from�the�survey1�sample�dataset�on�the�website�
� a�� Calculate�all�of�the�measures�of�central�tendency�and�dispersion�discussed�in�this�
chapter�that�are�appropriate�for�this�measurement�scale�
� b�� Write�an�APA-style�paragraph�which�summarizes�the�findings�
3.2� Select�one�ordinal�variable�from�the�survey1�sample�dataset�on�the�website�
� a�� Calculate� the� measures� of� central� tendency� and� dispersion� discussed� in� this�
chapter�that�are�appropriate�for�this�measurement�scale�
� b�� Write�an�APA-style�paragraph�which�summarizes�the�findings�
77
4
Normal Distribution and Standard Scores
Chapter Outline
4�1� Normal�Distribution
4�1�1� History
4�1�2� Characteristics
4�2� Standard�Scores
4�2�1� z�Scores
4�2�2� Other�Types�of�Standard�Scores
4�3� Skewness�and�Kurtosis�Statistics
4�3�1� Symmetry
4�3�2� Skewness
4�3�3� Kurtosis
4�4� SPSS
4�5� Templates�for�Research�Questions�and�APA-Style�Paragraph
Key Concepts
� 1�� Normal�distribution�(family�of�distributions,�unit�normal�distribution,�area�under�
the�curve,�points�of�inflection,�asymptotic�curve)
� 2�� Standard�scores�[z,�College�Entrance�Examination�Board�(CEEB),�T,�IQ]
� 3�� Symmetry
� 4�� Skewness�(positively�skewed,�negatively�skewed)
� 5�� Kurtosis�(leptokurtic,�platykurtic,�mesokurtic)
� 6�� Moments�around�the�mean
In�Chapter�3,�we�continued�our�discussion�of�descriptive�statistics,�previously�defined�
as�techniques�that�allow�us�to�tabulate,�summarize,�and�depict�a�collection�of�data�in�
an� abbreviated� fashion�� There� we� considered� the� following� three� topics:� summation�
notation� (method� for� summing� a� set� of� scores),� measures� of� central� tendency� (mea-
sures� for� boiling� down� a� set� of� scores� into� a� single� value� used� to� represent� the� data),�
and�measures�of�dispersion�(measures�dealing�with�the�extent�to�which�a�collection�of�
scores�vary)�
78 An Introduction to Statistical Concepts
In�this�chapter,�we�delve�more�into�the�field�of�descriptive�statistics�in�terms�of�three�addi-
tional�topics��First,�we�consider�the�most�commonly�used�distributional�shape,�the�normal�
distribution��Although�in�this�chapter�we�discuss�the�major�characteristics�of�the�normal�dis-
tribution�and�how�it�is�used�descriptively,�in�later�chapters�we�see�how�the�normal�distribu-
tion�is�used�inferentially�as�an�assumption�for�certain�statistical�tests��Second,�several�types�
of�standard�scores�are�considered��To�this�point,�we�have�looked�at�raw�scores�and�deviation�
scores��Here�we�consider�scores�that�are�often�easier�to�interpret,�known�as�standard�scores��
Then� we� examine� two� other� measures� useful� for� describing� a� collection� of� data,� namely,�
skewness�and�kurtosis��As�we�show�shortly,�skewness�refers�to�the�lack�of�symmetry�of�a�dis-
tribution�of�scores,�and�kurtosis�refers�to�the�peakedness�of�a�distribution�of�scores��Finally,�
we� provide� a� template� for� writing� research� questions,� develop� an� APA-style� paragraph� of�
results�for�an�example�dataset,�and�also�illustrate�the�use�of�SPSS��Concepts�to�be�discussed�
include�the�normal�distribution�(i�e�,�family�of�distributions,�unit�normal�distribution,�area�
under�the�curve,�points�of�inflection,�asymptotic�curve),�standard�scores�(e�g�,�z,�CEEB,�T,�IQ),�
symmetry,�skewness�(positively�skewed,�negatively�skewed),�kurtosis�(leptokurtic,�platykur-
tic,�mesokurtic),�and�moments�around�the�mean��Our�objectives�are�that�by�the�end�of�this�
chapter,�you�will�be�able�to�(a)�understand�the�normal�distribution�and�utilize�the�normal�
table,� (b)� determine� and� interpret� different� types� of� standard� scores,� particularly� z� scores,�
and�(c)�understand�and�interpret�skewness�and�kurtosis�statistics�
4.1 Normal Distribution
You�may�remember�the�following�research�scenario�that�was�first�introduced�in�Chapter�2��
We�will�revisit�Marie�in�this�chapter�
Marie,�a�graduate�student�pursuing�a�master’s�degree�in�educational�research,�has�been�
assigned�to�her�first�task�as�a�research�assistant��Her�faculty�mentor�has�given�Marie�
quiz�data�collected�from�25�students�enrolled�in�an�introductory�statistics�course�and�
has� asked� Marie� to� summarize� the� data�� The� faculty� member,� who� continues� to� be�
pleased� with� the� descriptive� analysis� and� presentation� of� results� previously� shared,�
has� asked� Marie� to� revisit� the� following� research� question� related� to� distributional�
shape:� What is the distributional shape of the statistics quiz score?� Additionally,� Marie’s�
faculty� mentor� has� asked� Marie� to� standardize� the� quiz� score� and� compare� student� 1�
to�student�3�relative�to�the�mean��The�corresponding�research�question�that�Marie�is�
provided� for� this� analysis� is� as� follows:� In standard deviation units, what is the relative
standing to the mean of student 1 compared to student 3?
Recall�from�Chapter�2�that�there�are�several�commonly�seen�distributions��The�most�com-
monly�observed�and�used�distribution�is�the�normal�distribution��It�has�many�uses�both�in�
descriptive�and�inferential�statistics,�as�we�show��In�this�section,�we�discuss�the�history�of�
the�normal�distribution�and�the�major�characteristics�of�the�normal�distribution�
4.1.1 history
Let�us�first�consider�a�brief�history�of�the�normal�distribution��From�the�time�that�data�were�
collected�and�distributions�examined,�a�particular�bell-shaped�distribution�occurred�quite�
often�for�many�variables�in�many�disciplines�(e�g�,�many�physical,�cognitive,�physiological,�
79Normal Distribution and Standard Scores
and� motor� attributes)�� This� has� come� to� be� known� as� the� normal distribution�� Back� in�
the� 1700s,� mathematicians� were� called� on� to� develop� an� equation� that� could� be� used� to�
approximate�the�normal�distribution��If�such�an�equation�could�be�found,�then�the�prob-
ability� associated� with� any� point� on� the� curve� could� be� determined,� and� the� amount� of�
space�or�area�under�any�portion�of�the�curve�could�also�be�determined��For�example,�one�
might� want� to� know� what� the� probability� of� being� taller� than� 6′2″� would� be� for� a� male,�
given� that� height� is� normally� shaped� for� each� gender�� Until� the� 1920s,� the� development�
of� this� equation� was� commonly� attributed� to� Karl� Friedrich� Gauss�� Until� that� time,� this�
distribution�was�known�as�the�Gaussian�curve��However,�in�the�1920s,�Karl�Pearson�found�
this�equation�in�an�earlier�article�written�by�Abraham�DeMoivre�in�1733�and�renamed�the�
curve�as�the�normal�distribution��Today�the�normal�distribution�is�obviously�attributed�to�
DeMoivre�
4.1.2 Characteristics
There�are�seven�important�characteristics�of�the�normal�distribution��Because�the�nor-
mal�distribution�occurs�frequently,�features�of�the�distribution�are�standard�across�all�
normal� distributions�� This� “standard� curve”� allows� us� to� make� comparisons� across�
two�or�more�normal�distributions�as�well�as�look�at�areas�under�the�curve,�as�becomes�
evident�
4.1.2.1 Standard Curve
First,�the�normal�distribution�is�a�standard�curve�because�it�is�always�(a)�symmetric�around�
the�mean,�(b)�unimodal,�and�(c)�bell-shaped��As�shown�in�Figure�4�1,�if�we�split�the�distri-
bution�in�one-half�at�the�mean�(μ),�the�left-hand�half�(below�the�mean)�is�the�mirror�image�
of�the�right-hand�half�(above�the�mean)��Also,�the�normal�distribution�has�only�one�mode,�
and�the�general�shape�of�the�distribution�is�bell-shaped�(some�even�call�it�the�bell-shaped�
curve)��Given�these�conditions,�the�mean,�median,�and�mode�will�always�be�equal�to�one�
another�for�any�normal�distribution�
–3 –2 –1 1
Mean
2 3
13.59%13.59%
34.13% 34.13%
2.14% 2.14%
FIGuRe 4.1
The�normal�distribution�
80 An Introduction to Statistical Concepts
4.1.2.2 Family of Curves
Second,�there�is�no�single�normal�distribution,�but�rather�the�normal�distribution�is�a�fam-
ily�of�curves��For�instance,�one�particular�normal�curve�has�a�mean�of�100�and�a�vari-
ance�of�225�(recall�that�the�standard�deviation�is�the�square�root�of�the�variance;�thus,�
the�standard�deviation�in�this�instance�is�15)��This�normal�curve�is�exemplified�by�the�
Wechsler� intelligence� scales�� Another� specific� normal� curve� has� a� mean� of� 50� and� a�
variance�of�100�(standard�deviation�of�10)��This�normal�curve�is�used�with�most�behav-
ior�rating�scales��In�fact,�there�are�an�infinite�number�of�normal�curves,�one�for�every�
distinct�pair�of�values�for�the�mean�and�variance��Every�member�of�the�family�of�nor-
mal� curves� has� the� same� characteristics;� however,� the� scale� of� X,� the� mean� of� X,� and�
the�variance�(and�standard�deviation)�of�X�can�differ�across�different�variables�and/or�
populations�
To� keep� the� members� of� the� family� distinct,� we� use� the� following� notation�� If� the�
variable�X�is�normally�distributed,�we�write�X ∼ N(μ,�σ2)��This�is�read�as�“X�is�distrib-
uted�normally�with�population�mean�μ�and�population�variance�σ2�”�This�is�the�general�
notation;�for�notation�specific�to�a�particular�normal�distribution,�the�mean�and�vari-
ance�values�are�given��For�our�examples,�the�Wechsler�intelligence�scales�are�denoted�
by�X ∼ N(100,�225),�whereas�the�behavior�rating�scales�are�denoted�by�X ∼ N(50,�100)��
Narratively�speaking�therefore,�the�Wechsler�intelligence�scale�is�distributed�normally�
with�a�population�mean�of�100�and�population�variance�of�225��A�similar�interpretation�
can�be�made�on�the�behavior�rating�scale�
4.1.2.3 Unit Normal Distribution
Third,�there�is�one�particular�member�of�the�family�of�normal�curves�that�deserves�addi-
tional�attention��This�member�has�a�mean�of�0�and�a�variance�(and�standard�deviation)�of�1�
and�thus�is�denoted�by�X ∼ N(0,�1)��This�is�known�as�the�unit normal distribution�(unit�
referring�to�the�variance�of�1)�or�as�the�standard unit normal distribution��On�a�related�
matter,�let�us�define�a�z�score�as�follows:
z
X
i
i=
−( )µ
σ
The� numerator� of� this� equation� is� actually� a� deviation� score,� previously� described� in�
Chapter� 3,� and� indicates� how� far� above� or� below� the� mean� an� individual’s� score� falls��
When�we�divide�the�deviation�from�the�mean�(i�e�,�the�numerator)�by�the�standard�devia-
tion�(i�e�,�denominator),�the�value�derived�indicates�how�many�deviations�above�or�below�the�
mean�an�individual’s�score�falls��If�one�individual�has�a�z�score�of�+1�00,�then�the�person�
falls�one�standard�deviation�above�the�mean��If�another�individual�has�a�z�score�of�−2�00,�
then�that�person�falls�two�standard�deviations�below�the�mean��There�is�more�to�say�about�this�
as�we�move�along�in�this�section�
4.1.2.4 Area
The� fourth� characteristic� of� the� normal� distribution� is� the� ability� to� determine� any� area�
under�the�curve��Specifically,�we�can�determine�the�area�above�any�value,�the�area�below�
any�value,�or�the�area�between�any�two�values�under�the�curve��Let�us�chat�about�what�we�
mean�by�area��If�you�return�to�Figure�4�1,�areas�for�different�portions�of�the�curve�are�listed��
81Normal Distribution and Standard Scores
Here�area�is�defined�as�the�percentage�or�amount�of�space�of�a�distribution,�either�above�
a� certain� score,� below� a� certain� score,� or� between� two� different� scores�� For� example,� we�
see�that�the�area�between�the�mean�and�one�standard�deviation�above�the�mean�is�34�13%��
In�other�words,�roughly�a�third�of�the�entire�distribution�falls�into�that�region��The�entire�
area� under� the� curve� then� represents� 100%,� and� smaller� portions� of� the� curve� represent�
somewhat�less�than�that�
For�example,�say�you�wanted�to�know�what�percentage�of�adults�had�an�IQ�score�over�120,�
what�percentage�of�adults�had�an�IQ�score�under�107,�or�what�percentage�of�adults�had�an�IQ�
score�between�107�and�120��How�can�we�compute�these�areas�under�the�curve?�A�table�of�the�
unit�normal�distribution�has�been�developed�for�this�purpose��Although�similar�tables�could�
also�be�developed�for�every�member�of�the�normal�family�of�curves,�these�are�unnecessary,�
as�any�normal�distribution�can�be�converted�to�a�unit�normal�distribution��The�unit�normal�
table�is�given�in�Table�A�1�
Turn�to�Table�A�1�now�and�familiarize�yourself�with�its�contents��To�help�illustrate,�a�
portion�of�the�table�is�presented�in�Figure�4�2��The�first�column�simply�lists�the�values�
of�z��These�are�standardized�scores�on�the�X�axis��Note�that�the�values�of�z�only�range�
from�0�to�4�0��There�are�two�reasons�for�this��First,�values�above�4�0�are�rather�unlikely,�
as�the�area�under�that�portion�of�the�curve�is�negligible�(less�than��003%)��Second,�val-
ues�below�0�(i�e�,�negative�z�scores)�are�not�really�necessary�to�present�in�the�table,�as�the�
normal�distribution�is�symmetric�around�the�mean�of�0��Thus,�that�portion�of�the�table�
would�be�redundant�and�is�not�shown�here�(we�show�how�to�deal�with�this�situation�for�
some�example�problems�in�a�bit)�
The� second� column,� labeled� P(z),� gives� the� area� below� the� respective� value� of� z—in�
other�words,�the�area�between�that�value�of�z�and�the�most�extreme�left-hand�portion�of�
the�curve�[i�e�,�−∞�(negative�infinity)�on�the�far�negative�or�left-hand�side�of�0]��So�if�we�
wanted�to�know�what�the�area�was�below�z�=�+1�00,�we�would�look�in�the�first�column�
under�z�=�1�00�and�then�look�in�the�second�column�(P(z))�to�find�the�area�of��8413��This�
value,��8413,�represents�the�percentage�of�the�distribution�that�is�smaller�than�z�of�+1�00��It�
also�represents�the�probability�that�a�score�will�be�smaller�than�z�of�+1�00��In�other�words,�
about�84%�of�the�distribution�is�less�than�z�of�+1�00,�and�the�probability�that�a�value�will�
be�less�than�z�of�+1�00�is�about�84%��More�examples�are�considered�later�in�this�section�
z scores are standardized
scores on the X axis.
.5000000
.5039694
.5079783
.5119665
.5159534
.5199388
.6914625
.6949743
.6984682
.7019440
.7054015
.7088403
.8414625
.8437524
.8461358
.8484950
.8508300
.8531409
.9331928
.9344783
.9357445
.9369916
.9382198
.9394292
.00
.01
.02
.03
.04
.04
.50
.51
.52
.53
.54
.55
1.00
1.01
1.02
1.03
1.04
1.05
1.50
1.51
1.52
1.53
1.54
1.55
P(z) P(z)P(z)P(z) zzzz
P(z) values indicate the percentage of the
z distribution that is smaller than the respective
z value and it also represents the probability that
a value will be less than that respective z value.
FIGuRe 4.2
Portion�of�z�table�
82 An Introduction to Statistical Concepts
4.1.2.5 Transformation to Unit Normal Distribution
A� fifth� characteristic� is� any� normally� distributed� variable,� regardless� of� the� mean� and�
variance,�can�be�converted�into�a�unit�normally�distributed�variable��Thus,�our�Wechsler�
intelligence� scales� as� denoted� by� X ∼ N(100,� 225)� can� be� converted� into� z ∼ N(0,� 1)��
Conceptually,�this�transformation�is�done�by�moving�the�curve�along�the�X�axis�until�it�
is�centered�at�a�mean�of�0�(by�subtracting�out�the�original�mean)�and�then�by�stretching�
or� compressing� the� distribution� until� it� has� a� variance� of� 1� (remember,� however,� that�
the�shape�of�the�distribution�does�not�change�during�the�standardization�process—only�
those� values� on� the� X� axis)�� This� allows� us� to� make� the� same� interpretation� about� any�
individual’s� score� on� any� normally� distributed� variable�� If� z� =� +1�00,� then� for� any� vari-
able,�this�implies�that�the�individual�falls�one�standard�deviation�above�the�mean�
This� also� allows� us� to� make� comparisons� between� two� different� individuals� or� across�
two� different� variables�� If� we� wanted� to� make� comparisons� between� two� different� indi-
viduals�on�the�same�variable�X,�then�rather�than�comparing�their�individual�raw�scores,�
X1�and�X2,�we�could�compare�their�individual�z�scores,�z1�and�z2,�where
z
X
1
1=
−( )µ
σ
and
z
X
2
2=
−( )µ
σ
This� is� the� reason� we� only� need� the� unit� normal� distribution� table� to� determine� areas�
under� the� curve� rather� than� a� table� for� every� member� of� the� normal� distribution� fam-
ily�� In� another� situation,� we� may� want� to� compare� scores� on� the� Wechsler� intelligence�
scales�[X ∼ N(100,�225)]�to�scores�on�behavior�rating�scales�[X ∼ N(50,�100)]�for�the�same�
individual��We�would�convert�to�z�scores�again�for�two�variables,�and�then�direct�com-
parisons�could�be�made�
It�is�important�to�note�that�in�standardizing�a�variable,�it�is�only�the�values�on�the�X�axis�
that�change��The�shape�of�the�distribution�(e�g�,�skewness�and�kurtosis)�remains�the�same�
4.1.2.6 Constant Relationship with Standard Deviation
The� sixth� characteristic� is� that� the� normal� distribution� has� a� constant� relationship� with�
the�standard�deviation��Consider�Figure�4�1�again��Along�the�X�axis,�we�see�values�repre-
sented�in�standard�deviation�increments��In�particular,�from�left�to�right,�the�values�shown�
are�three,�two,�and�one�standard�deviation�units�below�the�mean�and�one,�two,�and�three�
standard�deviation�units�above�the�mean��Under�the�curve,�we�see�the�percentage�of�scores�
that�are�under�different�portions�of�the�curve��For�example,�the�area�between�the�mean�and�
one�standard�deviation�above�or�below�the�mean�is�34�13%��The�area�between�one�standard�
deviation�and�two�standard�deviations�on�the�same�side�of�the�mean�is�13�59%,�the�area�
between�two�and�three�standard�deviations�on�the�same�side�is�2�14%,�and�the�area�beyond�
three�standard�deviations�is��13%�
In�addition,�three�other�areas�are�often�of�interest��The�area�within�one�standard�devi-
ation�of�the�mean,�from�one�standard�deviation�below�the�mean�to�one�standard�devia-
tion�above�the�mean,�is�approximately�68%�(or�roughly�two-thirds�of�the�distribution)��
The� area� within� two� standard� deviations� of� the� mean,� from� two� standard� deviations�
83Normal Distribution and Standard Scores
below� the� mean� to� two� standard� deviations� above� the� mean,� is� approximately� 95%��
The�area�within�three�standard�deviations�of�the�mean,�from�three�standard�deviations�
below�the�mean�to�three�standard�deviations�above�the�mean,�is�approximately�99%��In�
other�words,�nearly�all�of�the�scores�will�be�within�two�or�three�standard�deviations�of�
the�mean�for�any�normal�curve�
4.1.2.7 Points of Inflection and Asymptotic Curve
The� seventh� and� final� characteristic� of� the� normal� distribution� is� as� follows�� The� points
of inflection� are� where� the� curve� changes� from� sloping� down� (concave)� to� sloping� up�
(convex)��These�points�occur�precisely�at�one�standard�deviation�unit�above�and�below�the�
mean��This�is�more�a�matter�of�mathematical�elegance� than�a�statistical�application��The�
curve�also�never�touches�the�X�axis��This�is�because�with�the�theoretical�normal�curve,�all�
values�from�negative�infinity�to�positive�infinity�have�a�nonzero�probability�of�occurring��
Thus,� while� the� curve� continues� to� slope� ever-downward� toward� more� extreme� scores,�
it�approaches,�but�never�quite�touches,�the�X�axis��The�curve�is�referred�to�here�as�being�
asymptotic��This�allows�for�the�possibility�of�extreme�scores�
Examples:�Now�for�the�long-awaited�examples�for�finding�area�using�the�unit�normal�dis-
tribution��These�examples�require�the�use�of�Table�A�1��Our�personal�preference�is�to�draw�
a�picture�of�the�normal�curve�so�that�the�proper�area�is�determined��Let�us�consider�four�
examples�of�finding�the�area�below�a�certain�value�of�z:�(1)�below�z�=�−2�50,�(2)�below�z�=�0,�
(3)�below�z�=�1�00,�and�(4)�between�z�=�−2�50�and�z�=�1�00�
To�determine�the�value�below�z�=�−2�50,�we�draw�a�picture�as�shown�in�Figure�4�3a��We�
draw�a�vertical�line�at�the�value�of�z,�then�shade�in�the�area�we�want�to�find��Because�the�
shaded�region�is�relatively�small,�we�know�the�area�must�be�considerably�smaller�than��50��
In�the�unit�normal�table,�we�already�know�negative�values�of�z�are�not�included��However,�
because�the�normal�distribution�is�symmetric,�we�know�the�area�below�−2�50�is�the�same�as�
the�area�above�+2�50��Thus,�we�look�up�the�area�below�+2�50�and�find�the�value�of��9938��We�
subtract�this�from�1�0000�and�find�the�value�of��0062,�or��62%,�a�very�small�area�indeed�
How�do�we�determine�the�area�below�z�=�0�(i�e�,�the�mean)?�As�shown�in�Figure�4�3b,�we�
already�know�from�reading�this�section�that�the�area�has�to�be��5000�or�one-half�of�the�total�
area�under�the�curve��However,�let�us�look�in�the�table�again�for�area�below�z�=�0,�and�we�
find�the�area�is��5000��How�do�we�determine�the�area�below�z�=�1�00?�As�shown�in�Figure�
4�3c,�this�region�exists�on�both�sides�of�0�and�actually�constitutes�two�smaller�areas,�the�first�
area�below�0�and�the�second�area�between�0�and�1��For�this�example,�we�use�the�table�directly�
and�find�the�value�of��8413��We�leave�you�with�two�other�problems�to�solve�on�your�own��
First,�what�is�the�area�below�z�=��50�(answer:��6915)?�Second,�what�is�the�area�below�z�=�1�96�
(answer:��9750)?
Because�the�unit�normal�distribution�is�symmetric,�finding�the�area�above�a�certain�value�
of�z�is�solved�in�a�similar�fashion�as�the�area�below�a�certain�value�of�z��We�need�not�devote�
any�further�attention�to�that�particular�situation��However,�how�do�we�determine�the�area�
between� two� values� of� z?� This� is� a� little� different� and� needs� some� additional� discussion��
Consider� as� an� example� finding� the� area� between� z� =� −2�50� and� z� =� 1�00,� as� depicted� in�
Figure� 4�3d�� Here� we� see� that� the� shaded� region� consists� of� two� smaller� areas,� the� area�
between�the�mean�and�−2�50�and�the�area�between�the�mean�(z�=�0)�and�1�00��Using�the�table�
again,� we� find� the� area� below� 1�00� is� �8413� and� the� area� below� −2�50� is� �0062�� Thus,� the�
shaded�region�is�the�difference�as�computed�by��8413�−��0062�=��8351��On�your�own,�determine�
the�area�between�z�=�−1�27�and�z�=��50�(answer:��5895)�
84 An Introduction to Statistical Concepts
Finally,�what�if�we�wanted�to�determine�areas�under�the�curve�for�values�of�X�rather�than�
z?�The�answer�here�is�simple,�as�you�might�have�guessed��First�we�convert�the�value�of�X�
to�a�z�score;�then�we�use�the�unit�normal�table�to�determine�the�area��Because�the�normal�
curve�is�standard�for�all�members�of�the�family�of�normal�curves,�the�scale�of�the�variable,�
X�or�z,�is�irrelevant�in�terms�of�determining�such�areas��In�the�next�section,�we�deal�more�
with�such�transformations�
4.2 Standard Scores
We�have�already�devoted�considerable�attention�to�z�scores,�which�are�one�type�of�standard�
score��In�this�section,�we�describe�an�application�of�z�scores�leading�up�to�a�discussion�of�
other� types� of� standard� scores�� As� we� show,� the� major� purpose� of� standard� scores� is� to�
place�scores�on�the�same�standard�scale�so�that�comparisons�can�be�made�across�individu-
als�and/or�variables��Without�some�standard�scale,�comparisons�across�individuals�and/or�
variables�would�be�difficult�to�make��Examples�are�coming�right�up�
4.2.1 z Scores
A�child�comes�home�from�school�with�the�results�of�two�tests�taken�that�day��On�the�math�
test,�she�receives�a�score�of�75,�and�on�the�social�studies�test,�she�receives�a�score�of�60��
As�a�parent,�the�natural�question�to�ask�is,�“Which�performance�was�the�stronger�one?”�
.0062
–2.5(a)
.5000
0(b)
(c) (d)
.8413
1.0
.8351
0–2.5 1.0
FIGuRe 4.3
Examples�of�area�under�the�unit�normal�distribution:�(a)�Area�below�z�=�−2�5��(b)�Area�below�z�=�0��(c)�Area�below�
z�=�1�0��(d)�Area�between�z�=�−2�5�and�z�=�1�0�
85Normal Distribution and Standard Scores
No�information�about�any�of�the�following�is�available:�maximum�score�possible,�mean�
of�the�class�(or�any�other�central�tendency�measure),�or�standard�deviation�of�the�class�
(or�any�other�dispersion�measure)��It�is�possible�that�the�two�tests�had�a�different�number�
of� possible� points,� different� means,� and/or� different� standard� deviations�� How� can� we�
possibly�answer�our�question?
The�answer,�of�course,�is�to�use�z�scores�if�the�data�are�assumed�to�be�normally�distrib-
uted,�once�the�relevant�information�is�obtained��Let�us�take�a�minor�digression�before�we�
return�to�answer�our�question�in�more�detail��Recall�the�formula�for�standardizing�vari-
able�X�into�a�z�score:
z
X
i
i X
X
=
−( )µ
σ
where�the�X�subscript�has�been�added�to�the�mean�and�standard�deviation�for�purposes�
of�clarifying�which�variable�is�being�considered��If�variable�X�is�the�number�of�items�cor-
rect�on�a�test,�then�the�numerator�is�the�deviation�of�a�student’s�raw�score�from�the�class�
mean� (i�e�,� the� numerator� is� a� deviation� score� as� previously� defined� in� Chapter� 3),� mea-
sured�in�terms�of�items�correct,�and�the�denominator�is�the�standard�deviation�of�the�class,�
measured� in� terms� of� items� correct�� Because� both� the� numerator� and� denominator� are�
measured�in�terms�of�items�correct,�the�resultant�z�score�is�measured�in�terms�of�no�units�
(as�the�units�of�the�numerator�and�denominator�essentially�cancel�out)��As�z�scores�have�
no�units�(i�e�,�the�z�score�is�interpreted�as�the�number�of�standard�deviation�units�above�or�
below�the�mean),�this�allows�us�to�compare�two�different�raw�score�variables�with�different�
scales,�means,�and/or�standard�deviations��By�converting�our�two�variables�to�z�scores,�the�
transformed�variables�are�now�on�the�same�z�score�scale�with�a�mean�of�0,�and�a�variance�
and�standard�deviation�of�1�
Let� us� return� to� our� previous� situation� where� the� math� test� score� is� 75� and� the� social�
studies�test�score�is�60��In�addition,�we�are�provided�with�information�that�the�standard�
deviation�for�the�math�test�is�15�and�the�standard�deviation�for�the�social�studies�test�is�10��
Consider�the�following�three�examples��In�the�first�example,�the�means�are�60�for�the�math�
test�and�50�for�the�social�studies�test��The�z�scores�are�then�computed�as�follows:
z zmath ss=
−
= =
−
=
( )
.
( )
.
75 60
15
1 0
60 50
10
1 0
The�conclusion�for�the�first�example�is�that�the�performance�on�both�tests�is�the�same;�that�
is,�the�child�scored�one�standard�deviation�above�the�mean�for�both�tests�
In�the�second�example,�the�means�are�60�for�the�math�test�and�40�for�the�social�studies�
test��The�z�scores�are�then�computed�as�follows:
z zmath ss=
−
= =
−
=
( )
.
( )
.
75 60
15
1 0
60 40
10
2 0
The�conclusion�for�the�second�example�is�that�performance�is�better�on�the�social�studies�
test;� that� is,� the� child� scored� two� standard� deviations� above� the� mean� for� the� social�
studies�test�and�only�one�standard�deviation�above�the�mean�for�the�math�test�
86 An Introduction to Statistical Concepts
In�the�third�example,�the�means�are�60�for�the�math�test�and�70�for�the�social�studies�test��
The�z�scores�are�then�computed�as�follows:
z zmath ss=
−
= =
−
= −
( )
.
( )
.
75 60
15
1 0
60 70
10
1 0
The�conclusion�for�the�third�example�is�that�performance�is�better�on�the�math�test;�that�is,�
the�child�scored�one�standard�deviation�above�the�mean�for�the�math�test�and�one�standard�
deviation� below� the� mean� for� the� social� studies� test�� These� examples� serve� to� illustrate�
a� few� of� the� many� possibilities,� depending� on� the� particular� combinations� of� raw� score,�
mean,�and�standard�deviation�for�each�variable�
Let�us�conclude�this�section�by�mentioning�the�major�characteristics�of�z�scores��The�first�
characteristic�is�that�z�scores�provide�us�with�comparable�distributions,�as�we�just�saw�in�
the� previous� examples�� Second,� z� scores� take� into� account� the� entire� distribution� of� raw�
scores��All�raw�scores�can�be�converted�to�z�scores�such�that�every�raw�score�will�have�a�
corresponding�z�score��Third,�we�can�evaluate�an�individual’s�performance�relative�to�the�
scores�in�the�distribution��For�example,�saying�that�an�individual’s�score�is�one�standard�
deviation�above�the�mean�is�a�measure�of�relative�performance��This�implies�that�approxi-
mately�84%�of�the�scores�will�fall�below�the�performance�of�that�individual��Finally,�nega-
tive�values�(i�e�,�below�0)�and�decimal�values�(e�g�,�z�=�1�55)�are�obviously�possible�(and�will�
most�certainly�occur)�with�z�scores��On�the�average,�about�one-half�of�the�z�scores�for�any�
distribution�will�be�negative,�and�some�decimal�values�are�quite�likely��This�last�character-
istic�is�bothersome�to�some�individuals�and�has�led�to�the�development�of�other�types�of�
standard�scores,�as�described�in�the�next�section�
4.2.2 Other Types of Standard Scores
Over�the�years,�other�standard�scores�besides�z�scores�have�been�developed,�either�to�allevi-
ate�the�concern�over�negative�and/or�decimal�values�associated�with�z�scores,�or�to�obtain�a�
particular�mean�and�standard�deviation��Let�us�examine�three�common�examples��The�first�
additional�standard�score�is�known�as�the�College�Entrance�Examination�Board�(CEEB)�score��
This�standard�score�is�used�in�exams�such�as�the�SAT�and�the�GRE��The�subtests�for�these�
exams�all�have�a�mean�of�500�and�a�standard�deviation�of�100��A�second�additional�standard�
score�is�known�as�the�T�score�and�is�used�in�tests�such�as�most�behavior�rating�scales,�as�pre-
viously�mentioned��The�T�scores�have�a�mean�of�50�and�a�standard�deviation�of�10��A�third�
additional�standard�score�is�known�as�the�IQ�score�and�is�used�in�the�Wechsler�intelligence�
scales��The�IQ�score�has�a�mean�of�100�and�a�standard�deviation�of�15�(the�Stanford–Binet�
intelligence�scales�have�a�mean�of�100�and�a�standard�deviation�of�16)�
Say�we�want�to�develop�our�own�type�of�standard�score,�where�we�determine�in�advance�
the�mean�and�standard�deviation�that�we�would�like�to�have��How�would�that�be�done?�As�
the�equation�for�z�scores�is�as�follows:
z
X
i
i X
X
=
−( )µ
σ
then�algebraically�the�following�can�be�shown:
X zi X X i= +µ σ
87Normal Distribution and Standard Scores
If,�for�example,�we�want�to�develop�our�own�“stat”�standardized�score,�then�the�following�
equation�would�be�used:
stat zi stat stat i= +µ σ
where
stati�is�the�“stat”�standardized�score�for�a�particular�individual
μstat�is�the�desired�mean�of�the�“stat”�distribution
σstat�is�the�desired�standard�deviation�of�the�“stat”�distribution
If� we� want� to� have� a� mean� of� 10� and� a� standard� deviation� of� 2,� then� our� equation�
becomes
stat zi i= +10 2
We�would�then�have�the�computer�simply�plug�in�a�z�score�and�compute�an�individual’s�
“stat”�score��Thus,�a�z�score�of�1�0�would�yield�a�“stat”�standardized�score�of�12�0�
Consider�a�realistic�example�where�we�have�a�raw�score�variable�we�want�to�transform�
into�a�standard�score,�and�we�want�to�control�the�mean�and�standard�deviation��For�exam-
ple,�we�have�statistics�midterm�raw�scores�with�225�points�possible��We�want�to�develop�
a�standard�score�with�a�mean�of�50�and�a�standard�deviation�of�5��We�also�have�scores�on�
other� variables� that� are� on� different� scales� with� different� means� and� different� standard�
deviations�(e�g�,�statistics�final�exam�scores�worth�175�points,�a�set�of�20�lab�assignments�
worth�a�total�of�200�points,�a�statistics�performance�assessment�worth�100�points)��We�can�
standardize�each�of�those�variables�by�placing�them�on�the�same�scale�with�the�same�mean�
and�same�standard�deviation,�thereby�allowing�comparisons�across�variables��This�is�pre-
cisely� the�rationale�used�by�testing�companies�and�researchers� when�they�develop� stan-
dard�scores��In�short,�from�z�scores,�we�can�develop�a�CEEB,�T,�IQ,�“stat,”�or�any�other�type�
of�standard�score�
4.3 Skewness and Kurtosis Statistics
In� previous� chapters,� we� discussed� the� distributional� concepts� of� symmetry,� skewness,�
central�tendency,�and�dispersion��In�this�section,�we�more�closely�define�symmetry�as�well�
as�the�statistics�commonly�used�to�measure�skewness�and�kurtosis�
4.3.1 Symmetry
Conceptually,�we�define�a�distribution�as�being�symmetric�if�when�we�divide�the�dis-
tribution� precisely� in� one-half,� the� left-hand� half� is� a� mirror� image� of� the� right-hand�
half�� That� is,� the� distribution� above� the� mean� is� a� mirror� image� of� the� distribution�
below�the�mean��To�put�it�another�way,�a�distribution�is�symmetric around the mean�
if�for�every�score�q�units�below�the�mean,�there�is�a�corresponding�score�q�units�above�
the�mean�
88 An Introduction to Statistical Concepts
Two� examples� of� symmetric� distributions� are� shown� in� Figure� 4�4�� In� Figure� 4�4a,� we�
have�a�normal�distribution,�which�is�clearly�symmetric�around�the�mean��In�Figure�4�4b,�
we� have� a� symmetric� distribution� that� is� bimodal,� unlike� the� previous� example�� From�
these�and�other�numerous�examples,�we�can�make�the�following�two�conclusions��First,�if�a�
distribution�is�symmetric,�then�the�mean�is�equal�to�the�median��Second,�if�a�distribution�is�
symmetric�and�unimodal,�then�the�mean,�median,�and�mode�are�all�equal��This�indicates�
we�can�determine�whether�a�distribution�is�symmetric�by�simply�comparing�the�measures�
of�central�tendency�
4.3.2 Skewness
We� define� skewness� as� the� extent� to� which� a� distribution� of� scores� deviates� from� per-
fect�symmetry��This�is�important�as�perfectly�symmetrical�distributions�rarely�occur�with�
actual�sample�data�(i�e�,�“real”�data)��A�skewed�distribution�is�known�as�being�asymmetri-
cal�� As� shown� in� Figure� 4�5,� there� are� two� general� types� of� skewness,� distributions� that�
are�negatively�skewed,�as�in�Figure�4�5a,�and�those�that�are�positively�skewed,�as�in�Figure�
4�5b��Negatively�skewed�distributions,�which�are�skewed�to�the�left,�occur�when�most�of�
the�scores�are�toward�the�high�end�of�the�distribution�and�only�a�few�scores�are�toward�
the�low�end��If�you�make�a�fist�with�your�thumb�pointing�to�the�left�(skewed�to�the�left),�
you� have� graphically� defined� a� negatively� skewed� distribution�� For� a� negatively� skewed�
(a) (b)
FIGuRe 4.4
Symmetric�distributions:�(a)�Normal�distribution��(b)�Bimodal�distribution�
(a) (b)
FIGuRe 4.5
Skewed�distributions:�(a)�Negatively�skewed�distribution��(b)�Positively�skewed�distribution�
89Normal Distribution and Standard Scores
distribution,�we�also�find�the�following:�mode > median > mean��This�indicates�that�we�can�
determine�whether�a�distribution�is�negatively�skewed�by�simply�comparing�the�measures�
of�central�tendency�
Positively�skewed�distributions,�which�are�skewed�to�the�right,�occur�when�most�of�the�
scores� are� toward� the� low� end� of� the� distribution� and� only� a� few� scores� are� toward� the�
high�end��If�you�make�a�fist�with�your�thumb�pointing�to�the�right�(skewed�to�the�right),�
you� have� graphically� defined� a� positively� skewed� distribution�� For� a� positively� skewed�
distribution,�we�also�find�the�following:�mode < median < mean��This�indicates�that�we�can�
determine�whether�a�distribution�is�positively�skewed�by�simply�comparing�the�measures�
of�central�tendency�
The� most� commonly� used� measure� of� skewness� is� known� as� γ1� (Greek� letter� gamma),�
which�is�mathematically�defined�as�follows:
γ 1
3
1= =
∑ z
N
i
i
N
where�we�take�the�z�score�for�each�individual,�cube�it,�sum�across�all�N�individuals,�and�then�
divide� by� the� number� of� individuals� N�� This� measure� is� available� in� nearly� all� computer�
packages,� so� hand� computations� are� not� necessary�� The� characteristics� of� this� measure� of�
skewness�are�as�follows:�(a)�a�perfectly�symmetrical�distribution�has�a�skewness�value�of 0,�
(b)�the�range�of�values�for�the�skewness�statistic�is�approximately�from�−3�to�+3,�(c) nega-
tively�skewed�distributions�have�negative�skewness�values,�and�(d)�positively�skewed�dis-
tributions�have�positive�skewness�values�
There�are�different�rules�of�thumb�for�determining�how�extreme�skewness�can�be�and�
still�retain�a�relatively�normal�distribution��One�simple�rule�of�thumb�is�that�skewness�
values� within� ±2�0� are� considered� relatively� normal,� with� more� conservative� research-
ers� applying� a� ±3�0� guideline,� and� more� stringent� researchers� using� ±1�0�� Another� rule�
of� thumb� for� determining� how� extreme� a� skewness� value� must� be� for� the� distribution�
to� be� considered� nonnormal� is� as� follows:� Skewness� values� outside� the� range� of� ±� two�
standard�errors�of�skewness�suggest�a�distribution�that�is�nonnormal��Applying�this�rule�
of� thumb,� if� the� standard� error� of� skewness� is� �85,� then� anything� outside� of� −2(�85)� to�
+2(�85),�or�−1�7�to +1�7,�would�be�considered�nonnormal��It�is�important�to�note�that�this�
second�rule�of�thumb�is�sensitive�to�small�sample�sizes�and�should�only�be�considered�as�
a�general�guide�
4.3.3 kurtosis
Kurtosis� is� the� fourth� and� final� property� of� a� distribution� (often� referred� to� as� the�
moments around the mean)��These�four�properties�are�central�tendency�(first�moment),�
dispersion� (second� moment),� skewness� (third� moment),� and� kurtosis� (fourth� moment)��
Kurtosis�is�conceptually�defined�as�the�“peakedness”�of�a�distribution�(kurtosis�is�Greek�
for�peakedness)��Some�distributions�are�rather�flat,�and�others�have�a�rather�sharp�peak��
Specifically,�there�are�three�general�types�of�peakedness,�as�shown�in�Figure�4�6��A�distri-
bution�that�is�very�peaked�is�known�as�leptokurtic�(“lepto”�meaning�slender�or�narrow)�
(Figure�4�6a)��A�distribution�that�is�relatively�flat�is�known�as�platykurtic�(“platy”�mean-
ing�flat�or�broad)�(Figure�4�6b)��A�distribution�that�is�somewhere�in�between�is�known�as�
mesokurtic�(“meso”�meaning�intermediate)�(Figure�4�6c)�
90 An Introduction to Statistical Concepts
The�most�commonly�used�measure�of�kurtosis�is�known�as�γ2,�which�is�mathematically�
defined�as
γ 2
4
1 3= −=
∑ z
N
i
i
N
where�we�take�the�z�score�for�each�individual,�take�it�to�the�fourth�power�(being�the�fourth�
moment),� sum� across� all� N� individuals,� divide� by� the� number� of� individuals� N,� and� then�
subtract�3��This�measure�is�available�in�nearly�all�computer�packages,�so�hand�computations�
are�not�necessary��The�characteristics�of�this�measure�of�kurtosis�are�as�follows:�(a)�a�perfectly�
mesokurtic�distribution,�which�would�be�a�normal�distribution,�has�a�kurtosis�value�of�0,�
(b)� platykurtic�distributions�have�negative�kurtosis�values�(being�flat�rather�than�peaked),�
and�(c)�leptokurtic�distributions�have�positive�kurtosis�values�(being�peaked)��Kurtosis�values�
can�range�from�negative�to�positive�infinity�
There�are�different�rules�of�thumb�for�determining�how�extreme�kurtosis�can�be�and�still�
retain� a� relatively� normal� distribution�� One� simple� rule� of� thumb� is� that� kurtosis� values�
within�±2�0�are�considered�relatively�normal,�with�more�conservative�researchers�applying�
a�±3�0�guideline,�and�more�stringent�researchers�using�±1�0��A�rule�of�thumb�for�determin-
ing�how�extreme�a�kurtosis�value�may�be�for�the�distribution�to�be�considered�nonnormal�
is�as�follows:�Kurtosis�values�outside�the�range�of�±�two�standard�errors�of�kurtosis�suggest�
(c)
(a) (b)
FIGuRe 4.6
Distributions� of� different� kurtosis:� (a)� Leptokurtic� distribution�� (b)� Platykurtic� distribution�� (c)� Mesokurtic�
distribution�
91Normal Distribution and Standard Scores
a� distribution� that� is� nonnormal�� Applying� this� rule� of� thumb,� if� the� standard� error� of�
kurtosis� is� 1�20,� then� anything� outside� of� −2(1�20)� to� +2(1�20),� or� −2�40� to� +2�40,� would� be�
considered�nonnormal��It�is�important�to�note�that�this�second�rule�of�thumb�is�sensitive�to�
small�sample�sizes�and�should�only�be�considered�as�a�general�guide�
Skewness�and�kurtosis�statistics�are�useful�for�the�following�two�reasons:�(a)�as�descrip-
tive�statistics�used�to�describe�the�shape�of�a�distribution�of�scores�and�(b)�in�inferential�
statistics,�which�often�assume�a�normal�distribution,�so�the�researcher�has�some�indication�
of�whether�the�assumption�has�been�met�(more�about�this�beginning�in�Chapter�6)�
4.4 SPSS
Here�we�review�what�SPSS�has�to�offer�for�examining�distributional�shape�and�computing�
standard�scores��The�following�programs�have�proven�to�be�quite�useful�for�these�purposes:�
“Explore,” “Descriptives,” “Frequencies,” “Graphs,”� and� “Transform.”�
Instructions�for�using�each�are�provided�as�follows�
Explore
Explore: Step 1.� The� first� program,� “Explore,”� can� be� invoked� by� clicking� on�
“Analyze”�in�the�top�pulldown�menu,�then�“Descriptive Statistics,”�and�then�
“Explore.”�Following�the�screenshot�(step�1),�as�follows,�produces�the�“Explore”�dia-
log�box��For�brevity,�we�have�not�reproduced�this�initial�screenshot�when�we�discuss�the�
“Descriptives”� and�“Frequencies”� programs;� however,� you� see� here� where� they�
can�be�found�from�the�pulldown�menus�
Explore:
Step 1
B
A
C
Frequencies and
Descriptives
can also be invoked
from this menu.
92 An Introduction to Statistical Concepts
Explore: Step 2.� Next,� from� the� main�“Explore”� dialog� box,� click� the� variable� of�
interest�from�the�list�on�the�left�(e�g�,�quiz),�and�move�it�into�the�“Dependent List”�box�
by�clicking�on�the�arrow�button��Next,�click�on�the�“Statistics”�button�located�in�the�
top�right�corner�of�the�main�dialog�box�
Select the variable
of interest from
the list on the left
and use the arrow
to move to the
“Dependent List”
box on the right.
Clicking on
“Statistics” will
allow you to select
descriptive
statistics.
Explore: Step 2
Explore: Step 3.�A�new�box�labeled�“Explore: Statistics”�will�appear��Simply�
place�a�checkmark�in�the�“Descriptives”�box��Next�click�“Continue.”�You�will�then�be�
returned�to�the�main�“Explore”�dialog�box��From�there,�click�“OK.”�This�will�automati-
cally�generate�the�skewness�and�kurtosis�values,�as�well�as�measures�of�central�tendency�
and� dispersion� which� were� covered� in� Chapter� 3�� The� output� from� this� was� previously�
shown�in�the�top�panel�of�Table�3�5�
Explore: Step 3
Descriptives
Descriptives: Step 1. The� second� program� to� consider� is� “Descriptives.”� It�
can� also� be� accessed� by� going� to�“Analyze”� in� the� top� pulldown� menu,� then� selecting�
“Descriptive Statistics,”�and�then�“Descriptives”�(see�“Explore: Step 1”�for�
screenshots�of�these�steps)�
Descriptives: Step 2.� This� will� bring� up� the� “Descriptives”� dialog� box� (see�
screenshot,�step�2)��From�the�main�“Descriptives”�dialog�box,�click�the�variable�of�inter-
est�(e�g�,�quiz)�and�move�into�the�“Variable(s)”�box�by�clicking�on�the�arrow��If�you�want�
93Normal Distribution and Standard Scores
to�obtain�z�scores�for�this�variable�for�each�case�(e�g�,�person�or�object�that�was�measured—
your�unit�of�analysis),�check�the�“Save standardized values as variables”�box�
located�in�the�bottom�left�corner�of�the�main�“Descriptives”�dialog�box��This�will�insert�
a�new�variable�into�your�dataset�for�subsequent�analysis�(see�screenshot�for�how�this�will�
appear�in�“Data View”)��Next,�click�on�the�“Options”�button�
Select the variable
of interest from
the list on the left
and use the arrow
to move to the
“Variable” box on
the right.
Placing a
checkmark here
will generate a
new,
standardized
variable in your
datafile for each
variable
selected.
Clicking on
“Options” will
allow you to select
various statistics to
be generated.
Descriptives: Step 2
Descriptives: Step 3.�A�new�box�called�“Descriptives: Options”�will�appear�
(see�screenshot,�step�3)�and�you�can�simply�place�a�checkmark�in�the�boxes�for�the�statistics�
that� you� want� to� generate�� This� will� allow� you� to� obtain� the� skewness� and� kurtosis� val-
ues,�as�well�as�measures�of�central�tendency�and�dispersion�discussed�in�Chapter�3��After�
making� your� selections,� click� on� “Continue.”� You� will� then� be� returned� to� the� main�
“Descriptives”�dialog�box��From�there,�click�“OK.”
Statistics available when
clicking on “Options”
from the main dialog
box for Descriptives.
Placing a checkmark will
generate the respective
statistic in the output.
Descriptives: Step 3
94 An Introduction to Statistical Concepts
X – μ
σ
If “Save standardized
values as variables”
was checked on the main
“Descriptives” dialog
box, a new standardized
variable will be created.
By default, this variable
name is the name of the
original variable prefixed
with a “Z” (denoting its
standardization).
It is computed using the
unit normal formula:
Descriptives: Saving
standardized variable
–
–
–
–
–
–
–
–
–
Frequencies
Frequencies: Step 1.�The�third�program�to�consider�is�“Frequencies,”�which�is�
also� accessible� by� clicking� on� “Analyze”� in� the� top� pulldown� menu,� then� clicking� on�
“Descriptive Statistics,”� and� then� selecting� “Frequencies”� (see� “Explore:
Step 1”�for�screenshots�of�these�steps)�
Frequencies: Step 2.�This�will�bring�up�the�“Frequencies”�dialog�box��Click�the�
variable�of�interest�(e�g�,�quiz)�into�the�“Variable(s)”�box,�then�click�on�the�“Statistics”�
button�
95Normal Distribution and Standard Scores
Select the variable
of interest from
the list on the left
and use the arrow
to move to the
“Variable” box on
the right.
Clicking on
“Charts” will
allow you to
generate a
histogram with
normal curve
(and other types
of graphs).
Clicking on
“Statistics”
will allow you to
select various
statistics to be generated.
Frequencies: Step 2
Frequencies: Step 3.�A�new�box�labeled�“Frequencies:�Statistics”�will�appear��
Again,�you�can�simply�place�a�checkmark�in�the�boxes�for�the�statistics�that�you�want�to�
generate��Here�you�can�obtain�the�skewness�and�kurtosis�values,�as�well�as�measures�of�
central�tendency�and�dispersion�from�Chapter�3��If�you�click�on�the�“Charts”�button,�you�
can� also� obtain� a� histogram� with� a� normal� curve� overlay� by� clicking� the�“Histogram”�
radio� button� and� checking� the�“With normal curve”� box�� This� histogram� output� is�
shown�in�Figure�4�7��After�making�your�selections,�click�on�“Continue.”�You�will�then�be�
returned�to�the�main�“Frequencies”�dialog�box��From�there,�click�“OK.”
9 10 11 12 13 14 15
Quiz
16 17 18 19 20
1
2
3
Fr
eq
ue
nc
y
4
5
FIGuRe 4.7
SPSS� histogram� of� statistics� quiz� data� with� nor-
mal�distribution�overlay�
96 An Introduction to Statistical Concepts
Options available when clicking on
“Statistics” from the main
dialog box for Frequencies. Placing a
checkmark will generate the
respective statistic in the output.
Check this for
better
accuracy with
quartiles and
percentiles
(i.e., the median).
Frequencies: Step 3
Graphs
Graphs:� Two� other� programs� also� yield� a� histogram� with� a� normal� curve� overlay�� Both�
can� be� accessed� by� first� going� to�“Graphs”� in� the� top� pulldown� menu�� From� there,� select�
“Legacy Dialogs,”�then�“Histogram.”�Another�option�for�creating�a�histogram,�starting�
again�from�the�“Graphs”�option�in�the�top�pulldown�menu,�is�to�select�“Legacy Dialogs,”�
then� “Interactive,”� and� finally� “Histogram.”� From� there,� both� work� similarly� to� the�
“Frequencies”�program�described�earlier�
A
B
C
Graphs: Step 1
97Normal Distribution and Standard Scores
Transform
Transform: Step 1.�A�final�program�that�comes�in�handy�is�for�transforming�variables,�
such� as� creating� a� standardized� version� of� a� variable� (most� notably� standardization� other�
than�the�application�of�the�unit�normal�formula,�where�the�unit�normal�standardization�can�
be� easily� performed� as� seen� previously� by� using�“Descriptives”)�� Go� to�“Transform”�
from�the�top�pulldown�menu,�and�then�select�“Compute Variables.”�A�dialog�box�labeled�
“Compute Variables”�will�appear�
A
B
Transform: Step 1
Transform: Step 2.�The�“Target Variable”�is�the�name�of�the�new�variable�you�are�
creating,�and�the�“Numeric Expression”�box�is�where�you�insert�the�commands�of�which�
original�variable�to�transform�and�how�to�transform�it�(e�g�,�stat�variable)��When�you�are�done�
defining�the�formula,�simply�click�“OK”�to�generate�the�new�variable�in�the�data�file�
The name
specified here
becomes the
column header in
“Data View.”
This name must
begin with a
letter, and no
spaces can be
included.
“Numeric Expression” will be where
you enter the formula for your new
variable. For user’s convenience, a number
of formulas are already defined within SPSS
and accessible through the “Function
group” formulas listed below.
Transform: Step 2
98 An Introduction to Statistical Concepts
4.5 Templates for Research Questions and APA-Style Paragraph
As�stated�in�the�previous�chapter,�depending�on�the�purpose�of�your�research�study,�you�
may�or�may�not�write�a�research�question�that�corresponds�to�your�descriptive�statistics��
If� the� end� result� of� your� research� paper� is� to� present� results� from� inferential� statistics,�
it� may� be� that� your� research� questions� correspond� only� to� those� inferential� questions,�
and,� thus,� no� question� is� presented� to� represent� the� descriptive� statistics�� That� is� quite�
common�� On� the� other� hand,� if� the� ultimate� purpose� of� your� research� study� is� purely�
descriptive� in� nature,� then� writing� one� or� more� research� questions� that� correspond� to�
the� descriptive� statistics� is� not� only� entirely� appropriate� but� (in� most� cases)� absolutely�
necessary�
It�is�time�again�to�revisit�our�graduate�research�assistant,�Marie,�who�was�reintroduced�
at�the�beginning�of�the�chapter��As�a�reminder,�her�task�was�to�continue�to�summarize�data�
from� 25� students� enrolled� in� a� statistics� course,� this� time� paying� particular� attention� to�
distributional�shape�and�standardization��The�questions�posed�this�time�by�Marie’s�faculty�
mentor�were�as�follows:�What is the distributional shape of the statistics
quiz score? In standard deviation units, what is the relative stand-
ing to the mean of student 1 compared to student 3? A�template�for�writ-
ing�a�descriptive�research�question�for�summarizing�distributional�shape�is�presented�as�
follows�(this�may�sound�familiar�as�this�was�first�presented�in�Chapter�2�when�we�initially�
discussed�distributional�shape)��This�is�followed�by�a�template�for�writing�a�research�ques-
tion�related�to�standardization:
What is the distributional shape of the [variable]? In standard devi-
ation units, what is the relative standing to the mean of [unit 1]
compared to [unit 3]?
Next,� we� present� an� APA-style� paragraph� summarizing� the� results� of� the� statistics� quiz�
data�example�answering�the�questions�posed�to�Marie:
As shown in the top panel of Table 3.5, the skewness value is −.598
(SE = .464) and the kurtosis value is −.741 (SE = .902). Skewness and
kurtosis values within the range of +/−2(SE) are generally considered
normal. Given our values, skewness is within the range of −.928 to
+.928 and kurtosis is within the range of −1.804 and +1.804, and these
would be considered normal. Another rule of thumb is that the skew-
ness and kurtosis values should fall within an absolute value of 2.0
to be considered normal. Applying this rule, normality is still evi-
dent. The histogram with a normal curve overlay is depicted in Figure
4.7. Taken with the skewness and kurtosis statistics, these results
indicate that the quiz scores are reasonably normally distributed.
There is a slight negative skew such that there are more scores at
the high end of the distribution than a typical normal distribu-
tion. There is also a slight negative kurtosis indicating that the
distribution is slightly flatter than a normal distribution, with a
few more extreme scores at the low end of the distribution. Again,
however, the values are within the range of what is considered a
reasonable approximation to the normal curve.
99Normal Distribution and Standard Scores
The quiz score data were standardized using the unit normal formula.
After standardization, student 1’s score was −2.07 and student 3’s score
was 1.40. This suggests that student 1 was slightly more than two stan-
dard deviation units below the mean on the statistics quiz score, while
student 3 was nearly 1.5 standard deviation units above the mean.
4.6 Summary
In� this� chapter,� we� continued� our� exploration� of� descriptive� statistics� by� considering� an�
important� distribution,� the� normal� distribution,� standard� scores,� and� other� characteristics�
of�a�distribution�of�scores��First�we�discussed�the�normal�distribution,�with�its�history�and�
important� characteristics�� In� addition,� the� unit� normal� table� was� introduced� and� used� to�
determine� various� areas� under� the� curve�� Next� we� examined� different� types� of� standard�
scores,�in�particular�z�scores,�as�well�as�CEEB�scores,�T�scores,�and�IQ�scores��Examples�of�
types�of�standard�scores�are�summarized�in�Box�4�1��The�next�section�of�the�chapter�included�
a�detailed�description�of�symmetry,�skewness,�and�kurtosis��The�different�types�of�skewness�
and� kurtosis� were� defined� and� depicted�� We� finished� the� chapter� by� examining� SPSS� for�
these�statistics�as�well�as�how�to�write�up�an�example�set�of�results��At�this�point,�you�should�
have� met� the� following� objectives:� (a)� understand� the� normal� distribution� and� utilize� the�
normal�table;�(b)�determine�and�interpret�different�types�of�standard�scores,�particularly�z�
scores;�and�(c)�understand�and�interpret�skewness�and�kurtosis�statistics��In�the�next�chapter,�
we�move�toward�inferential�statistics�through�an�introductory�discussion�of�probability�as�
well�as�a�more�detailed�discussion�of�sampling�and�estimation�
STOp aNd ThINk bOx 4.1
Examples�of�Types�of�Standard�Scores
Standard Score Distributiona
Z�(unit�normal) N(0,�1)
CEEB�score N(500,�10,000)
T�score N(50,�100)
Wechsler�intelligence�scale N(100,�225)
Stanford–Binet�intelligence�scale N(100,�256)
a� N(μ,�σ2)�
Problems
Conceptual problems
4.1� For�which�of�the�following�distributions�will�the�skewness�value�be�0?
� a�� N(0,�1)
� b�� N(0,�2)
� c�� N(10,�50)
� d�� All�of�the�above
100 An Introduction to Statistical Concepts
4.2� For�which�of�the�following�distributions�will�the�kurtosis�value�be�0?
� a�� N(0,�1)
� b�� N(0,�2)
� c�� N(10,�50)
� d�� All�of�the�above
4.3� A�set�of�400�scores�is�approximately�normally�distributed�with�a�mean�of�65�and�a�
standard�deviation�of�4�5��Approximately�95%�of�the�scores�would�fall�within�which�
range�of�scores?
� a�� 60�5�and�69�5
� b�� 56�and�74
� c�� 51�5�and�78�5
� d�� 64�775�and�65�225
4.4� What�is�the�percentile�rank�of�60�in�the�distribution�of�N(60,100)?
� a�� 10
� b�� 50
� c�� 60
� d�� 100
4.5� Which� of� the� following� parameters� can� be� found� on� the� X� axis� for� a� frequency�
polygon�of�a�population�distribution?
� a�� Skewness
� b�� Median
� c�� Kurtosis
� d�� Q
4.6� The�skewness�value�is�calculated�for�a�set�of�data�and�is�found�to�be�equal�to�+2�75��
This�indicates�that�the�distribution�of�scores�is�which�one�of�the�following?
� a�� Highly�negatively�skewed
� b�� Slightly�negatively�skewed
� c�� Symmetrical
� d�� Slightly�positively�skewed
� e�� Highly�positively�skewed
4.7� The�kurtosis�value�is�calculated�for�a�set�of�data�and�is�found�to�be�equal�to�+2�75��This�
indicates�that�the�distribution�of�scores�is�which�one�of�the�following?
� a�� Mesokurtic
� b�� Platykurtic
� c�� Leptokurtic
� d�� Cannot�be�determined
4.8� For�a�normal�distribution,�all�percentiles�above�the�50th�must�yield�positive�z�scores��
True�or�false?
4.9� If� one� knows� the� raw� score,� the� mean,� and� the� z� score,� then� one� can� calculate� the�
value�of�the�standard�deviation��True�or�false?
101Normal Distribution and Standard Scores
4.10� In�a�normal�distribution,�a�z�score�of�1�0�has�a�percentile�rank�of�34��True�or�false?
4.11� The�mean�of�a�normal�distribution�of�scores�is�always�1��True�or�false?
4.12� If�in�a�distribution�of�200�IQ�scores,�the�mean�is�considerably�above�the�median,�then�
the�distribution�is�which�one�of�the�following?
� a�� Negatively�skewed
� b�� Symmetrical
� c�� Positively�skewed
� d�� Bimodal
4.13� Which� of� the� following� is� indicative� of� a� distribution� that� has� a� skewness� value� of�
−3�98�and�a�kurtosis�value�of�−6�72?
� a�� A�left�tail�that�is�pulled�to�the�left�and�a�very�flat�distribution
� b�� A�left�tail�that�is�pulled�to�the�left�and�a�distribution�that�is�neither�very�peaked�
nor�very�flat
� c�� A�right�tail�that�is�pulled�to�the�right�and�a�very�peaked�distribution
� d�� A�right�tail�that�is�pulled�to�the�right�and�a�very�flat�distribution
4.14� Which�of�the�following�is�indicative�of�a�distribution�that�has�a�kurtosis�value�of�+4�09?
� a�� Leptokurtic�distribution
� b�� Mesokurtic�distribution
� c�� Platykurtic�distribution
� d�� Positive�skewness
� e�� Negative�skewness
4.15� For�which�of�the�following�distributions�will�the�kurtosis�value�be�greatest?
A f B f C f D f
11 3 11 4 11 1 11 1
12 4 12 4 12 3 12 5
13 6 13 4 13 12 13 8
14 4 14 4 14 3 14 5
15 3 15 4 15 1 15 1
� a�� Distribution�A
� b�� Distribution�B
� c�� Distribution�C
� d�� Distribution�D
4.16� The�distribution�of�variable�X�has�a�mean�of�10�and�is�positively�skewed��The�distri-
bution�of�variable�Y�has�the�same�mean�of�10�and�is�negatively�skewed��I�assert�that�
the�medians�for�the�two�variables�must�also�be�the�same��Am�I�correct?
4.17� The�variance�of�z�scores�is�always�equal�to�the�variance�of�the�raw�scores�for�the�same�
variable��True�or�false?
102 An Introduction to Statistical Concepts
4.18� The� mode� has� the� largest� value� of� the� central� tendency� measures� in� a� positively�
skewed�distribution��True�or�false?
4.19� Which� of� the� following� represents� the� highest� performance� in� a� normal�
distribution?
� a�� P90
� b�� z�=�+1�00
� c�� Q3
� d�� IQ�=�115
4.20� Suzie�Smith�came�home�with�two�test�scores,�z�=�+1�in�math�and�z�=�−1�in�biology��
For which�test�did�Suzie�perform�better?
4.21� A�psychologist�analyzing�data�from�creative�intelligence�scores�finds�a�relatively�nor-
mal�distribution�with�a�population�mean�of�100�and�population�standard�deviation�
of� 10�� When� standardized� into� a� unit� normal� distribution,� what� is� the� mean� of� the�
(standardized)�creative�intelligence�scores?
� a�� 0
� b�� 70
� c�� 100
� d�� Cannot�be�determined�from�the�information�provided
Computational problems
4.1� Give�the�numerical�value�for�each�of�the�following�descriptions�concerning�normal�
distributions�by�referring�to�the�table�for�N(0,�1)�
� a�� The�proportion�of�the�area�below�z�=�−1�66
� b�� The�proportion�of�the�area�between�z�=�−1�03�and�z�=�+1�03
� c�� The�fifth�percentile�of�N(20,�36)
� d�� The�99th�percentile�of�N(30,�49)
� e�� The�percentile�rank�of�the�score�25�in�N(20,�36)
� f�� The�percentile�rank�of�the�score�24�5�in�N(30,�49)
� g�� The�proportion�of�the�area�in�N(36,�64)�between�the�scores�of�18�and�42
4.2� Give�the�numerical�value�for�each�of�the�following�descriptions�concerning�normal�
distributions�by�referring�to�the�table�for�N(0,�1)�
� a�� The�proportion�of�the�area�below�z�=�−�80
� b�� The�proportion�of�the�area�between�z�=�−1�49�and�z�=�+1�49
� c�� The�2�5th�percentile�of�N(50,�81)
� d�� The�50th�percentile�of�N(40,�64)
� e�� The�percentile�rank�of�the�score�45�in�N(50,�81)
� f�� The�percentile�rank�of�the�score�53�in�N(50,�81)
� g�� The�proportion�of�the�area�in�N(36,�64)�between�the�scores�of�19�7�and�45�1
103Normal Distribution and Standard Scores
4.3� Give�the�numerical�value�for�each�of�the�following�descriptions�concerning�normal�
distributions�by�referring�to�the�table�for�N(0,�1)�
� a�� The�proportion�of�the�area�below�z�=�+1�50
� b�� The�proportion�of�the�area�between�z�=�−�75�and�z�=�+2�25
� c�� The�15th�percentile�of�N(12,�9)
� d�� The�80th�percentile�of�N(100,000,�5,000)
� e�� The�percentile�rank�of�the�score�300�in�N(200,�2500)
� f�� The�percentile�rank�of�the�score�61�in�N(60,�9)
� g�� The�proportion�of�the�area�in�N(500,�1600)�between�the�scores�of�350�and�550
Interpretive problems
4.1� Select� one� interval� or� ratio� variable� from� the� survey� 1� dataset� on� the� website� (e�g�,�
one�idea�is�to�select�the�same�variable�you�selected�for�the�interpretive�problem�from�
Chapter�3)�
� a�� Determine�the�measures�of�central�tendency,�dispersion,�skewness,�and�kurtosis�
� b�� Write�a�paragraph�which�summarizes�the�findings,�particularly�commenting�on�
the�distributional�shape�
4.2� Using�the�same�variable�selected�in�the�previous�problem,�standardize�it�using�SPSS�
� a�� Determine�the�measures�of�central�tendency,�dispersion,�skewness,�and�kurtosis�
for�the�standardized�variable�
� b�� Determine�the�measures�of�central�tendency,�dispersion,�skewness,�and�kurtosis�
for�the�variable�in�its�original�scale�(i�e�,�the�unstandardized�variable)�
� c�� Compare� and� contrast� the� differences� between� the� standardized� and� unstan-
dardized�variables�
105
5
Introduction to Probability and Sample Statistics
Chapter Outline
5�1� Brief�Introduction�to�Probability
5�1�1� Importance�of�Probability
5�1�2� Definition�of�Probability
5�1�3� Intuition�Versus�Probability
5�2� Sampling�and�Estimation
5�2�1� Simple�Random�Sampling
5�2�2� Estimation�of�Population�Parameters�and�Sampling�Distributions
Key Concepts
� 1�� Probability
� 2�� Inferential�statistics
� 3�� Simple�random�sampling�(with�and�without�replacement)
� 4�� Sampling�distribution�of�the�mean
� 5�� Variance�and�standard�error�of�the�mean�(sampling�error)
� 6�� Confidence�intervals�(CIs)�(point�vs��interval�estimation)
� 7�� Central�limit�theorem
In�Chapter�4,�we�extended�our�discussion�of�descriptive�statistics��There�we�considered�the�
following�three�general�topics:�the�normal�distribution,�standard�scores,�and�skewness�and�
kurtosis��In�this�chapter,�we�begin�to�move�from�descriptive�statistics�into�inferential�statis-
tics�(in�which�normally�distributed�data�play�a�major�role)��The�two�basic�topics�described�
in�this�chapter�are�probability,�and�sampling�and�estimation��First,�as�a�brief�introduction�
to�probability,�we�discuss�the�importance�of�probability�in�statistics,�define�probability�in�a�
conceptual�and�computational�sense,�and�discuss�the�notion�of�intuition�versus�probabil-
ity�� Second,� under� sampling� and� estimation,� we� formally� move� into� inferential� statistics�
by�considering�the�following�topics:�simple�random�sampling�(and�briefly�other�types�of�
sampling),�and�estimation�of�population�parameters�and�sampling�distributions��Concepts�
to� be� discussed� include� probability,� inferential� statistics,� simple� random� sampling� (with�
and�without�replacement),�sampling�distribution�of�the�mean,�variance�and�standard�error�
106 An Introduction to Statistical Concepts
of�the�mean�(sampling�error),�CIs�(point�vs��interval�estimation),�and�central�limit�theorem��
Our�objectives�are�that�by�the�end�of�this�chapter,�you�will�be�able�to�(a)�understand�the�
most�basic�concepts�of�probability;�(b)�understand�and�conduct�simple�random�sampling;�
and�(c)�understand,�determine,�and�interpret�the�results�from�the�estimation�of�population�
parameters�via�a�sample�
5.1 Brief Introduction to Probability
The�area�of�probability�became�important�and�began�to�be�developed�during�the�Middle�
Ages�(seventeenth�and�eighteenth�centuries),�when�royalty�and�other�well-to-do�gamblers�
consulted� with� mathematicians� for� advice� on� games� of� chance�� For� example,� in� poker� if�
you�hold�two�jacks,�what�are�your�chances�of�drawing�a�third�jack?�Or�in�craps,�what�is�the�
chance�of�rolling�a�“7”�with�two�dice?�During�that�time,�probability�was�also�used�for�more�
practical� purposes,� such� as� to� help� determine� life� expectancy� to� underwrite� life� insur-
ance� policies�� Considerable� development� in� probability� has� obviously� taken� place� since�
that� time�� In� this� section,� we� discuss� the� importance� of� probability,� provide� a� definition�
of� probability,� and� consider� the� notion� of� intuition� versus� probability�� Although� there� is�
much�more�to�the�topic�of�probability,�here�we�simply�discuss�those�aspects�of�probability�
necessary�for�the�remainder�of�the�text��For�additional�information�on�probability,�take�a�
look�at�texts�by�Rudas�(2004)�or�Tijms�(2004)�
5.1.1 Importance of probability
Let�us�first�consider�why�probability�is�important�in�statistics��A�researcher�is�out�collect-
ing�some�sample�data�from�a�group�of�individuals�(e�g�,�students,�parents,�teachers,�voters,�
corporations,�animals)��Some�descriptive�statistics�are�generated�from�the�sample�data��Say�
the�sample�mean,�X
–
,�is�computed�for�several�variables�(e�g�,�number�of�hours�of�study�time�
per� week,� grade� point� average,� confidence� in� a� political� candidate,� widget� sales,� animal�
food�consumption)��To�what�extent�can�we�generalize�from�these�sample�statistics�to�their�
corresponding�population�parameters?�For�example,�if�the�mean�amount�of�study�time�per�
week�for�a�given�sample�of�graduate�students�is�X
–
�=�10�hours,�to�what�extent�are�we�able�to�
generalize�to�the�population�of�graduate�students�on�the�value�of�the�population�mean�μ?
As�we�see,�beginning�in�this�chapter,�inferential�statistics�involve�making�an�inference�
about�population�parameters�from�sample�statistics��We�would�like�to�know�(a)�how�much�
uncertainty�exists�in�our�sample�statistics�as�well�as�(b)�how�much�confidence�to�place�in�
our�sample�statistics��These�questions�can�be�addressed�by�assigning�a�probability�value�
to�an�inference��As�we�show�beginning�in�Chapter�6,�probability�can�also�be�used�to�make�
statements�about�areas�under�a�distribution�of�scores�(e�g�,�the�normal�distribution)��First,�
however,�we�need�to�provide�a�definition�of�probability�
5.1.2 definition of probability
In�order�to�more�easily�define�probability,�consider�a�simple�example�of�rolling�a�six-sided�die�
(as�there�are�dice�with�different�numbers�of�sides)��Each�of�the�six�sides,�of�course,�has�any-
where�from�one�to�six�dots��Each�side�has�a�different�number�of�dots��What�is�the�probability�
of�rolling�a�“4”?�Technically,�there�are�six�possible�outcomes�or�events�that�can�occur��One�can�
107Introduction to Probability and Sample Statistics
also�determine�how�many�times�a�specific�outcome�or�event�actually�can�occur��These�two�
concepts�are�used�to�define�and�compute�the�probability�of�a�particular�outcome�or�event�by
�
p A
S
T
( ) =
where
p(A)�is�the�probability�that�outcome�or�event�A�will�occur
S�is�the�number�of�times�that�the�specific�outcome�or�event�A�can�occur
T�is�the�total�number�of�outcomes�or�events�possible
Let�us�revisit�our�example,�the�probability�of�rolling�a�“4�”�A�“4”�can�occur�only�once,�thus�
S�=�1��There�are�six�possible�values�that�can�be�rolled,�thus�T�=�6��Therefore�the�probability�
of�rolling�a�“4”�is�determined�by
�
p
S
T
( )4
1
6
= =
This�assumes,�however,�that�the�die�is�unbiased,�which�means�that�the�die�is�fair�and�that�
the�probability�of�obtaining�any�of�the�six�outcomes�is�the�same��For�a�fair,�unbiased�die,�
the�probability�of�obtaining�any�outcome�is�1/6��Gamblers�have�been�known�to�possess�an�
unfair,�biased�die�such�that�the�probability�of�obtaining�a�particular�outcome�is�different�
from�1/6�(e�g�,�to�cheat�their�opponent�by�shaving�one�side�of�the�die)�
Consider�one�other�classic�probability�example��Imagine�you�have�an�urn�(or�other�con-
tainer)��Inside�of�the�urn�and�out�of�view�are�a�total�of�nine�balls�(thus�T�=�9),�six�of�the�balls�
being�red�(event�A;�S�=�6)�and�the�other�three�balls�being�green�(event�B;�S�=�3)��Your�task�
is�to�draw�one�ball�out�of�the�urn�(without�looking)�and�then�observe�its�color��The�prob-
ability�of�each�of�these�two�events�occurring�on�the�first�draw�is�as�follows:
�
p A
S
T
( ) = = =
6
9
2
3
�
p B
S
T
( ) = = =
3
9
1
3
Thus�the�probability�of�drawing�a�red�ball�is�2/3,�and�the�probability�of�drawing�a�green�
ball�is�1/3�
Two�notions�become�evident�in�thinking�about�these�examples��First,�the�sum�of�the�
probabilities�for�all�distinct�or�independent�events�is�precisely�1��In�other�words,�if�we�
take�each�distinct�event�and�compute�its�probability,�then�the�sum�of�those�probabilities�
must�be�equal�to�one�so�as�to�account�for�all�possible�outcomes��Second,�the�probability�
of�any�given�event�(a)�cannot�exceed�one�and�(b)�cannot�be�less�than�zero��Part�(a)�should�
be� obvious� in� that� the� sum� of� the� probabilities� for� all� events� cannot� exceed� one,� and�
therefore�the�probability�of�any�one�event�cannot�exceed�one�either�(it�makes�no�sense�
to�talk�about�an�event�occurring�more�than�all�of�the�time)��An�event�would�have�a�prob-
ability�of�one�if�no�other�event�can�possibly�occur,�such�as�the�probability�that�you�are�
currently�breathing��For�part�(b)�no�event�can�have�a�negative�probability�(it�makes�no�
108 An Introduction to Statistical Concepts
sense�to�talk�about�an�event�occurring�less�than�never);�however,�an�event�could�have�
a� zero� probability� if� the� event� can� never� occur�� For� instance,� in� our� urn� example,� one�
could�never�draw�a�purple�ball�
5.1.3 Intuition Versus probability
At�this�point,�you�are�probably�thinking�that�probability�is�an�interesting�topic��However,�
without�extensive�training�to�think�in�a�probabilistic�fashion,�people�tend�to�let�their�intu-
ition�guide�them��This�is�all�well�and�good,�except�that�intuition�can�often�guide�you�to�a�
different�conclusion�than�probability��Let�us�examine�two�classic�examples�to�illustrate�this�
dilemma��The�first�classic�example�is�known�as�the�“birthday�problem�”�Imagine�you�are�in�
a�room�of�23�people��You�ask�each�person�to�write�down�their�birthday�(month�and�day)�on�
a�piece�of�paper��What�do�you�think�is�the�probability�that�in�a�room�of�23�people�at�least�
two�will�have�the�same�birthday?
Assume� first� that� we� are� dealing� with� 365� different� possible� birthdays,� where� leap� year�
(February�29)�is�not�considered��Also�assume�the�sample�of�23�people�is�randomly�drawn�from�
some�population�of�people��Taken�together,�this�implies�that�each�of�the�365�different�possible�
birthdays�has�the�same�probability�(i�e�,�1/365)��An�intuitive�thinker�might�have�the�following�
thought�processing��“There�are�365�different�birthdays�in�a�year�and�there�are�23�people�in�the�
sample��Therefore�the�probability�of�two�people�having�the�same�birthday�must�be�close�to�zero�”�
We�try�this�on�our�introductory�students�each�year,�and�their�guesses�are�usually�around�zero�
Intuition� has� led� us� astray,� and� we� have� not� used� the� proper� thought� processing�� True,�
there�are�365�days�and�23�people��However,�the�question�really�deals�with�pairs�of�people��
There� is� a� fairly� large� number� of� different� possible� pairs� of� people� [i�e�,� person� 1� with� 2,� 1�
with�3,�etc�,�where�the�total�number�of�different�pairs�of�people�is�equal�to�n(n�−�1)/2�=�23(22)/�
2 = 253]��All�we�need�is�for�one�pair�to�have�the�same�birthday��While�the�probability�compu-
tations�are�a�little�complex�(see�Appendix),�the�probability�that�at�least�two�individuals�will�
have�the�same�birthday�in�a�group�of�23�is�equal�to��507��That�is�right,�about�one-half�of�the�
time�a�group�of�23�people�will�have�two�or�more�with�the�same�birthday��Our�introductory�
classes�typically�have�between�20�and�40�students��More�often�than�not,�we�are�able�to�find�
two�students�with�the�same�birthday��One�year�one�of�us�wrote�each�birthday�on�the�board�so�
that�students�could�see�the�data��The�first�two�students�selected�actually�had�the�same�birth-
day,�so�our�point�was�very�quickly�shown��What�was�the�probability�of�that�event�occurring?
The� second� classic� example� is� the� “gambler’s� fallacy,”� sometimes� referred� to� as� the�
“law�of�averages�”�This�works�for�any�game�of�chance,�so�imagine�you�are�flipping�a�coin��
Obviously�there�are�two�possible�outcomes�from�a�coin�flip,�heads�and�tails��Assume�the�
coin�is�fair�and�unbiased�such�that�the�probability�of�flipping�a�head�is�the�same�as�flipping�
a�tail,�that�is,��5��After�flipping�the�coin�nine�times,�you�have�observed�a�tail�every�time��
What�is�the�probability�of�obtaining�a�head�on�the�next�flip?
An�intuitive�thinker�might�have�the�following�thought�processing��“I�have�just�observed�a�
tail�each�of�the�last�nine�flips��According�to�the�law�of�averages,�the�probability�of�observing�a�
head�on�the�next�flip�must�be�near�certainty��The�probability�must�be�nearly�one�”�We�also�try�
this�on�our�introductory�students�every�year,�and�their�guesses�are�almost�always�near�one�
Intuition� has� led� us� astray� once� again� as� we� have� not� used� the� proper� thought� pro-
cessing��True,�we�have�just�observed�nine�consecutive�tails��However,�the�question�really�
deals� with� the� probability� of� the� 10th� flip� being� a� head,� not� the� probability� of� obtaining�
10�consecutive�tails��The�probability�of�a�head�is�always��5�with�a�fair,�unbiased�coin��The�
coin�has�no�memory;�thus�the�probability�of�tossing�a�head�after�nine�consecutive�tails�is�
the�same�as�the�probability�of�tossing�a�head�after�nine�consecutive�heads,��5��In�technical�
109Introduction to Probability and Sample Statistics
terms,�the�probabilities�of�each�event�(each�toss)�are�independent�of�one�another��In�other�
words,�the�probability�of�flipping�a�head�is�the�same�regardless�of�the�preceding�flips��This�
is� not� the� same� as� the� probability� of� tossing� 10� consecutive� heads,� which� is� rather� small�
(approximately��0010)��So�when�you�are�gambling�at�the�casino�and�have�lost�the�last�nine�
games,�do�not�believe�that�you�are�guaranteed�to�win�the�next�game��You�can�just�as�easily�
lose�game�10�as�you�did�game�1��The�same�goes�if�you�have�won�a�number�of�games��You�
can�just�as�easily�win�the�next�game�as�you�did�game�1��To�some�extent,�the�casinos�count�
on�their�customers�playing�the�gambler’s�fallacy�to�make�a�profit�
5.2 Sampling and Estimation
In�Chapter�3,�we�spent�some�time�discussing�sample�statistics,�including�the�measures�of�
central�tendency�and�dispersion��In�this�section,�we�expand�upon�that�discussion�by�defin-
ing�inferential�statistics,�describing�different�types�of�sampling,�and�then�moving�into�the�
implications�of�such�sampling�in�terms�of�estimation�and�sampling�distributions�
Consider� the� situation� where� we� have� a� population� of� graduate� students�� Population�
parameters�(characteristics�of�a�population)�could�be�determined,�such�as�the�population�
size N,� the� population� mean� μ,� the� population� variance� σ2,� and� the� population� standard�
deviation�σ��Through�some�method�of�sampling,�we�then�take�a�sample�of�students�from�
this�population��Sample�statistics�(characteristics�of�a�sample)�could�be�determined,�such�
as�the�sample�size�n,�the�sample�mean�X
–
,�the�sample�variance�s2,�and�the�sample�standard�
deviation�s�
How� often� do� we� actually� ever� deal� with� population� data?� Except� when� dealing� with�
very� small,� well-defined� populations,� we� almost� never� deal� with� population� data�� The�
main�reason�for�this�is�cost,�in�terms�of�time,�personnel,�and�economics��This�means�then�
that� we� are� almost� always� dealing� with� sample� data�� With� descriptive� statistics,� dealing�
with�sample�data�is�very�straightforward,�and�we�only�need�to�make�sure�we�are�using�the�
appropriate�sample�statistic�equation��However,�what�if�we�want�to�take�a�sample�statistic�
and�make�some�generalization�about�its�relevant�population�parameter?�For�example,�you�
have�computed�a�sample�mean�on�grade�point�average�(GPA)�of�X
–
�=�3�25�for�a�sample�of�25�
graduate�students�at�State�University��You�would�like�to�make�some�generalization�from�
this� sample� mean� to� the� population� mean� μ� at� State� University�� How� do� we� do� this?� To�
what�extent�can�we�make�such�a�generalization?�How�confident�are�we�that�this�sample�
mean�actually�represents�the�population�mean?
This� brings� us� to� the� field� of� inferential� statistics�� We� define� inferential statistics� as�
statistics�that�allow�us�to�make�an�inference�or�generalization�from�a�sample�to�the�popu-
lation�� In� terms� of� reasoning,� inductive� reasoning� is� used� to� infer� from� the� specific� (the�
sample)� to� the� general� (the� population)�� Thus� inferential� statistics� is� the� answer� to� all� of�
our�preceding�questions�about�generalizing�from�sample�statistics�to�population�param-
eters��How�the�sample�is�derived,�however,�is�important�in�determining�to�what�extent�the�
statistical�results�we�derive�can�be�inferred�from�the�sample�back�to�the�population��Thus,�
it�is�important�to�spend�a�little�time�talking�about�simple�random�sampling,�the�only�sam-
pling�procedure�that�allows�generalizations�to�be�made�from�the�sample�to�the�population��
(Although�there�are�statistical�means�to�correct�for�non-simple�random�samples,�they�are�
beyond� the� scope� of� this� textbook�)� In� the� remainder� of� this� section,� and� in� much� of� the�
remainder� of� this� text,� we� take� up� the� details� of� inferential� statistics� for� many� different�
procedures�
110 An Introduction to Statistical Concepts
5.2.1 Simple Random Sampling
There�are�several�different�ways�in�which�a�sample�can�be�drawn�from�a�population��
In�this�section�we�introduce�simple�random�sampling,�which�is�a�commonly�used�type�
of� sampling� and� which� is� also� assumed� for� many� inferential� statistics� (beginning� in�
Chapter� 6)�� Simple random sampling� is� defined� as� the� process� of� selecting� sample�
observations�from�a�population�so�that�each�observation�has�an�equal�and�independent�
probability� of� being� selected�� If� the� sampling� process� is� truly� random,� then� (a)� each�
observation� in� the� population� has� an� equal� chance� of� being� included� in� the� sample,�
and�(b)�each�observation�selected�into�the�sample�is�independent�of�(or�not�affected�by)�
every�other�selection��Thus�a�volunteer�or�“street-corner”�sample�would�not�meet�the�
first�condition�because�members�of�the�population�who�do�not�frequent�that�particular�
street�corner�have�no�chance�of�being�included�in�the�sample�
In� addition,� if� the� selection� of� spouses� required� the� corresponding� selection� of� their�
respective�mates,�then�the�second�condition�would�not�be�met��For�example,�if�the�selection�
of�Mr��Joe�Smith�III�also�required�the�selection�of�his�wife,�then�these�two�selections�are�not�
independent�of�one�another��Because�we�selected�Mr��Joe�Smith�III,�we�must�also�therefore�
select�his�wife��Note�that�through�independent�sampling�it�is�possible�for�Mr��Smith�and�
his�wife�to�both�be�sampled,�but�it�is�not�required��Thus,�independence�implies�that�each�
observation�is�selected�without�regard�to�any�other�observation�sampled�
We� also� would� fail� to� have� equal� and� independent� probability� of� selection� if� the� sam-
pling�procedure�employed�was�something�other�than�a�simple�random�sample—because�it�
is�only�with�a�simple�random�sample�that�we�have�met�the�conditions�(a)�and�(b)�presented�
earlier� in� the� paragraph�� (Although� there� are� statistical� means� to� correct� for� non-simple�
random�samples,�they�are�beyond�the�scope�of�this�textbook�)�This�concept�of�independence�
is� an� important� assumption� that� we� will� become� acquainted� with� more� in� the� remain-
ing�chapters��If�we�have�independence,�then�generalizations�from�the�sample�back�to�the�
population� can� be� made� (you� may� remember� this� as� external validity� which� was� likely�
introduced� in� your� research� methods� course)� (see� Figure� 5�1)�� Because� of� the� connection�
between�simple�random�sampling�and�independence,�let�us�expand�our�discussion�on�the�
two�types�of�simple�random�sampling�
5.2.1.1 Simple Random Sampling With Replacement
There�are�two�specific�types�of�simple�random�sampling��Simple random sampling with
replacement�is�conducted�as�follows��The�first�observation�is�selected�from�the�population�
into�the�sample,�and�that�observation�is�then�replaced�back�into�the�population��The�second�
observation�is�selected�and�then�replaced�in�the�population��This�continues�until�a�sample�
of� the� desired� size� is� obtained�� The� key� here� is� that� each� observation� sampled� is� placed�
back�into�the�population�and�could�be�selected�again�
This�scenario�makes�sense�in�certain�applications�and�not�in�others��For�example,�return�
to�our�coin�flipping�example�where�we�now�want�to�flip�a�coin�100�times�(i�e�,�a�sample�size�
of�100)��How�does�this�operate�in�the�context�of�sampling?�We�flip�the�coin�(e�g�,�heads)�and�
record�the�result��This�“head”�becomes�the�first�observation�in�our�sample��This�observa-
tion� is� then� placed� back� into� the� population�� Then� a� second� observation� is� made� and� is�
placed�back�into�the�population��This�continues�until�our�sample�size�requirement�of�100�is�
reached��In�this�particular�scenario�we�always�sample�with�replacement,�and�we�automati-
cally�do�so�even�if�we�have�never�heard�of�sampling�with�replacement��If�no�replacement�
took�place,�then�we�could�only�ever�have�a�sample�size�of�two,�one�“head”�and�one�“tail�”
111Introduction to Probability and Sample Statistics
5.2.1.2 Simple Random Sampling Without Replacement
In� other� scenarios,� sampling� with� replacement� does� not� make� sense�� For� example,� say�
we�are�conducting�a�poll�for�the�next�major�election�by�randomly�selecting�100�students�
(the� sample)� at� a� local� university� (the� population)�� As� each� student� is� selected� into� the�
sample,�they�are�removed�and�cannot�be�sampled�again��It�simply�would�make�no�sense�
if� our� sample� of� 100� students� only� contained� 78� different� students� due� to� replacement�
(as� some� students� were� polled� more� than� once)�� Our� polling� example� represents� the�
other�type�of�simple�random�sampling,�this�time�without�replacement��Simple random
sampling without replacement� is� conducted� in� a� similar� fashion� except� that� once� an�
observation� is� selected� for� inclusion� in� the� sample,� it� is� not� replaced� and� cannot� be�
selected�a�second�time�
5.2.1.3 Other Types of Sampling
There� are� several� other� types� of� sampling�� These� other� types� of� sampling� include� con-
venient� sampling� (i�e�,� volunteer� or� “street-corner”� sampling� previously� mentioned),�
systematic� sampling� (e�g�,� select� every� 10th� observation� from� the� population� into� the�
sample),�cluster�sampling�(i�e�,�sample�groups�or�clusters�of�observations�and�include�all�
members�of�the�selected�clusters�in�the�sample),�stratified�sampling�(i�e�,�sampling�within�
subgroups� or� strata� to� ensure� adequate� representation� of� each� strata),� and� multistage�
sampling�(e�g�,�stratify�at�one�stage�and�randomly�sample�at�another�stage)��These�types�
of� sampling� are� beyond� the� scope� of� this� text,� and� the� interested� reader� is� referred� to�
sampling� texts� such� as� Sudman� (1976),� Kalton� (1983),� Jaeger� (1984),� Fink� (1995),� or� Levy�
and�Lemeshow�(1999)�
Step 1:
Population
Step 2:
Draw simple
random
sample
Step 3:
Compute
sample
statistics
Step 4:
Make
inference
back to the
population
FIGuRe 5.1
Cycle�of�inference�
112 An Introduction to Statistical Concepts
5.2.2 estimation of population parameters and Sampling distributions
Take� as� an� example� the� situation� where� we� select� one� random� sample� of� n� females� (e�g�,�
n = 20),�measure�their�weight,�and�then�compute�the�mean�weight�of�the�sample��We�find�
the�mean�of�this�first�sample�to�be�102�pounds�and�denote�it�by�X
–
1�=�102,�where�the�subscript�
identifies�the�first�sample��This�one�sample�mean�is�known�as�a�point�estimate�of�the�popu-
lation�mean�μ,�as�it�is�simply�one�value�or�point��We�can�then�proceed�to�collect�weight�data�
from�a�second�sample�of�n�females�and�find�that�X
–
2�=�110��Next�we�collect�weight�data�from�
a�third�sample�of�n�females�and�find�that�X
–
3�=�119��Imagine�that�we�go�on�to�collect�such�data�
from�many�other�samples�of�size�n�and�compute�a�sample�mean�for�each�of�those�samples�
5.2.2.1 Sampling Distribution of the Mean
At� this� point,� we� have� a� collection� of� sample� means,� which� we� can� use� to� construct� a�
frequency�distribution�of�sample�means��This�frequency�distribution�is�formally�known�
as�the�sampling distribution of the mean��To�better�illustrate�this�new�distribution,�let�
us�take�a�very�small�population�from�which�we�can�take�many�samples��Here�we�define�
our�population�of�observations�as�follows:�1,�2,�3,�5,�9�(in�other�words,�we�have�five�values�
in� our� population)�� As� the� entire� population� is� known� here,� we� can� better� illustrate� the�
important�underlying�concepts��We�can�determine�that�the�population�mean�μX�=�4�and�
the�population�variance�σX
2
�=�8,�where�X�indicates�the�variable�we�are�referring�to��Let�us�
first�take�all�possible�samples�from�this�population�of�size�2�(i�e�,�n�=�2)�with�replacement��
As� there� are� only� five� observations,� there� will� be� 25� possible� samples� as� shown� in� the�
upper�portion�of�Table�5�1,�called�“Samples�”�Each�entry�represents�the�two�observations�
for�a�particular�sample��For�instance,�in�row�1�and�column�4,�we�see�1,5��This�indicates�that�
the�first�observation�is�a�1�and�the�second�observation�is�a�5��If�sampling�was�done�without�
replacement,�then�the�diagonal�of�the�table�from�upper�left�to�lower�right�would�not�exist��
For�instance,�a�1,1�sample�could�not�be�selected�if�sampling�without�replacement�
Now� that� we� have� all� possible� samples� of� size� 2,� let� us� compute� the� sample� means� for�
each� of� the� 25� samples�� The� sample� means� are� shown� in� the� middle� portion� of� Table� 5�1,�
called�“Sample�means�”�Just�eyeballing�the�table,�we�see�the�means�range�from�1�to�9�with�
numerous�different�values�in�between��We�then�compute�the�mean�of�the�25�sample�means�
to�be�4,�as�shown�in�the�bottom�portion�of�Table�5�1,�called�“Mean�of�the�sample�means�”
This� is� a� matter� for� some� discussion,� so� consider� the� following� three� points�� First,� the�
distribution�of�X
–
�for�all�possible�samples�of�size�n�is�known�as�the�sampling�distribution�
of�the�mean��In�other�words,�if�we�were�to�take�all�of�the�“sample�mean”�values�in�Table�5�1�
and�construct�a�histogram�of�those�values,�then�that�is�what�is�referred�to�as�a�“sampling�
distribution�of�the�mean�”�It�is�simply�the�distribution�(i�e�,�histogram)�of�all�the�“sample�
mean”�values��Second,�the�mean�of�the�sampling�distribution�of�the�mean�for�all�possible�
samples�of�size�n�is�equal�to�μX–��As�the�mean�of�the�sampling�distribution�of�the�mean�is�
denoted�by�μX–�(the�mean�of�the�X
–
s),�then�we�see�for�the�example�that�μX–�=�μX�=�4��In�other�
words,�the�mean�of�the�sampling�distribution�of�the�mean�is�simply�the�average�of�all�of�
the�“sample�means”�in�Table�5�1��The�mean�of�the�sampling�distribution�of�the�mean�will�
always�be�equal�to�the�population�mean�
Third,�we�define�sampling error�in�this�context�as�the�difference�(or�deviation)�between�
a�particular�sample�mean�and�the�population�mean,�denoted�as�X
–
�−�μX��A�positive�sam-
pling�error�indicates�a�sample�mean�greater�than�the�population�mean,�where�the�sam-
ple�mean�is�known�as�an�overestimate�of�the�population�mean��A�zero�sampling�error�
indicates� a� sample� mean� exactly� equal� to� the� population� mean�� A� negative� sampling�
113Introduction to Probability and Sample Statistics
error�indicates�a�sample�mean�less�than�the�population�mean,�where�the�sample�mean�
is� known� as� an� underestimate� of� the� population� mean�� As� a� researcher,� we� want� the�
sampling�error�to�be�as�close�to�zero�as�possible�to�suggest�that�the�sample�reflects�the�
population�well�
5.2.2.2 Variance Error of the Mean
Now�that�we�have�a�measure�of�the�mean�of�the�sampling�distribution�of�the�mean,�let�us�
consider�the�variance�of�this�distribution��We�define�the�variance�of�the�sampling�distribu-
tion�of�the�mean,�known�as�the�variance error of the mean,�as� σX
2
��This�will�provide�us�
with� a� dispersion� measure� of� the� extent� to� which� the� sample� means� vary� and� will� also�
provide�some�indication�of�the�confidence�we�can�place�in�a�particular�sample�mean��The�
variance�error�of�the�mean�is�computed�as
�
σ
σ
X
X
n
2
2
=
where
σX
2
�is�the�population�variance�of�X
n�is�the�sample�size
Table 5.1
All�Possible�Samples�and�Sample�Means�for�n�=�2�from�the�Population�of�1,�2,�3,�5,�9
First
Observation Second Observation
Samples 1 2 3 5 9
1 1,1 1,2 1,3 1,5 1,9
2 2,1 2,2 2,3 2,5 2,9
3 3,1 3,2 3,3 3,5 3,9
5 5,1 5,2 5,3 5,5 5,9
9 9,1 9,2 9,3 9,5 9,9
Sample Means
1 1�0 1�5 2�0 3�0 5�0
2 1�5 2�0 2�5 3�5 5�5
3 2�0 2�5 3�0 4�0 6�0
5 3�0 3�5 4�0 5�0 7�0
9 5�0 5�5 6�0 7�0 9�0
X =∑ 12 5. X =∑ 15 0. X =∑ 17 5. X =∑ 22 5. X =∑ 32 5.
Mean�of�the�sample�means:
µX
X
number of samples
= = =
∑ 100
25
4 0.
Variance�of�the�sample�means:
σX
number of samples X X
number of samples
2
2
2
2
25 500
=
− ( )
=
−∑ ∑( )
( )
( ) (1100
25
25 500 10 000
25
4 0
2
2 2
)
( )
( ) ,
( )
.=
−
=
114 An Introduction to Statistical Concepts
For�the�example,�we�have�already�determined�that�σX
2
�=�8�and�that�n�=�2;�therefore,
�
σ
σ
X
X
n
2
2 8
2
4= = =
This�is�verified�in�the�bottom�portion�of�Table�5�1,�called�“Variance�of�the�sample�means,”�
where�the�variance�error�is�computed�from�the�collection�of�sample�means�
What�will�happen�if�we�increase�the�size�of�the�sample?�If�we�increase�the�sample�size�to�
n�=�4,�then�the�variance�error�is�reduced�to�2��Thus�we�see�that�as�the�size�of�the�sample�n�
increases,�the�magnitude�of�the�sampling�error�decreases��Why?�Conceptually,�as�sample�
size�increases,�we�are�sampling�a�larger�portion�of�the�population��In�doing�so,�we�are�also�
obtaining� a� sample� that� is� likely� more� representative� of� the� population�� In� addition,� the�
larger�the�sample�size,�the�less�likely�it�is�to�obtain�a�sample�mean�that�is�far�from�the�popu-
lation�mean��Thus,�as�sample�size�increases,�we�hone�in�closer�and�closer�to�the�population�
mean�and�have�less�and�less�sampling�error�
For�example,�say�we�are�sampling�from�a�voting�district�with�a�population�of�5000�vot-
ers��A�survey�is�developed�to�assess�how�satisfied�the�district�voters�are�with�their�local�
state�representative��Assume�the�survey�generates�a�100-point�satisfaction�scale��First�we�
determine�that�the�population�mean�of�satisfaction�is�75��Next�we�take�samples�of�different�
sizes��For�a�sample�size�of�1,�we�find�sample�means�that�range�from�0�to�100�(i�e�,�each�mean�
really�only�represents�a�single�observation)��For�a�sample�size�of�10,�we�find�sample�means�
that�range�from�50�to�95��For�a�sample�size�of�100,�we�find�sample�means�that�range�from�
70�to�80��We�see�then�that�as�sample�size�increases,�our�sample�means�become�closer�and�
closer�to�the�population�mean,�and�the�variability�of�those�sample�means�becomes�smaller�
and�smaller�
5.2.2.3 Standard Error of the Mean
We� can� also� compute� the� standard� deviation� of� the� sampling� distribution� of� the� mean,�
known�as�the�standard error of the mean,�by
�
σ
σ
X
X
n
=
Thus�for�the�example�we�have
�
σ
σ
X
X
n
= = =
2 8284
2
2
.
Because�the�applied�researcher�typically�does�not�know�the�population�variance,�the�pop-
ulation�variance�error�of�the�mean�and�the�population�standard�error�of�the�mean�can�be�
estimated�by�the�following,�respectively:
�
s
s
nX
X2
2
=
115Introduction to Probability and Sample Statistics
and
�
s
s
n
X
X=
5.2.2.4 Confidence Intervals
Thus� far� we� have� illustrated� how� a� sample� mean� is� a� point estimate� of� the� popula-
tion� mean� and� how� a� variance� error� gives� us� some� sense� of� the� variability� among�
the� sample� means�� Putting� these� concepts� together,� we� can� also� build� an� interval
estimate� for� the� population� mean� to� give� us� a� sense� of� how� confident� we� are� in� our�
particular�sample�mean��We�can�form�a�confidence interval (CI)�around�a�particular�
sample�mean�as�follows��As�we�learned�in�Chapter�4,�for�a�normal�distribution,�68%�of�
the�distribution�falls�within�one�standard�deviation�of�the�mean��A�68%�CI�of�a�sample�
mean�can�be�formed�as follows:
� 68% CI = ±X Xσ
Conceptually,� this� means� that� if� we� form� 68%� CIs� for� 100� sample� means,� then� 68� of�
those� 100� intervals� would� contain� or� include� the� population� mean� (it� does� not� mean�
that� there� is� a� 68%� probability� of� the� interval� containing� the� population� mean—the�
interval� either� contains� it� or� does� not)�� Because� the� applied� researcher� typically� only�
has�one�sample�mean�and�does�not�know�the�population�mean,�he�or�she�has�no�way�
of�knowing�if�this�one�CI�actually�contains�the�population�mean�or�not��If�one�wanted�
to�be�more�confident�in�a�sample�mean,�then�a�90%�CI,�a�95%�CI,�or�a�99%�CI�could�be�
formed�as�follows:
� 90 1 645% CI .= ±X Xσ
� 95 1 96% CI .= ±X Xσ
� 99 2 5758% CI .= ±X Xσ
Thus�for�the�90%�CI,�the�population�mean�will�be�contained�in�90�out�of�100�CIs;�for�the�
95%�CI,�the�population�mean�will�be�contained�in�95�out�of�100�CIs;�and�for�the�99%�CI,�the�
population�mean�will�be�contained�in�99�out�of�100�CIs��The�critical�values�of�1�645,�1�96,�
and�2�5758�come�from�the�standard�unit�normal�distribution�table�(Table�A�1)�and�indicate�
the�width�of�the�CI��Wider�CIs,�such�as�the�99%�CI,�enable�greater�confidence��For�example,�
with�a�sample�mean�of�70�and�a�standard�error�of�the�mean�of�3,�the�following�CIs�result:�
68%�CI�=�(67,�73)�[i�e�,�ranging�from�67�to�73];�90%�CI�=�(65�065,�74�935);�95%�CI�=�(64�12,�75�88);�
and�99%�CI�=�(62�2726,�77�7274)��We�can�see�here�that�to�be�assured�that�99%�of�the�CIs�con-
tain�the�population�mean,�then�our�interval�must�be�wider�(i�e�,�ranging�from�about�62�27�to�
77�73,�or�a�range�of�about�15)�than�the�CIs�that�are�lesser�(e�g�,�the�95%�CI�ranges�from�64�12�
to�75�88,�or�a�range�of�about�11)�
116 An Introduction to Statistical Concepts
In�general,�a�CI�for�any�level�of�confidence�(i�e�,�XX%�CI)�can�be�computed�by�the�follow-
ing�general�formula:
� XX X zcv X% CI = ± σ
where�zcv�is�the�critical� value�taken� from�the�standard�unit�normal�distribution�table�for�
that�particular�level�of�confidence,�and�the�other�values�are�as�before�
5.2.2.5 Central Limit Theorem
In�our�discussion�of�CIs,�we�used�the�normal�distribution�to�help�determine�the�width�
of�the�intervals��Many�inferential�statistics�assume�the�population�distribution�is�nor-
mal� in� shape�� Because� we� are� looking� at� sampling� distributions� in� this� chapter,� does�
the�shape�of�the�original�population�distribution�have�any�relationship�to�the�sampling�
distribution�of�the�mean�we�obtain?�For�example,�if�the�population�distribution�is�non-
normal,� what� form� does� the� sampling� distribution� of� the� mean� take� (i�e�,� is� the� sam-
pling�distribution�of�the�mean�also�nonnormal)?�There�is�a�nice�concept,�known�as�the�
central limit theorem,�to�assist�us�here��The�central�limit�theorem�states�that�as�sample�
size�n�increases,�the�sampling�distribution�of�the�mean�from�a�random�sample�of�size�
n� more� closely� approximates� a� normal� distribution�� If� the� population� distribution� is�
normal�in�shape,�then�the�sampling�distribution�of�the�mean�is�also�normal�in�shape��
If� the� population� distribution� is� not� normal� in� shape,� then� the� sampling� distribution�
of� the� mean� becomes� more� nearly� normal� as� sample� size� increases�� This� concept� is�
graphically�depicted�in�Figure�5�2�
The�top�row�of�the�figure�depicts�two�population�distributions,�the�left�one�being�normal�
and�the�right�one�being�positively�skewed��The�remaining�rows�are�for�the�various�sam-
pling� distributions,� depending� on� the� sample� size�� The� second� row� shows� the� sampling�
distributions�of�the�mean�for�n�=�1��Note�that�these�sampling�distributions�look�precisely�
like�the�population�distributions,�as�each�observation�is�literally�a�sample�mean��The�next�
row�gives�the�sampling�distributions�for�n�=�2;�here�we�see�for�the�skewed�population�that�
the�sampling�distribution�is�slightly�less�skewed��This�is�because�the�more�extreme�obser-
vations�are�now�being�averaged�in�with�less�extreme�observations,�yielding�less�extreme�
Normal Positively skewed
Population
------------------------------------------------------------------
n = 1
n = 2
n = 4
n = 25
FIGuRe 5.2
Central�limit�theorem�for�normal�and�positively�skewed�population�distributions�
117Introduction to Probability and Sample Statistics
means��For�n�=�4,�the�sampling�distribution�in�the�skewed�case�is�even�less�skewed�than�for�
n = 2��Eventually�we�reach�the�n�=�25�sampling�distribution,�where�the�sampling�distribu-
tion� for� the� skewed� case� is� nearly� normal� and� nearly� matches� the� sampling� distribution�
for�the�normal�case��This�phenomenon�will�occur�for�other�nonnormal�population�distri-
butions�as�well�(e�g�,�negatively�skewed)��The�morale�of�the�story�here�is�a�good�one��If�the�
population�distribution�is�nonnormal,�then�this�will�have�minimal�effect�on�the�sampling�
distribution� of� the� mean� except� for� rather� small� samples�� This� can� come� into� play� with�
inferential�statistics�when�the�assumption�of�normality�is�not�satisfied,�as�we�see�in�later�
chapters�
5.3 Summary
In� this� chapter,� we� began� to� move� from� descriptive� statistics� to� the� realm� of� inferential�
statistics�
The� two� main� topics� we� considered� were� probability,� and� sampling� and� estimation��
First� we� briefly� introduced� probability� by� looking� at� the� importance� of� probability� in�
statistics,� defining� probability,� and� comparing� conclusions� often� reached� by� intuition�
versus�probability��The�second�topic�involved�sampling�and�estimation,�a�topic�we�return�
to�in�most�of�the�remaining�chapters��In�the�sampling�section,�we�defined�and�described�
simple�random�sampling,�both�with�and�without�replacement,�and�briefly�outlined�other�
types�of�sampling��In�the�estimation�section,�we�examined�the�sampling�distribution�of�
the� mean,� the� variance� and� standard� error� of� the� mean,� CIs� around� the� mean,� and� the�
central�limit�theorem��At�this�point�you�should�have�met�the�following�objectives:�(a)�be�
able�to�understand�the�most�basic�concepts�of�probability,�(b)�be�able�to�understand�and�
conduct� simple� random� sampling,� and� (c)� be� able� to� understand,� determine,� and� inter-
pret�the�results�from�the�estimation�of�population�parameters�via�a�sample��In�the�next�
chapter� we� formally� discuss� our� first� inferential� statistics� situation,� testing� hypotheses�
about�a�single�mean�
Appendix: Probability That at Least Two Individuals
Have the Same Birthday
This� probability� can� be� shown� by� either� of� the� following� equations�� Note� that� there� are�
n = 23�individuals�in�the�room��One�method�is�as�follows:
�
1
365 364 363 365 1
365
1
365 364 363 343
365
5023−
× × × × − +
= −
× × × ×
=
� �( )
.
n
n 77
An�equivalent�method�is�as�follows:
�
1
365
365
364
365
363
365
365 1
365
1
365
365
364
36
− × × × ×
− +
= − ×�
( )n
55
363
365
343
365
507× × ×
=� .
118 An Introduction to Statistical Concepts
Problems
Conceptual problems
5.1� The�standard�error�of�the�mean�is�which�one�of�the�following?
� a�� Standard�deviation�of�a�sample�distribution
� b�� Standard�deviation�of�the�population�distribution
� c�� Standard�deviation�of�the�sampling�distribution�of�the�mean
� d�� Mean�of�the�sampling�distribution�of�the�standard�deviation
5.2� An�unbiased�six-sided�die�is�tossed�on�two�consecutive�trials,�and�the�first�toss�results�
in�a�“2�”�What�is�the�probability�that�a�“2”�will�result�on�the�second�toss?
� a�� Less�than�1/6
� b�� 1/6
� c�� More�than�1/6
� d�� Cannot�be�determined
5.3� An�urn�contains�9�balls:�3�green,�4�red,�and�2�blue��The�probability�that�a�ball�selected�
at�random�is�blue�is�equal�to�which�one�of�the�following?
� a�� 2/9
� b�� 5/9
� c�� 6/9
� d�� 7/9
5.4� Sampling�error�is�which�one�of�the�following?
� a�� The�amount�by�which�a�sample�mean�is�greater�than�the�population�mean
� b�� The�amount�of�difference�between�a�sample�statistic�and�a�population�parameter
� c�� The�standard�deviation�divided�by�the�square�root�of�n
� d�� When�the�sample�is�not�drawn�randomly
5.5� What�does�the�central�limit�theorem�state?
� a�� The� means� of� many� random� samples� from� a� population� will� be� normally�
distributed�
� b�� The�raw�scores�of�many�natural�events�will�be�normally�distributed�
� c�� z�scores�will�be�normally�distributed�
� d�� None�of�the�above�
5.6� For� a� normal� population,� the� variance� of� the� sampling� distribution� of� the� mean�
increases�as�sample�size�increases��True�or�false?
5.7� All� other� things� being� equal,� as� the� sample� size� increases,� the� standard� error� of� a�
statistic�decreases��True�or�false?
5.8� I�assert�that�the�95%�CI�has�a�larger�(or�wider)�range�than�the�99%�CI�for�the�same�
parameter�using�the�same�data��Am�I�correct?
5.9� I�assert�that�the�90%�CI�has�a�smaller�(or�more�narrow)�range�than�the�68%�CI�for�the�
same�parameter�using�the�same�data��Am�I�correct?
119Introduction to Probability and Sample Statistics
5.10� I�assert�that�the�mean�and�median�of�any�random�sample�drawn�from�a�symmetric�
population�distribution�will�be�equal��Am�I�correct?
5.11� A�random�sample�is�to�be�drawn�from�a�symmetric�population�with�mean�100�and�
variance�225��I�assert�that�the�sample�mean�is�more�likely�to�have�a�value�larger�than�
105�if�the�sample�size�is�16�than�if�the�sample�size�is�25��Am�I�correct?
5.12� A� gambler� is� playing� a� card� game� where� the� known� probability� of� winning� is� �40�
(win�40%�of�the�time)��The�gambler�has�just�lost�10�consecutive�hands��What�is�the�
probability�of�the�gambler�winning�the�next�hand?
� a�� Less�than��40
� b�� Equal�to��40
� c�� Greater�than��40
� d�� Cannot�be�determined�without�observing�the�gambler
5.13� On� the� evening� news,� the� anchorwoman� announces� that� the� state’s� lottery� has�
reached�$72�billion�and�reminds�the�viewing�audience�that�there�has�not�been�a�win-
ner�in�over�5�years��In�researching�lottery�facts,�you�find�a�report�that�states�the�prob-
ability� of� winning� the� lottery� is� 1� in� 2� million� (i�e�,� a� very,� very� small� probability)��
What�is�the�probability�that�you�will�win�the�lottery?
� a�� Less�than�1�in�2�million
� b�� Equal�to�1�in�2�million
� c�� Greater�than�1�in�2�million
� d�� Cannot�be�determined�without�additional�statistics
5.14� The�probability�of�being�selected�into�a�sample�is�the�same�for�every�individual�in�the�
population�for�the�convenient�method�of�sampling��True�or�false?
5.15� Malani�is�conducting�research�on�elementary�teacher�attitudes�toward�changes�
in�mathematics�standards��Malani’s�population�consists�of�all�elementary�teach-
ers� within� one� district� in� the� state�� Malani� wants� her� sampling� method� to� be�
such� that� every� teacher� in� the� population� has� an� equal� and� independent� prob-
ability� of� selection�� Which� of� the� following� is� the� most� appropriate� sampling�
method?
� a�� Convenient�sampling
� b�� Simple�random�sampling�with�replacement
� c�� Simple�random�sampling�without�replacement
� d�� Systematic�sampling
5.16� Sampling�error�increases�with�larger�samples��True�or�false?
5.17� If�a�population�distribution�is�highly�positively�skewed,�then�the�distribution�of�the�
sample�means�for�samples�of�size�500�will�be
� a�� Highly�negatively�skewed
� b�� Highly�positively�skewed
� c�� Approximately�normally�distributed
� d�� Cannot�be�determined�without�further�information
120 An Introduction to Statistical Concepts
Computational problems
5.1� The�population�distribution�of�variable�X,�the�number�of�pets�owned,�consists�of�the�
five�values�of�1,�4,�5,�7,�and�8�
� a�� Calculate�the�values�of�the�population�mean�and�variance�
� b�� List�all�possible�samples�of�size�2�where�samples�are�drawn�with�replacement�
� c�� Calculate�the�values�of�the�mean�and�variance�of�the�sampling�distribution�of�the�
mean�
5.2� The�following�is�a�random�sampling�distribution�of�the�mean�number�of�children�for�
samples�of�size�3,�where�samples�are�drawn�with�replacement�
Sample Mean f
1 1
2 2
3 4
4 2
5 1
� a�� What�is�the�population�mean?
� b�� What�is�the�population�variance?
� c�� What�is�the�mean�of�the�sampling�distribution�of�the�mean?
� d�� What�is�the�variance�error�of�the�mean?
5.3� In�a�study�of�the�entire�student�body�of�a�large�university,�if�the�standard�error�of�the�
mean�is�20�for�n�=�16,�what�must�the�sample�size�be�to�reduce�the�standard�error�to�5?
5.4� A�random�sample�of�13�statistics�texts�had�a�mean�number�of�pages�of�685�and�a�stan-
dard�deviation�of�42��First�calculate�the�standard�error�of�the�mean��Then�calculate�
the�95%�CI�for�the�mean�length�of�statistics�texts�
5.5� A�random�sample�of�10�high�schools�employed�a�mean�number�of�guidance�counsel-
ors�of�3�and�a�standard�deviation�of�2��First�calculate�the�standard�error�of�the�mean��
Then�calculate�the�90%�CI�for�the�mean�number�of�guidance�counselors�
Interpretive problems
5.1� Take�a�six-sided�die,�where�the�population�values�are�obviously�1,�2,�3,�4,�5,�and�6��Take�
20�samples,�each�of�size�2�(e�g�,�every�two�rolls�is�one�sample)��For�each�sample,�calcu-
late�the�mean��Then�determine�the�mean�of�the�sampling�distribution�of�the�mean�and�
the�variance�error�of�the�mean��Compare�your�results�to�those�of�your�colleagues�
5.2� You�will�need�20�plain�M&M�candy�pieces�and�one�cup��Put�the�candy�pieces�in�the�
cup�and�toss�them�onto�a�flat�surface��Count�the�number�of�candy�pieces�that�land�
with�the�“M”�facing�up��Write�down�that�number��Repeat�these�steps�five�times��These�
steps� will� constitute� one sample�� Next,� generate� four� additional� samples� (i�e�,� repeat�
the�process�of�tossing�the�candy�pieces,�counting�the�“Ms,”�and�writing�down�that�
number)��Then�determine�the�mean�of�the�sampling�distribution�of�the�mean�and�the�
variance�error�of�the�mean��Compare�your�results�to�those�of�your�colleagues�
121
6
Introduction to Hypothesis Testing:
Inferences About a Single Mean
Chapter Outline
6�1� Types�of�Hypotheses
6�2� Types�of�Decision�Errors
6�2�1� Example�Decision-Making�Situation
6�2�2� Decision-Making�Table
6�3� Level�of�Significance�(α)
6�4� Overview�of�Steps�in�Decision-Making�Process
6�5� Inferences�About�μ�When�σ�Is�Known
6�5�1� z�Test
6�5�2� Example
6�5�3� Constructing�Confidence�Intervals�Around�the�Mean
6�6� Type�II�Error�(β)�and�Power�(1�−�β)
6�6�1� Full�Decision-Making�Context
6�6�2� Power�Determinants
6�7� Statistical�Versus�Practical�Significance
6�8� Inferences�About�μ�When�σ�Is�Unknown
6�8�1� New�Test�Statistic�t
6�8�2� t�Distribution
6�8�3� t�Test
6�8�4� Example
6�9� SPSS
6�10� G*Power
6�11� Template�and�APA-Style�Write-Up
Key Concepts
� 1�� Null�or�statistical�hypothesis�versus�scientific�or�research�hypothesis
� 2�� Type�I�error�(α),�type�II�error�(β),�and�power�(1�−�β)
� 3�� Two-tailed�versus�one-tailed�alternative�hypotheses
� 4�� Critical�regions�and�critical�values
122 An Introduction to Statistical Concepts
� 5�� z�test�statistic
� 6�� Confidence�interval�(CI)�around�the�mean
� 7�� t�test�statistic
� 8�� t�distribution,�degrees�of�freedom,�and�table�of�t�distributions
In�Chapter�5,�we�began�to�move�into�the�realm�of�inferential�statistics��There�we�considered�
the� following� general� topics:� probability,� sampling,� and� estimation�� In� this� chapter,� we�
move�totally�into�the�domain�of�inferential�statistics,�where�the�concepts�involved�in�prob-
ability,�sampling,�and�estimation�can�be�implemented��The�overarching�theme�of�the�chap-
ter�is�the�use�of�a�statistical�test�to�make�inferences�about�a�single�mean��In�order�to�properly�
cover� this� inferential� test,� a� number� of� basic� foundational� concepts� are� described� in� this�
chapter�� Many� of� these� concepts� are� utilized� throughout� the� remainder� of� this� text�� The�
topics�described�include�the�following:�types�of�hypotheses,�types�of�decision�errors,�level�
of� significance� (α),� overview� of� steps� in� the� decision-making� process,� inferences� about μ�
when�σ�is�known,�Type�II�error�(β)�and�power�(1�−�β),�statistical�versus�practical�significance,�
and� inferences� about� μ� when� σ� is� unknown�� Concepts� to� be� discussed� include� the� fol-
lowing:�null�or�statistical�hypothesis�versus�scientific�or�research�hypothesis;�Type�I�error�
(α),� Type II� error� (β),� and� power� (1� −� β);� two-tailed� versus� one-tailed� alternative� hypoth-
eses;�critical�regions�and�critical�values;�z�test�statistic;�confidence�interval�(CI)�around�the�
mean;� t� test� statistic;� and� t� distribution,� degrees� of� freedom,� and� table� of� t� distributions��
Our�objectives�are�that�by�the�end�of�this�chapter,�you�will�be�able�to�(a)�understand�the�
basic�concepts�of�hypothesis�testing;�(b)�utilize�the�normal�and�t�tables;�and�(c)�understand,�
determine,�and�interpret�the�results�from�the�z�test,�t�test,�and�CI�procedures�
6.1 Types of Hypotheses
You�may�remember�Marie�from�previous�chapters��We�now�revisit�Marie�in�this�chapter�
Marie,� a� graduate� student� pursuing� a� master’s� degree� in� educational� research,� has�
completed� her� first� tasks� as� a� research� assistant—determining� a� number� of� descrip-
tive�statistics�on�data�provided�to�her�by�her�faculty�mentor��The�faculty�member�was�
so�pleased�with�the�descriptive�analyses�and�presentation�of�results�previously�shared�
that�she�has�asked�Marie�to�consult�with�a�local�hockey�coach,�Oscar,�who�is�interested�
in� examining� team� skating� performance�� Based� on� Oscar’s� research� question:� Is the
mean skating speed of a hockey team different from the league mean speed of 12 seconds?�Marie�
suggests�a�one-sample�test�of�means�as�the�test�of�inference��Her�task�is�to�assist�Oscar�
in�generating�the�test�of�inference�to�answer�his�research�question�
Hypothesis�testing�is�a�decision-making�process�where�two�possible�decisions�are�weighed�
in�a�statistical�fashion��In�a�way,�this�is�much�like�any�other�decision�involving�two�possi-
bilities,�such�as�whether�to�carry�an�umbrella�with�you�today�or�not��In�statistical�decision-
making,�the�two�possible�decisions�are�known�as�hypotheses��Sample�data�are�then�used�
to�help�us�select�one�of�these�decisions��The�two�types�of�hypotheses�competing�against�
one�another�are�known�as�the�null�or�statistical hypothesis,�denoted�by�H0,�and�the�scien-
tific, alternative,�or�research hypothesis,�denoted�by�H1�
123Introduction to Hypothesis Testing: Inferences About a Single Mean
The�null�or�statistical�hypothesis�is�a�statement�about�the�value�of�an�unknown�popula-
tion� parameter�� Considering� the� procedure� we� are� discussing� in� this� chapter,� the� one-
sample� mean� test,� one� example� H0� might� be� that� the� population� mean� IQ� score� is� 100,�
which�we�denote�as
� H H0 000 00 0: 1 or : 1µ µ= − =
Mathematically,� both� equations� say� the� same� thing�� The� version� on� the� left� is� the� more�
traditional�form�of�the�null�hypothesis�involving�a�single�mean��However,�the�version�on�
the� right� makes� clear� to� the� reader� why� the� term� “null”� is� appropriate�� That� is,� there� is�
no�difference�or�a�“null”�difference�between�the�population�mean�and�the�hypothesized�
mean�value�of�100��In�general,�the�hypothesized�mean�value�is�denoted�by�μ0�(here�μ0�=�100)��
Another� H0� might� be� the� statistics� exam� population� means� are� the� same� for� male� and�
female�students,�which�we�denote�as
� H0 00: 11 2µ µ− =
where
μ1�is�the�population�mean�for�males
μ2�is�the�population�mean�for�females
Here�there�is�no�difference�or�a�“null”�difference�between�the�two�population�means��The�
test�of�the�difference�between�two�means�is�presented�in�Chapter�7��As�we�move�through�
subsequent�chapters,�we�become�familiar�with�null�hypotheses�that�involve�other�popula-
tion�parameters�such�as�proportions,�variances,�and�correlations�
The�null�hypothesis�is�basically�set�up�by�the�researcher�in�an�attempt�to�reject�the�null�
hypothesis�in�favor�of�our�own�personal�scientific,�alternative,�or�research�hypothesis��In�
other�words,�the�scientific�hypothesis�is�what�we�believe�the�outcome�of�the�study�will�be,�
based�on�previous�theory�and�research��Thus,�we�are�trying�to�reject�the�null�hypothesis�
and�find�evidence�in�favor�of�our�scientific�hypothesis��The�scientific�hypotheses�H1�for�our�
two�examples�are
� H H1 1: 1 or : 1µ µ≠ − ≠00 00 0
and
� H1 1 2 1: µ µ− ≠ 00
Based�on�the�sample�data,�hypothesis�testing�involves�making�a�decision�as�to�whether�the�
null�or�the�research�hypothesis�is�supported��Because�we�are�dealing�with�sample�statistics�
in�our�decision-making�process,�and�trying�to�make�an�inference�back�to�the�population�
parameter(s),�there�is�always�some�risk�of�making�an�incorrect�decision��In�other�words,�the�
sample�data�might�lead�us�to�make�a�decision�that�is�not�consistent�with�the�population��
We�might�decide�to�take�an�umbrella�and�it�does�not�rain,�or�we�might�decide�to�leave�the�
umbrella�at�home�and�it�rains��Thus,�as�in�any�decision,�the�possibility�always�exists�that�
an�incorrect�decision�may�be�made��This�uncertainty�is�due�to�sampling�error,�which,�we�
will�see,�can�be�described�by�a�probability�statement��That�is,�because�the�decision�is�made�
based�on�sample�data,�the�sample�may�not�be�very�representative�of�the�population�and�
therefore�leads�us�to�an�incorrect�decision��If�we�had�population�data,�we�would�always�
124 An Introduction to Statistical Concepts
make�the�correct�decision�about�a�population�parameter��Because�we�usually�do�not,�we�
use�inferential�statistics�to�help�make�decisions�from�sample�data�and�infer�those�results�
back� to� the� population�� The� nature� of� such� decision� errors� and� the� probabilities� we� can�
attribute�to�them�are�described�in�the�next�section�
6.2 Types of Decision Errors
In� this� section,� we� consider� more� specifically� the� types� of� decision� errors� that� might� be�
made�in�the�decision-making�process��First�an�example�decision-making�situation�is�pre-
sented��This�is�followed�by�a�decision-making�table�whereby�the�types�of�decision�errors�
are�easily�depicted�
6.2.1 example decision-Making Situation
Let�us�propose�an�example�decision-making�situation�using�an�adult�intelligence�instru-
ment��It�is�known�somehow�that�the�population�standard�deviation�of�the�instrument�is�
15�(i�e�,�σ2�=�225,�σ�=�15)��(In�the�real�world,�it�is�rare�that�the�population�standard�deviation�
is�known,�and�we�return�to�reality�later�in�the�chapter�when�the�basic�concepts�have�been�
covered��But�for�now,�assume�that�we�know�the�population�standard�deviation�)�Our�null�
and�alternative�hypotheses,�respectively,�are�as�follows:
� H H0 000 00 0: 1 or : 1µ µ= − =
� H H1 1: 1 or : 1µ µ≠ − ≠00 00 0
Thus,�we�are�interested�in�testing�whether�the�population�mean�for�the�intelligence�instru-
ment�is�equal�to�100,�our�hypothesized�mean�value,�or�not�equal�to�100�
Next�we�take�several�random�samples�of�individuals�from�the�adult�population��We�find�
for�our�first�sample�Y
–
1�=�105�(i�e�,�denoting�the�mean�for�sample�1)��Eyeballing�the�informa-
tion�for�sample�1,�the�sample�mean�is�one-third�of�a�standard�deviation�above�the�hypoth-
esized�value�[i�e�,�by�computing�a�z�score�of�(105�−�100)/15�=��3333],�so�our�conclusion�would�
probably�be�to�fail�to�reject�H0��In�other�words,�if�the�population�mean�actually�is�100,�then�
we�believe�that�one�is�quite�likely�to�observe�a�sample�mean�of�105��Thus,�our�decision�for�
sample� 1� is� to� fail� to� reject� H0;� however,� there� is� some� likelihood� or� probability� that� our�
decision�is�incorrect�
We� take� a� second� sample� and� find� Y
–
2� =� 115� (i�e�,� denoting� the� mean� for� sample� 2)��
Eyeballing�the�information�for�sample�2,�the�sample�mean�is�one�standard�deviation�above�
the�hypothesized�value�[i�e�,�z�=�(115�−�100)/15�=�1�0000],�so�our�conclusion�would�probably�
be�to�fail�to�reject�H0��In�other�words,�if�the�population�mean�actually�is�100,�then�we�believe�
that�it�is�somewhat�likely�to�observe�a�sample�mean�of�115��Thus,�our�decision�for�sample�2�is�
to�fail�to�reject�H0��However,�there�is�an�even�greater�likelihood�or�probability�that�our�deci-
sion�is�incorrect�than�was�the�case�for�sample�1;�this�is�because�the�sample�mean�is�further�
away�from�the�hypothesized�value�
We�take�a�third�sample�and�find�Y
–
3�=�190�(i�e�,�denoting�the�mean�for�sample�3)��Eyeballing�
the� information� for� sample� 3,� the� sample� mean� is� six� standard� deviations� above� the�
hypothesized�value�[i�e�,�z�=�(190�−�100)/15�=�6�0000],�so�our�conclusion�would�probably�be�
125Introduction to Hypothesis Testing: Inferences About a Single Mean
reject�H0��In�other�words,�if�the�population�mean�actually�is�100,�then�we�believe�that�it�is�
quite�unlikely�to�observe�a�sample�mean�of�190��Thus,�our�decision�for�sample�3�is�to�reject�
H0;�however,�there�is�some�small�likelihood�or�probability�that�our�decision�is�incorrect�
6.2.2 decision-Making Table
Let�us�consider�Table�6�1�as�a�mechanism�for�sorting�out�the�possible�outcomes�in�the�sta-
tistical�decision-making�process��The�table�consists�of�the�general�case�and�a�specific�case��
First,� in� part� (a)� of� the� table,� we� have� the� possible� outcomes� for� the� general� case�� For� the�
state�of�nature�or�reality�(i�e�,�how�things�really�are�in�the�population),�there�are�two�distinct�
possibilities�as�depicted�by�the�rows�of�the�table��Either�H0�is�indeed�true�or�H0�is�indeed�
false��In�other�words,�according�to�the�real-world�conditions�in�the�population,�either�H0�is�
actually�true�or�H0�is�actually�false��Admittedly,�we�usually�do�not�know�what�the�state�of�
nature�truly�is;�however,�it�does�exist�in�the�population�data��It�is�the�state�of�nature�that�we�
are�trying�to�best�approximate�when�making�a�statistical�decision�based�on�sample�data�
For� our� statistical� decision,� there� are� two� distinct� possibilities� as� depicted� by� the� col-
umns� of� the� table�� Either� we� fail� to� reject� H0� or� we� reject� H0�� In� other� words,� based� on�
our�sample�data,�we�either�fail�to�reject�H0�or�reject�H0��As�our�goal�is�usually�to�reject�H0�
in� favor� of� our� research� hypothesis,� we� prefer� the� term� fail to reject� rather� than� accept��
Accept�implies� you� are�willing�to�throw� out�your� research� hypothesis� and�admit�defeat�
based� on� one� sample�� Fail to reject� implies� you� still� have� some� hope� for� your� research�
hypothesis,�despite�evidence�from�a�single�sample�to�the�contrary�
If�we�look�inside�of�the�table,�we�see�four�different�outcomes�based�on�a�combination�of�
our�statistical�decision�and�the�state�of�nature��Consider�the�first�row�of�the�table�where�H0�
is�in�actuality�true��First,�if�H0�is�true�and�we�fail�to�reject�H0,�then�we�have�made�a�correct�
decision;�that�is,�we�have�correctly�failed�to�reject�a�true�H0��The�probability�of�this�first�out-
come�is�known�as�1�−�α�(where�α�represents�alpha)��Second,�if�H0�is�true�and�we�reject�H0,�
then�we�have�made�a�decision�error�known�as�a�Type I error��That�is,�we�have�incorrectly�
rejected�a�true�H0��Our�sample�data�have�led�us�to�a�different�conclusion�than�the�popula-
tion�data�would�have��The�probability�of�this�second�outcome�is�known�as��Therefore,�if�
H0�is�actually�true,�then�our�sample�data�lead�us�to�one�of�two�conclusions,�either�we�cor-
rectly�fail�to�reject�H0,�or�we�incorrectly�reject�H0��The�sum�of�the�probabilities�for�these�two�
outcomes�when�H0�is�true�is�equal�to�1�[i�e�,�(1�−�α)�+�α�=�1]�
Consider� now� the� second� row� of� the� table� where� H0� is� in� actuality� false�� First,� if� H0� is�
really�false�and�we�fail�to�reject�H0,�then�we�have�made�a�decision�error�known�as�a�Type II
Table 6.1
Statistical�Decision�Table
State of Nature (Reality)
Decision
Fail to Reject H0 Reject H0
(a) General case
H0�is�true Correct�decision�(1�−�α) Type�I�error�(α)
H0�is�false Type�II�error�(β) Correct�decision�(1�−�β)�=�power
(b) Example rain case
H0�is�true�(no rain) Correct�decision�(do not take umbrella
and no umbrella needed)�(1�−�α)
Type�I�error�(take umbrella and look silly)�(α)
H0�is�false�(rains) Type�II�error�(do not take umbrella and
get wet)�(β)
Correct�decision�(take umbrella and stay dry)�
(1�−�β)�=�power
126 An Introduction to Statistical Concepts
error��That�is,�we�have�incorrectly�failed�to�reject�a�false�H0��Our�sample�data�have�led�us�
to�a�different�conclusion�than�the�population�data�would�have��The�probability�of�this�out-
come�is�known�as�β�(beta)��Second,�if�H0�is�really�false�and�we�reject�H0,�then�we�have�made�
a�correct�decision;�that�is,�we�have�correctly�rejected�a�false�H0��The�probability�of�this�sec-
ond�outcome�is�known�as�1�−�β�or�power�(to�be�more�fully�discussed�later�in�this�chapter)��
Therefore,�if�H0�is�actually�false,�then�our�sample�data�lead�us�to�one�of�two�conclusions,�
either�we�incorrectly�fail�to�reject�H0,�or�we�correctly�reject�H0��The�sum�of�the�probabilities�
for�these�two�outcomes�when�H0�is�false�is�equal�to�1�[i�e�,�β�+�(1�−�β)�=�1]�
As�an�application�of�this�table,�consider�the�following�specific�case,�as�shown�in�part�(b)�of�
Table�6�1��We�wish�to�test�the�following�hypotheses�about�whether�or�not�it�will�rain�tomorrow�
H0:�no�rain�tomorrow
H1:�rains�tomorrow
We� collect� some� sample� data� from� prior� years� for� the� same� month� and� day,� and� go� to�
make�our�statistical�decision��Our�two�possible�statistical�decisions�are�(a)�we�do�not�believe�
it�will�rain�tomorrow�and�therefore�do�not�bring�an�umbrella�with�us,�or�(b)�we�do�believe�it�
will�rain�tomorrow�and�therefore�do�bring�an�umbrella�
Again� there� are� four� potential� outcomes�� First,� if� H0� is� really� true� (no� rain)� and� we� do�
not�carry�an�umbrella,�then�we�have�made�a�correct�decision�as�no�umbrella�is�necessary�
(probability�=�1�−�α)��Second,�if�H0�is�really�true�(no�rain)�and�we�carry�an�umbrella,�then�
we�have�made�a�Type�I�error�as�we�look�silly�carrying�that�umbrella�around�all�day�(prob-
ability�=�α)��Third,�if�H0�is�really�false�(rains)�and�we�do�not�carry�an�umbrella,�then�we�have�
made�a�Type�II�error�and�we�get�wet�(probability�=�β)��Fourth,�if�H0�is�really�false�(rains)�
and�we�carry�an�umbrella,�then�we�have�made�the�correct�decision�as�the�umbrella�keeps�
us�dry�(probability�=�1�−�β)�
Let� us� make� two� concluding� statements� about� the� decision� table�� First,� one� can� never�
prove�the�truth�or�falsity�of�H0�in�a�single�study��One�only�gathers�evidence�in�favor�of�or�
in�opposition�to�the�null�hypothesis��Something�is�proven�in�research�when�an�entire�col-
lection�of�studies�or�evidence�reaches�the�same�conclusion�time�and�time�again��Scientific�
proof�is�difficult�to�achieve�in�the�social�and�behavioral�sciences,�and�we�should�not�use�
the�term�prove�or�proof�loosely��As�researchers,�we�gather�multiple�pieces�of�evidence�that�
eventually�lead�to�the�development�of�one�or�more�theories��When�a�theory�is�shown�to�be�
unequivocally�true�(i�e�,�in�all�cases),�then�proof�has�been�established�
Second,�let�us�consider�the�decision�errors�in�a�different�light��One�can�totally�eliminate�
the� possibility� of� a� Type� I� error� by� deciding� to� never� reject� H0�� That� is,� if� we� always� fail�
to� reject� H0� (do� not� ever� carry� umbrella),� then� we� can� never� make� a� Type� I� error� (look�
silly�with�unnecessary�umbrella)��Although�this�strategy�sounds�fine,�it�totally�takes�the�
decision-making�power�out�of�our�hands��With�this�strategy,�we�do�not�even�need�to�collect�
any�sample�data,�as�we�have�already�decided�to�never�reject�H0�
One� can� totally� eliminate� the� possibility� of� a� Type� II� error� by� deciding� to� always�
reject H0��That�is,�if�we�always�reject�H0�(always�carry�umbrella),�then�we�can�never�make�
a� Type� II� error� (get� wet� without� umbrella)�� Although� this� strategy� also� sounds� fine,� it�
totally�takes�the�decision-making�power�out�of�our�hands��With�this�strategy,�we�do�not�
even� need� to� collect� any� sample� data� as� we� have� already� decided� to� always� reject H0��
Taken� together,� one� can� never� totally� eliminate� the� possibility� of� both� a� Type� I� and� a�
Type�II�error��No�matter�what�decision�we�make,�there�is�always�some�possibility�of�mak-
ing�a�Type�I�and/or�Type�II�error��Therefore,�as�researchers,�our�job�is�to�make�conscious�
decisions�in�designing�and�conducting�our�study�and�in�analyzing�the�data�so�that�the�
possibility�of�decision�error�is�minimized�
127Introduction to Hypothesis Testing: Inferences About a Single Mean
6.3 Level of Significance (α)
We�have�already�stated�that�a�Type�I�error�occurs�when�the�decision�is�to�reject�H0�when�
in�fact�H0�is�actually�true��We�defined�the�probability�of�a�Type�I�error�as�α,�which�is�also�
known�as�the�level�of�significance�or�significance�level��We�now�examine�α�as�a�basis�for�
helping� us� make� statistical� decisions�� Recall� from� a� previous� example� that� the� null� and�
alternative�hypotheses,�respectively,�are�as�follows:
� H H0 000 00 0: 1 or : 1µ µ= − =
� H H1 1: 1 or : 1µ µ≠ − ≠00 00 0
We� need� a� mechanism� for� deciding� how� far� away� a� sample� mean� needs� to� be� from� the�
hypothesized� mean� value� of� μ0� =� 100� in� order� to� reject� H0�� In� other� words,� at� a� certain�
point�or�distance�away�from�100,�we�will�decide�to�reject�H0��We�use�α�to�determine�that�
point� for� us,� where� in� this� context,� α� is� known� as� the� level of significance�� Figure� 6�1a�
shows�a�sampling�distribution�of�the�mean�where�the�hypothesized�value�μ0�is�depicted�
at�the�center�of�the�distribution��Toward�both�tails�of�the�distribution,�we�see�two�shaded�
regions�known�as�the�critical regions�or�regions�of�rejection��The�combined�areas�of�the�
two�shaded�regions�is�equal�to�α,�and,�thus,�the�area�of�either�the�upper�or�the�lower�tail�
critical�region�is�equal�to�α/2�(i�e�,�we�split�α�in�half�by�dividing�by�two)��If�the�sample�mean�
(a)
α/2
Critical
region
α/2
Critical
region
Critical
value
Critical
value
µ0
Hypothesized
value (b)
α
Critical
region
Critical
value
µ0
Hypothesized
value
(c)
α
Critical
region
Critical
value
µ0
Hypothesized
value
FIGuRe 6.1
Alternative�hypotheses�and�critical�regions:�(a)�two-tailed�test;�(b)�one-tailed,�right�tailed�test;�(c)�one-tailed,�left�
tailed�test�
128 An Introduction to Statistical Concepts
is�far�enough�away�from�the�hypothesized�mean�value,�μ0,�that�it�falls�into�either�critical�
region,�then�our�statistical�decision�is�to�reject�H0��In�this�case,�our�decision�is�to�reject�H0�
at�the�α�level�of�significance��If,�however,�the�sample�mean�is�close�enough�to�μ0�that�it�falls�
into�the�unshaded�region�(i�e�,�not�into�either�critical�region),�then�our�statistical�decision�
is� to� fail to reject� H0�� The� precise� points� on� the� X� axis� at� which� the� critical� regions� are�
divided�from�the�unshaded�region�are�known�as�the�critical values��Determining�critical�
values�is�discussed�later�in�this�chapter�
Note�that�under�the�alternative�hypothesis�H1,�we�are�willing�to�reject�H0�when�the�sample�
mean�is�either�significantly�greater�than�or�significantly�less�than�the�hypothesized�mean�
value�μ0��This�particular�alternative�hypothesis�is�known�as�a�nondirectional alternative
hypothesis,�as�no�direction�is�implied�with�respect�to�the�hypothesized�value��That�is,�we�
will� reject� the� null� hypothesis� in� favor� of� the� alternative� hypothesis� in� either� direction,�
either� above� or� below� the� hypothesized� mean� value�� This� also� results� in� what� is� known�
as�a�two-tailed test of significance�in�that�we�are�willing�to�reject�the�null�hypothesis�in�
either�tail�or�critical�region�
Two� other� alternative� hypotheses� are� also� possible,� depending� on� the� researcher’s� sci-
entific�hypothesis,�which�are�known�as�a�directional alternative hypothesis��One�direc-
tional�alternative�is�that�the�population�mean�is�greater�than�the�hypothesized�mean�value,�
also�known�as�a�right-tailed�test,�as�denoted�by
� H H1 1: 1 or : 1µ µ> − >00 00 0
Mathematically,�both�of�these�equations�say�the�same�thing��With�a�right-tailed�alternative�
hypothesis,�the�entire�region�of�rejection�is�contained�in�the�upper�tail,�with�an�area�of�α,�
known� as� a� one-tailed test of significance� (and� specifically� the� right� tail)�� If� the� sample�
mean�is�significantly�greater�than�the�hypothesized�mean�value�of�100,�then�our�statistical�
decision�is�to�reject�H0��If,�however,�the�sample�mean�falls�into�the�unshaded�region,�then�
our�statistical�decision�is�to�fail�to�reject�H0��This�situation�is�depicted�in�Figure�6�1b�
A� second� directional� alternative� is� that� the� population� mean� is� less� than� the� hypoth-
esized�mean�value,�also�known�as�a�left-tailed�test,�as�denoted�by
� H H1 1: 1 or : 1µ µ< − <00 00 0
Mathematically,�both�of�these�equations�say�the�same�thing��With�a�left-tailed�alternative�
hypothesis,�the�entire�region�of�rejection�is�contained�in�the�lower�tail,�with�an�area�of�α,�
also�known�as�a�one-tailed test of significance�(and�specifically�the�left�tail)��If�the�sam-
ple�mean�is�significantly�less�than�the�hypothesized�mean�value�of�100,�then�our�statisti-
cal�decision�is�to�reject�H0��If,�however,�the�sample�mean�falls�into�the�unshaded�region,�
then�our�statistical�decision�is�to�fail�to�reject�H0��This�situation�is�depicted�in�Figure�6�1c�
There� is� some� potential� for� misuse� of� the� different� alternatives,� which� we� consider� to�
be�an�ethical� matter��For�example,� a�researcher� conducts�a�one-tailed� test�with�an�upper�
tail�critical�region�and�fails�to�reject�H0��However,�the�researcher�notices�that�the�sample�
mean�is�considerably�below�the�hypothesized�mean�value�and�then�decides�to�change�the�
alternative�hypothesis�to�either�a�nondirectional�test�or�a�one-tailed�test�in�the�other�tail��
This� is� unethical,� as� the� researcher� has� examined� the� data� and� changed� the� alternative�
hypothesis��The�morale�of�the�story�is�this:�If there is previous and consistent empirical evidence
to use a specific directional alternative hypothesis, then you should do so. If, however, there is mini-
mal or inconsistent empirical evidence to use a specific directional alternative, then you should not.
Instead, you should use a nondirectional alternative��Once�you�have�decided�which�alternative�
129Introduction to Hypothesis Testing: Inferences About a Single Mean
hypothesis�to�go�with,�then�you�need�to�stick�with�it�for�the�duration�of�the�statistical�deci-
sion��If�you�find�contrary�evidence,�then�report�it�as�it�may�be�an�important�finding,�but�do�
not�change�the�alternative�hypothesis�in�midstream�
6.4 Overview of Steps in Decision-Making Process
Before�we�get�into�the�specific�details�of�conducting�the�test�of�a�single�mean,�we�want�to�
discuss�the�basic�steps�for�hypothesis�testing�of�any�inferential�test:
� 1�� State�the�null�and�alternative�hypotheses�
� 2�� Select�the�level�of�significance�(i�e�,�alpha,�α)�
� 3�� Calculate�the�test�statistic�value�
� 4�� Make�a�statistical�decision�(reject�or�fail�to�reject�H0)�
Step 1:�The�first�step�in�the�decision-making�process�is�to�state�the�null�and�alternative�
hypotheses��Recall�from�our�previous�example�that�the�null�and�nondirectional�alterna-
tive�hypotheses,�respectively,�for�a�two-tailed�test�are�as�follows:
� H H0 000 00 0: 1 or : 1µ µ= − =
� H H1 1: 1 or : 1µ µ≠ − ≠00 00 0
One� could� also� choose� one� of� the� other� directional� alternative� hypotheses� described�
previously�
If� we� choose� to� write� our� null� hypothesis� as� H0:� μ� =� 100,� we� would� want� to� write� our�
research�hypothesis�in�a�consistent�manner,�H1:�μ�≠�100�(rather�than�H1:�μ�−�100�≠�0)��In�pub-
lication,�many�researchers�opt�to�present�the�hypotheses�in�narrative�form�(e�g�,�“the�null�
hypothesis�states�that�the�population�mean�will�equal�100,�and�the�alternative�hypothesis�
states� that� the� population� mean� will� not� equal� 100”)�� How� you� present� your� hypotheses�
(mathematically�or�using�statistical�notation)�is�up�to�you�
Step 2:�The�second�step�in�the�decision-making�process�is�to�select�a�level�of�significance��
There�are�two�considerations�to�make�in�terms�of�selecting�a�level�of�significance��One�con-
sideration�is�the�cost�associated�with�making�a�Type�I�error,�which�is�what�α�really�is��Recall�
that�alpha�is�the�probability�of�rejecting�the�null�hypothesis�if�in�reality�the�null�hypothesis�
is�true��When�a�Type�I�error�is�made,�that�means�evidence�is�building�in�favor�of�the�research�
hypothesis�(which�is�actually�false)��Let�us�take�an�example�of�a�new�drug��To�test�the�efficacy�
of�the�drug,�an�experiment�is�conducted�where�some�individuals�take�the�new�drug�while�
others� receive� a� placebo�� The� null� hypothesis,� stated� nondirectionally,� would� essentially�
indicate�that�the�effects�of�the�drug�and�placebo�are�the�same��Rejecting�that�null�hypothesis�
would�mean�that�the�effects�are�not�equal—suggesting�that�perhaps�this�new�drug,�which�in�
reality�is�not�any�better�than�a�placebo,�is�being�touted�as�effective�medication��That�is�obvi-
ously�problematic�and�potentially�very�hazardous�
Thus,�if�there�is�a�relatively�high�cost�associated�with�a�Type�I�error—for�example,�such�
that�lives�are�lost,�as�in�the�medical�profession—then�one�would�want�to�select�a�relatively�
small�level�of�significance�(e�g�,��01�or�smaller)��A�small�alpha�would�translate�to�a�very�small�
probability�of�rejecting�the�null�if�it�were�really�true�(i�e�,�a�small�probability�of�making�an�
130 An Introduction to Statistical Concepts
incorrect�decision)��If�there�is�a�relatively�low�cost�associated�with�a�Type�I�error—for�exam-
ple,�such�that�children�have�to�eat�the�second-rated�candy�rather�than�the�first—then�select-
ing�a�larger�level�of�significance�may�be�appropriate�(e�g�,��05�or�larger)��Costs�are�not�always�
known,�however��A�second�consideration�is�the�level�of�significance�commonly�used�in�your�
field� of� study�� In� many� disciplines,� the� �05� level� of� significance� has� become� the� standard�
(although�no�one�seems�to�have�a�really�good�rationale)��This�is�true�in�many�of�the�social�
and�behavioral�sciences��Thus,�you�would�do�well�to�consult�the�published�literature�in�your�
field�to�see�if�some�standard�is�commonly�used�and�to�consider�it�for�your�own�research�
Step 3:�The�third�step�in�the�decision-making�process�is�to�calculate�the�test�statistic��For�
the� one-sample� mean� test,� we� will� compute� the� sample� mean� Y
–
� and� compare� it� to� the�
hypothesized�value�μ0��This�allows�us�to�determine�the�size�of�the�difference�between�
Y
–
� and� μ0,� and� subsequently,� the� probability� associated� with� the� difference�� The� larger�
the�difference,�the�more�likely�it�is�that�the�sample�mean�really�differs�from�the�hypoth-
esized�mean�value�and�the�larger�the�probability�associated�with�the�difference�
Step 4:�The�fourth�and�final�step�in�the�decision-making�process�is�to�make�a�statistical�deci-
sion�regarding�the�null�hypothesis�H0��That�is,�a�decision�is�made�whether�to�reject�H0�or�to�
fail�to�reject�H0��If�the�difference�between�the�sample�mean�and�the�hypothesized�value�is�
large�enough�relative�to�the�critical�value�(we�will�talk�about�critical�values�in�more�detail�
later),�then�our�decision�is�to�reject�H0��If�the�difference�between�the�sample�mean�and�the�
hypothesized�value�is�not�large�enough�relative�to�the�critical�value,�then�our�decision�is�to�
fail�to�reject�H0��This�is�the�basic�four-step�process�for�hypothesis�testing�of�any�inferential�
test��The�specific�details�for�the�test�of�a�single�mean�are�given�in�the�following�section�
6.5 Inferences About μ When σ Is Known
In�this�section,�we�examine�how�hypotheses�about�a�single�mean�are�conducted�when�the�
population�standard�deviation�is�known��Specifically,�we�consider�the�z�test,�an�example�
illustrating�the�use�of�the�z�test,�and�how�to�construct�a�CI�around�the�mean�
6.5.1 z Test
Recall�from�Chapter�4�the�definition�of�a�z�score�as
�
z
Yi
Y
=
− µ
σ
where
Yi�is�the�score�on�variable�Y�for�individual�i
μ�is�the�population�mean�for�variable�Y
σY�is�the�population�standard�deviation�for�variable�Y
The�z�score�is�used�to�tell�us�how�many�standard�deviation�units�an�individual’s�score�is�
from�the�mean�
In� the� context� of� this� chapter,� however,� we� are� concerned� with� the� extent� to� which� a�
sample�mean�differs�from�some�hypothesized�mean�value��We�can�construct�a�variation�of�
131Introduction to Hypothesis Testing: Inferences About a Single Mean
the�z�score�for�testing�hypotheses�about�a�single�mean��In�this�situation,�we�are�concerned�
with�the�sampling�distribution�of�the�mean�(introduced�in�Chapter�5),�so�the�equation�must�
reflect�means�rather�than�raw�scores��Our�z�score�equation�for�testing�hypotheses�about�a�
single�mean�becomes
�
z
Y
Y
=
− µ
σ
0
where
Y
–
�is�the�sample�mean�for�variable�Y
μ0�is�the�hypothesized�mean�value�for�variable�Y
σY– is�the�population�standard�error�of�the�mean�for�variable�Y
From�Chapter�5,�recall�that�the�population�standard�error�of�the�mean�σY–�is�computed�by
�
σ
σ
Y
Y
n
=
�
where
σY�is�the�population�standard�deviation�for�variable�Y
n�is�sample�size
Thus,� the� numerator� of� the� z� score� equation� is� the� difference� between� the� sample� mean�
and�the�hypothesized�value�of�the�mean,�and�the�denominator�is�the�standard�error�of�the�
mean��What�we�are�really�determining�here�is�how�many�standard�deviation�(or�standard�
error)� units� the� sample� mean� is� from� the� hypothesized� mean�� Henceforth,� we� call� this�
variation�of�the�z�score�the�test statistic for the test of a single mean,�also�known�as�the�
z�test��This�is�the�first�of�several�test�statistics�we�describe�in�this�text;�every�inferential�test�
requires�some�test�statistic�for�purposes�of�testing�hypotheses�
We�need�to�make�a�statistical�assumption�regarding�this�hypothesis�testing�situation��We�
assume�that�z�is�normally�distributed�with�a�mean�of�0�and�a�standard�deviation�of�1��This�
is�written�statistically�as�z ∼ N(0,�1)�following�the�notation�we�developed�in�Chapter�4��Thus,�
the�assumption�is�that�z�follows�the�unit�normal�distribution�(in�other�words,�the�shape�of�
the�distribution�is�approximately�normal)��An�examination�of�our�test�statistic�z�reveals�that�
only� the� sample� mean� can� vary� from� sample� to� sample�� The� hypothesized� value� and� the�
standard�error�of�the�mean�are�constant�for�every�sample�of�size�n�from�the�same�population�
In�order�to�make�a�statistical�decision,�the�critical�regions�need�to�be�defined��As�the�test�
statistic�is�z�and�we�have�assumed�normality,�then�the�relevant�theoretical�distribution�we�
compare�the�test�statistic�to�is�the�unit�normal�distribution��We�previously�discussed�this�dis-
tribution�in�Chapter�4,�and�the�table�of�values�is�given�in�Table�A�1��If�the�alternative�hypoth-
esis�is�nondirectional,�then�there�would�be�two�critical�regions,�one�in�the�upper�tail�and�one�
in�the�lower�tail��Here�we�would�split�the�area�of�the�critical�region,�known�as�α,�in�two��If�the�
alternative�hypothesis�is�directional,�then�there�would�only�be�one�critical�region,�either�in�
the�upper�tail�or�in�the�lower�tail,�depending�on�which�direction�one�is�willing�to�reject�H0�
6.5.2 example
Let�us�illustrate�the�use�of�this�inferential�test�through�an�example��We�are�interested�in�
testing�whether�the�population�of�undergraduate�students�from�Awesome�State�University�
(ASU)�has�a�mean�intelligence�test�score�different�from�the�hypothesized�mean�value�of�
132 An Introduction to Statistical Concepts
μ0�=�100�(remember�that�the�hypothesized�mean�value�does�not�come�from�our�sample�but�
from�another�source;�in�this�example,�let�us�say�that�this�value�of�100�is�the�national�norm�
as�presented�in�the�technical�manual�of�this�particular�intelligence�test)�
Recall that our first step in hypothesis testing is to state the hypothesis��A�nondirectional�alter-
native�hypothesis�is�of�interest�as�we�simply�want�to�know�if�this�population�has�a�mean�
intelligence�different�from�the�hypothesized�value,�either�greater�than�or�less�than��Thus,�
the�null�and�alternative�hypotheses�can�be�written�respectively�as�follows:
� H H0 000 00 0: 1 or : 1µ µ= − =
� H H1 1: 1 or : 1µ µ≠ − ≠00 00 0
A�sample�mean�of�Y
–
�=�103�is�observed�for�a�sample�of�n�=�100�ASU�undergraduate�students��
From� the� development� of� this� intelligence� test,� we� know� that� the� theoretical� population�
standard�deviation�is�σY�=�15�(again,�for�purposes�of�illustration,�let�us�say�that�the�popula-
tion�standard�deviation�of�15�was�noted�in�the�technical�manual�for�this�test)�
Our second step is to select a level of significance��The�standard�level�of�significance�in�this�
field�is�the��05�level;�thus,�we�perform�our�significance�test�at�α�=��05�
The third step is to compute the test statistic value��To�compute�our�test�statistic�value,�first�
we�compute�the�standard�error�of�the�mean�(the�denominator�of�our�test�statistic�formula)�
as�follows:
�
σ
σ
Y
Y
n
= = =
15
100
1 5000.
Then� we� compute� the� test� statistic� z,� where� the� numerator� is� the� difference� between� the�
mean�of�our�sample�(Y
–
�=�103)�and�the�hypothesized�mean�value�(μ0�=�100),�and�the�denomi-
nator�is�the�standard�error�of�the�mean:
�
z
Y
Y
=
−
=
−
=
µ
σ
0 103 100
1 5000
2 0000
.
.
Finally, in the last step, we make our statistical decision by comparing the test statistic z to the critical
values��To�determine�the�critical�values�for�the�z�test,�we�use�the�unit�normal�distribution�in�
Table A�1�� Since� α� =� �05� and� we� are� conducting� a� nondirectional� test,� we� need� to� find� criti-
cal�values�for�the�upper�and�lower�tails,�where�the�area�of�each�of�the�two�critical�regions�is�
equal�to��025�(i�e�,�splitting�alpha�in�half:�α/2�or��05/2�=��025)��From�the�unit�normal�table,�we�
find�these�critical�values�to�be�+1�96�(the�point�on�the�X�axis�where�the�area�above�that�point�
is�equal�to��025)�and�−1�96�(the�point�on�the�X�axis�where�the�area�below�that�point�is�equal�to�
�025)��As�shown�in�Figure�6�2,�the�test�statistic�z�=�2�00�falls�into�the�upper�tail�critical�region,�
just�slightly�larger�than�the�upper�tail�critical�value�of�+1�96��Our�decision�is�to�reject�H0�and�
conclude�that�the�ASU�population�from�which�the�sample�was�selected�has�a�mean�intelligence�
score�that�is�statistically�significantly�different�from�the�hypothesized�mean�of�100�at�the��05�
level�of�significance�
A�more�precise�way�of�thinking�about�this�process�is�to�determine�the�exact probability�
of�observing�a�sample�mean�that�differs�from�the�hypothesized�mean�value��From�the�unit�
normal�table,�the�area�above�z�=�2�00�is�equal�to��0228��Therefore,�the�area�below�z�=�−2�00�is�
also�equal�to��0228��Thus,�the�probability�p�of�observing,�by�chance,�a�sample�mean�of�2�00�
or�more�standard�errors�(i�e�,�z�=�2�00)�from�the�hypothesized�mean�value�of�100,�in�either�
direction,�is�two�times�the�observed�probability�level�or�p�=�(2)(�0228)�= �0456��To�put�this�in�
133Introduction to Hypothesis Testing: Inferences About a Single Mean
the�context�of�the�values�in�this�example,�there�is�a�relatively�small�probability�(less�than�5%)�
of�observing�a�sample�mean�of�103�just�by�chance�if�the�true�population�mean�is�really 100��
As� this� exact� probability� (p� =� �0456)� is� smaller� than� our� level� of� significance� α� = �05,� we�
reject H0�� Thus,� there� are� two� approaches� to� dealing� with� probability�� One� approach� is� a�
decision�based�solely�on�the�critical�values��We�reject�or�fail�to�reject�H0�at�a�given�α�level,�
but�no�other�information�is�provided��The�other�approach�is�a�decision�based�on�compar-
ing�the�exact�probability�to�the�given�α�level��We�reject�or�fail�to�reject�H0�at�a�given�α�level,�
but�we�also�have�information�available�about�the�closeness�or�confidence�in�that�decision�
For�this�example,�the�findings�in�a�manuscript�would�be�reported�based�on�comparing�
the�p�value�to�alpha�and�reported�either�as�z�=�2�(p�<��05)�or�as�z�=�2�(p�=��0456)��(You�may�
want�to�refer�to�the�style�manual�relevant�to�your�discipline,�such�as�the�Publication Manual
for the American Psychological Association� (2010),� for� information� on� which� is� the� recom-
mended�reporting�style�)�Obviously�the�conclusion�is�the�same�with�either�approach;�it�is�
just�a�matter�of�how�the�results�are�reported��Most�statistical�computer�programs,�includ-
ing�SPSS,�report�the�exact�probability�so�that�the�reader�can�make�a�decision�based�on�their�
own� selected� level� of� significance�� These� programs� do� not� provide� the� critical� value(s),�
which�are�only�found�in�the�appendices�of�statistics�textbooks�
6.5.3 Constructing Confidence Intervals around the Mean
Recall�our�discussion�from�Chapter�5�on�CIs��CIs�are�often�quite�useful�in�inferential�sta-
tistics� for� providing� the� researcher� with� an� interval� estimate� of� a� population� parameter��
Although�the�sample�mean�gives�us�a�point�estimate�(i�e�,�just�one�value)�of�a�population�
mean,�a�CI�gives�us�an�interval�estimate�of�a�population�mean�and�allows�us�to�determine�
the�accuracy�or�precision�of�the�sample�mean��For�the�inferential�test�of�a�single�mean,�a�CI�
around�the�sample�mean�Y
–
�is�formed�from
� Y zcv Y± σ
where
zcv�is�the�critical�value�from�the�unit�normal�distribution
σY–�is�the�population�standard�error�of�the�mean
α/2
Critical
region
α/2
Critical
region
–1.96
z critical
value
+1.96
z critical
value
+2.00
z test
statistic
value
µ0
Hypothesized
value
FIGuRe 6.2
Critical�regions�for�example�
134 An Introduction to Statistical Concepts
CIs�are�typically�formed�for�nondirectional�or�two-tailed�tests�as�shown�in�the�equation��
A�CI�will�generate�a�lower�and�an�upper�limit��If�the�hypothesized�mean�value�falls�within�
the�lower�and�upper�limits,�then�we�would�fail�to�reject�H0��In�other�words,�if�the�hypoth-
esized�mean�is�contained�in�(or�falls�within)�the�CI�around�the�sample�mean,�then�we�con-
clude�that�the�sample�mean�and�the�hypothesized�mean�are�not�significantly�different�and�
that�the�sample�mean�could�have�come�from�a�population�with�the�hypothesized�mean��
If� the� hypothesized� mean� value� falls� outside� the� limits� of� the� interval,� then� we� would�
reject�H0��Here�we�conclude�that�it�is�unlikely�that�the�sample�mean�could�have�come�from�
a�population�with�the�hypothesized�mean�
One�way�to�think�about�CIs�is�as�follows��Imagine�we�take�100�random�samples�of�the�
same�sample�size�n,�compute�each�sample�mean,�and�then�construct�each�95%�CI��Then�we�
can�say�that�95%�of�these�CIs�will�contain�the�population�parameter�and�5%�will�not��In�
short,� 95%� of� similarly� constructed� CIs� will� contain� the� population� parameter�� It� should�
also� be� mentioned� that� at� a� particular� level� of� significance,� one� will� always� obtain� the�
same�statistical�decision�with�both�the�hypothesis�test�and�the�CI��The�two�procedures�use�
precisely� the� same� information�� The� hypothesis� test� is� based� on� a� point� estimate;� the� CI�
is�based�on�an�interval�estimate�providing�the�researcher�with�a�little�more�information�
For�the�ASU�example�situation,�the�95%�CI�would�be�computed�by
� Y zcv Y± = ± = ± =σ 103 1 96 1 5 103 2 94 100 06 105 94. ( . ) . ( . , . )
Thus,�the�95%�CI�ranges�from�100�06�to�105�94��Because�the�interval�does�not�contain�the�
hypothesized�mean�value�of�100,�we�reject�H0�(the�same�decision�we�arrived�at�by�walking�
through�the�steps�for�hypothesis�testing)��Thus,�it�is�quite�unlikely�that�our�sample�mean�
could�have�come�from�a�population�distribution�with�a�mean�of�100�
6.6 Type II Error (β) and Power (1 − β)
In� this� section,� we� complete� our� discussion� of� Type� II� error� (β)� and� power� (1� −� β)�� First�
we�return�to�our�rain�example�and�discuss�the�entire�decision-making�context��Then�we�
describe�the�factors�which�determine�power�
6.6.1 Full decision-Making Context
Previously,� we� defined� Type� II� error� as� the� probability� of� failing� to� reject� H0� when� H0� is�
really�false��In�other�words,�in�reality,�H0�is�false,�yet�we�made�a�decision�error�and�did�not�
reject�H0��The�probability�associated�with�a�Type�II�error�is�denoted�by��Power�is�a�related�
concept�and�is�defined�as�the�probability�of�rejecting�H0�when�H0�is�really�false��In�other�
words,�in�reality,�H0�is�false,�and�we�made�the�correct�decision�to�reject�H0��The�probability�
associated�with�power�is�denoted�by�1�−�β��Let�us�return�to�our�“rain”�example�to�describe�
Type�I�and�Type�II�errors�and�power�more�completely�
The�full�decision-making�context�for�the�“rain”�example�is�given�in�Figure�6�3��The�dis-
tribution�on�the�left-hand�side�of�the�figure�is�the�sampling�distribution�when�H0�is�true,�
meaning�in�reality�it�does�not�rain��The�vertical�line�represents�the�critical�value�for�decid-
ing� whether� to� carry� an� umbrella� or� not�� To� the� left� of� the� vertical� line,� we� do� not� carry�
an� umbrella,� and� to� the� right� side� of� the� vertical� line,� we� do� carry� an� umbrella�� For� the�
135Introduction to Hypothesis Testing: Inferences About a Single Mean
no-rain�sampling�distribution�on�the�left,�there�are�two�possibilities��First,�we�do�not�carry�
an�umbrella�and�it�does�not�rain��This�is�the�unshaded�portion�under�the�no-rain�sampling�
distribution�to�the�left�of�the�vertical�line��This is a correct decision,�and�the�probability�asso-
ciated�with�this�decision�is�1�−�α��Second,�we�do�carry�an�umbrella�and�it�does�not�rain��
This�is�the�shaded�portion�under�the�no-rain�sampling�distribution�to�the�right�of�the�verti-
cal�line��This is an incorrect decision,�a�Type�I�error,�and�the�probability�associated�with�this�
decision�is�α/2�in�either�the�upper�or�lower�tail,�and�α�collectively�
The�distribution�on�the�right-hand�side�of�the�figure�is�the�sampling�distribution�when�
H0� is� false,� meaning� in� reality,� it� does� rain�� For� the� rain� sampling� distribution,� there� are�
two�possibilities��First,�we�do�carry�an�umbrella�and�it�does�rain��This�is�the�unshaded�por-
tion�under�the�rain�sampling�distribution�to�the�right�of�the�vertical�line��This�is�a�correct
decision,�and�the�probability�associated�with�this�decision�is�1�−�β�or�power��Second,�we�do�
not�carry�an�umbrella�and�it�does�rain��This�is�the�shaded�portion�under�the�rain�sampling�
distribution�to�the�left�of�the�vertical�line��This�is�an�incorrect decision,�a�Type�II�error,�and�
the�probability�associated�with�this�decision�is�β�
As�a�second�illustration,�consider�again�the�example�intelligence�test�situation��This�situ-
ation�is� depicted� in� Figure� 6�4��The�distribution� on� the�left-hand� side�of�the�figure� is� the�
sampling�distribution�of�Y
–
�when�H0�is�true,�meaning�in�reality,�μ�=�100��The�distribution�on�
the�right-hand�side�of�the�figure�is�the�sampling�distribution�of�Y
–
�when�H1�is�true,�meaning�
in�reality,�μ�=�115�(and�in�this�example,�while�there�are�two�critical�values,�only�the�right�
tail�matters�as�that�relates�to�the�H1�sampling�distribution)��The�vertical�line�represents�the�
critical� value� for� deciding� whether� to� reject� the� null� hypothesis� or� not�� To� the� left� of� the�
vertical�line,�we�do�not�reject�H0�and�to�the�right�of�the�vertical�line,�we�reject�H0��For�the�H0�
is�true�sampling�distribution�on�the�left,�there�are�two�possibilities��First,�we�do�not�reject�
H0�and�H0�is�really�true��This is the unshaded portion under the H0�is true sampling distribution
to the left of the vertical line��This�is�a�correct�decision,�and�the�probability�associated�with�
this�decision�is�1�−�α��Second,�we�reject�H0�and�H0�is�true��This is the shaded portion under the
H0�is true sampling distribution to the right of the vertical line��This�is�an�incorrect�decision,�a�
Type II error
(got wet)
Do not carry
umbrella.
Correct
decision
Correct
decision
Do carry
umbrella.
Sampling
distribution when
H0 “No Rain”
is true.
Sampling
distribution when
H0 “No Rain”
is false.
Type I error
(did not need umbrella)
FIGuRe 6.3
Sampling�distributions�for�the�rain�case�
136 An Introduction to Statistical Concepts
Type�I�error,�and�the�probability�associated�with�this�decision�is�α/2�in�either�the�upper�or�
lower�tail,�and�α�collectively�
The�distribution�on�the�right-hand�side�of�the�figure�is�the�sampling�distribution�when�
H0�is�false,�and�in�particular,�when�H1:�μ�=�115�is�true��This�is�a�specific�sampling�distribu-
tion�when�H0�is�false,�and�other�possible�sampling�distributions�can�also�be�examined�(e�g�,�
μ�=�85,�110)��For�the�H1:�μ�=�115�is� true�sampling�distribution,� there�are� two�possibilities��
First,�we�do�reject�H0,�as�H0�is�really�false,�and�H1:�μ�=�115�is�really�true��This�is�the�unshaded�
portion�under�the�H1:�μ�=�115�is�true�sampling�distribution�to�the�right�of�the�vertical�line��
This� is� a� correct� decision,� and� the� probability� associated� with� this� decision� is� 1� −� β� or�
power��Second,�we�do�not�reject�H0,�H0�is�really�false,�and�H1:�μ�=�115�is�really�true��This�
is�the�shaded�portion�under�the�H1:�μ�=�115�is�true�sampling�distribution�to�the�left�of�the�
vertical�line��This�is�an�incorrect�decision,�a�Type�II�error,�and�the�probability�associated�
with�this�decision�is�β�
6.6.2 power determinants
Power�is�determined�by�five�different�factors:�(1)�level�of�significance,�(2)�sample�size,�(3)�popu-
lation�standard�deviation,�(4)�difference�between�the�true�population�mean�μ�and�the�hypoth-
esized�mean�value�μ0,�and�(5)�directionality�of�the�test�(i�e�,�one-�or�two-tailed�test)��Let�us�talk�
about�each�of�these�factors�in�more�detail�
First,�power�is�determined�by�the�level�of�significance�α��As�α�increases,�power�increases��
Thus,�if�α�increases�from��05�to��10,�then�power�will�increase��This�would�occur�in�Figure�6�4�
if�the�vertical�line�were�shifted�to�the�left�(thus�creating�a�larger�critical�region�and�thereby�
making� it� easier� to� reject� the� null� hypothesis)�� This� would� increase� the� α� level� and� also�
increase�power��This�factor�is�under�the�control�of�the�researcher�
Second,�power�is�determined�by�sample�size��As�sample�size�n�increases,�power�increases��
Thus,�if�sample�size�increases,�meaning�we�have�a�sample�that�consists�of�a�larger�propor-
tion�of�the�population,�this�will�cause�the�standard�error�of�the�mean�to�decrease,�as�there�
Type II error
(β)
Do not reject H0. Correct
decision
(1 – α)
Correct
decision
(1 – β)
Reject H0.
Sampling
distribution when
H0: µ = 100
is true.
Sampling
distribution when
H1: µ = 115
is true
(i.e., H0: µ = 100 is false).
Type I error
(α/2)
Type I error
(α/2)
Critical
value
Critical
value
FIGuRe 6.4
Sampling�distributions�for�the�intelligence�test�case�
137Introduction to Hypothesis Testing: Inferences About a Single Mean
is�less�sampling�error�with�larger�samples��This�would�also�result�in�the�vertical�line�being�
moved� to� the� left� (again� thereby� creating� a� larger� critical� region� and� thereby� making� it�
easier�to�reject�the�null�hypothesis)��This�factor�is�also�under�the�control�of�the�researcher��
In� addition,� because� a� larger� sample� yields� a� smaller� standard� error,� it� will� be� easier� to�
reject�H0�(all�else�being�equal),�and�the�CIs�generated�will�also�be�narrower�
Third,�power�is�determined�by�the�size�of�the�population�standard�deviation�σ��Although�
not� under� the� researcher’s� control,� as� σ� increases,� power� decreases�� Thus,� if� σ� increases,�
meaning�the�variability�in�the�population�is�larger,�this�will�cause�the�standard�error�of�the�
mean�to�increase�as�there�is�more�sampling�error�with�larger�variability��This�would�result�
in�the�vertical�line�being�moved�to�the�right��If�σ�decreases,�meaning�the�variability�in�the�
population�is�smaller,�this�will�cause�the�standard�error�of�the�mean�to�decrease�as�there�
is�less�sampling�error�with�smaller�variability��This�would�result�in�the�vertical�line�being�
moved�to�the�left��Considering,�for�example,�the�one-sample�mean�test,�the�standard�error�
of�the�mean�is�the�denominator�of�the�test�statistic�formula��When�the�standard�error�term�
decreases,�the�denominator�is�smaller�and�thus�the�test�statistic�value�becomes�larger�(and�
thereby�easier�to�reject�the�null�hypothesis)�
Fourth,�power�is�determined�by�the�difference�between�the�true�population�mean�μ�and�
the�hypothesized�mean�value�μ0��Although�not�always�under�the�researcher’s�control�(only�
in�true�experiments�as�described�in�Chapter�14),�as�the�difference�between�the�true�popula-
tion�mean�and�the�hypothesized�mean�value�increases,�power�increases��Thus,�if�the�differ-
ence�between�the�true�population�mean�and�the�hypothesized�mean�value�is�large,�it�will�
be� easier� to� correctly� reject� H0�� This� would� result� in� greater� separation� between� the� two�
sampling�distributions��In�other�words,�the�entire�H1�is�true�sampling�distribution�would�
be� shifted� to� the� right�� Consider,� for� example,� the� one-sample� mean� test�� The� numerator�
is�the�difference�between�the�means��The�larger�the�numerator�(holding�the�denominator�
constant),�the�more�likely�it�will�be�to�reject�the�null�hypothesis�
Finally,� power� is� determined� by� directionality� and� type� of� statistical� procedure—
whether� we� conduct� a� one-� or� a� two-tailed� test� as� well� as� the� type� of� test� of� inference��
There�is�greater�power�in�a�one-tailed�test,�such�as�when�μ�>�100,�than�in�a�two-tailed�test��
In�a�one-tailed�test,�the�vertical�line�will�be�shifted�to�the�left,�creating�a�larger�rejection�
region��This�factor�is�under�the�researcher’s�control��There�is�also�often�greater�power�in�
conducting�parametric�as�compared�to�nonparametric�tests�of�inference�(we�will�talk�more�
about� parametric� versus� nonparametric� tests� in� later� chapters)�� This� factor� is� under� the�
researcher’s�control�to�some�extent�depending�on�the�scale�of�measurement�of�the�variables�
and�the�extent�to�which�the�assumptions�of�parametric�tests�are�met�
Power� has� become� of� much� greater� interest� and� concern� to� the� applied� researcher� in�
recent�years��We�begin�by�distinguishing�between�a priori power,�when�power�is�deter-
mined�as�a�study�is�being�planned�or�designed�(i�e�,�prior�to�the�study),�and�post hoc power,�
when�power�is�determined�after�the�study�has�been�conducted�and�the�data�analyzed�
For�a�priori�power,�if�you�want�to�insure�a�certain�amount�of�power�in�a�study,�then�you�can�
determine�what�sample�size�would�be�needed�to�achieve�such�a�level�of�power��This�requires�
the�input�of�characteristics�such�as�α,�σ,�the�difference�between�μ�and�μ0,�and�one-�versus�two-
tailed�test��Alternatively,�one�could�determine�power�given�each�of�those�characteristics��This�
can� be� done� either� by� using� statistical� software� [such� as� Power� and� Precision,� Ex-Sample,�
G*Power�(freeware),�or�a�CD�provided�with�the�Murphy,�Myors,�and�Wolach�(2008)�text]�or�
by�using�tables�[the�most�definitive�collection�of�tables�being�in�Cohen�(1988)]�
For�post�hoc�power�(also�called�observed�power),�most�statistical�software�packages�(e�g�,�
SPSS,�SAS,�STATGRAPHICS)�will�compute�this�as�part�of�the�analysis�for�many�types�of�
inferential� statistics� (e�g�,� analysis� of� variance)�� However,� even� though� post� hoc� power� is�
138 An Introduction to Statistical Concepts
routinely�reported�in�some�journals,�it�has�been�found�to�have�some�flaws��For�example,�
Hoenig�and�Heisey�(2001)�concluded�that�it�should�not�be�used�to�aid�in�interpreting�non-
significant� results�� They� found� that� low� power� may� indicate� a� small� effect� (e�g�,� a� small�
mean�difference)�rather�than�an�underpowered�study��Thus,�increasing�sample�size�may�
not� make� much� of� a� difference�� Yuan� and� Maxwell� (2005)� found� that� observed� power� is�
almost�always�biased�(too�high�or�too�low),�except�when�true�power�is��50��Thus,�we�do�not�
recommend� the� sole� use� of� post� hoc� power� to� determine� sample� size� in� the� next� study;�
rather�it�is�recommended�that�CIs�be�used�in�addition�to�post�hoc�power��(An�example�pre-
sented�later�in�this�chapter�will�use�G*Power�to�illustrate�both�a�priori�sample�size�require-
ments�given�desired�power�and�post�hoc�power�analysis�)
6.7 Statistical Versus Practical Significance
We�have�discussed�the�inferential�test�of�a�single�mean�in�terms�of�statistical�significance��
However,�are�statistically�significant�results�always�practically�significant?�In�other�words,�
if�a�result�is�statistically�significant,�should�we�make�a�big�deal�out�of�this�result�in�a�practi-
cal�sense?�Consider�again�the�simple�example�where�the�null�and�alternative�hypotheses�
are�as�follows:
� H H0 000 00 0: 1 or : 1µ µ= − = �
� H H1 1: 1 or : 1µ µ≠ − ≠00 00 0
A�sample�mean�intelligence�test�score�of�Y
–
�=�101�is�observed�for�a�sample�size�of�n�=�2000�and�
a�known�population�standard�deviation�of�σY�=�15��If�we�perform�the�test�at�the��01�level�of�
significance,�we�find�we�are�able�to�reject�H0�even�though�the�observed�mean�is�only�1�unit�
away�from�the�hypothesized�mean�value��The�reason�is,�because�the�sample�size�is�rather�
large,�a�rather�small�standard�error�of�the�mean�is�computed�(σY–�=�0�3354),�and�we�thus�reject�
H0�as�the�test�statistic�(z�=�2�9815)�exceeds�the�critical�value�(z�=�2�5758)��Holding�the�mean�
and�standard�deviation�constant,�if�we�had�a�sample�size�of�200�instead�of�2000,�the�standard�
error�becomes�much�larger�(σY–�=�1�0607),�and�we�thus�fail�to�reject�H0�as�the�test�statistic�
(z�=�0�9428)�does�not�exceed�the�critical�value�(z�=�2�5758)��From�this�example,�we�can�see�how�
the�sample�size�can�drive�the�results�of�the�hypothesis�test,�and�how�it�is�possible�that�statisti-
cal�significance�can�be�influenced�simply�as�an�artifact�of�sample�size�
Should�we�make�a�big�deal�out�of�an�intelligence�test�sample�mean�that�is�1�unit�away�
from�the�hypothesized�mean�intelligence?�The�answer�is�“maybe�not�”�If�we�gather�enough�
sample� data,� any� small� difference,� no� matter� how� small,� can� wind� up� being� statistically�
significant�� Thus,� larger� samples� are� more� likely� to� yield� statistically� significant� results��
Practical�significance�is�not�entirely�a�statistical�matter��It�is�also�a�matter�for�the�substan-
tive� field� under� investigation�� Thus,� the� meaningfulness� of� a� small� difference� is� for� the�
substantive�area�to�determine��All�that�inferential�statistics�can�really�determine�is�statis-
tical� significance�� However,� we� should� always� keep� practical� significance� in� mind� when�
interpreting�our�findings�
In�recent�years,�a�major�debate�has�been�ongoing�in�the�statistical�community�about�the�
role�of�significance�testing��The�debate�centers�around�whether�null�hypothesis�significance�
testing�(NHST)�best�suits�the�needs�of�researchers��At�one�extreme,�some�argue�that�NHST�is�
139Introduction to Hypothesis Testing: Inferences About a Single Mean
fine�as�is��At�the�other�extreme,�others�argue�that�NHST�should�be�totally�abandoned��In�the�
middle,�yet�others�argue�that�NHST�should�be�supplemented�with�measures�of�effect�size��In�
this�text,�we�have�taken�the�middle�road�believing�that�more�information�is�a�better�choice�
Let�us�formally�introduce�the�notion�of�effect size��While�there�are�a�number�of�different�
measures�of�effect�size,�the�most�commonly�used�measure�is�Cohen’s�δ�(delta)�or�d�(1988)��
For�the�population�case�of�the�one-sample�mean�test,�Cohen’s�delta�is�computed�as�follows:
� δ
µ µ
σ
=
− 0
For�the�corresponding�sample�case,�Cohen’s�d�is�computed�as�follows:
� d
Y
s
=
− µ0
For�the�one-sample�mean�test,�d�indicates�how�many�standard�deviations�the�sample�mean�
is�from�the�hypothesized�mean��Thus,�if�d�=�1�0,�the�sample�mean�is�one�standard�deviation�
away� from� the� hypothesized� mean�� Cohen� has� proposed� the� following� subjective� stan-
dards�for�the�social�and�behavioral�sciences�as�a�convention�for�interpreting�d:�small�effect�
size,�d�=��2;�medium�effect�size,�d�=��5;�large�effect�size,�d�=��8��Interpretation�of�effect�size�
should�always�be�made�first�based�on�a�comparison�to�similar�studies;�what�is�considered�
a�“small”�effect�using�Cohen’s�rule�of�thumb�may�actually�be�quite�large�in�comparison�to�
other�related�studies�that�have�been�conducted��In�lieu�of�a�comparison�to�other�studies,�
such�as�in�those�cases�where�there�are�no�or�minimal�related�studies,�then�Cohen’s�subjec-
tive�standards�may�be�appropriate�
Computing�CIs�for�effect�sizes�is�also�valuable��The�benefit�in�creating�CIs�for�effect�size�
values�is�similar�to�that�of�creating�CIs�for�parameter�estimates—CIs�for�the�effect�size�pro-
vide�an�added�measure�of�precision�that�is�not�obtained�from�knowledge�of�the�effect�size�
alone�� Computing� CIs� for� effect� size� indices,� however,� is� not� as� straightforward� as� simply�
plugging�in�known�values�into�a�formula��This�is�because�d�is�a�function�of�both�the�popula-
tion�mean�and�population�standard�deviation�(Finch�&�Cumming,�2009)��Thus,�specialized�
software�must�be�used�to�compute�the�CIs�for�effect�sizes,�and�interested�readers�are�referred�
to�appropriate�sources�(e�g�,�Algina�&�Keselman,�2003;�Algina,�Keselman,�&�Penfield,�2005;�
Cumming�&�Finch,�2001)�
While�a�complete�discussion�of�these�issues�is�beyond�this�text,�further�information�on�
effect� sizes� can� be� seen� in� special� sections� of� Educational and Psychological Measurement�
(2001a;� 2001b)� and� Grissom� and� Kim� (2005),� while� additional� material� on� NHST� can� be�
viewed� in� Harlow,� Mulaik,� and� Steiger� (1997)� and� a� special� section� of� Educational and
Psychological Measurement� (2000,� October)�� Additionally,� style� manuals� (e�g�,� American�
Psychological�Association,�2010)�often�provide�useful�guidelines�on�reporting�effect�size�
6.8 Inferences About μ When σ Is Unknown
We�have�already�considered�the�inferential�test�involving�a�single�mean�when�the�popula-
tion�standard�deviation�σ�is�known��However,�rarely�is�σ�known�to�the�applied�researcher��
When�σ�is�unknown,�then�the�z�test�previously�discussed�is�no�longer�appropriate��In�this�
140 An Introduction to Statistical Concepts
section,�we�consider�the�following:�the�test�statistic�for�inferences�about�the�mean�when�the�
population�standard�deviation�is�unknown,�the�t�distribution,�the�t�test,�and�an�example�
using�the�t�test�
6.8.1 New Test Statistic t
What�is�the�applied�researcher�to�do�then�when�σ�is�unknown?�The�answer�is�to�estimate�
σ�by�the�sample�standard�deviation�s��This�changes�the�standard�error�of�the�mean�to�be
� s
s
nY
Y=
Now�we�are�estimating�two�population�parameters:�(1)�the�population�mean,�μY,�is�being�
estimated�by�the�sample�mean,�Y
–
;�and�(2)�the�population�standard�deviation,�σY,�is�being�
estimated�by�the�sample�standard�deviation,�sY��Both�Y
–
�and�sY�can�vary�from�sample�to�
sample��Thus,�although�the�sampling�error�of�the�mean�is�taken�into�account�explicitly�in�
the�z�test,�we�also�need�to�take�into�account�the�sampling�error�of�the�standard�deviation,�
which�the�z�test�does�not�at�all�consider��We�now�develop�a�new�inferential�test�for�the�
situation�where�σ�is�unknown��The�test�statistic�is�known�as�the�t�test�and�is�computed�
as�follows:
�
t
Y
sY
=
− µ0
The� t� test� was� developed� by� William� Sealy� Gossett,� also� known� by� the� pseudonym�
Student,�previously�mentioned�in�Chapter�1��The�unit�normal�distribution�cannot�be�used�
here� for� the� unknown� σ� situation�� A� different� theoretical� distribution� must� be� used� for�
determining�critical�values�for�the�t�test,�known�as�the�t�distribution�
6.8.2 t distribution
The�t�distribution�is�the�theoretical�distribution�used�for�determining�the�critical�values�of�
the�t�test��Like�the�normal�distribution,�the�t�distribution�is�actually�a�family�of�distribu-
tions�� There� is� a� different� t� distribution� for� each� value� of� degrees� of� freedom�� However,�
before�we�look�more�closely�at�the�t�distribution,�some�discussion�of�the�degrees of free-
dom�concept�is�necessary�
As�an�example,�say�we�know�a�sample�mean�Y
–
�=�6�for�a�sample�size�of�n�=�5��How�many�
of� those� five� observed� scores� are� free� to� vary?� The� answer� is� that� four� scores� are� free� to�
vary��If�the�four�known�scores�are�2,�4,�6,�and�8�and�the�mean�is�6,�then�the�remaining�score�
must�be�10��The�remaining�score�is�not�free�to�vary,�but�is�already�totally�determined��We�
see�this�in�the�following�equation�where,�to�arrive�at�a�solution�of�6,�the�sum�in�the�numera-
tor�must�equal�30,�and�Y5�must�be�10:
�
Y
Y
n
Y
Y
i
i
n
i
i= = =
+ + + +
== =
∑ ∑
1 1
5
5
5
2 4 6 8
5
6
Therefore,�the�number�of�degrees�of�freedom�is�equal�to�4�in�this�particular�case�and�n�−�1�
in�general��For�the�t�test�being�considered�here,�we�specify�the�degrees�of�freedom�as�
141Introduction to Hypothesis Testing: Inferences About a Single Mean
ν� =� n� −� 1� (ν� is� the� Greek� letter� “nu”)�� We� use� ν� often� in� statistics� to� denote� some� type� of�
degrees�of�freedom�
Another�way�to�think�about�degrees�of�freedom�is�that�we�know�the�sum�of�the�devia-
tions� from� the� mean� must� equal� 0� (recall� the� unsquared� numerator� of� the� variance� con-
ceptual�formula)��For�example,�if�n�=�10,�there�are�10�deviations�from�the�mean��Once�the�
mean�is�known,�only�nine�of�the�deviations�are�free�to�vary��A�final�way�to�think�about�this�
is�that,�in�general,�df�=�(n�−�number�of�restrictions)��For�the�one-sample�t�test,�because�the�
population�variance�is�unknown,�we�have�to�estimate�it�resulting�in�one�restriction��Thus,�
df�=�(n�−�1)�for�this�particular�inferential�test�
Several�members�of�the�family�of�t�distributions�are�shown�in�Figure�6�5��The�distribu-
tion�for�ν�=�1�has�thicker�tails�than�the�unit�normal�distribution�and�a�shorter�peak��This�
indicates�that�there�is�considerable�sampling�error�of�the�sample�standard�deviation�with�
only�two�observations�(as�ν�=�2�−�1�=�1)��For�ν�=�5,�the�tails�are�thinner�and�the�peak�is�
taller�than�for�ν�=�1��As�the�degrees�of�freedom�increase,�the�t�distribution�becomes�more�
nearly� normal�� For� ν� =� ∞� (i�e�,� infinity),� the� t� distribution� is� precisely� the� unit� normal�
distribution�
A� few� important� characteristics� of� the� t� distribution� are� worth� mentioning�� First,� like�
the�unit�normal�distribution,�the�mean�of�any�t�distribution�is�0,�and�the�t�distribution�is�
symmetric�around�the�mean�and�unimodal��Second,�unlike�the�unit�normal�distribution,�
which�has�a�variance�of�1,�the�variance�of�a�t�distribution�is�as�follows:
� σ
ν
ν
ν2
2
2=
−
>for
Thus,�the�variance�of�a�t�distribution�is�somewhat�greater�than�1�but�approaches�1�as�
ν�increases�
The�table�for�the�t�distribution�is�given�in�Table�A�2,�and�a�snapshot�of�the�table�is�pre-
sented�in�Figure�6�6�for�illustration�purposes��In�looking�at�the�table,�each�column�header�
has�two�values��The�top�value�is�the�significance�level�for�a�one-tailed�test,�denoted�by�α1��
Thus,�if�you�were�doing�a�one-tailed�test�at�the��05�level�of�significance,�you�want�to�look�in�
the�second�column�of�numbers��The�bottom�value�is�the�significance�level�for�a�two-tailed�
test,�denoted�by�α2��Thus,�if�you�were�doing�a�two-tailed�test�at�the��05�level�of�significance,�
you�want�to�look�in�the�third�column�of�numbers��The�rows�of�the�table�denote�the�various�
degrees�of�freedom�ν�
0.4
0.3
0.2
Re
la
tiv
e
fr
eq
ue
nc
y
0.1
0
–4 0
t
4
1
5
Normal
FIGuRe 6.5
Several�members�of�the�family�of�t�distributions�
142 An Introduction to Statistical Concepts
Thus,�if�ν�=�3,�meaning�n�=�4,�you�want�to�look�in�the�third�row�of�numbers��If�ν�=�3�for�
α1�=��05,�the�tabled�value�is�2�353��This�value�represents�the�95th�percentile�point�in�a�t�dis-
tribution�with�three�degrees�of�freedom��This�is�because�the�table�only�presents�the�upper�
tail�percentiles��As�the�t�distribution�is�symmetric�around�0,�the�lower�tail�percentiles�are�
the�same�values�except�for�a�change�in�sign��The�fifth�percentile�for�three�degrees�of�free-
dom�then�is�−2�353��Thus,�for�a�right-tailed�directional�hypothesis,�the�critical�value�will�be�
+2�353,�and�for�a�left-tailed�directional�hypothesis,�the�critical�value�will�be�−2�353�
If�ν�=�120�for�α1�=��05,�then�the�tabled�value�is�1�658��Thus,�as�sample�size�and�degrees�of�
freedom�increase,�the�value�of�t�decreases��This�makes�it�easier�to�reject�the�null�hypothesis�
when�sample�size�is�large�
6.8.3 t Test
Now�that�we�have�covered�the�theoretical�distribution�underlying�the�test�of�a�single�mean�
for�an�unknown�σ,�we�can�go�ahead�and�look�at�the�inferential�test��First,�the�null�and�alter-
native�hypotheses�for�the�t�test�are�written�in�the�same�fashion�as�for�the�z�test�presented�
earlier��Thus,�for�a�two-tailed�test,�we�have�the�same�notation�as�previously�presented:
� H H0 000 00 0: 1 or : 1µ µ= − =
� H H1 1: 1 or : 1µ µ≠ − ≠00 00 0 �
The�test�statistic�t�is�written�as�follows:
�
t
Y
sY
=
− µ0
In�order�to�use�the�theoretical�t�distribution�to�determine�critical�values,�we�must�assume�
that�Yi�∼�N(μ,�σ2)�and�that�the�observations�are�independent�of�each�other�(also�referred�to�
as�“independent�and�identically�distributed”�or�IID)��In�terms�of�the�distribution�of�scores�
on� Y,� in� other� words,� we� assume� that� the� population� of� scores� on� Y� is� normally� distrib-
uted�with�some�population�mean�μ�and�some�population�variance�σ2��The�most�important�
assumption�for�the�t�test�is�normality�of�the�population��Conventional�research�has�shown�
that� the� t� test� is� very� robust� to� nonnormality� for� a� two-tailed� test� except� for� very� small�
samples�(e�g�,�n�<�5)��The�t�test�is�not�as�robust�to�nonnormality�for�a�one-tailed�test,�even�
for�samples�as�large�as�40�or�more�(e�g�,�Noreen,�1989;�Wilcox,�1993)��Recall�from�Chapter�5�
on�the�central�limit�theorem�that�when�sample�size�increases,�the�sampling�distribution�of�
the�mean�becomes�more�nearly�normal��As�the�shape�of�a�population�distribution�may�be�
unknown,�conservatively�one�would�do�better�to�conduct�a�two-tailed�test�when�sample�
size�is�small,�unless�some�normality�evidence�is�available�
1ν = .10
1 = .20
.05
.10
.025
.050
.01
.02
.005
.010
.0025
.0050
.001
.002
.0005
.0010
1 3.078 6.314 12.706 31.821 63.657 127.32 318.31 636.62
2 1.886 2.920 4.303 6. 965 9.925 14.089 22.327 31.598
3 1.638 2.353 3.182 4.541 5.841 7.453 10.214 12.924
… … … … … … … … …
FIGuRe 6.6
Snapshot�of�t�distribution�table�
143Introduction to Hypothesis Testing: Inferences About a Single Mean
However,�recent�research�(e�g�,�Basu�&�DasGupta,�1995;�Wilcox,�1997,�2003)�suggests�that�
small�departures�from�normality�can�inflate�the�standard�error�of�the�mean�(as�the�stan-
dard�deviation�is�larger)��This�can�reduce�power�and�also�affect�control�over�Type�I�error��
Thus,�a�cavalier�attitude�about�ignoring�nonnormality�may�not�be�the�best�approach,�and�
if� nonnormality� is� an� issue,� other� procedures,� such� as� the� nonparametric� Kolmogorov–
Smirnov�one-sample�test,�may�be�considered��In�terms�of�the�assumption�of�independence,�
this�assumption�is�met�when�the�cases�or�units�in�your�sample�have�been�randomly�selected�
from� the� population�� Thus,� the� extent� to� which� this� assumption� is� met� is� dependent� on�
your�sampling�design��In�reality,�random�selection�is�often�difficult�in�education�and�the�
social�sciences�and�may�or�may�not�be�feasible�given�your�study�
The� critical� values� for� the� t� distribution� are� obtained� from� the� t� table� in� Table� A�2,�
where�you�take�into�account�the�α�level,�whether�the�test�is�one-�or�two-tailed,�and�the�
degrees�of�freedom�ν�=�n�−�1��If�the�test�statistic�falls�into�a�critical�region,�as�defined�by�
the�critical�values,�then�our�conclusion�is�to�reject�H0��If�the�test�statistic�does�not�fall�into�
a�critical�region,�then�our�conclusion�is�to�fail�to�reject�H0��For�the�t�test,�the�critical�values�
depend�on�sample�size,�whereas�for�the�z�test,�the�critical�values�do�not�
As�was�the�case�for�the�z�test,�for�the�t�test,�a�CI�for�μ0�can�be�developed��The�(1�−�α)%�CI�
is�formed�from
� Y t scv Y±
where�tcv�is�the�critical�value�from�the�t�table��If�the�hypothesized�mean�value�μ0�is�not�con-
tained�in�the�interval,�then�our�conclusion�is�to�reject�H0��If�the�hypothesized�mean�value�
μ0�is�contained�in�the�interval,�then�our�conclusion�is�to�fail�to�reject�H0��The�CI�procedure�
for�the�t�test�then�is�comparable�to�that�for�the�z�test�
6.8.4 example
Let�us�consider�an�example�of�the�entire�t�test�process��A�hockey�coach�wanted�to�determine�
whether� the� mean� skating� speed� of� his� team� differed� from� the� hypothesized� league� mean�
speed�of�12�seconds��The�hypotheses�are�developed�as�a�two-tailed�test�and�written�as�follows:
� H H0 0 0: 12 or : 12µ µ= − =
� H H1 1: 12 or : 12µ µ≠ − ≠ 0
Skating�speed�around�the�rink�was�timed�for�each�of�16�players�(data�are�given�in�Table�6�2�and�
on�the�website�as�chap6data)��The�mean�speed�of�the�team�was�Y
–
�=�10�seconds�with�a�standard�
deviation�of�sY�=�1�7889�seconds��The�standard�error�of�the�mean�is�then�computed�as�follows:
�
s
s
nY
Y= = =
1 7889
16
0 4472
.
.
We�wish�to�conduct�a�t�test�at�α�=��05,�where�we�compute�the�test�statistic�t�as
�
t
Y
sY
=
−
=
−
= −
µ0 10 12
0 4472
4 4722
.
.
144 An Introduction to Statistical Concepts
Table 6.2
SPSS�Output�for�Skating�Example
Raw data: 8, 12, 9, 7, 8, 10, 9, 11, 13.5, 8.5, 10.5, 9.5, 11.5, 12.5, 9.5, 10.5
One-Sample Statistics
N Mean Std. Deviation Std. Error Mean
Time 16 10.000 1.7889 .4472
One-Sample Test
Test Value = 12
95% Confidence Interval of the
Difference
t df Sig. (2-Tailed) Mean Difference Lower Upper
Time –4.472 15
“t” is the t test statistic
value.
.000 –2.0000 –2.953 –1.047
“Sig.” is the
observed p value.
It is interpreted as: there is
less than a 1% probability of
a sample mean of 10.00
occurring by chance if the null
hypothesis is really true
(i.e., if the population mean
is really 12).
The mean difference is
simply the difference
between the sample mean
value (in this case, 10)
and the hypothesized mean
value (in this example, 12).
In other words,
10 – 12 = –2.00
df are the degrees of freedom.
For the one sample t test, they
are calculated as n – 1
The table labeled “One-Sample
Statistics” provides basic
descriptive statistics for the sample.
The standard error of the
mean is:
sY
sY
n
=–
sY
1.7889= =–
16
0.4472
.4472
t
10 – 12= = –4.472
SPSS reports the 95% confidence
interval of the difference which
means that in 95% of sample CIs,
the true population mean
difference will fall between –2.953
and –1.047. It is computed as:
The 95% confidence interval of the
mean (although not provided by
SPSS) could also be calculated as:
–2.00 ± (2.131)(.4472)
± sYtcv
– –Ydifference
10 ± 2.131(0.4472) = 10 ± ( .9530)
[9.047, 10.953]
± sYtcv
––Y
sY–
t =
–
Y – μ0
=
145Introduction to Hypothesis Testing: Inferences About a Single Mean
We� turn� to� the�t� table� in�Table� A�2�and� determine� the�critical� values� based� on� α2�=��05�
and�ν�=�15�degrees�of�freedom��The�critical�values�are�+�2�131,�which�defines�the�upper�tail�
critical�region,�and�−2�131,�which�defines�the�lower�tail�critical�region��As�the�test�statistic t�
(i�e�, −4�4722)� falls� into� the� lower� tail� critical� region� (i�e�,� the� test� statistic� is� less� than� the�
lower� tail� critical� value),� our� decision� is� to� reject� H0� and� conclude� that� the� mean� skating�
speed�of�this�team�is�significantly�different�from�the�hypothesized�league�mean�speed�at�the�
�05�level�of�significance��A�95%�CI�can�be�computed�as�follows:
� Y t scv Y± = ± = ± =10 2 131 0 4472 10 9530 9 0470 10 9530. ( . ) (. ) ( . , . )
As�the�CI�does�not�contain�the�hypothesized�mean�value�of�12,�our�conclusion�is�to�again�
reject�H0��Thus,�there�is�evidence�to�suggest�that�the�mean�skating�speed�of�the�team�differs�
from�the�hypothesized�league�mean�speed�of�12�seconds�
6.9 SPSS
Here�we�consider�what�SPSS�has�to�offer�in�the�way�of�testing�hypotheses�about�a�single�
mean��As�with�most�statistical�software,�the�t�test�is�included�as�an�option�in�SPSS,�but�the�
z�test�is�not��Instructions�for�determining�the�one-sample�t�test�using�SPSS�are�presented�
first��This�is�followed�by�additional�steps�for�examining�the�normality�assumption�
One-Sample t Test
Step 1:�To�conduct�the�one-sample�t�test,�go�to�“Analyze”�in�the�top�pulldown�menu,�
then�select�“Compare Means,”�and�then�select�“One-Sample T Test.”�Following�the�
screenshot�(step�1)�as�follows�produces�the�“One-Sample T Test”�dialog�box�
A
B C
Step 1
146 An Introduction to Statistical Concepts
Step 2:�Next,�from�the�main�“One-Sample T Test”�dialog�box,�click�the�variable�of�
interest�from�the�list�on�the�left�(e�g�,�time),�and�move�it�into�the�“Test Variable”�box�by�
clicking�on�the�arrow�button��At�the�bottom�right�of�the�screen�is�a�box�for�“Test Value,”�
where�you�indicate�the�hypothesized�value�(e�g�,�12)�
Select the variable
of interest from
the list on the left
and use the arrow
to move to the
“Test Variable”
box on the right.
Clicking on
“Options” will
allow you to define a
confidence interval
percentage.
�e default is 95%
(corresponding to
an alpha of .05).
Step 2
Step 3 (Optional):�The�default�alpha�level�in�SPSS�is��05,�and,�thus,�the�default�cor-
responding�CI�is�95%��If�you�wish�to�test�your�hypothesis�at�an�alpha�level�other�than��05�
(and�thus�obtain�CIs�other�than�95%),�then�click�on�the�“Options”�button�located�in�the�
top�right�corner�of�the�main�dialog�box��From�here,�the�CI�percentage�can�be�adjusted�to�
correspond�to�the�alpha�level�at�which�your�hypothesis�is�being�tested��(For�purposes�of�
this�example,�the�test�has�been�generated�using�an�alpha�level�of��05�)
Step 3
The�one-sample�t�test�output�for�the�skating�example�is�provided�in�Table�6�2�
Using Explore to Examine Normality of Sample Distribution
Generating normality evidence:�As�alluded�to�earlier�in�the�chapter,�understanding�
the�distributional�shape�of�your�variable,�specifically�the�extent�to�which�normality�is�a�reason-
able�assumption,�is�important��In�earlier�chapters,�we�saw�how�we�could�use�the�“Explore”�
tool�in�SPSS�to�generate�a�number�of�useful�descriptive�statistics��In�conducting�our�one-sample�
t�test,�we�can�again�use�“Explore”�to�examine�the�extent�to�which�the�assumption�of�normal-
ity� is� met� for� our� sample� distribution�� As� the� general� steps� for� accessing�“Explore”� have�
147Introduction to Hypothesis Testing: Inferences About a Single Mean
been�presented�in�previous�chapters�(e�g�,�Chapter�4),�they�will�not�be�reiterated�here��After�the�
variable�of�interest�has�been�selected�and�moved�to�the�“Dependent List”�box�on�the�main�
“Explore”�dialog�box,�click�on�“Plots”�in�the�upper�right�corner��Place�a�checkmark�in�the�
boxes�for�“Normality plots with tests”�and�also�for “Histogram.”
Select the variable
of interest from
the list on the left
and use the arrow
to move to the
“Dependent
List” box on the
right. Then click
on “Plots.”
Generating normality
evidence
Interpreting normality evidence:� We� have� already� developed� a� good� under-
standing�of�how�to�interpret�some�forms�of�evidence�of�normality,�including�skewness�and�
kurtosis,�histograms,�and�boxplots��Using�our�hockey�data,�the�skewness�statistic�is��299�
and�kurtosis�is�−�483—both�within�the�range�of�an�absolute�value�of�2�0,�suggesting�some�
evidence�of�normality��The�histogram�also�suggests�relative�normality�
3
Histogram
2
1
Fr
eq
ue
nc
y
0
8.0 10.0
Time
12.0 14.0
Mean = 10.0
Std. dev. = 1.789
N = 16
148 An Introduction to Statistical Concepts
There�are�a�few�other�statistics�that�can�be�used�to�gauge�normality�as�well��Using�SPSS,�we�
can�obtain�two�statistical�tests�of�normality��The�Kolmogorov–Smirnov�(K–S)�(Chakravart,�
Laha,�&�Roy,�1967)�with�Lilliefors�significance�(Lilliefors,�1967)�and�the�Shapiro-Wilk�(S–W)�
(Shapiro�&�Wilk,�1965)�are�tests�that�provide�evidence�of�the�extent�to�which�our�sample�
distribution�is�statistically�different�from�a�normal�distribution��The�K–S�test�tends�to�be�
conservative,�whereas�the�S–W�test�is�usually�considered�the�more�powerful�of�the�two�for�
testing�normality�and�is�recommended�for�use�with�small�sample�sizes�(n�<�50)��Both�of�
these�statistics�are�generated�from�the�selection�of�“Normality plots with tests.”�
The�output�for�the�K–S�and�S–W�tests�is�presented�as�follows��As�we�have�learned�in�this�
chapter,�when�the�observed�probability�(i�e�,�p�value�which�is�reported�in�SPSS�as�“Sig�”)�is�
less�than�our�stated�alpha�level,�then�we�reject�the�null�hypothesis��We�follow�those�same�
rules�of�interpretation�here��Regardless�of�which�test�(K–S�or�S–W)�we�examine,�both�pro-
vide�the�same�evidence—our�sample�distribution�is�not�statistically�significantly�different�
than�what�would�be�expected�from�a�normal�distribution�
Time
a Lilliefors significance correction.
* This is a lower bound of the true significance.
.110 16 .200 .982 16 .978
Statistic Statisticdf dfSig.
Tests of Normality
Kolmogorov–Smirnova Shapiro–Wilk
Sig.
Quantile–quantile� (Q–Q)� plots� are� also� often� examined� to� determine� evidence� of� nor-
mality��Q–Q�plots�are�graphs�that�depict�quantiles�of�the�sample�distribution�to�quantiles�
of� the� theoretical� normal� distribution�� Points� that� fall� on� or� closely� to� the� diagonal� line�
suggest�evidence�of�normality��The�Q–Q�plot�of�our�hockey�skating�time�provides�another�
form�of�evidence�of�normality�
3
2
1
Ex
pe
ct
ed
n
or
m
al
0
–1
–2
6 8 10
Observed value
12 14
Normal Q–Q plot of time
149Introduction to Hypothesis Testing: Inferences About a Single Mean
The� detrended� normal� Q–Q� plot� shows� deviations� of� the� observed� values� from� the�
theoretical� normal� distribution�� Evidence� of� normality� is� suggested� when� the� points�
exhibit�little�or�no�pattern�around�0�(the�horizontal�line);�however�due�to�subjectivity�in�
determining�the�extent�of�a�pattern,�this�graph�can�often�be�difficult�to�interpret��Thus,�
in� many� cases,� you� may� wish� to� rely� more� heavily� on� the� other� forms� of� evidence� of�
normality�
0.4
0.3
0.2
0.1
D
ev
.
fr
om
n
or
m
al
0.0
–0.1
–0.2
8 10
Observed value
12 14
Detrended normal Q–Q plot of time
6.10 G*Power
In�our�discussion�of�power�presented�earlier�in�this�chapter,�we�indicated�that�the�sample�
size� to� achieve� a� desired� level� of� power� can� be� determined� a� priori� (before� the� study� is�
conducted),�and�observed�power�can�also�be�determined�post�hoc�(after�the�study�is�con-
ducted)�using�statistical�software�or�power�tables��One�freeware�program�for�calculating�
power� is� G*Power� (http://www�psycho�uni-duesseldorf�de/abteilungen/aap/gpower3/),�
which� can� be� used� to� compute� both� a� priori� sample� size� and� post� hoc� power� analyses�
(among�other�things)��Using�the�results�of�the�one-sample�t�test�just�conducted,�let�us�uti-
lize�G*Power�to�first�determine�the�required�sample�size�given�various�estimated�param-
eters�and�then�compute�the�post�hoc�power�of�our�test�
A Priori Sample Size Using G*Power
Step 1 (A priori sample size):�As�seen�in�step�1,�there�are�several�decisions�that�
need�to�be�made�from�the�initial�G*Power�screen��First,�the�correct�test�family�needs�to�be�
selected��In�our�case,�we�conducted�a�one-sample�t�test;�therefore,�the�default�selection�of�
http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/),
150 An Introduction to Statistical Concepts
“t tests”�is�the�correct�test�family��Next,�we�need�to�select�the�appropriate�statistical�test��
The�default�is�“Correlation: Point biserial model.”�This�is�not�the�correct�option�
for� us,� and� so� we� use� the� arrow� to� toggle� to�“Means: Difference from constant
(one sample case).”
�e default selection
for “Test Family”
is “t tests.”
�e default selection for “Statistical Test”
is “Correlation: Point biserial
model.” Use the arrow to toggle to the desired
statistical test.
For the one sample t test, we need “Means:
Difference from constant
(one sample case).”
Step 1
�is is the test
needed for a one
sample t test.
Step 2 (A priori sample size):�The�“Type of Power Analysis”�desired�then�
needs� to� be� selected�� The� default� is“� A� priori:� Compute� required� sample� size—given� α,�
power,�and�effect�size�”�For�this�illustration,�we�will�first�conduct�an�example�of�comput-
ing� the� a� priori� sample� size� (i�e�,� the� default� option),� and� then�we� will�compute� post�hoc�
power��Although�we�do�not�illustrate�the�use�of�these�here,�we�see�that�there�are�also�three�
additional�forms�of�power�analysis�that�can�be�conducted�using�G*Power:�(1)�compromise,�
(2)�criterion,�and�(3)�sensitivity�
151Introduction to Hypothesis Testing: Inferences About a Single Mean
The default selection for “Type of Power
Analysis” is “A priori: Compute
required sample size–given ,
power, and effect size.”
Step 2
Step 3 (A priori sample size):� The� “Input Parameters”� must� then� be�
specified�� The� first� parameter� is� the� selection� of� whether� your� test� is� one-tailed� (i�e�,�
directional)� or� two-tailed� (i�e�,� nondirectional)�� In� this� example,� we� have� a� two-tailed�
test,� so� we� use� the� arrow� to� toggle�“Tails”� to�“Two.”� For� a� priori� power,� we� have�
to� indicate� the� anticipated� effect� size�� Your� best� estimate� of� the� effect� size� you� can�
anticipate� achieving� is� usually� to� rely� on� previous� studies� that� have� been� conducted�
that� are� similar� to� yours�� In� G*Power,� the� default� effect� size� is� d� =� �50�� For� purposes�
of� this� illustration,� let� us� use� the� default�� The� alpha� level� must� also� be� defined�� The�
default� significance� level� in� G*Power� is� �05,� which� is� the� alpha� level� we� will� be� using�
for�our�example��The�desired�level�of�power�must�also�be�defined��The�G*Power�default�
for� power� is� �95�� Many� researchers� in� education� and� the� behavioral� sciences� indicate�
that�a�desired�power�of��80�or�above�is�usually�desired��Thus,��95�may�be�higher�than�
152 An Introduction to Statistical Concepts
what� many� would� consider� sufficient� power�� For� purposes� of� this� example,� however,�
we�will�use�the�default�power�of��95��Once�the�parameters�are�specified,�simply�click�on�
“Calculate”�to�generate�the�a�priori�power�statistics�
Once the
parameters are
specified, click on
“Calculate.”
Step 3
The “Input Parameters” to determine a
prior sample size must be specified including:
1. One versus two tailed test;
2. Anticipated effect size;
3. Alpha level; and
4. Desired power.
Step 4 (A priori sample size):�The�“Output Parameters”�provide�the�relevant�
statistics�given�the�input�specified��In�this�example,�we�were�interested�in�determining�the�
a�priori�sample�size�given�a�two-tailed�test,�with�an�anticipated�effect�size�of��50,�an�alpha�
level�of��05,�and�desired�power�of��95��Based�on�those�criteria,�the�required�sample�size�for�
our�one-sample�t�test�is�54��In�other�words,�if�we�have�a�sample�size�of�54�individuals�or�
cases�in�our�study,�testing�at�an�alpha�level�of��05,�with�a�two-tailed�test,�and�achieving�a�
moderate�effect�size�of��50,�then�the�power�of�our�test�will�be��95—the�probability�of�reject-
ing�the�null�hypothesis�when�it�is�really�false�will�be�95%�
153Introduction to Hypothesis Testing: Inferences About a Single Mean
Step 4
The “Output Parameters” provide the
relevant statistics given the input specified.
In this case, we were interested in
determining the required sample size given
various parameters. Based on the
parameters specified, we need a sample size
of 54 for our one sample t test.
If�we�had�anticipated�a�smaller�effect�size,�say��20,�but�left�all�of�the�other�input�parameters�
the�same,�the�required�sample�size�needed�to�achieve�a�power�of��95�increases�greatly—to�327�
If a small e�ect is anticipated, the
needed sample size increases greatly
to achieve the desired power.
Post Hoc Power Using G*Power
Now,�let�us�use�G*Power�to�compute�post�hoc�power��Step�1,�as�presented�earlier�for�a�priori�
power,�remains�the�same;�thus,�we�will�start�from�step�2�
154 An Introduction to Statistical Concepts
Step 2 (Post hoc power):�The�“Type of Power Analysis”�desired�then�needs�to�
be�selected��The�default�is�“A�priori:�Compute�required�sample�size—given�α,�power,�and�
effect�size�”�To�compute�post�hoc�power,�we�need�to�select�“Post�hoc:�Compute�achieved�
power—given�α,�sample�size,�and�effect�size�”
Step 3 (Post hoc power):�The�“Input Parameters”�must�then�be�specified��The�
first�parameter�is�the�selection�of�whether�your�test�is�one-tailed�(i�e�,�directional)�or�two-
tailed�(i�e�,�nondirectional)��In�this�example,�we�have�a�two-tailed�test�so�we�use�the�arrow�
to�toggle�to�“Tails”�to�“Two.”�The�achieved�or�observed�effect�size�was�−1�117��The�alpha�
level� we� tested� at� was� �05,� and� the� actual� sample� size� was� 16�� Once� the� parameters� are�
specified,�simply�click�on�“Calculate”�to�generate�the�achieved�power�statistics�
Step 4 (Post hoc power):�The�“Output Parameters”�provide�the�relevant�statis-
tics�given�the�input�specified��In�this�example,�we�were�interested�in�determining�post�hoc�
power�given�a�two-tailed�test,�with�an�observed�effect�size�of�−1�117,�an�alpha�level�of��05,�and�
sample�size�of�16��Based�on�those�criteria,�the�post�hoc�power�was��96��In�other�words,�with�a�
sample�size�of�16�skaters�in�our�study,�testing�at�an�alpha�level�of��05,�with�a�two-tailed�test,�
and�observing�a�large�effect�size�of�−1�117,�then�the�power�of�our�test�was��96—the�probability�
of�rejecting�the�null�hypothesis�when�it�is�really�false�will�be�96%,�an�excellent�level�of�power��
Keep�in�mind�that�conducting�power�analysis�a�priori�is�highly�recommended�so�that�you�
avoid�a�situation�where,�post�hoc,�you�find�that�the�sample�size�was�not�sufficient�to�reach�the�
desired�power�(given�the�observed�effect�size�and�alpha�level)�
The “Input Parameters” must be
specified including:
Once the
parameters are
specified, click on
“Calculate.”
Steps 2–4
1. One versus two tailed test;
2. Actual effect size (for post hoc power);
3. Alpha level; and
4. Total sample size.
155Introduction to Hypothesis Testing: Inferences About a Single Mean
6.11 Template and APA-Style Write-Up
Let� us� revisit� our� graduate� research� assistant,� Marie,� who� was� working� with� Oscar,� a�
local� hockey� coach,� to� assist� in� analyzing� his� team’s� data�� As� a� reminder,� her� task� was�
to� assist� Oscar� in� generating� the� test� of� inference� to� answer� his� research� question,� “Is
the mean skating speed of our hockey team different from the league mean speed of 12 seconds”?�
Marie�suggested�a�one-sample�test�of�means�as�the�test�of�inference��A�template�for�writ-
ing�a�research�question�for�a�one-sample�test�of�inference�(i�e�,�one-sample�t�test)�is�pre-
sented�as�follows:
Is the mean of [sample variable] different from [hypothesized mean
value]?
It�may�be�helpful�to�preface�the�results�of�the�one-sample�t�test�with�the�information�we�
gathered�to�examine�the�extent�to�which�the�assumption�of�normality�was�met��This�assists�
the�reader�in�understanding�that�you�were�thorough�in�data�screening�prior�to�conducting�
the�test�of�inference�
The distributional shape of skating speed was examined to determine
the extent to which the assumption of normality was met. Skewness
(.299, SE = .564), kurtosis (−.483, SE = 1.091), and the Shapiro-Wilk test
of normality (S-W = .982, df = 16, p = .978) suggest that normality is a
reasonable assumption. Visually, a relatively bell-shaped distribution
displayed in the histogram (reflected similarly in the boxplot) as
well as a Q–Q plot with points adhering closely to the diagonal line
also suggest evidence of normality. Additionally, the boxplot did not
suggest the presence of any potential outliers. These indices suggest
evidence that the assumption of normality was met.
An� additional� assumption� of� the� one-sample� t� test� is� the� assumption� of� independence��
This�assumption�is�met�when�the�cases�in�our�sample�have�been�randomly�selected�from�
the� population�� This� is� an� often� overlooked,� but� important,� assumption� for� researchers�
when� presenting� the� results� of� their� test�� One� or� two� sentences� are� usually� sufficient� to�
indicate�if�this�assumption�was�met�
Because the skaters in this sample represented a random sample, the
assumption of independence was met.
It�is� also� desirable� to�include� a� measure� of� effect� size�� Recall� our� formula�for� computing�
the�effect�size,�d,�presented�earlier�in�the�chapter��Plugging�in�the�values�for�our�skating�
example,�we�find�an�effect�size�of�−1�117,�interpreted�according�to�Cohen’s�(1988)�guidelines�
as�a�large�effect:
�
d
Y
s
=
−
=
−
= −
µ0 10 12
1 7889
1 117
.
.
Remember�that�for�the�one-sample�mean�test,�d�indicates�how�many�standard�deviations�
the�sample�mean�is�from�the�hypothesized�mean��Thus,�with�an�effect�size�of�−1�117,�there�
156 An Introduction to Statistical Concepts
are�nearly�one�and�one-quarter�standard�deviation�units�between�our�sample�mean�and�
the�hypothesized�mean��The�negative�sign�simply�indicates�that�our�sample�mean�was�the�
smaller� mean� (as� it� is� the� first� value� in� the� numerator� of� the� formula)�� In� this� particular�
example,� the� negative� effect� is� desired� as� it� suggests� the� team’s� average� skating� time� is�
quicker�than�the�league�mean�
Here�is�an�example�APA-style�paragraph�of�results�for�the�skating�data�(remember�that�
this�will�be�prefaced�by�the�paragraph�reporting�the�extent�to�which�the�assumptions�of�
the�test�were�met)�
A one-sample t test was conducted at an alpha level of .05 to answer
the research question: Is the mean skating speed of a hockey team dif-
ferent from the league mean speed of 12 seconds? The null hypothesis
stated that the team mean speed would not differ from the league mean
speed of 12. The alternative hypothesis stated that the team average
speed would differ from the league mean. As depicted in Table 6.2,
based on a random sample of 16 skaters, there was a mean time of 10
seconds, and a standard deviation of 1.7889 seconds. When compared
against the hypothesized mean of 12 seconds, the one-sample t test was
shown to be statistically significant (t = −4.472, df = 15, p < .001).
Therefore, the null hypothesis that the team average time would be
12 seconds was rejected. This provides evidence to suggest that the
sample mean skating time for this particular team was statistically
different from the hypothesized mean skating time of the league.
Additionally, the effect size d was −1.117, generally interpreted as a
large effect (Cohen, 1988), and indicating that there is more than a
one standard deviation difference between the team and league mean
skating times. The post hoc power of the test, given the sample size,
two-tailed test, alpha level, and observed effect size, was .96.
6.12 Summary
In� this� chapter,� we� considered� our� first� inferential� testing� situation,� testing� hypotheses�
about�a�single�mean��A�number�of�topics�and�new�concepts�were�discussed��First,�we�intro-
duced�the�types�of�hypotheses�utilized�in�inferential�statistics,�that�is,�the�null�or�statistical�
hypothesis�versus�the�scientific�or�alternative�or�research�hypothesis��Second,�we�moved�
on�to�the�types�of�decision�errors�(i�e�,�Type�I�and�Type�II�errors)�as�depicted�by�the�deci-
sion�table�and�illustrated�by�the�rain�example��Third,�the�level�of�significance�was�intro-
duced� as� well� as� the� types� of� alternative� hypotheses� (i�e�,� nondirectional� vs�� directional�
alternative�hypotheses)��Fourth,�an�overview�of�the�steps�in�the�decision-making�process�
of� inferential� statistics� was� given�� Fifth,� we� examined� the� z� test,� which� is� the� inferential�
test�about�a�single�mean�when�the�population�standard�deviation�is�known��This�was�fol-
lowed� by� a� more� formal� description� of� Type� II� error� and� power�� We� then� discussed� the�
notion�of�statistical�significance�versus�practical�significance��Finally,�we�considered�the�t�
test,�which�is�the�inferential�test�about�a�single�mean�when�the�population�standard�devia-
tion�is�unknown,�and�then�completed�the�chapter�with�an�example,�SPSS�information,�a�
G*Power�illustration,�and�an�APA-style�write-up�of�results��At�this�point,�you�should�have�
157Introduction to Hypothesis Testing: Inferences About a Single Mean
met�the�following�objectives:�(a)�be�able�to�understand�the�basic�concepts�of�hypothesis�testing,�
(b)�be�able�to�utilize�the�normal�and�t�tables,�and�(c)�be�able�to�understand,�determine,�and�
interpret�the�results�from�the�z�test,�t test,�and�CI�procedures��Many�of�the�concepts�in�this�
chapter�carry�over�into�other�inferential�tests��In�the�next�chapter,�we�discuss�inferential�
tests�involving�the�difference�between�two�means��Other�inferential�tests�will�be�consid-
ered�in�subsequent�chapters�
Problems
Conceptual problems
6.1� In� hypothesis� testing,� the� probability� of� failing� to� reject� H0� when� H0� is� false� is�
denoted by
� a�� α
� b�� 1�−�α
� c�� β
� d�� 1�−�β
6.2� The� probability� of� observing� the� sample� mean� (or� some� value� greater� than� the�
sample� mean)� by� chance� if� the� null� hypothesis� is� really� true� is� which� one� of� the�
following?
� a�� α
� b�� Level�of�significance
� c�� p�value
� d�� Test�statistic�value
6.3� When�testing�the�hypotheses�presented�in�the�following,�at�a��05�level�of�significance�
with�the�t�test,�where�is�the�rejection�region?
�
H
H
0
1
100
100
:
:
µ
µ
=
<
� a�� The�upper�tail
� b�� The�lower�tail
� c�� Both�the�upper�and�lower�tails
� d�� Cannot�be�determined
6.4� A�research�question�asks,�“Is�the�mean�age�of�children�who�enter�preschool�different�
from�48�months”?�Which�one�of�the�following�is�implied?
� a�� Left-tailed�test
� b�� Right-tailed�test
� c�� Two-tailed�test
� d�� Cannot�be�determined�based�on�this�information
158 An Introduction to Statistical Concepts
6.5� The�probability�of�making�a�Type�II�error�when�rejecting�H0�at�the��05�level�of�signifi-
cance�is�which�one�of�the�following?
� a�� 0
� b�� �05
� c�� Between��05�and��95
� d�� �95
6.6� If�the�90%�CI�does�not�include�the�value�for�the�parameter�being�estimated�in�H0,�then�
which�one�of�the�following�is�a�correct�statement?
� a�� H0�cannot�be�rejected�at�the��10�level�
� b�� H0�can�be�rejected�at�the��10�level�
� c�� A�Type�I�error�has�been�made�
� d�� A�Type�II�error�has�been�made�
6.7� Other�things�being�equal,�which�of�the�values�of�t�given�next�is�least�likely�to�result�
when�H0�is�true,�for�a�two-tailed�test?
� a�� 2�67
� b�� 1�00
� c�� 0�00
� d�� −1�96
� e�� −2�70
6.8� The�fundamental�difference�between�the�z�test�and�the�t�test�for�testing�hypotheses�
about�a�population�mean�is�which�one�of�the�following?
� a�� Only�z�assumes�the�population�distribution�be�normal�
� b�� z�is�a�two-tailed�test,�whereas�t�is�one-tailed�
� c�� Only�t�becomes�more�powerful�as�sample�size�increases�
� d�� Only�z�requires�the�population�variance�be�known�
6.9� If�one�fails�to�reject�a�true�H0,�one�is�making�a�Type�I�error��True�or�false?
6.10� Which�one�of�the�following�is�a�correct�interpretation�of�d?
� a�� Alpha�level
� b�� CI
� c�� Effect�size
� d�� Observed�probability
� e�� Power
6.11� A�one-sample�t�test�is�conducted�at�an�alpha�level�of��10��The�researcher�finds�a�p�value�
of��08�and�concludes�that�the�test�is�statistically�significant��Is�the�researcher�correct?
6.12� When�testing�the�following�hypotheses�at�the��01�level�of�significance�with�the�t�test,�a�
sample�mean�of�301�is�observed��I�assert�that�if�I�calculate�the�test�statistic�and�compare�it�
to�the�t�distribution�with�n�−�1�degrees�of�freedom,�it�is�possible�to�reject�H0��Am�I�correct?
�
H
H
0
1
295
295
:
:
µ
µ
=
<
159Introduction to Hypothesis Testing: Inferences About a Single Mean
6.13� If�the�sample�mean�exceeds�the�hypothesized�mean�by�200�points,�I�assert�that�H0�can�
be�rejected��Am�I�correct?
6.14� I� assert� that� H0� can� be� rejected� with� 100%� confidence� if� the� sample� consists� of� the�
entire�population��Am�I�correct?
6.15� I� assert� that� the� 95%� CI� has� a� larger� width� than� the� 99%� CI� for� a� population� mean�
using�the�same�data��Am�I�correct?
6.16� I�assert�that�the�critical�value�of�z,�for�a�test�of�a�single�mean,�will�increase�as�sample�
size�increases��Am�I�correct?
6.17� The� mean� of� the� t� distribution� increases� as� degrees� of� freedom� increase?� True� or�
false?
6.18� It�is�possible�that�the�results�of�a�one-sample�t�test�and�for�the�corresponding�CI�will�
differ�for�the�same�dataset�and�level�of�significance��True�or�false?
6.19� The�width�of�the�95%�CI�does�not�depend�on�the�sample�mean��True�or�false?
6.20� The�null�hypothesis�is�a�numerical�statement�about�which�one�of�the�following?
� a�� An�unknown�parameter
� b�� A�known�parameter
� c�� An�unknown�statistic
� d�� A�known�statistic
Computational problems
6.1� Using�the�same�data�and�the�same�method�of�analysis,�the�following�hypotheses�are�
tested� about� whether� mean� height� is� 72� inches�� Researcher� A� uses� the� �05� level� of�
significance,�and�Researcher�B�uses�the��01�level�of�significance:
�
H
H
0
1
72
72
:
:
µ
µ
=
≠
� a�� If�Researcher�A�rejects�H0,�what�is�the�conclusion�of�Researcher�B?
� b�� If�Researcher�B�rejects�H0,�what�is�the�conclusion�of�Researcher�A?
� c�� If�Researcher�A�fails�to�reject�H0,�what�is�the�conclusion�of�Researcher�B?
� d�� If�Researcher�B�fails�to�reject�H0,�what�is�the�conclusion�of�Researcher�A?
6.2� Give�a�numerical�value�for�each�of�the�following�descriptions�by�referring�to�the�
t�table�
� a�� The�percentile�rank�of�t5�=�1�476
� b�� The�percentile�rank�of�t10�=�3�169
� c�� The�percentile�rank�of�t21�=�2�518
� d�� The�mean�of�the�distribution�of�t23
� e�� The�median�of�the�distribution�of�t23
� f�� The�variance�of�the�distribution�of�t23
� g�� The�90th�percentile�of�the�distribution�of�t27
160 An Introduction to Statistical Concepts
6.3� Give�a�numerical�value�for�each�of�the�following�descriptions�by�referring�to�the�
t�table�
� a�� The�percentile�rank�of�t5�=�2�015
� b�� The�percentile�rank�of�t20�=�1�325
� c�� The�percentile�rank�of�t30�=�2�042
� d�� The�mean�of�the�distribution�of�t10
� e�� The�median�of�the�distribution�of�t10
� f�� The�variance�of�the�distribution�of�t10
� g�� The�95th�percentile�of�the�distribution�of�t14
6.4� The� following� random� sample� of� weekly� student� expenses� is� obtained� from� a�
normally� distributed� population� of� undergraduate� students� with� unknown�
parameters:
68 56 76 75 62 81 72 69 91 84
49 75 69 59 70 53 65 78 71 87
71 74 69 65 64
� a�� Test�the�following�hypotheses�at�the��05�level�of�significance:
�
H
H
0
1
74
74
:
:
µ
µ
=
≠
� b�� Construct�a�95%�CI�
6.5� The�following�random�sample�of�hours�spent�per�day�answering�e-mail�is�obtained�
from�a�normally�distributed�population�of�community�college�faculty�with�unknown�
parameters:
2 3�5 4 1�25 2�5 3�25 4�5 4�25 2�75 3�25
1�75 1�5 2�75 3�5 3�25 3�75 2�25 1�5 1�25 3�25
� a�� Test�the�following�hypotheses�at�the��05�level�of�significance:
�
H
H
0
1
3 0
3 0
:
:
µ
µ
=
≠
.
.
� b�� Construct�a�95%�CI�
161Introduction to Hypothesis Testing: Inferences About a Single Mean
6.6� In�the�population,�it�is�hypothesized�that�flags�have�a�mean�usable�life�of�100�days��
Twenty-five�flags�are�flown�in�the�city�of�Tuscaloosa�and�are�found�to�have�a�sample�
mean�usable�life�of�200�days�with�a�standard�deviation�of�216�days��Does�the�sample�
mean�in�Tuscaloosa�differ�from�that�of�the�population�mean?
� a�� Conduct�a�two-tailed�t�test�at�the��01�level�of�significance�
� b�� Construct�a�99%�CI�
Interpretive problems
6.1� Using�item�7�from�the�survey�1�dataset�accessible�from�the�website,�use�SPSS�to�con-
duct� a� one-sample� t� test� to� determine� whether� the� mean� number� of� compact� disks�
owned�is�significantly�different�from�25,�at�the��05�level�of�significance��Test�for�the�
extent�to�which�the�assumption�of�normality�has�been�met��Calculate�an�effect�size�as�
well�as�post�hoc�power��Then�write�an�APA-style�paragraph�reporting�your�results�
6.2� Using� item� 14� from� the� survey� 1� dataset� accessible� from� the� website,� use� SPSS� to�
conduct�a�one-sample�t�test�to�determine�whether�the�mean�number�of�hours�slept�
is�significantly�different�from�8,�at�the��05�level�of�significance��Test�for�the�extent�to�
which�the�assumption�of�normality�has�been�met��Calculate�an�effect�size�as�well�as�
post�hoc�power��Then�write�an�APA-style�paragraph�reporting�your�results�
163
7
Inferences About the Difference Between Two Means
Chapter Outline
7�1� New�Concepts
7�1�1� Independent�Versus�Dependent�Samples
7�1�2� Hypotheses
7�2� Inferences�About�Two�Independent�Means
7�2�1� Independent�t�Test
7�2�2� Welch�t′�Test
7�2�3� Recommendations
7�3� Inferences�About�Two�Dependent�Means
7�3�1� Dependent�t�Test
7�3�2� Recommendations
7�4� SPSS
7�5� G*Power
7�6� Template�and�APA-Style�Write-Up
Key Concepts
� 1�� Independent�versus�dependent�samples
� 2�� Sampling�distribution�of�the�difference�between�two�means
� 3�� Standard�error�of�the�difference�between�two�means
� 4�� Parametric�versus�nonparametric�tests
In�Chapter�6,�we�introduced�hypothesis�testing�and�ultimately�considered�our�first�inferen-
tial�statistic,�the�one-sample�t�test��There�we�examined�the�following�general�topics:�types�
of�hypotheses,�types�of�decision�errors,�level�of�significance,�steps�in�the�decision-making�
process,�inferences�about�a�single�mean�when�the�population�standard�deviation�is�known�
(the� z� test),� power,� statistical� versus� practical� significance,� and� inferences� about� a� single�
mean�when�the�population�standard�deviation�is�unknown�(the�t�test)�
In�this�chapter,�we�consider�inferential�tests�involving�the�difference�between�two�means��
In�other�words,�our�research�question�is�the�extent�to�which�two�sample�means�are�statis-
tically�different�and,�by�inference,�the�extent�to�which�their�respective�population�means�
are�different��Several�inferential�tests�are�covered�in�this�chapter,�depending�on�whether�
164 An Introduction to Statistical Concepts
the� two� samples� are� selected� in� an� independent� or� dependent� manner,� and� on� whether�
the�statistical�assumptions�are�met��More�specifically,�the�topics�described�include�the�fol-
lowing�inferential�tests:�for�two�independent�samples—the�independent�t�test,�the�Welch�
t′�test,�and�briefly�the�Mann–Whitney–Wilcoxon�test;�and�for�two�dependent�samples—the�
dependent�t�test�and�briefly�the�Wilcoxon�signed�ranks�test��We�use�many�of�the�founda-
tional�concepts�previously�covered�in�Chapter�6��New�concepts�to�be�discussed�include�the�
following:� independent� versus� dependent� samples,� the� sampling� distribution� of� the� dif-
ference�between�two�means,�and�the�standard�error�of�the�difference�between�two�means��
Our�objectives�are�that�by�the�end�of�this�chapter,�you�will�be�able�to�(a)�understand�the�
basic�concepts�underlying�the�inferential�tests�of�two�means,�(b)�select�the�appropriate�test,�
and�(c)�determine�and�interpret�the�results�from�the�appropriate�test�
7.1 New Concepts
Remember� Marie,� our� very� capable� educational� researcher� graduate� student?� Let� us� see�
what�Marie�has�in�store�for�her�now…�
Marie’s�first�attempts�at�consulting�went�so�well�that�her�faculty�advisor�has�assigned�
Marie�two�additional�consulting�responsibilities�with�individuals�from�their�commu-
nity�� Marie� has� been� asked� to� consult� with� a� local� nurse� practitioner,� JoAnn,� who� is�
studying�cholesterol�levels�of�adults�and�how�they�differ�based�on�gender��Marie�sug-
gests�the�following�research�question:�Is there a mean difference in cholesterol level between
males and females?� Marie� suggests� an� independent� samples� t� test� as� the� test� of� infer-
ence��Her�task�is�then�to�assist�JoAnn�in�generating�the�test�of�inference�to�answer�her�
research�question�
Marie�has�also�been�asked�to�consult�with�the�swimming�coach,�Mark,�who�works�
with� swimming� programs� that� are� offered� through� their� local� Parks� and� Recreation�
Department�� Mark� has� just� conducted� an� intensive� 2� month� training� program� for� a�
group�of�10�swimmers��He�wants�to�determine�if,�on�average,�their�time�in�the�50�meter�
freestyle�event�is�different�after�the�training��The�following�research�question�is�sug-
gested�by�Marie:�Is there a mean difference in swim time for the 50-meter freestyle event before
participation in an intensive training program as compared to swim time for the 50-meter free-
style event after participation in an intensive training program?�Marie�suggests�a�dependent�
samples�t�test�as�the�test�of�inference��Her�task�is�then�to�assist�Mark�in�generating�the�
test�of�inference�to�answer�his�research�question�
Before� we� proceed� to� inferential� tests� of� the� difference� between� two� means,� a� few� new�
concepts� need� to� be� introduced�� The� new� concepts� are� the� difference� between� the� selec-
tion�of�independent�samples�and�dependent�samples,�the�hypotheses�to�be�tested,�and�the�
sampling�distribution�of�the�difference�between�two�means�
7.1.1 Independent Versus dependent Samples
The�first�new�concept�to�address�is�to�make�a�distinction�between�the�selection�of�indepen-
dent samples�and�dependent samples��Two�samples�are�independent�when�the�method�
of� sample� selection� is� such� that� those� individuals� selected� for� sample� 1� do� not� have� any�
165Inferences About the Difference Between Two Means
relationship� to� those� individuals� selected� for� sample� 2�� In� other� words,� the� selection� of�
individuals�to�be�included�in�the�two�samples�are�unrelated�or�uncorrelated�such�that�they�
have�absolutely�nothing�to�do�with�one�another��You�might�think�of�the�samples�as�being�
selected� totally� separate� from� one� another�� Because� the� individuals� in� the� two� samples�
are�independent�of�one�another,�their�scores�on�the�dependent�variable,�Y,�should�also�be�
independent�of�one�another��The�independence�condition�leads�us�to�consider,�for�example,�
the�independent samples�t�test��(This�should�not,�however,�be�confused�with�the�assump-
tion�of�independence,�which�was�introduced�in�the�previous�chapter��The�assumption�of�
independence�still�holds�for�the�independent�samples�t�test,�and�we�will�talk�later�about�
how�this�assumption�can�be�met�with�this�particular�procedure�)
Two� samples� are� dependent� when� the� method� of� sample� selection� is� such� that� those�
individuals�selected�for�sample�1�do�have�a�relationship�to�those�individuals�selected�for�
sample�2��In�other�words,�the�selections�of�individuals�to�be�included�in�the�two�samples�
are�related�or�correlated��You�might�think�of�the�samples�as�being�selected�simultaneously�
such�that�there�are�actually�pairs�of�individuals��Consider�the�following�two�typical�exam-
ples��First,�if�the�same�individuals�are�measured�at�two�points�in�time,�such�as�during�a�
pretest�and�a�posttest,�then�we�have�two�dependent�samples��The�scores�on�Y�at�time�1�will�
be�correlated�with�the�scores�on�Y�at�time�2�because�the�same�individuals�are�assessed�at�
both�time�points��Second,�if�husband-and-wife�pairs�are�selected,�then�we�have�two�depen-
dent�samples��That�is,�if�a�particular�wife�is�selected�for�the�study,�then�her�corresponding�
husband�is�also�automatically�selected—this�is�an�example�where�individuals�are�paired�
or�matched�in�some�way�such�that�they�share�characteristics�that�makes�the�score�of�one�
person�related�to�(i�e�,�dependent�on)�the�score�of�the�other�person��In�both�examples,�we�
have�natural�pairs�of�individuals�or�scores��The�dependence�condition�leads�us�to�consider�
the�dependent samples�t�test,�alternatively�known�as�the�correlated samples�t�test�or�the�
paired samples�t�test��As�we�show�in�this�chapter,�whether�the�samples�are�independent�
or�dependent�determines�the�appropriate�inferential�test�
7.1.2 hypotheses
The�hypotheses�to�be�evaluated�for�detecting�a�difference�between�two�means�are�as�fol-
lows�� The� null� hypothesis� H0� is� that� there� is� no� difference� between� the� two� population�
means,�which�we�denote�as�the�following:
� H H0 1 2 0 1 20: :µ µ µ µ− = =or
where
μ1�is�the�population�mean�for�sample�1
μ2�is�the�population�mean�for�sample�2
Mathematically,�both�equations�say�the�same�thing��The�version�on�the�left�makes�it�clear�
to�the�reader�why�the�term�“null”�is�appropriate��That�is,�there�is�no�difference�or�a�“null”�
difference�between�the�two�population�means��The�version�on�the�right�indicates�that�the�
population� mean� of� sample� 1� is� the� same� as� the� population� mean� of� sample� 2—another�
way�of�saying�that�there�is�no�difference�between�the�means�(i�e�,�they�are�the�same)��The�
nondirectional�scientific�or�alternative�hypothesis�H1�is�that�there�is�a�difference�between�
the�two�population�means,�which�we�denote�as�follows:
� H H1 1 2 1 1 20: :µ µ µ µ− ≠ or ≠
166 An Introduction to Statistical Concepts
The�null�hypothesis�H0�will�be�rejected�here�in�favor�of�the�alternative�hypothesis�H1�if�the�
population�means�are�different��As�we�have�not�specified�a�direction�on�H1,�we�are�will-
ing�to�reject�either�if�μ1�is�greater�than�μ2�or�if�μ1�is�less�than�μ2��This�alternative�hypothesis�
results�in�a�two-tailed�test�
Directional�alternative�hypotheses�can�also�be�tested�if�we�believe�μ1�is�greater�than�μ2,�
denoted�as�follows:
� H H1 1 2 1 1 20: :µ µ µ µ− > >or
In�this�case,�the�equation�on�the�left�tells�us�that�when�μ2�is�subtracted�from�μ1,�a�positive�
value�will�result�(i�e�,�some�value�greater�than�0)��The�equation�on�the�right�makes�it�some-
what�clearer�what�we�hypothesize�
Or�if�we�believe�μ1�is�less�than�μ2,�the�directional�alternative�hypotheses�will�be�denoted�
as�we�see�here:
� H H1 1 2 1 1 20: :µ − < <µ µ µor
In�this�case,�the�equation�on�the�left�tells�us�that�when�μ2�is�subtracted�from�μ1,�a�negative�
value�will�result�(i�e�,�some�value�less�than�0)��The�equation�on�the�right�makes�it�somewhat�
clearer�what�we�hypothesize��Regardless�of�how�they�are�denoted,�directional�alternative�
hypotheses�result�in�a�one-tailed�test�
The� underlying� sampling� distribution� for� these� tests� is� known� as� the� sampling dis-
tribution of the difference between two means��This�makes�sense,�as�the�hypotheses�
examine�the�extent�to�which�two�sample�means�differ��The�mean�of�this�sampling�dis-
tribution�is�0,�as�that�is�the�hypothesized�difference�between�the�two�population�means�
μ1�−�μ2��The�more�the�two�sample�means�differ,�the�more�likely�we�are�to�reject�the�null�
hypothesis��As�we�show�later,�the�test�statistics�in�this�chapter�all�deal�in�some�way�with�
the�difference�between�the�two�means�and�with�the�standard�error�(or�standard�devia-
tion)�of�the�difference�between�two�means�
7.2 Inferences About Two Independent Means
In� this� section,� three� inferential� tests� of� the� difference� between� two� independent� means�
are� described:� the� independent� t� test,� the� Welch� t′� test,� and� briefly� the� Mann–Whitney–
Wilcoxon�test��The�section�concludes�with�a�list�of�recommendations�
7.2.1 Independent t Test
First,�we�need�to�determine�the�conditions�under�which�the�independent�t�test�is�appropri-
ate��In�part,�this�has�to�do�with�the�statistical�assumptions�associated�with�the�test�itself��
The�assumptions�of�the�independent�t�test�are�that�the�scores�on�the�dependent�variable�Y�
(a)�are�normally�distributed�within�each�of�the�two�populations,�(b)�have�equal�population�
variances�(known�as�homogeneity�of�variance�or�homoscedasticity),�and�(c)�are�indepen-
dent�� (The� assumptions� of� normality� and� independence� should� sound� familiar� as� they�
were�introduced�as�we�learned�about�the�one-sample�t�test�)�Later�in�the�chapter,�we�more�
167Inferences About the Difference Between Two Means
fully�discuss�the�assumptions�for�this�particular�procedure��When�these�assumptions�are�
not�met,�other�procedures�may�be�more�appropriate,�as�we�also�show�later�
The�measurement�scales�of�the�variables�must�also�be�appropriate��Because�this�is�a�test�
of�means,�the�dependent�variable�must�be�measured�on�an�interval�or�ratio�scale��The�inde-
pendent�variable,�however,�must�be�nominal�or�ordinal,�and�only�two�categories�or�groups�
of�the�independent�variable�can�be�used�with�the�independent�t�test��(In�later�chapters,�we�
will�learn�about�analysis�of�variance�(ANOVA)�which�can�accommodate�an�independent�
variable� with� more� than� two� categories�)� It� is� not� a� condition� of� the� independent� t� test�
that�the�sample�sizes�of�the�two�groups�be�the�same��An�unbalanced�design�(i�e�,�unequal�
sample�sizes)�is�perfectly�acceptable�
The�test�statistic�for�the�independent�t�test�is�known�as�t�and�is�denoted�by�the�following�
formula:
�
t
Y Y
sY Y
=
−
−
1 2
1 2
where
Y
–
1�and�Y
–
2�are�the�means�for�sample�1�and�sample�2,�respectively
sY Y1 2− �is�the�standard�error�of�the�difference�between�two�means
This�standard�error�is�the�standard�deviation�of�the�sampling�distribution�of�the�difference�
between�two�means�and�is�computed�as�follows:
�
s s
n nY Y
p1 2
1 1
1 2
− = +
where�sp�is�the�pooled�standard�deviation�computed�as
�
s
n s n s
n n
p =
− + −
+ −
( ) ( )1 1
2
2 2
2
1 2
1 1
2
and�where
s1
2�and� s2
2 �are�the�sample�variances�for�groups�1�and�2,�respectively
n1�and�n2�are�the�sample�sizes�for�groups�1�and�2,�respectively
Conceptually,�the�standard�error� sY Y1 2− �is�a�pooled�standard�deviation�weighted�by�the�
two� sample� sizes;� more� specifically,� the� two� sample� variances� are� weighted� by� their�
respective� sample� sizes� and� then� pooled�� This� is� conceptually� similar� to� the� standard�
error�for�the�one-sample�t�test,�which�you�will�recall�from�Chapter�6�as
�
s
s
n
Y
Y=
where�we�also�have�a�standard�deviation�weighted�by�sample�size��If�the�sample�variances�
are�not�equal,�as�the�test�assumes,�then�you�can�see�why�we�might�not�want�to�take�a�pooled�
or�weighted�average�(i�e�,�as�it�would�not�represent�well�the�individual�sample�variances)�
168 An Introduction to Statistical Concepts
The� test� statistic� t� is� then� compared� to� a� critical� value(s)� from� the� t� distribution�� For� a�
two-tailed� test,� from� Table� A�2,� we� would� use� the� appropriate� α2� column� depending� on�
the� desired� level� of� significance� and� the� appropriate� row� depending� on� the� degrees� of�
freedom�� The� degrees� of� freedom� for� this� test� are� n1� +� n2� −� 2�� Conceptually,� we� lose� one�
degree�of�freedom�from�each�sample�for�estimating�the�population�variances�(i�e�,�there�are�
two�restrictions�along�the�lines�of�what�was�discussed�in�Chapter�6)��The�critical�values�are�
denoted�as� ± + −α2 1 2 2tn n ��The�subscript�α2�of�the�critical�values�reflects�the�fact�that�this�is�a�
two-tailed�test,�and�the�subscript�n1�+�n2�−�2�indicates�these�particular�degrees�of�freedom��
(Remember�that�the�critical�value�can�be�found�based�on�the�knowledge�of�the�degrees�of�
freedom�and�whether�it�is�a�one-�or�two-tailed�test�)�If�the�test�statistic�falls�into�either�criti-
cal�region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�H0�
For�a�one-tailed�test,�from�Table�A�2,�we�would�use�the�appropriate�α1�column�depend-
ing�on�the�desired�level�of�significance�and�the�appropriate�row�depending�on�the�degrees�
of�freedom��The�degrees�of�freedom�are�again�n1�+�n2�−�2��The�critical�value�is�denoted�as�
+α1 1 2 2tn n+ − �for�the�alternative�hypothesis�H1:�μ1�−�μ2�>�0�(i�e�,�right-tailed�test�so�the�critical�
value�will�be�positive),�and�as�− + −α1 1 2 2tn n �for�the�alternative�hypothesis�H1:�μ1�−�μ2�<�0�(i�e�,�
left-tailed�test�and�thus�a�negative�critical�value)��If�the�test�statistic�t�falls�into�the�appro-
priate�critical�region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�H0�
7.2.1.1 Confidence Interval
For�the�two-tailed�test,�a�(1�−�α)%�confidence�interval�(CI)�can�also�be�examined��The�CI�is�
formed�as�follows:
�
( ) ( )Y Y t sn n Y Y1 2 22 1 21 2− ± + − −α
If�the�CI�contains�the�hypothesized�mean�difference�of�0,�then�the�conclusion�is�to�fail�to�
reject�H0;�otherwise,�we�reject�H0��The�interpretation�and�use�of�CIs�is�similar�to�that�of�the�
one-sample�test�described�in�Chapter�6��Imagine�we�take�100�random�samples�from�each�of�
two�populations�and�construct�95%�CIs��Then�95%�of�the�CIs�will�contain�the�true�popula-
tion�mean�difference�μ1�−�μ2�and�5%�will�not��In�short,�95%�of�similarly�constructed�CIs�will�
contain�the�true�population�mean�difference�
7.2.1.2 Effect Size
Next�we�extend�Cohen’s�(1988)�sample�measure�of�effect�size�d�from�Chapter�6�to�the�two�
independent�samples�situation��Here�we�compute�d�as�follows:
�
d
Y Y
sp
=
−1 2
The�numerator�of�the�formula�is�the�difference�between�the�two�sample�means��The�denomi-
nator� is� the� pooled� standard� deviation,� for� which� the� formula� was� presented� previously��
Thus,�the�effect�size�d�is�measured�in�standard�deviation�units,�and�again�we�use�Cohen’s�
proposed�subjective�standards�for�interpreting�d:�small�effect�size,�d�=��2;�medium�effect�size,�
d�=��5;�large�effect�size,�d�=��8��Conceptually,�this�is�similar�to�d�in�the�one-sample�case�from�
Chapter�6��The�effect�size�d�is�considered�a�standardized�group�difference�type�of�effect�size�
(Huberty,�2002)��There�are�other�types�of�effect�sizes,�however��Another�is�eta�squared�(η2),�
169Inferences About the Difference Between Two Means
also�a�standardized�effect�size,�and�it�is�considered�a�relationship�type�of�effect�size�(Huberty,�
2002)��For�the�independent�t�test,�eta�squared�can�be�calculated�as�follows:
�
η2
2
2
2
2
1 2 2
=
+
=
+ + −
t
t df
t
t n n( )
Here� the� numerator� is� the� squared� t� test� statistic� value,� and� the� denominator� is� sum� of�
the�squared�t�test�statistic�value�and�the�degrees�of�freedom��Values�for�eta�squared�range�
from�0�to�+1�00,�where�values�closer�to�one�indicate�a�stronger�association��In�terms�of�what�
this�effect�size�tells�us,�eta�squared�is�interpreted�as�the�proportion�of�variance�accounted�
for�in�the�dependent�variable�by�the�independent�variable�and�indicates�the�degree�of�the�
relationship�between�the�independent�and�dependent�variables��If�we�use�Cohen’s�(1988)�
metric�for�interpreting�eta�squared:�small�effect�size,�η2�=��01;�moderate�effect�size,�η2�=��06;�
large�effect�size,�η2�=��14�
7.2.1.3 Example of the Independent t Test
Let�us�now�consider�an�example�where�the�independent�t�test�is�implemented��Recall�from�
Chapter�6�the�basic�steps�for�hypothesis�testing�for�any�inferential�test:�(1)�State�the�null�
and� alternative� hypotheses,� (2)� select� the� level� of� significance� (i�e�,� alpha,� α),� (3)� calculate�
the�test�statistic�value,�and�(4)�make�a�statistical�decision�(reject�or�fail�to�reject�H0)��We�will�
follow�these�steps�again�in�conducting�our�independent�t�test��In�our�example,�samples�of�8�
female�and�12�male�middle-age�adults�are�randomly�and�independently�sampled�from�the�
populations�of�female�and�male�middle-age�adults,�respectively��Each�individual�is�given�
a�cholesterol�test�through�a�standard�blood�sample��The�null�hypothesis�to�be�tested�is�that�
males�and�females�have�equal�cholesterol�levels��The�alternative�hypothesis�is�that�males�
and�females�will�not�have�equal�cholesterol�levels,�thus�necessitating�a�nondirectional�or�
two-tailed�test��We�will�conduct�our�test�using�an�alpha�level�of��05��The�raw�data�and�sum-
mary�statistics�are�presented�in�Table�7�1��For�the�female�sample�(sample�1),�the�mean�and�
variance�are�185�0000�and�364�2857,�respectively,�and�for�the�male�sample�(sample�2),�the�
mean�and�variance�are�215�0000�and�913�6363,�respectively�
In�order�to�compute�the�test�statistic�t,�we�first�need�to�determine�the�standard�error�of�
the�difference�between�the�two�means��The�pooled�standard�deviation�is�computed�as
�
s
n s n s
n n
p =
− + −
+ −
=
− + −( ) ( ) ( ) . ( ) .1 1
2
2 2
2
1 2
1 1
2
8 1 364 2857 12 1 913 6363
88 12 2
26 4575
+ −
= .
and�the�standard�error�of�the�difference�between�two�means�is�computed�as
�
s s
n nY Y
p1 2
1 1
26 4575
1
8
1
12
12 0752
1 2
− = + = + =. .
The�test�statistic�t�can�then�be�computed�as
�
t
Y Y
sY Y
=
−
=
−
= −
−
1 2
1 2
185 215
12 0752
2 4844
.
.
170 An Introduction to Statistical Concepts
The�next�step�is�to�use�Table�A�2�to�determine�the�critical�values��As�there�are�18�degrees�
of�freedom�(n1�+�n2�−�2�=�8�+�12�−�2�=�18),�using�α�=��05�and�a�two-tailed�or�nondirectional�
test,�we�find�the�critical�values�using�the�appropriate�α2�column�to�be�+2�101�and�−2�101��
Since�the�test�statistic�falls�beyond�the�critical�values�as�shown�in�Figure�7�1,�we�therefore�
reject�the�null�hypothesis�that�the�means�are�equal�in�favor�of�the�nondirectional�alterna-
tive�that�the�means�are�not�equal��Thus,�we�conclude�that�the�mean�cholesterol�levels�for�
males�and�females�are�not�equal�at�the��05�level�of�significance�(denoted�by�p�<��05)�
The� 95%� CI� can� also� be� examined�� For� the� cholesterol� example,� the� CI� is� formed� as�
follows:
( ) ( ) ( ) . ( . )Y Y t sn n Y Y1 2 22 1 2 1 2 185 215 2 101 12 0752 30 25− ± = − ± = − ±+ − −α .. ( . , . )3700 55 3700 4 6300= − −
Table 7.1
Cholesterol�Data�for�Independent�
Samples
Female (Sample 1) Male (Sample 2)
205 245
160 170
170 180
180 190
190 200
200 210
210 220
165 230
240
250
260
185
X
–
1�=�185�0000 X
–
2�=�215�0000
s1
2 364 2857= . s22 913 6363= .
FIGuRe 7.1
Critical� regions� and� test� statistics� for� the�
cholesterol�example�
α = .025 α = .025
+2.101
Critical
value
–2.101
Critical
value
–2.4884
t test
statistic
value
–2.7197
Welch
t΄ test
statistic
value
171Inferences About the Difference Between Two Means
As�the�CI�does�not�contain�the�hypothesized�mean�difference�value�of�0,�then�we�would�
again�reject�the�null�hypothesis�and�conclude�that�the�mean�gender�difference�in�choles-
terol�levels�was�not�equal�to�0�at�the��05�level�of�significance�(p�<��05)��In�other�words,�there�
is�evidence�to�suggest�that�the�males�and�females�differ,�on�average,�on�cholesterol�level��
More�specifically,�the�mean�cholesterol�level�for�males�is�greater�than�the�mean�cholesterol�
level�for�females�
The�effect�size�for�this�example�is�computed�as�follows:
�
d
Y Y
sp
=
−
=
−
= −1 2
185 215
26 4575
1 1339
.
.
According�to�Cohen’s�recommended�subjective�standards,�this�would�certainly�be�a�rather�
large�effect�size,�as�the�difference�between�the�two�sample�means�is�larger�than�one�stan-
dard�deviation��Rather�than�d,�had�we�wanted�to�compute�eta�squared,�we�would�have�also�
found�a�large�effect:
�
η2
2
2
2
2
2 4844
2 4844 18
2553=
+
=
−
− +
=
t
t df
( . )
( . ) ( )
.
An� eta� squared� value� of� �26� indicates� a� large� relationship� between� the� independent� and�
dependent�variables,�with�26%�of�the�variance�in�the�dependent�variable�(i�e�,�cholesterol�
level)�accounted�for�by�the�independent�variable�(i�e�,�gender)�
7.2.1.4 Assumptions
Let�us�return�to�the�assumptions�of�normality,�independence,�and�homogeneity�of�vari-
ance�� For� the� independent� t� test,� the� assumption� of� normality� is� met� when� the� depen-
dent�variable�is�normally�distributed�for�each�sample�(i�e�,�each�category�or�group)�of�the�
independent�variable��The�normality�assumption�is�made�because�we�are�dealing�with�a�
parametric�inferential�test��Parametric tests�assume�a�particular�underlying�theoretical�
population� distribution,� in� this� case,� the� normal� distribution�� Nonparametric tests� do�
not�assume�a�particular�underlying�theoretical�population�distribution�
Conventional� wisdom� tells� us� the� following� about� nonnormality�� When� the� normality�
assumption�is�violated�with�the�independent�t�test,�the�effects�on�Type�I�and�Type�II�errors�
are�minimal�when�using�a�two-tailed�test�(e�g�,�Glass,�Peckham,�&�Sanders,�1972;�Sawilowsky�&�
Blair,�1992)��When�using�a�one-tailed�test,�violation�of�the�normality�assumption�is�minimal�
for� samples� larger� than� 10� and� disappears� for� samples� of� at� least� 20� (Sawilowsky� &� Blair,�
1992;� Tiku� &� Singh,� 1981)�� The� simplest� methods� for� detecting� violation� of� the� normality�
assumption� are� graphical� methods,� such� as� stem-and-leaf� plots,� box� plots,� histograms,� or�
Q–Q�plots,�statistical�procedures�such�as�the�Shapiro–Wilk�(S–W)�test�(1965),�and/or�skew-
ness�and�kurtosis�statistics��However,�more�recent�research�by�Wilcox�(2003)�indicates�that�
power� for� both� the� independent� t� and� Welch� t′� can� be� reduced� even� for� slight� departures�
from�normality,�with�outliers�also�contributing�to�the�problem��Wilcox�recommends�several�
procedures�not�readily�available�and�beyond�the�scope�of�this�text�(such�as�bootstrap�meth-
ods,�trimmed�means,�medians)��Keep�in�mind,�though,�that�the�independent�t�test�is�fairly�
robust�to�nonnormality�in�most�situations�
The�independence�assumption�is�also�necessary�for�this�particular�test��For�the�indepen-
dent� t� test,� the� assumption� of� independence� is� met� when� there� is� random� assignment� of�
172 An Introduction to Statistical Concepts
individuals�to�the�two�groups�or�categories�of�the�independent�variable��Random�assignment�
to�the�two�samples�being�studied�provides�for�greater�internal�validity—the�ability�to�state�
with�some�degree�of�confidence�that�the�independent�variable�caused�the�outcome�(i�e�,�the�
dependent�variable)��If�the�independence�assumption�is�not�met,�then�probability�statements�
about� the� Type� I� and� Type� II� errors� will� not� be� accurate;� in� other� words,� the� probability�
of�a�Type�I�or�Type�II�error�may�be�increased�as�a�result�of�the�assumption�not�being�met��
Zimmerman�(1997)�found�that�Type�I�error�was�affected�even�for�relatively�small�relations�
or�correlations�between�the�samples�(i�e�,�even�as�small�as��10�or��20)��In�general,�the�assump-
tion�can�be�met�by�(a)�keeping�the�assignment�of�individuals�to�groups�separate�through�the�
design�of�the�experiment�(specifically�random�assignment—not�to�be�confused�with�random�
selection),�and�(b)�keeping�the�individuals�separate�from�one�another�through�experimen-
tal�control�so�that�the�scores�on�the�dependent�variable�Y�for�sample�1�do�not�influence�the�
scores�for�sample�2��Zimmerman�also�stated�that�independence�can�be�violated�for�suppos-
edly� independent� samples� due� to� some� type� of� matching� in� the� design� of� the� experiment�
(e�g�,�matched�pairs�based�on�gender,�age,�and�weight)��If�the�observations�are�not�indepen-
dent,�then�the�dependent�t�test,�discussed�further�in�the�chapter,�may�be�appropriate�
Of�potentially�more�serious�concern�is�violation�of�the�homogeneity�of�variance�assump-
tion��Homogeneity�of�variance�is�met�when�the�variances�of�the�dependent�variable�for�the�
two�samples�(i�e�,�the�two�groups�or�categories�of�the�independent�variables)�are�the�same��
Research� has� shown� that� the� effect� of� heterogeneity� (i�e�,� unequal� variances)� is� minimal�
when�the�sizes�of�the�two�samples,�n1�and�n2,�are�equal;�this�is�not�the�case�when�the�sample�
sizes�are�not�equal��When�the�larger�variance�is�associated�with�the�smaller�sample�size�
(e�g�,� group� 1� has� the� larger� variance� and� the� smaller� n),� then� the� actual� α� level� is� larger�
than�the�nominal�α�level��In�other�words,�if�you�set�α�at��05,�then�you�are�not�really�conduct-
ing�the�test�at�the��05�level,�but�at�some�larger�value��When�the�larger�variance�is�associated�
with�the�larger�sample�size�(e�g�,�group�1�has�the�larger�variance�and�the�larger�n),�then�the�
actual�α�level�is�smaller�than�the�nominal�α�level��In�other�words,�if�you�set�α�at��05,�then�
you�are�not�really�conducting�the�test�at�the��05�level,�but�at�some�smaller�value�
One�can�use�statistical�tests�to�detect�violation�of�the�homogeneity�of�variance�assump-
tion,� although� the� most� commonly� used� tests� are� somewhat� problematic�� These� tests�
include�Hartley’s�Fmax�test�(for�equal�n’s,�but�sensitive�to�nonnormality;�it�is�the�unequal�
n’s�situation�that�we�are�concerned�with�anyway),�Cochran’s�test�(for�equal�n’s,�but�even�
more� sensitive� to� nonnormality� than� Hartley’s� test;� concerned� with� unequal� n’s� situa-
tion�anyway),�Levene’s�test�(for�equal�n’s,�but�sensitive�to�nonnormality;�concerned�with�
unequal� n’s� situation� anyway)� (available� in� SPSS),� the� Bartlett� test� (for� unequal� n’s,� but�
very� sensitive� to� nonnormality),� the� Box–Scheffé–Anderson� test� (for� unequal� n’s,� fairly�
robust�to�nonnormality),�and�the�Browne–Forsythe�test�(for�unequal�n’s,�more�robust�to�
nonnormality�than�the�Box–Scheffé–Anderson�test�and�therefore�recommended)��When�
the�variances�are�unequal�and�the�sample�sizes�are�unequal,�the�usual�method�to�use�as�
an�alternative�to�the�independent�t�test�is�the�Welch�t′�test�described�in�the�next�section��
Inferential� tests� for� evaluating� homogeneity� of� variance� are� more� fully� considered� in�
Chapter�9�
7.2.2 Welch t′ Test
The�Welch�t′�test�is�usually�appropriate�when�the�population�variances�are�unequal�and�
the� sample� sizes� are� unequal�� The� Welch� t′� test� assumes� that� the� scores� on� the� depen-
dent� variable� Y� (a)� are� normally� distributed� in� each� of� the� two� populations� and� (b)� are�
independent�
173Inferences About the Difference Between Two Means
The�test�statistic�is�known�as�t′�and�is�denoted�by
�
′ =
−
=
−
+
=
−
+−
t
Y Y
s
Y Y
s s
Y Y
s
n
s
n
Y Y Y Y
1 2 1 2
2 2
1 2
1
2
1
2
2
2
1 2 1 2
where
Y
–
1�and�Y
–
2�are�the�means�for�samples�1�and�2,�respectively
sY1
2 �and� sY2
2 �are�the�variance�errors�of�the�means�for�samples�1�and�2,�respectively
Here�we�see�that�the�denominator�of�this�test�statistic�is�conceptually�similar�to�the�one-
sample� t� and� the� independent� t� test� statistics�� The� variance� errors� of� the� mean� are� com-
puted�for�each�group�by
�
s
s
nY1
2 1
2
1
=
�
s
s
nY2
2 2
2
2
=
where� s1
2 �and� s2
2 �are�the�sample�variances�for�groups�1�and�2,�respectively��The�square�root�
of�the�variance�error�of�the�mean�is�the�standard�error�of�the�mean�(i�e�,� sY1 �and� sY2 )��Thus,�
we�see�that�rather�than�take�a�pooled�or�weighted�average�of�the�two�sample�variances�as�
we�did�with�the�independent�t�test,�the�two�sample�variances�are�treated�separately�with�
the�Welch�t′�test�
The�test�statistic�t′�is�then�compared�to�a�critical�value(s)�from�the�t�distribution�in�Table�
A�2��We�again�use�the�appropriate�α�column�depending�on�the�desired�level�of�significance�
and�whether�the�test�is�one-�or�two-tailed�(i�e�,�α1�and�α2),�and�the�appropriate�row�for�the�
degrees�of�freedom��The�degrees�of�freedom�for�this�test�are�a�bit�more�complicated�than�
for� the� independent� t� test�� The� degrees� of� freedom� are� adjusted� from� n1� +� n2� −� 2� for� the�
independent�t�test�to�the�following�value�for�the�Welch�t′�test:
�
ν =
+( )
( )
−
+
( )
−
s s
s
n
s
n
Y Y
Y Y
1 2
1 2
2 2 2
2 2
1
2 2
21 1
The� degrees� of� freedom� ν� are� approximated� by� rounding� to� the� nearest� whole� number�
prior� to� using� the� table�� If� the� test� statistic� falls� into� a� critical� region,� then� we� reject� H0;�
otherwise,�we�fail�to�reject�H0�
For�the�two-tailed�test,�a�(1�−�α)%�CI�can�also�be�examined��The�CI�is�formed�as�follows:
�
( ) ( )Y Y t sY Y1 2 2 1 2− ± −α ν
If�the�CI�contains�the�hypothesized�mean�difference�of�0,�then�the�conclusion�is�to�fail�to�
reject�H0;�otherwise,�we�reject�H0��Thus,�interpretation�of�this�CI�is�the�same�as�with�the�
independent�t�test�
174 An Introduction to Statistical Concepts
Consider� again� the� example� cholesterol� data� where� the� sample� variances� were� some-
what� different� and� the� sample� sizes� were� different�� The� variance� errors� of� the� mean� are�
computed�for�each�sample�as�follows:
�
s
s
nY1
2 1
2
1
364 2857
8
45 5357= = =
.
.
�
s
s
nY2
2 2
2
2
913 6363
12
76 1364= = =
.
.
The�t′�test�statistic�is�computed�as
�
′ =
−
+
=
−
+
=
−
= −t
Y Y
s sY Y
1 2
2 2
1 2
185 215
45 5357 76 1364
30
11 0305
2 719
. . .
. 77
Finally,�the�degrees�of�freedom�ν�are�determined�to�be
�
ν =
+( )
( )
−
+
( )
−
=
+s s
s
n
s
n
Y Y
Y Y
1 2
1 2
2 2 2
2 2
1
2 2
2
2
1 1
45 5357 76 1364
4
( . . )
( 55 5357
8 1
76 1364
12 1
17 98382 2. ) ( . )
.
−
+
−
=
which� is� rounded� to� 18,� the� nearest� whole� number�� The� degrees� of� freedom� remain� 18� as�
they�were�for�the�independent�t�test,�and�thus,�the�critical�values�are�still�+2�101�and�−2�101��
As�the�test�statistic�falls�beyond�the�critical�values�as�shown�in�Figure�7�1,�we�therefore�reject�
the�null�hypothesis�that�the�means�are�equal�in�favor�of�the�alternative�that�the�means�are�
not�equal��Thus,�as�with�the�independent�t�test,�with�the�Welch�t′�test,�we�conclude�that�the�
mean�cholesterol�levels�for�males�and�females�are�not�equal�at�the��05�level�of�significance��In�
this�particular�example,�then,�we�see�that�the�unequal�sample�variances�and�unequal�sample�
sizes�did�not�alter�the�outcome�when�comparing�the�independent�t�test�result�with�the�Welch�
t′�test�result��However,�note�that�the�results�for�these�two�tests�may�differ�with�other�data�
Finally,�the�95%�CI�can�be�examined��For�the�example,�the�CI�is�formed�as�follows:
�
( ) ( ) ( ) . ( . ) .Y Y t sY Y1 2 2 1 2 185 215 2 101 11 0305 30 23 1751− ± = − ± = − ± =−α ν (( . , . )− −53 1751 6 8249
As�the�CI�does�not�contain�the�hypothesized�mean�difference�value�of�0,�then�we�would�
again� reject� the� null� hypothesis� and� conclude� that� the� mean� gender� difference� was� not�
equal�to�0�at�the��05�level�of�significance�(p�<��05)�
7.2.3 Recommendations
The�following�four�recommendations�are�made�regarding�the�two�independent�samples�
case�� Although� there� is� no� total� consensus� in� the� field,� our� recommendations� take� into�
account,� as� much� as� possible,� the� available� research� and� statistical� software�� First,� if� the�
normality� assumption� is� satisfied,� the� following� recommendations� are� made:� (a)� the�
175Inferences About the Difference Between Two Means
independent�t�test�is�recommended�when�the�homogeneity�of�variance�assumption�is�met;�
(b)�the�independent�t�test�is�recommended�when�the�homogeneity�of�variance�assumption�
is�not�met�and�when�there�are�an�equal�number�of�observations�in�the�samples;�and�(c)�the�
Welch�t′�test�is�recommended�when�the�homogeneity�of�variance�assumption�is�not�met�
and�when�there�are�an�unequal�number�of�observations�in�the�samples�
Second,�if�the�normality�assumption�is�not�satisfied,�the�following�recommendations�are�
made:� (a)� if� the� homogeneity� of� variance� assumption� is� met,� then� the� independent� t� test�
using�ranked�scores�(Conover�&�Iman,�1981),�rather�than�raw�scores,�is�recommended;�and�
(b)�if�homogeneity�of�variance�assumption�is�not�met,�then�the�Welch�t′�test�using�ranked�
scores�is�recommended,�regardless�of�whether�there�are�an�equal�number�of�observations�
in�the�samples��Using�ranked�scores�means�you�rank�order�the�observations�from�highest�
to�lowest�regardless�of�group�membership,�then�conduct�the�appropriate�t�test�with�ranked�
scores�rather�than�raw�scores�
Third,�the�dependent�t�test�is�recommended�when�there�is�some�dependence�between�
the� groups� (e�g�,� matched� pairs� or� the� same� individuals� measured� on� two� occasions),� as�
described� later� in� this� chapter�� Fourth,� the� nonparametric� Mann-Whitney-Wilcoxon� test�
is�not�recommended��Among�the�disadvantages�of�this�test�are�that�(a)�the�critical�values�
are�not�extensively�tabled,�(b)�tied�ranks�can�affect�the�results�and�no�optimal�procedure�
has�yet�been�developed�(Wilcox,�1996),�and�(c)�Type�I�error�appears�to�be�inflated�regard-
less� of� the� status� of� the� assumptions� (Zimmerman,� 2003)�� For� these� reasons,� the� Mann–
Whitney–Wilcoxon� test� is� not� further� described� here�� Note� that� most� major� statistical�
packages,�including�SPSS,�have�options�for�conducting�the�independent�t�test,�the�Welch�t′�
test,�and�the�Mann-Whitney-Wilcoxon�test��Alternatively,�one�could�conduct�the�Kruskal–
Wallis�nonparametric�one-factor�ANOVA,�which�is�also�based�on�ranked�data,�and�which�
is�appropriate�for�comparing�the�means�of�two�or�more�independent�groups��This�test�is�
considered�more�fully�in�Chapter�11��These�recommendations�are�summarized�in�Box�7�1�
STOp aNd ThINk bOx 7.1
Recommendations�for�the�Independent�and�Dependent�Samples�Tests�Based�on�Meeting�
or Violating�the�Assumption�of�Normality
Assumption Independent Samples Tests Dependent Samples Tests
Normality�is�met •��Use�the�independent�t�test�when�
homogeneity�of�variances�is�met
•�Use�the�dependent�t�test
•��Use�the�independent�t�test�when�
homogeneity�of�variances�is�not�met,�but�
there�are�equal�sample�sizes�in�the�groups
•��Use�the�Welch�t′�test�when�homogeneity�of�
variances�is�not�met�and�there�are�unequal�
sample�sizes�in�the�groups
Normality�is�not�met •��Use�the�independent�t�test�with�ranked�
scores�when�homogeneity�of�variances�is�
met
•��Use�the�Welch�t′�test�with�ranked�scores�
when�homogeneity�of�variances�is�not�met,�
regardless�of�equal�or�unequal�sample�
sizes�in�the�groups
•��Use�the�Kruskal–Wallis�nonparametric�
procedure
•��Use�the�dependent�t�test�with�ranked�
scores,�or�alternative�procedures�
including�bootstrap�methods,�
trimmed�means,�medians,�or�Stein’s�
method
•��Use�the�Wilcoxon�signed�ranks�test�
when�data�are�both�nonnormal�and�
have�extreme�outliers
•��Use�the�Friedman�nonparametric�
procedure
176 An Introduction to Statistical Concepts
7.3 Inferences About Two Dependent Means
In�this�section,�two�inferential�tests�of�the�difference�between�two�dependent�means�are�
described,� the� dependent� t� test� and� briefly� the� Wilcoxon� signed� ranks� test�� The� section�
concludes�with�a�list�of�recommendations�
7.3.1 dependent t Test
As�you�may�recall,�the�dependent�t�test�is�appropriate�to�use�when�there�are�two�samples�
that�are�dependent—the�individuals�in�sample�1�have�some�relationship�to�the�individuals�
in�sample�2��First,�we�need�to�determine�the�conditions�under�which�the�dependent�t�test�is�
appropriate��In�part,�this�has�to�do�with�the�statistical�assumptions�associated�with�the�test�
itself—that�is,�(a)�normality�of�the�distribution�of�the�differences�of�the�dependent�variable�
Y,�(b)�homogeneity�of�variance�of�the�two�populations,�and�(c)�independence�of�the�obser-
vations�within�each�sample��Like�the�independent�t�test,�the�dependent�t�test�is�reasonably�
robust�to�violation�of�the�normality�assumption,�as�we�show�later��Because�this�is�a�test�of�
means,�the�dependent�variable�must�be�measured�on�an�interval�or�ratio�scale��For�example,�
the� same� individuals� may� be� measured� at� two� points� in� time� on� the� same� interval-scaled�
pretest�and�posttest,�or�some�matched�pairs�(e�g�,�twins�or�husbands–wives)�may�be�assessed�
with�the�same�ratio-scaled�measure�(e�g�,�weight�measured�in�pounds)�
Although�there�are�several�methods�for�computing�the�test�statistic�t,�the�most�direct�method�
and�the�one�most�closely�aligned�conceptually�with�the�one-sample�t�test�is�as�follows:
�
t
d
sd
=
where
d
–
�is�the�mean�difference
sd–�is�the�standard�error�of�the�mean�difference
Conceptually,� this� test� statistic� looks� just� like� the� one-sample� t� test� statistic,� except� now�
that�the�notation�has�been�changed�to�denote�that�we�are�dealing�with�difference�scores�
rather�than�raw�scores�
The�standard�error�of�the�mean�difference�is�computed�by
�
s
s
n
d
d=
where
sd�is�the�standard�deviation�of�the�difference�scores�(i�e�,�like�any�other�standard�devia-
tion,�only�this�one�is�computed�from�the�difference�scores�rather�than�raw�scores)
n�is�the�total�number�of�pairs
Conceptually,�this�standard�error�looks�just�like�the�standard�error�for�the�one-sample�t�test��
If�we�were�doing�hand�computations,�we�would�compute�a�difference�score�for�each�pair�of�
scores�(i�e�,�Y1�−�Y2)��For�example,�if�sample�1�were�wives�and�sample�2�were�their�husbands,�
then�we�calculate�a�difference�score�for�each�couple��From�this�set�of�difference�scores,�we�
then�compute�the�mean�of�the�difference�scores�d
–
�and�standard�deviation�of�the�difference�
177Inferences About the Difference Between Two Means
scores�sd��This�leads�us�directly�into�the�computation�of�the�t�test�statistic��Note�that�although�
there�are�n�scores�in�sample�1,�n�scores�in�sample�2,�and�thus�2n�total�scores,�there�are�only�n�
difference�scores,�which�is�what�the�analysis�is�actually�based�upon�
The�test�statistic�t�is�then�compared�with�a�critical�value(s)�from�the�t�distribution��For�a�
two-tailed�test,�from�Table�A�2,�we�would�use�the�appropriate�α2�column�depending�on�the�
desired� level� of� significance� and� the� appropriate� row� depending� on� the� degrees� of� free-
dom��The�degrees�of�freedom�for�this�test�are�n�−�1��Conceptually,�we�lose�one�degree�of�
freedom�from�the�number�of�differences�(or�pairs)�because�we�are�estimating�the�popula-
tion�variance�(or�standard�deviation)�of�the�difference��Thus,�there�is�one�restriction�along�
the� lines� of� our� discussion� of� degrees� of� freedom� in� Chapter� 6�� The� critical� values� are�
denoted�as�± −α2 1tn ��The�subscript,�α2,�of�the�critical�values�reflects�the�fact�that�this�is�a�two-
tailed�test,�and�the�subscript�n�−�1�indicates�the�degrees�of�freedom��If�the�test�statistic�falls�
into�either�critical�region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�H0�
For�a�one-tailed�test,�from�Table�A�2,�we�would�use�the�appropriate�α1�column�depending�
on�the�desired�level�of�significance�and�the�appropriate�row�depending�on�the�degrees�of�
freedom��The�degrees�of�freedom�are�again�n�−�1��The�critical�value�is�denoted�as�+ −α1 1tn �for�
the�alternative�hypothesis�H1:�μ1�−�μ2�>�0�and�as� − −α1 1tn �for�the�alternative�hypothesis�H1:�
μ1�−�μ2�<�0��If�the�test�statistic�t�falls�into�the�appropriate�critical�region,�then�we�reject�H0;�
otherwise,�we�fail�to�reject�H0�
7.3.1.1 Confidence Interval
For�the�two-tailed�test,�a�(1�−�α)%�CI�can�also�be�examined��The�CI�is�formed�as�follows:
� d t sn d± −α 2 1( )
If�the�CI�contains�the�hypothesized�mean�difference�of�0,�then�the�conclusion�is�to�fail�to�
reject�H0;�otherwise,�we�reject�H0��The�interpretation�of�these�CIs�is�the�same�as�those�previ-
ously�discussed�for�the�one-sample�t�and�the�independent�t�
7.3.1.2 Effect Size
The�effect�size�can�be�measured�using�Cohen’s�(1988)�d�computed�as�follows:
�
Cohen d
d
sd
=
where�Cohen’s�d�is�simply�used�to�distinguish�among�the�various�uses�and�slight�differ-
ences� in� the� computation� of� d�� Interpretation� of� the� value� of� d� would� be� the� same� as� for�
the�one-sample�t�and�the�independent�t�previously�discussed—specifically,�the�number�of�
standard�deviation�units�for�which�the�mean(s)�differ(s)�
7.3.1.3 Example of the Dependent t Test
Let�us�consider�an�example�for�purposes�of�illustrating�the�dependent�t�test��Ten�young�
swimmers�participated�in�an�intensive�2�month�training�program��Prior�to�the�program,�
each�swimmer�was�timed�during�a�50�meter�freestyle�event��Following�the�program,�the�
178 An Introduction to Statistical Concepts
same�swimmers�were�timed�in�the�50�meter�freestyle�event�again��This�is�a�classic�pretest-
posttest�design��For�illustrative�purposes,�we�will�conduct�a�two-tailed�test��However,�a�
case� might� also� be� made� for� a� one-tailed� test� as� well,� in� that� the� coach� might� want� to�
see�improvement�only��However,�conducting�a�two-tailed�test�allows�us�to�examine�the�
CI�for�purposes�of�illustration��The�raw�scores,�the�difference�scores,�and�the�mean�and�
standard�deviation�of�the�difference�scores�are�shown�in�Table�7�2��The�pretest�mean�time�
was�64�seconds�and�the�posttest�mean�time�was�59�seconds�
To�determine�our�test�statistic�value,�t,�first�we�compute�the�standard�error�of�the�mean�
difference�as�follows:
�
s
s
n
d
d= = =
2 1602
10
0 6831
.
.
Next,�using�this�value�for�the�denominator,�the�test�statistic�t�is�then�computed�as�follows:
�
t
d
sd
= = =
5
0 6831
7 3196
.
.
We�then�use�Table�A�2�to�determine�the�critical�values��As�there�are�nine�degrees�of�free-
dom�(n�−�1�=�10�−�1�=�9),�using�α�=��05�and�a�two-tailed�or�nondirectional�test,�we�find�the�
critical�values�using�the�appropriate�α2�column�to�be�+2�262�and�−2�262��Since�the�test�sta-
tistic�falls�beyond�the�critical�values,�as�shown�in�Figure�7�2,�we�reject�the�null�hypothesis�
that�the�means�are�equal�in�favor�of�the�nondirectional�alternative�that�the�means�are�not�
equal��Thus,�we�conclude�that�the�mean�swimming�performance�changed�from�pretest�to�
posttest�at�the��05�level�of�significance�(p�<��05)�
The�95%�CI�is�computed�to�be�the�following:
�
d t sn d± = ± = ± =−α 2 1 5 2 262 0 6831 5 1 5452 3 4548 6 5452( ) . ( . ) . ( . , . )
Table 7.2
Swimming�Data�for�Dependent�Samples
Swimmer
Pretest Time
(in Seconds)
Posttest Time
(in Seconds) Difference (d)
1 58 54 (i�e�,�58�−�54)�=�4
2 62 57 5
3 60 54 6
4 61 56 5
5 63 61 2
6 65 59 6
7 66 64 2
8 69 62 7
9 64 60 4
10 72 63 9
d
–
�=�5�0000
sd�=�2�1602
179Inferences About the Difference Between Two Means
As�the�CI�does�not�contain�the�hypothesized�mean�difference�value�of�0,�we�would�again�
reject�the�null�hypothesis�and�conclude�that�the�mean�pretest-posttest�difference�was�not�
equal�to�0�at�the��05�level�of�significance�(p�<��05)�
The�effect�size�is�computed�to�be�the�following:
�
Cohen d
d
sd
= = =
5
2 1602
2 3146
.
.
which� is� interpreted� as� there� is� approximately� a� two� and� one-third� standard� deviation�
difference�between�the�pretest�and�posttest�mean�swimming�times,�a�very�large�effect�size�
according�to�Cohen’s�subjective�standard�
7.3.1.4 Assumptions
Let� us� return� to� the� assumptions� of� normality,� independence,� and� homogeneity� of� vari-
ance�� For� the� dependent� t� test,� the� assumption� of� normality� is� met� when� the� difference�
scores� are� normally� distributed�� Normality� of� the� difference� scores� can� be� examined� as�
discussed� previously—graphical� methods� (such� as� stem-and-leaf� plots,� box� plots,� histo-
grams,�and/or�Q–Q�plots),�statistical�procedures�such�as�the�S–W�test�(1965),�and/or�skew-
ness�and�kurtosis�statistics��The�assumption�of�independence�is�met�when�the�cases�in�our�
sample�have�been�randomly�selected�from�the�population��If�the�independence�assump-
tion�is�not�met,�then�probability�statements�about�the�Type�I�and�Type�II�errors�will�not�be�
accurate;�in�other�words,�the�probability�of�a�Type�I�or�Type�II�error�may�be�increased�as�a�
result�of�the�assumption�not�being�met��Homogeneity�of�variance�refers�to�equal�variances�
of�the�two�populations��In�later�chapters,�we�will�examine�procedures�for�formally�testing�
for�equal�variances��For�the�moment,�if�the�ratio�of�the�smallest�to�largest�sample�variance�
is� within� 1:4,� then� we� have� evidence� to� suggest� the� assumption� of� homogeneity� of� vari-
ances�is�met��Research�has�shown�that�the�effect�of�heterogeneity�(i�e�,�unequal�variances)�
is�minimal�when�the�sizes�of�the�two�samples,�n1�and�n2,�are�equal,�as�is�the�case�with�the�
dependent�t�test�by�definition�(unless�there�are�missing�data)�
α = .025
–2.262
Critical
value
+2.262
Critical
value
+7.3196
t test
statistic
value
α = .025
FIGuRe 7.2
Critical� regions� and� test� statistic� for� the�
swimming�example�
180 An Introduction to Statistical Concepts
7.3.2 Recommendations
The� following� three� recommendations� are� made� regarding� the� two� dependent� samples�
case��First,�the�dependent�t�test�is�recommended�when�the�normality�assumption�is�met��
Second,�the�dependent�t�test�using�ranks�(Conover�&�Iman,�1981)�is�recommended�when�
the�normality�assumption�is�not�met��Here�you�rank�order�the�difference�scores�from�high-
est�to�lowest,�then�conduct�the�test�on�the�ranked�difference�scores�rather�than�on�the�raw�
difference�scores��However,�more�recent�research�by�Wilcox�(2003)�indicates�that�power�for�
the�dependent�t�can�be�reduced�even�for�slight�departures�from�normality��Wilcox�recom-
mends� several� procedures� not� readily� available� and� beyond� the� scope� of� this� text� (boot-
strap�methods,�trimmed�means,�medians,�Stein’s�method)��Keep�in�mind,�though,�that�the�
dependent�t�test�is�fairly�robust�to�nonnormality�in�most�situations�
Third,�the�nonparametric�Wilcoxon�signed�ranks�test�is�recommended�when�the�data�
are�nonnormal�with�extreme�outliers�(one�or�a�few�observations�that�behave�quite�differ-
ently�from�the�rest)��However,�among�the�disadvantages�of�this�test�are�that�(a)�the�criti-
cal�values�are�not�extensively�tabled�and�two�different�tables�exist�depending�on�sample�
size,� and� (b)� tied� ranks� can� affect� the� results� and� no� optimal� procedure� has� yet� been�
developed� (Wilcox,� 1996)�� For� these� reasons,� the� details� of� the� Wilcoxon� signed� ranks�
test� are� not� described� here�� Note� that� most� major� statistical� packages,� including� SPSS,�
include�options�for�conducting�the�dependent�t�test�and�the�Wilcoxon�signed�ranks�test��
Alternatively,�one�could�conduct�the�Friedman�nonparametric�one-factor�ANOVA,�also�
based�on�ranked�data,�and�which�is�appropriate�for�comparing�two�or�more�dependent�
sample�means��This�test�is�considered�more�fully�in�Chapter�15��These�recommendations�
are�summarized�in�Box�7�1�
7.4 SPSS
Instructions�for�determining�the�independent�samples�t�test�using�SPSS�are�presented�first��
This� is� followed� by� additional� steps� for� examining� the� assumption� of� normality� for� the�
independent�t�test��Next,�instructions�for�determining�the�dependent�samples�t�test�using�
SPSS�are�presented�and�are�then�followed�by�additional�steps�for�examining�the�assump-
tions�of�normality�and�homogeneity�
Independent t Test
Step 1:� In� order� to� conduct� an� independent� t� test,� your� dataset� needs� to� include� a�
dependent�variable�Y�that�is�measured�on�an�interval�or�ratio�scale�(e�g�,�cholesterol),�as�
well�as�a�grouping�variable�X�that�is�measured�on�a�nominal�or�ordinal�scale�(e�g�,�gen-
der)��For�the�grouping�variable,�if�there�are�more�than�two�categories�available,�only�two�
categories�can�be�selected�when�running�the�independent�t�test�(the�ANOVA�is�required�
for� examining� more� than� two� categories)�� To� conduct� the� independent� t� test,� go� to� the�
“Analyze”�in�the�top�pulldown�menu,�then�select�“Compare Means,”�and�then�select�
“Independent-Samples T Test.”�Following�the�screenshot�(step�1)�as�follows�pro-
duces�the�“Independent-Samples T Test”�dialog�box�
181Inferences About the Difference Between Two Means
A
B
C
Independent
t test: Step 1
Step 2:�Next,�from�the�main�“Independent-Samples T Test”�dialog�box,�click�the�
dependent�variable�(e�g�,�cholesterol)�and�move�it�into�the�“Test Variable”�box�by�click-
ing�on�the�arrow�button��Next,�click�the�grouping�variable�(e�g�,�gender)�and�move�it�into�
the�“Grouping Variable”� box� by� clicking� on� the� arrow� button�� You� will� notice� that�
there�are�two�question�marks�next�to�the�name�of�your�grouping�variable��This�is�SPSS�let-
ting�you�know�that�you�need�to�define�(numerically)�which�two�categories�of�the�grouping�
variable�you�want�to�include�in�your�analysis��To�do�that,�click�on�“Define Groups.”
Clicking on
“Options” will
allow you to define a
confidence interval
percentage.
e default is 95%
(corresponding to
an alpha of .05).
Select the variable
of interest from
the list on the left
and use the arrow
to move to the
“Test Variable”
box on the right.
Clicking on
“Define Groups”
will allow you to
define the two
numeric values of
the categories for
the independent
variable.
Independent t test:
Step 2
182 An Introduction to Statistical Concepts
Step 3:�From�the�“Define Groups”�dialog�box,�enter�the�numeric�value�designated�for�
each�of�the�two�categories�or�groups�of�your�independent�variable��Where�it�says�“Group
1,”�type�in�the�value�designated�for�your�first�group�(e�g�,�1,�which�in�our�case�indicated�
that�the�individual�was�a�female),�and�where�it�says�“Group 2,”�type�in�the�value�desig-
nated�for�your�second�group�(e�g�,�2,�in�our�example,�a�male)�(see�step�3�screenshot)�
Independent t test:
Step 3
Click�on�“Continue”�to�return�to�the�original�dialog�box�(see�step�2�screenshot)�and�then�
click�on�“OK”�to�run�the�analysis��The�output�is�shown�in�Table�7�3�
Changing the alpha level (optional):�The�default�alpha�level�in�SPSS�is��05,�and�
thus,�the�default�corresponding�CI�is�95%��If�you�wish�to�test�your�hypothesis�at�an�alpha�
level�other�than��05�(and�thus�obtain�CIs�other�than�95%),�click�on�the�“Options”�button�
located�in�the�top�right�corner�of�the�main�dialog�box�(see�step�2�screenshot)��From�here,�
the�CI�percentage�can�be�adjusted�to�correspond�to�the�alpha�level�at�which�you�wish�your�
hypothesis�to�be�tested�(see�Chapter�6�screenshot�step�3)��(For�purposes�of�this�example,�the�
test�has�been�generated�using�an�alpha�level�of��05�)
Interpreting the output:�The�top�table�provides�various�descriptive�statistics�for�
each� group,� while� the� bottom� box� gives� the� results� of� the� requested� procedure�� There�
you� see� the� following� three� different� inferential� tests� that� are� automatically� provided:�
(1)� Levene’s� test� of� the� homogeneity� of� variance� assumption� (the� first� two� columns� of�
results),�(2)�the�independent�t�test�(which�SPSS�calls�“Equal variances assumed”)�
(the�top�row�of�the�remaining�columns�of�results),�and�(3)�the�Welch�t′�test�(which�SPSS�
calls�“Equal variances not assumed”)�(the�bottom�row�of�the�remaining�columns�
of�results)�
The� first� interpretation� that� must� be� made� is� for� Levene’s� test� of� equal� variances�� The�
assumption� of� equal� variances� is� met� when� Levene’s� test� is� not� statistically� significant��
We� can� determine� statistical� significance� by� reviewing� the� p� value� for� the� F� test�� In� this�
example,�the�p�value�is��090,�greater�than�our�alpha�level�of��05�and�thus�not�statistically�
significant��Levene’s�test�tells�us�that�the�variance�for�cholesterol�level�for�males�is�not�sta-
tistically�significantly�different�than�the�variance�for�cholesterol�level�for�females��Having�
met� the� assumption� of� equal� variances,� the� values� in� the� rest� of� the� table� will� be� drawn�
from�the�row�labeled�“Equal Variances Assumed.”�Had�we�not�met�the�assumption�
of�equal�variances�(p�<�α),�we�would�report�Welch�t′�for�which�the�statistics�are�presented�
on�the�row�labeled�“Equal Variances Not Assumed.”
After� determining� that� the� variances� are� equal,� the� next� step� is� to� examine� the�
results�of�the�independent�t�test��The�t�test�statistic�value�is�−2�4842,�and�the�associated�
p�value�is��023��Since�p�is�less�than�α,�we�reject�the�null�hypothesis��There�is�evidence�to�
suggest�that�the�mean�cholesterol�level�for�males�is�different�than�the�mean�cholesterol�
level�for�females�
183Inferences About the Difference Between Two Means
Table 7.3
SPSS�Results�for�Independent�t�Test
Group Statistics
Gender N Mean Std. Deviation Std. Error Mean
Female 8 185.0000 19.08627 6.74802Cholesterol level
Male 12 215.0000 30.22642 8.72562
Independent Samples Test
Levene's Test for
Equality of
Variances t-Test for Equality of Means
95% Confidence Interval
of the Difference
F Sig. t df
Sig.
(2-Tailed)
Mean
Difference
Std. Error
Difference Lower Upper
3.201 .090 –2.484 .023 –30.00000 12.07615 –55.37104
–2.720
18
17.984 .014 –30.00000 11.03051 –53.17573
–4.62896
– 6.82427
“t” is the t test statistic value.
�e t value in the top row is used when the
assumption of equal variances has been met and
is calculated as:
The t value in the bottom row is the Welch t΄and
is used when the assumption of equal variances
has not been met.
�e table labeled “Group Statistics”
provides basic descriptive statistics for the
dependent variable by group.
SPSS reports the95%
confidence interval of
the difference. This is
interpreted to mean that
95% of the CIs generated
across samples will contain
the true population mean
difference of 0.
�e F test (and p value) of Levene’s Test for
Equality of Variances is reviewed to determine
if the equal variances assumption has been
met. �e result of this test determines which
row of statistics to utilize. In this case, we
meet the assumption and use the statistics
reported in the top row.
“Sig.” is the observed p value
for the independent t test.
It is interpreted as: there is less
than a 3% probability of a sample
mean difference of –30 or greater
occurring by chance if the null
hypothesis is really true (i.e., if
the population mean difference is 0).
�e mean
difference is
simply the
difference
between the
sample mean
cholesterol values.
In other words,
185 – 215 = – 30
The standard error of
the mean difference
is calculated as:
=–sY1 sp n1 n2
11
+
df are the
degrees of
freedom.
For the
independent
samples t test,
they are
calculated as
Equal
variances
assumed
Cholesterol
level
Equal
variances
not assumed
Y2
n1 + n2 – 2.
–2.484
12.075
Y1 – Y2t sY1 – Y2
185 – 215
===
184 An Introduction to Statistical Concepts
Using “Explore” to Examine Normality of Distribution of
Dependent Variable by Categories of Independent Variable
Generating normality evidence: As�alluded�to�earlier�in�the�chapter,�understanding�
the�distributional�shape,�specifically�the�extent�to�which�normality�is�a�reasonable�assump-
tion,�is�important��For�the�independent�t�test,�the�distributional�shape�for�the�dependent�vari-
able�should�be�normally�distributed�for�each�category/group�of�the�independent�variable��As�
with�our�one-sample�t�test,�we�can�again�use�“Explore”�to�examine�the�extent�to�which�the�
assumption�of�normality�is�met�
The� general� steps� for� accessing�“Explore”� have� been� presented� in� previous� chapters�
(e�g�,�Chapter�4),�and�they�will�not�be�reiterated�here��Normality�of�the�dependent�variable�
must�be�examined�for�each�category�of�the�independent�variable,�so�we�must�tell�SPSS�to�
split�the�examination�of�normality�by�group��Click�the�dependent�variable�(e�g�,�cholesterol)�
and�move�it�into�the�“Dependent List”�box�by�clicking�on�the�arrow�button��Next,�click�
the�grouping�variable�(e�g�,�gender)�and�move�it�into�the�“Factor List”�box�by�clicking�
on�the�arrow�button��The�procedures�for�selecting�normality�statistics�were�presented�in�
Chapter� 6,� and� they� remain� the� same� here:� click� on�“Plots”� in� the� upper� right� corner��
Place� a� checkmark� in� the� boxes� for� “Normality plots with tests”� and� also� for�
“Histogram.”�Then�click�“Continue”�to�return�to�the�main�“Explore”�dialog�screen��
From�there,�click�“OK”�to�generate�the�output�
Select the dependent
variable from the list
on the left and use the
arrow to move to the
“Dependent List”
box on the right and the
independent
variable from the list
on the left and use the
arrow to move to the
“Factor List”
box on the right. �en
click on “Plots.”
Generating normality
evidence by group
Interpreting normality evidence:�We�have�already�developed�a�good�under-
standing�of�how�to�interpret�some�forms�of�evidence�of�normality�including�skewness�
185Inferences About the Difference Between Two Means
and� kurtosis,� histograms,� and� boxplots�� As� we� examine� the� “Descriptives”� table,� we�
see� the� output� for� the� cholesterol� statistics� is� separated� for� male� (top� portion)� and�
female�(bottom�portion)�
Mean
95% Con�dence interval
for mean
5% Trimmed mean
Median
Variance
Std. deviation
Minimum
Maximum
Range
Interquartile range
Skewness
Kurtosis
Mean
95% Con�dence interval
for mean
5% Trimmed mean
Median
Variance
Std. deviation
Minimum
Maximum
Range
Interquartile range
Skewness
Kurtosis
Female
Lower bound
Upper bound
Lower bound
Upper bound
Cholesterol level Male
Gender
Descriptives
Statistic Std. Error
215.0000
195.7951
234.2049
215.0000
215.0000
913.636
170.00
260.00
90.00
57.50
.000
–1.446
185.0000
169.0435
200.9565
185.0000
185.0000
364.286
19.08627
160.00
210.00
50.00
37.50
.000
–1.790
1.232
6.74802
.637
30.22642
8.72562
.752
1.481
The� skewness� statistic� of� cholesterol� level� for� the� males� is� �000� and� kurtosis� is�
−1�446—both� within� the� range� of� an� absolute� value� of� 2�0,� suggesting� some� evidence�
of� normality� of� the� dependent� variable� for� males�� Evidence� of� normality� for� the� dis-
tributional�shape�of�cholesterol�level�for�females�is�also�present:�skewness�=��000�and�
kurtosis�=�−1�790�
The�histogram�of�cholesterol�level�for�males�is�not�exactly�what�most�researchers�would�
consider� a� classic� normally� shaped� distribution�� Although� the� histogram� of� cholesterol�
level�for�females�is�not�presented�here,�it�follows�a�similar�distributional�shape�
186 An Introduction to Statistical Concepts
2.0
1.5
1.0
Fr
eq
ue
nc
y
0.5
0.0
160.00 180.00 200.00 220.00
Cholesterol level
240.00 260.00
Histogram for group = Male
Mean = 215.00
Std. dev. = 30.226
N = 12
There�are�a�few�other�statistics�that�can�be�used�to�gauge�normality�as�well��Our�formal�
test�of�normality,�the�Shapiro–Wilk�test�(SW)�(Shapiro�&�Wilk,�1965),�provides�evidence�of�
the�extent�to�which�our�sample�distribution�is�statistically�different�from�a�normal�distri-
bution�� The� output� for� the� S–W� test� is� presented� in� the� following� and� suggests� that� our�
sample�distribution�for�cholesterol�level�is�not�statistically�significantly�different�than�what�
would�be�expected�from�a�normal�distribution—and�this�is�true�for�both�males�(SW�=��949,�
df�=�12,�p�=��617)�and�females�(SW�=��931,�df�=�8,�p�=��525)�
Gender
Male
Female
Cholesterol level
Statistic Statisticdf
Tests of Normality
Kolmogorov–Smirnova
df
Shapiro–Wilk
Sig. Sig.
.129
.159 8
12
8
12 .617
.525
.200
.200 .931
.949
a Lilliefors significance correction
* This is a lower bound of the true significance.
Quantile–quantile� (Q–Q)� plots� are� also� often� examined� to� determine� evidence� of� nor-
mality�� Q–Q� plots� are� graphs� that� plot� quantiles� of� the� theoretical� normal� distribution�
against�quantiles�of�the�sample�distribution��Points�that�fall�on�or�close�to�the�diagonal�line�
suggest�evidence�of�normality��Similar�to�what�we�saw�with�the�histogram,�the�Q–Q�plot�
of�cholesterol�level�for�both�males�and�females�(although�not�shown�here)�suggests�some�
nonnormality��Keep�in�mind�that�we�have�a�relatively�small�sample�size��Thus,�interpreting�
the�visual�graphs�(e�g�,�histograms�and�Q–Q�plots)�can�be�challenging,�although�we�have�
plenty�of�other�evidence�for�normality�
187Inferences About the Difference Between Two Means
2
1
0
–1
–2
Ex
pe
ct
ed
n
or
m
al
175 200 225
Observed value
250 275
Normal Q–Q plot of cholesterol level
For group = male
Examination�of�the�boxplots�suggests�a�relatively�normal�distributional�shape�of�choles-
terol�level�for�both�males�and�females�and�no�outliers�
260.00
240.00
220.00
200.00
C
ho
le
st
er
ol
le
ve
l
180.00
160.00
Male Female
Gender
Considering�the�forms�of�evidence�we�have�examined,�skewness�and�kurtosis�statistics,�
the�S–W�test,�and�the�boxplots,�all�suggest�normality�is�a�reasonable�assumption��Although�
the� histograms� and� Q–Q� plots� suggest� some� nonnormality,� this� is� somewhat� expected�
given� the� small� sample� size�� Generally,� we� can� be� reasonably� assured� we� have� met� the�
assumption� of� normality� of� the� dependent� variable� for� each� group� of� the� independent�
variable��Additionally,�recall�that�when�the�assumption�of�normality�is�violated�with�the�
independent�t�test,�the�effects�on�Type�I�and�Type�II�errors�are�minimal�when�using�a�two-
tailed�test,�as�we�are�conducting�here�(e�g�,�Glass,�Peckham,�&�Sanders,�1972;�Sawilowsky�
&�Blair,�1992)�
188 An Introduction to Statistical Concepts
Dependent t Test
Step 1:�To�conduct�a�dependent�t�test,�your�dataset�needs�to�include�the�two�variables�
(i�e�,�for�the�paired�samples)�whose�means�you�wish�to�compare�(e�g�,�pretest�and�posttest)��
To� conduct� the� dependent� t� test,� go� to� the�“Analyze”� in� the� top� pulldown� menu,� then�
select�“Compare Means,”�and�then�select�“Paired-Samples T Test.”�Following�the�
screenshot�(step�1)�as�follows�produces�the�“Paired-Samples T Test”�dialog�box�
A
B
C
Dependent t test:
Step 1
Step 2:�Click�both�variables�(e�g�,�pretest�and�posttest�as�variable�1�and�variable�2,�respec-
tively)�and�move�them�into�the�“Paired Variables”�box�by�clicking�the�arrow�button��
Both�variables�should�now�appear�in�the�box�as�shown�in�screenshot�step�2��Then�click�on�
“Ok” to�run�the�analysis�and�generate�the�output�
Select the paired
samples from the list
on the left and use the
arrow to move to the
“Paired
Variables”
box on the right.
Then click on “Ok.”
Dependent t test:
Step 2
The�output�appears�in�Table�7�4,�where�again�the�top�box�provides�descriptive�statistics,�
the�middle�box�provides�a�bivariate�correlation�coefficient,�and�the�bottom�box�gives�the�
results�of�the�dependent�t�test�procedure�
189Inferences About the Difference Between Two Means
Table 7.4
SPSS�Results�for�Dependent�t�Test
Paired Samples Statistics
Mean N Std. Deviation Std. Error Mean
Pretest 64.0000 10 4.21637 1.33333Pair 1
Posttest 59.0000 10 3.62093 1.14504
Paired Samples Correlations
N Correlation Sig.
Pair 1 Pretest and posttest 10 .859 .001
Paired Samples Test
Paired Differences
95% Confidence
Interval of the
Difference
Mean
Std.
Deviation
Std. Error
Mean Lower Upper t df Sig. (2-Tailed)
Pair 1 Pretest -
posttest
5.00000 2.16025 .68313 3.45465 6.54535 7.319 9 .000
�e table labeled “Paired Samples
Statistics” provides basic descriptive
statistics for the paired samples.
The table labeled “Paired Samples
Correlations” provides the Pearson
Product Moment Correlation
Coefficient value, a bivariate
correlation coefficient, between the
pretest and posttest values.
In this example, there is a strong
correlation (r = .859) and it is
statistically significant ( p = .001).�e values in this section of
the table are calculated based
on paired differences (i.e., the
difference values between
pretest and posttest scores).
“Sig.” is the
observed p value for
the dependent t test.
It is interpreted as:
there is less than a 1%
probability of a sample
mean difference of 5 or
greater occurring by
chance if the null
hypothesis is really true
(i.e., if the population
mean difference is 0).
df are the
degrees of
freedom.
For the
dependent
samples
t test, they are
calculated as
n – 1.
“t” is the t test statistic value.
The t value is calculated as:
5
0.6831
== d
sd
t 7.3196=
190 An Introduction to Statistical Concepts
Using “Explore” to Examine Normality of Distribution
of Difference Scores
Generating normality evidence:�As�with�the�other�t�tests�we�have�studied,�under-
standing� the� distributional� shape� and� the� extent� to� which� normality� is� a� reasonable�
assumption� is� important�� For� the� dependent� t� test,� the� distributional� shape� for� the� dif-
ference scores� should� be� normally� distributed�� Thus,� we� first� need� to� create� a� new� vari-
able�in�our�dataset�to�reflect�the�difference�scores�(in�this�case,�the�difference�between�the�
pre-�and�posttest�values)��To�do�this,�go�to�“Transform”�in�the�top�pulldown�menu,�then�
select�“Compute Variable.”�Following�the�screenshot�(step�1)�as�follows�produces�the�
“Compute Variable”�dialog�box�
A
B
Computing the
difference score:
Step 1
From�the�“Compute Variable”�dialog�screen,�we�can�define�the�column�header�for�our�
variable�by�typing�in�a�name�in�the�“Target Variable”�box�(no�spaces,�no�special�char-
acters,�and�cannot�begin�with�a�numeric�value)��The�formula�for�computing�our�difference�
score�is�inserted�in�the�“Numeric Expression”�box��To�create�this�formula,�(1)�click�on�
“pretest”�in�the�left�list�of�variables�and�use�the�arrow�key�to�move�it�into�the�“Numeric
Expression” box;�(2)�use�your�keyboard�or�the�keyboard�within�the�dialog�box�to�insert�
a�minus�sign�(i�e�,�dash)�after�“pretest”�in�the�“Numeric Expression”�box;�(3)�click�on�
“posttest”�in�the�left�list�of�variables�and�use�the�arrow�key�to�move�it�into�the�“Numeric
Expression”�box;�and�(4)�click�on�“OK”�to�create�the�new�difference�score�variable�in�
your�dataset�
191Inferences About the Difference Between Two Means
Computing the
difference
score: Step 2
We�can�again�use�“Explore”�to�examine�the�extent�to�which�the�assumption�of�normal-
ity�is�met�for�the�distributional�shape�of�our�newly�created�difference score��The�general�steps�
for� accessing� “Explore”� (see,� e�g�,� Chapter� 4)� and� for� generating� normality� evidence� for�
one�variable�(see�Chapter�6)�have�been�presented�in�previous�chapters,�and�they�will�not�
be�reiterated�here�
Interpreting normality evidence:� We� have� already� developed� a� good� under-
standing�of�how�to�interpret�some�forms�of�evidence�of�normality�including�skewness�and�
kurtosis,�histograms,�and�boxplots��The�skewness�statistic�for�the�difference�score�is��248�
and� kurtosis� is� �050—both� within� the� range� of� an� absolute� value� of� 2�0,� suggesting� one�
form�of�evidence�of�normality�of�the�differences�
The�histogram�for�the�difference�scores�(not�presented�here)�is�not�necessarily�what�most�
researchers� would� consider� a� normally� shaped� distribution�� Our� formal� test� of� normal-
ity,� the� S–W� (SW)� test� (Shapiro� &� Wilk,� 1965),� suggests� that� our� sample� distribution� for�
differences�is�not�statistically�significantly�different�than�what�would�be�expected�from�a�
normal�distribution�(S–W�=��956,�df�=�10,�p�=��734)��Similar�to�what�we�saw�with�the�histo-
gram,�the�Q–Q�plot�of�differences�suggests�some�nonnormality�in�the�tails�(as�the�farthest�
points� are� not� falling� on� the� diagonal� line)�� Keep� in� mind� that� we� have� a� small� sample�
size��Thus,�interpreting�the�visual�graphs�(e�g�,�histograms�and�Q–Q�plots)�can�be�difficult��
Examination�of�the�boxplot�suggests�a�relatively�normal�distributional�shape��Considering�
the�forms�of�evidence�we�have�examined,�skewness�and�kurtosis,�the�S–W�test�of�normal-
ity,� and� boxplots,� all� suggest� normality� is� a� reasonable� assumption�� Although� the� histo-
grams�and�Q–Q�plots�suggested�some�nonnormality,�this�is�somewhat�expected�given�the�
small�sample�size��Generally,�we�can�be�reasonably�assured�we�have�met�the�assumption�
of�normality�of�the�difference�scores�
Generating evidence of homogeneity of variance of difference scores:�
Without�conducting�a�formal�test�of�equality�of�variances�(as�we�do�in�Chapter�9),�a�rough�
benchmark�for�having�met�the�assumption�of�homogeneity�of�variances�when�conducting�
192 An Introduction to Statistical Concepts
the�dependent�t�test�is�that�the�ratio�of�the�smallest�to�largest�variance�of�the�paired�samples�
is�no�greater�than�1:4��The�variance�can�be�computed�easily�by�any�number�of�procedures�
in�SPSS�(e�g�,�refer�back�to�Chapter�3),�and�these�steps�will�not�be�repeated�here��For�our�
paired�samples,�the�variance�of�the�pretest�score�is�17�778�and�the�variance�of�the�posttest�
score�is�13�111—well�within�the�range�of�1:4,�suggesting�that�homogeneity�of�variances�is�
reasonable�
7.5 G*Power
Using�the�results�of�the�independent�samples�t�test�just�conducted,�let�us�use�G*Power�to�
compute�the�post�hoc�power�of�our�test�
Post Hoc Power for the Independent t Test Using G*Power
The� first� thing� that� must� be� done� when� using� G*Power� for� computing� post� hoc� power�
is� to� select� the� correct� test� family�� In� our� case,� we� conducted� an� independent� samples�
t� test;� therefore,� the� default� selection� of�“t tests”� is� the� correct� test� family�� Next,� we�
need� to� select� the� appropriate� statistical� test�� We� use� the� arrow� to� toggle� to� “Means:
Difference between two independent means (two groups).”�The�“Type of
Power Analysis”� desired� then� needs� to� be� selected�� To� compute� post� hoc� power,� we�
need�to�select�“Post hoc: Compute achieved power–given�α, sample size, and
effect size.”
The�“Input Parameters”�must�then�be�specified��The�first�parameter�is�the�selection�of�
whether�your�test�is�one-tailed�(i�e�,�directional)�or�two-tailed�(i�e�,�nondirectional)��In�this�
example,�we�have�a�two-tailed�test�so�we�use�the�arrow�to�toggle�to�“Two.”�The�achieved�
or�observed�effect�size�was�−1�1339��The�alpha�level�we�tested�at�was��05,�and�the�sample�
size�for�females�was�8�and�for�males,�12��Once�the�parameters�are�specified,�simply�click�on�
“Calculate”�to�generate�the�achieved�power�statistics�
The�“Output Parameters”�provide�the�relevant�statistics�given�the�input�just�speci-
fied�� In� this� example,� we� were� interested� in� determining� post� hoc� power� given� a� two-
tailed�test,�with�an�observed�effect�size�of�−1�1339,�an�alpha�level�of��05,�and�sample�sizes�
of� 8� (females)� and� 12� (males)�� Based� on� those� criteria,� the� post� hoc� power� was� �65�� In�
other� words,� with� a� sample� size� of� 8� females� and� 12� males� in� our� study,� testing� at� an�
alpha�level�of��05�and�observing�a�large�effect�of�−1�1339,�then�the�power�of�our�test�was�
�65—the�probability�of�rejecting�the�null�hypothesis�when�it�is�really�false�will�be�65%,�
which�is�only�moderate�power��Keep�in�mind�that�conducting�power�analysis�a�priori�is�
recommended�so�that�you�avoid�a�situation�where,�post�hoc,�you�find�that�the�sample�size�
was�not�sufficient�to�reach�the�desired�power�(given�the�observed�effect�size�and�alpha�
level)��We�were�fortunate�in�this�example�in�that�we�were�still�able�to�detect�a�statistically�
significant�difference�in�cholesterol�levels�between�males�and�females;�however�we�will�
likely�not�always�be�that�lucky�
193Inferences About the Difference Between Two Means
The “Input Parameters” for computing
post hoc power must be specified including:
Once the
parameters are
specified, click on
“Calculate.”
Independent
t test
1. One versus two tailed test;
2. Observed effect size d;
3. Alpha level; and
4. Sample size for each group of the
independent variable.
Post Hoc Power for the Dependent t Test Using G*Power
Now,�let�us�use�G*Power�to�compute�post�hoc�power�for�the�dependent�t�test��First,�the�cor-
rect�test�family�needs�to�be�selected��In�our�case,�we�conducted�a�dependent�samples�t�test;�
therefore,�the�default�selection�of�“t tests”�is�the�correct�test�family��Next,�we�need�to�
select�the�appropriate�statistical�test��We�use�the�arrow�to�toggle�to�“Means: Difference
between two dependent means (matched pairs).”� The� “Type of Power
Analysis”� desired� then� needs� to� be� selected�� To� compute� post� hoc� power,� we� need� to�
select�“Post hoc: Compute achieved power–given α, sample size, and
effect size.”
The�“Input Parameters”� must� then� be� specified�� The� first� parameter� is� the� selec-
tion�of�whether�your�test�is�one-tailed�(i�e�,�directional)�or�two-tailed�(i�e�,�nondirectional)��
194 An Introduction to Statistical Concepts
In�this�example,�we�have�a�two-tailed�test,�so�we�use�the�arrow�to�toggle�to�“Two.”�The�
achieved�or�observed�effect�size�was�2�3146��The�alpha�level�we�tested�at�was��05,�and�the�
total�sample�size�was�10��Once�the�parameters�are�specified,�simply�click�on�“Calculate”�
to�generate�the�achieved�power�statistics�
The�“Output Parameters”�provide�the�relevant�statistics�given�the�input�specified��In�
this�example,�we�were�interested�in�determining�post�hoc�power�given�a�two-tailed�test,�
with� an� observed� effect� size� of� 2�3146,� an� alpha� level� of� �05,� and� total� sample� size� of� 10��
Based�on�those�criteria,�the�post�hoc�power�was��99��In�other�words,�with�a�total�sample�size�
of�10,�testing�at�an�alpha�level�of��05�and�observing�a�large�effect�of�2�3146,�then�the�power�
of�our�test�was�over��99—the�probability�of�rejecting�the�null�hypothesis�when�it�is�really�
false�will�be�greater�than�99%,�about�the�strongest�power�that�can�be�achieved��Again,�con-
ducting�power�analysis�a�priori�is�recommended�so�that�you�avoid�a�situation�where,�post�
hoc,�you�find�that�the�sample�size�was�not�sufficient�to�reach�the�desired�power�(given�the�
observed�effect�size�and�alpha�level)�
Once the
parameters are
specified, click on
“Calculate.”
Dependent
t test
�e “Input Parameters” for computing
post hoc power must be specified including:
1. One versus two tailed test;
2. Observed effect size d;
3. Alpha level; and
4. Sample size for each group of the
independent variable.
195Inferences About the Difference Between Two Means
7.6 Template and APA-Style Write-Up
Next�we�develop�APA-style�paragraphs�describing�the�results�for�both�examples��First�is�a�
paragraph�describing�the�results�of�the�independent�t�test�for�the�cholesterol�example,�and�
this�is�followed�by�dependent�t�test�for�the�swimming�example�
Independent t Test
Recall�that�our�graduate�research�assistant,�Marie,�was�working�with�JoAnn,�a�local�nurse�
practitioner,� to� assist� in� analyzing� cholesterol� levels�� Her� task� was� to� assist� JoAnn� with�
writing�her�research�question�(Is there a mean difference in cholesterol level between males and
females?)�and�generating�the�test�of�inference�to�answer�her�question��Marie�suggested�an�
independent�samples�t�test�as�the�test�of�inference��A�template�for�writing�a�research�ques-
tion�for�an�independent�t�test�is�presented�as�follows:
Is there a mean difference in [dependent variable] between [group 1 of
the independent variable] and [group 2 of the independent variable]?
It�may�be�helpful�to�preface�the�results�of�the�independent�samples�t�test�with�informa-
tion� on� an� examination� of� the� extent� to� which� the� assumptions� were� met� (recall� there�
are� three� assumptions:� normality,� homogeneity� of� variances,� and� independence)�� This�
assists� the� reader� in� understanding� that� you� were� thorough� in� data� screening� prior� to�
conducting�the�test�of�inference�
An independent samples t test was conducted to determine if the mean
cholesterol level of males differed from females. The assumption of
normality was tested and met for the distributional shape of the
dependent variable (cholesterol level) for females. Review of the S-W
test for normality (SW = .931, df = 8, p = .525) and skewness (.000) and
kurtosis (−1.790) statistics suggested that normality of cholesterol
levels for females was a reasonable assumption. Similar results were
found for male cholesterol levels. Review of the S-W test for normality
(S-W = .949, df = 12, p = .617) and skewness (.000) and kurtosis (−1.446)
statistics suggested that normality of males cholesterol levels was
a reasonable assumption. The boxplots suggested a relatively normal
distributional shape (with no outliers) of cholesterol levels for both
males and females. The Q–Q plots and histograms suggested some minor
nonnormality for both male and female cholesterol levels. Due to the
small sample, this was anticipated. Although normality indices gener-
ally suggest the assumption is met, even if there are slight depar-
tures from normality, the effects on Type I and Type II errors will
be minimal given the use of a two-tailed test (e.g., Glass, Peckham, &
Sanders, 1972; Sawilowsky & Blair, 1992). According to Levene’s test,
the homogeneity of variance assumption was satisfied (F = 3.2007, p =
.090). Because there was no random assignment of the individuals to
gender, the assumption of independence was not met, creating a poten-
tial for an increased probability of a Type I or Type II error.
196 An Introduction to Statistical Concepts
It�is�also�desirable�to�include�a�measure�of�effect�size��Recall�our�formula�for�computing�
the�effect�size,�d,�presented�earlier�in�the�chapter��Plugging�in�the�values�for�our�cholesterol�
example,� we� find� an� effect� size� d� of� −1�1339,� which� is� interpreted� according� to� Cohen’s�
(1988)�guidelines�as�a�large�effect:
�
d
Y Y
sp
=
−
=
−
= −1 2
185 215
26 4575
1 1339
.
.
Remember�that�for�the�two-sample�mean�test,�d�indicates�how�many�standard�deviations�
the�mean�of�sample�1�is�from�the�mean�of�sample�2��Thus,�with�an�effect�size�of�−1�1339,�
there�are�nearly�one�and�one-quarter�standard�deviation�units�between�the�mean�choles-
terol�levels�of�males�as�compared�to�females��The�negative�sign�simply�indicates�that�group�
1�has�the�smaller�mean�(as�it�is�the�first�value�in�the�numerator�of�the�formula;�in�our�case,�
the�mean�cholesterol�level�of�females)�
Here�is�an�APA-style�example�paragraph�of�results�for�the�cholesterol�level�data�(remem-
ber�that�this�will�be�prefaced�by�the�paragraph�reporting�the�extent�to�which�the�assump-
tions�of�the�test�were�met)�
As shown in Table 7.3, cholesterol data were gathered from samples of
12 males and 8 females, with a female sample mean of 185 (SD = 19.09)
and a male sample mean of 215 (SD = 30.22). The independent t test indi-
cated that the cholesterol means were statistically significantly dif-
ferent for males and females (t = −2.4842, df = 18, p = .023). Thus, the
null hypothesis that the cholesterol means were the same by gender was
rejected at the .05 level of significance. The effect size d (calculated
using the pooled standard deviation) was −1.1339. Using Cohen’s (1988)
guidelines, this is interpreted as a large effect. The results provide
evidence to support the conclusion that males and females differ in
cholesterol levels, on average. More specifically, males were observed
to have larger cholesterol levels, on average, than females.
Parenthetically,�notice�that�the�results�of�the�Welch�t′�test�were�the�same�as�for�the�inde-
pendent� t� test� (Welch� t′� =� −2�7197,� rounded� df� =� 18,� p� =� �014)�� Thus,� any� deviation� from�
homogeneity�of�variance�did�not�affect�the�results�
Dependent t Test
Marie,�as�you�recall,�was�also�working�with�Mark,�a�local�swimming�coach,�to�assist�in�analyz-
ing�freestyle�swimming�time�before�and�after�swimmers�participated�in�an�intensive�training�
program�� Marie� suggested� a� research� question� (Is there a mean difference in swim time for the
50-meter freestyle event before participation in an intensive training program as compared to swim time
for the 50-meter freestyle event after participation in an intensive training program?)�and�assisted�in�
generating�the�test�of�inference�(specifically�the�dependent�t�test)�to�answer�her�question��A�
template�for�writing�a�research�question�for�a�dependent�t�test�is�presented�as�follows�
Is there a mean difference in [paired sample 1] as compared to
[paired sample 2]?
197Inferences About the Difference Between Two Means
It�may�be�helpful�to�preface�the�results�of�the�dependent�samples�t�test�with�information�on�
the�extent�to�which�the�assumptions�were�met�(recall�there�are�three�assumptions:�normal-
ity,�homogeneity�of�variance,�and�independence)��This�assists�the�reader�in�understanding�
that�you�were�thorough�in�data�screening�prior�to�conducting�the�test�of�inference�
A dependent samples t test was conducted to determine if there was
a difference in the mean swim time for the 50 meter freestyle before
participation in an intensive training program as compared to the
mean swim time for the 50 meter freestyle after participation in an
intensive training program. The assumption of normality was tested
and met for the distributional shape of the paired differences. Review
of the S-W test for normality (SW = .956, df = 10, p = .734) and skew-
ness (.248) and kurtosis (.050) statistics suggested that normality of
the paired differences was reasonable. The boxplot suggested a rela-
tively normal distributional shape, and there were no outliers pres-
ent. The Q–Q plot and histogram suggested minor nonnormality. Due to
the small sample, this was anticipated. Homogeneity of variance was
tested by reviewing the ratio of the raw score variances. The ratio of
the smallest (posttest = 13.111) to largest (pretest = 17.778) variance
was less than 1:4; therefore, there is evidence of the equal variance
assumption. The individuals were not randomly selected; therefore,
the assumption of independence was not met, creating a potential for
an increased probability of a Type I or Type II error.
It�is�also�important�to�include�a�measure�of�effect�size��Recall�our�formula�for�computing�
the�effect�size,�d,�presented�earlier�in�the�chapter��Plugging�in�the�values�for�our�swimming�
example,�we�find�an�effect�size�d�of�2�3146,�which�is�interpreted�according�to�Cohen’s�(1988)�
guidelines�as�a�large�effect:
�
Cohen d
d
sd
= = =
5
2 1602
2 3146
.
.
With� an� effect� size� of� 2�3146,� there� are� about� two� and� a� third� standard� deviation� units�
between�the�pretraining�mean�swim�time�and�the�posttraining�mean�swim�time�
Here�is�an�APA-style�example�paragraph�of�results�for�the�swimming�data�(remember�
that�this�will�be�prefaced�by�the�paragraph�reporting�the�extent�to�which�the�assumptions�
of�the�test�were�met)�
From Table 7.4, we see that pretest and posttest data were collected
from a sample of 10 swimmers, with a pretest mean of 64 seconds (SD =
4.22) and a posttest mean of 59 seconds (SD = 3.62). Thus, swimming times
decreased from pretest to posttest. The dependent t test was conducted
to determine if this difference was statistically significantly dif-
ferent from 0, and the results indicate that the pretest and posttest
means were statistically different (t = 7.319, df = 9, p < .001). Thus,
the null hypothesis that the freestyle swimming means were the same at
both points in time was rejected at the .05 level of significance. The
effect size d (calculated as the mean difference divided by the standard
198 An Introduction to Statistical Concepts
deviation of the difference) was 2.3146. Using Cohen’s (1988) guidelines,
this is interpreted as a large effect. The results provide evidence to
support the conclusion that the mean 50 meter freestyle swimming time
prior to intensive training is different than the mean 50 meter free-
style swimming time after intensive training.
7.7 Summary
In�this�chapter,�we�considered�a�second�inferential�testing�situation,�testing�hypotheses�
about� the� difference� between� two� means�� Several� inferential� tests� and� new� concepts�
were�discussed��New�concepts�introduced�were�independent�versus�dependent�samples,�
the� sampling� distribution� of� the� difference� between� two� means,� the� standard� error� of�
the�difference�between�two�means,�and�parametric�versus�nonparametric�tests��We�then�
moved�on�to�describe�the�following�three�inferential�tests�for�determining�the�difference�
between� two� independent� means:� the� independent� t� test,� the� Welch� t′� test,� and� briefly�
the� Mann–Whitney–Wilcoxon� test�� The� following� two� tests� for� determining� the� differ-
ence� between� two� dependent� means� were� considered:� the� dependent� t� test� and� briefly�
the�Wilcoxon�signed�ranks�test��In�addition,�examples�were�presented�for�each�of�the�t�
tests,� and� recommendations� were� made� as� to� when� each� test� is� most� appropriate�� The�
chapter�concluded�with�a�look�at�SPSS�and�G*Power�(for�post�hoc�power)�as�well�as�devel-
oping�an�APA-style�write-up�of�results��At�this�point,�you�should�have�met�the�following�
objectives:�(a)�be�able�to�understand�the�basic�concepts�underlying�the�inferential�tests�
of�two�means,�(b)�be�able�to�select�the�appropriate�test,�and�(c)�be�able�to�determine�and�
interpret�the�results�from�the�appropriate�test��In�the�next�chapter,�we�discuss�inferential�
tests�involving�proportions��Other�inferential�tests�are�covered�in�subsequent�chapters�
Problems
Conceptual problems
7.1� We�test�the�following�hypothesis:
� H0 1 2 0: µ µ− =
� H1 1 2 0: µ µ− ≠
� The�level�of�significance�is��05�and�H0�is�rejected��Assuming�all�assumptions�are�met�and�
H0�is�true,�the�probability�of�committing�a�Type�I�error�is�which�one�of�the�following?
� a�� 0
� b�� 0�05
� c�� Between��05�and��95
� d�� 0�95
� e�� 1�00
199Inferences About the Difference Between Two Means
7.2� When�H0�is�true,�the�difference�between�two�independent�sample�means�is�a�func-
tion�of�which�one�of�the�following?
� a�� Degrees�of�freedom
� b�� The�standard�error
� c�� The�sampling�distribution
� d�� Sampling�error
7.3� The�denominator�of�the�independent�t�test�is�known�as�the�standard�error�of�the�
difference�between�two�means,�and�may�be�defined�as�which�one�of�the�following?
� a�� The�difference�between�the�two�group�means
� b�� The�amount�by�which�the�difference�between�the�two�group�means�differs�from�
the�population�mean
� c�� The� standard� deviation� of� the� sampling� distribution� of� the� difference� between�
two�means
� d�� All�of�the�above
� e�� None�of�the�above
7.4� In�the�independent�t�test,�the�homoscedasticity�assumption�states�what?
� a�� The�two�population�means�are�equal�
� b�� The�two�population�variances�are�equal�
� c�� The�two�sample�means�are�equal�
� d�� The�two�sample�variances�are�equal�
7.5� Sampling�error�increases�with�larger�samples��True�or�false?
7.6� At� a� given� level� of� significance,� it� is� possible� that� the� significance� test� and� the� CI�
results�will�differ�for�the�same�dataset��True�or�false?
7.7� I� assert� that� the� critical� value� of� t� required� for� statistical� significance� is� smaller� (in�
absolute�value�or�ignoring�the�sign)�when�using�a�directional�rather�than�a�nondirec-
tional�test��Am�I�correct?
7.8� If�a�95%�CI�from�an�independent�t�test�ranges�from�−�13�to�+1�67,�I�assert�that�the�null�
hypothesis�would�not�be�rejected�at�the��05�level�of�significance��Am�I�correct?
7.9� A� group� of� 15� females� was� compared� to� a� group� of� 25� males� with� respect� to� intel-
ligence��To�test�if�the�sample�sizes�are�significantly�different,�which�of�the�following�
tests�would�you�use?
� a�� Independent�t�test
� b�� Dependent�t�test
� c�� z�test
� d�� None�of�the�above
7.10� The� mathematic� ability� of� 10� preschool� children� was� measured� when� they� entered�
their�first�year�of�preschool�and�then�again�in�the�spring�of�their�kindergarten�year��
To�test�for�pre-�to�post-mean�differences,�which�of�the�following�tests�would�be�used?
� a�� Independent�t�test
� b�� Dependent�t�test
� c�� z�test
� d�� None�of�the�above
200 An Introduction to Statistical Concepts
7.11� A� researcher� collected� data� to� answer� the� following� research� question:� Are� there�
mean�differences�in�science�test�scores�for�middle�school�students�who�participate�in�
school-sponsored�athletics�as�compared�to�students�who�do�not�participate?�Which�
of�the�following�tests�would�be�used�to�answer�this�question?
� a�� Independent�t�test
� b�� Dependent�t�test
� c�� z�test
� d�� None�of�the�above
7.12� The�number�of�degrees�of�freedom�for�an�independent�t�test�with�15�females�and�
25�males�is�40��True�or�false?
7.13� I�assert�that�the�critical�value�of�t,�for�a�test�of�two�dependent�means,�will�increase�as�
the�samples�become�larger��Am�I�correct?
7.14� Which�of�the�following�is�NOT�an�assumption�of�the�independent�t�test?
� a�� Normality
� b�� Independence
� c�� Equal�sample�sizes
� d�� Homogeneity�of�variance
7.15� For� which� of� the� following� assumptions� of� the� independent� t� test� is� evidence� pro-
vided�in�the�SPSS�output�by�default?
� a�� Normality
� b�� Independence
� c�� Equal�sample�sizes
� d�� Homogeneity�of�variance
Computational problems
7.1� The�following�two�independent�samples�of�older�and�younger�adults�were�measured�
on�an�attitude�toward�violence�test:
Sample 1 (Older Adult) Data Sample 1 (Younger Adult) Data
42 36 47 45 50 57
35 46 37 58 43 52
52 44 47 43 60 41
51 56 54 49 44 51
55 50 40 49� 55� 56
40 46 41
� a�� Test�the�following�hypotheses�at�the��05�level�of�significance:
� H0 1 2 0: µ µ− =
� H1 1 2 0: µ µ− ≠
� b�� Construct�a�95%�CI�
201Inferences About the Difference Between Two Means
7.2� The�following�two�independent�samples�of�male�and�female�undergraduate�students�
were�measured�on�an�English�literature�quiz:
Sample 1 (Male) Data Sample 1 (Female) Data
5 7 8 9 9 11
10 11 11 13 15 18
13 15 19 20
� a�� Test�the�following�hypotheses�at�the��05�level�of�significance:
� H0 1 2 0: µ µ− =
� H1 1 2 0: µ µ− ≠
� b�� Construct�a�95%�CI�
7.3� The� following� two� independent� samples� of� preschool� children� (who� were� demo-
graphically� similar� but� differed� in� Head� Start� participation)� were� measured� on�
teacher-reported�social�skills�during�the�spring�of�kindergarten:
Sample 1 (Head Start) Data Sample 1 (Non-Head Start) Data
18 14 12 15 12 9
16 10 17 10 18 12
20 16 19 11 8 11
15 13 22 13 10 14
� a�� Test�the�following�hypothesis�at�the��05�level�of�significance:
� H0 1 2 0: µ µ− =
� H1 1 2 0: µ µ− ≠
� b�� Construct�a�95%�CI�
202 An Introduction to Statistical Concepts
7.4� The�following�is�a�random�sample�of�paired�values�of�weight�measured�before�(time�1)�
and�after�(time�2)�a�weight-reduction�program:
Pair 1 2
1 127 130
2 126 124
3 129 135
4 123 127
5 124 127
6 129 128
7 132 136
8 125 130
9 135 131
10 126 128
� a�� Test�the�following�hypothesis�at�the��05�level�of�significance:
� H0 1 2 0: µ µ− =
� H1 1 2 0: µ µ− ≠
� b�� Construct�a�95%�CI�
7.5� Individuals�were�measured�on�the�number�of�words�spoken�during�the�1�minute�prior�
to�exposure�to�a�confrontational�situation��During�the�1�minute�after�exposure,�the�indi-
viduals�were�again�measured�on�the�number�of�words�spoken��The�data�are�as�follows:
Person Pre Post
1 60 50
2 80 70
3 120 80
4 100 90
5 90 100
6 85 70
7 70 40
8 90 70
9 100 60
10 110 100
11 80 100
12 100 70
13 130 90
14 120 80
15 90 50
� a�� Test�the�following�hypotheses�at�the��05�level�of�significance:
� H0 1 2 0: µ µ− =
� H1 1 2 0: µ µ− ≠
� b�� Construct�a�95%�CI�
203Inferences About the Difference Between Two Means
7.6� The�following�is�a�random�sample�of�scores�on�an�attitude�toward�abortion�scale�for�
husband�(sample�1)�and�wife�(sample�2)�pairs:
Pair 1 2
1 1 3
2 2 3
3 4 6
4 4 5
5 5 7
6 7 8
7 7 9
8 8 10
� a�� Test�the�following�hypotheses�at�the��05�level�of�significance:
� H0 1 2 0: µ µ− =
� H1 1 2 0: µ µ− ≠
� b�� Construct�a�95%�CI�
7.7� For� two� dependent� samples,� test� the� following� hypothesis� at� the� �05� level� of�
significance:
� Sample�statistics:�n�=�121;�d
–
�=�10;�sd�=�45�
� H0 1 2 0: µ µ− =
� H1 1 2 0: µ µ− >
7.8� For� two� dependent� samples,� test� the� following� hypothesis� at� the� �05� level� of�
significance�
� Sample�statistics:�n�=�25;�d
–
�=�25;�sd�=�14�
� H0 1 2 0: µ µ− =
� H1 1 2 0: µ µ− >
Interpretive problems
7.1� Using� the� survey� 1� dataset� from� the� website,� use� SPSS� to� conduct� an� independent�
t� test,� where� gender� is� the� grouping� variable� and� the� dependent� variable� is� a� vari-
able�of�interest�to�you��Test�for�the�extent�to�which�the�assumptions�have�been�met��
Calculate� an� effect� size� as� well� as� post� hoc� power�� Then� write� an� APA-style� para-
graph�describing�the�results�
7.2� Using�the�survey�1�dataset�from�the�website,�use�SPSS�to�conduct�an�independent�t test,�
where� the� grouping� variable� is� whether� or� not� the� person� could� tell� the� difference�
between�Pepsi�and�Coke�and�the�dependent�variable�is�a�variable�of�interest�to�you��Test�
for�the�extent�to�which�the�assumptions�have�been�met��Calculate�an�effect�size�as�well�as�
post�hoc�power��Then�write�an�APA-style�paragraph�describing�the�results�
205
8
Inferences About Proportions
Chapter Outline
8�1� Inferences�About�Proportions�Involving�the�Normal�Distribution
8�1�1� Introduction
8�1�2� Inferences�About�a�Single�Proportion
8�1�3� Inferences�About�Two�Independent�Proportions
8�1�4� Inferences�About�Two�Dependent�Proportions
8�2� Inferences�About�Proportions�Involving�the�Chi-Square�Distribution
8�2�1� Introduction
8�2�2� Chi-Square�Goodness-of-Fit�Test
8�2�3� Chi-Square�Test�of�Association
8�3� SPSS
8�4� G*Power
8�5� Template�and�APA-Style�Write-Up
Key Concepts
� 1�� Proportion
� 2�� Sampling�distribution�and�standard�error�of�a�proportion
� 3�� Contingency�table
� 4�� Chi-square�distribution
� 5�� Observed�versus�expected�proportions
In� Chapters� 6� and� 7,� we� considered� testing� inferences� about� means,� first� for� a� single� mean�
(Chapter�6)�and�then�for�two�means�(Chapter�7)��The�major�concepts�discussed�in�those�two�
chapters� included� the� following:� types� of� hypotheses,� types� of� decision� errors,� level� of� sig-
nificance,�power,�confidence�intervals�(CIs),�effect�sizes,�sampling�distributions�involving�the�
mean,�standard�errors�involving�the�mean,�inferences�about�a�single�mean,�inferences�about�
the�difference�between�two�independent�means,�and�inferences�about�the�difference�between�
two� dependent� means�� In� this� chapter,� we� consider� inferential� tests� involving� proportions��
We�define�a�proportion�as�the�percentage�of�scores�falling�into�particular�categories��Thus,�
the�tests�described�in�this�chapter�deal�with�variables�that�are�categorical�in�nature�and�thus�
are� nominal� or� ordinal� variables� (see� Chapter� 1),� or� have� been� collapsed� from� higher-level�
variables�into�nominal�or�ordinal�variables�(e�g�,�high�and�low�scorers�on�an�achievement�test)�
206 An Introduction to Statistical Concepts
The�tests�that�we�cover�in�this�chapter�are�considered�nonparametric�procedures,�also�
sometimes�referred�to�as�distribution-free�procedures,�as�there�is�no�requirement�that�the�
data�adhere�to�a�particular�distribution�(e�g�,�normal�distribution)��Nonparametric�pro-
cedures�are�often�less�preferable�than�parametric�procedures�(e�g�,�t�tests�which�assume�
normality� of� the� distribution)� for� the� following� reasons:� (1)� parametric� procedures� are�
often� robust� to� assumption� violations;� in� other� words,� the� results� are� often� still� inter-
pretable�even�if�there�may�be�assumption�violations;�(2)�nonparametric�procedures�have�
lower�power�relative�to�sample�size;�in�other�words,�rejecting�the�null�hypothesis�if�it�is�
false�requires�a�larger�sample�size�with�nonparametric�procedures;�and�(3)�the�types�of�
research�questions�that�can�be�addressed�by�nonparametric�procedures�are�often�quite�
simple� (e�g�,� while� complex� interactions� of� many� different� variables� can� be� tested� with�
parametric� procedures� such� as� factorial� analysis� of� variance,� this� cannot� be� done� with�
nonparametric�procedures)��Nonparametric�procedures�can�still�be�valuable�to�use�given�
the� measurement� scale(s)� of� the� variable(s)� and� the� research� question;� however,� at� the�
same�time,�it�is�important�that�researchers�recognize�the�limitations�in�using�these�types�
of�procedures�
Research�questions�to�be�asked�of�proportions�include�the�following�examples:
� 1�� Is� the� quarter� in� my� hand� a� fair� or� biased� coin;� in� other� words,� over� repeated�
samples,�is�the�proportion�of�heads�equal�to��50�or�not?
� 2�� Is�there�a�difference�between�the�proportions�of�Republicans�and�Democrats�who�
support�the�local�school�bond�issue?
� 3�� Is�there�a�relationship�between�ethnicity�(e�g�,�African-American,�Caucasian)�and�
type�of�criminal�offense�(e�g�,�petty�theft,�rape,�murder);�in�other�words,�is�the�pro-
portion�of�one�ethnic�group�different�from�another�in�terms�of�the�types�of�crimes�
committed?
Several�inferential�tests�are�covered�in�this�chapter,�depending�on�(a)�whether�there�are�one�
or�two�samples,�(b)�whether�the�two�samples�are�selected�in�an�independent�or�dependent�
manner,�and�(c)�whether�there�are�one�or�more�categorical�variables��More�specifically,�the�
topics� described� include� the� following� inferential� tests:� testing� whether� a� single� propor-
tion�is�different�from�a�hypothesized�value,�testing�whether�two�independent�proportions�
are�different,�testing�whether�two�dependent�proportions�are�different,�and�the�chi-square�
goodness-of-fit� test� and� chi-square� test� of� association�� We� use� many� of� the� foundational�
concepts� previously� covered� in� Chapters� 6� and� 7�� New� concepts� to� be� discussed� include�
the�following:�proportion,�sampling�distribution�and�standard�error�of�a�proportion,�con-
tingency� table,� chi-square� distribution,� and� observed� versus� expected� frequencies�� Our�
objectives�are�that�by�the�end�of�this�chapter,�you�will�be�able�to�(a)�understand�the�basic�
concepts�underlying�tests�of�proportions,�(b)�select�the�appropriate�test,�and�(c)�determine�
and�interpret�the�results�from�the�appropriate�test�
8.1 Inferences About Proportions Involving Normal Distribution
We�have�been�following�Marie,�an�educational�research�graduate�student,�as�she�completes�
tasks�assigned�to�her�by�her�faculty�advisor�
207Inferences About Proportions
Marie’s�advisor�has�received�two�additional�calls�from�individuals�in�other�states�who�
are� interested� in� assistance� with� statistical� analysis�� Knowing� the� success� Marie� has�
had� with� the� previous� consultations,� Marie’s� advisor� requests� that� Marie� work� with�
Tami,�a�staff�member�in�the�Undergraduate�Services�Office�at�Ivy-Covered�University�
(ICU),�and�Matthew,�a�lobbyist�from�a�state�that�is�considering�legalizing�gambling�
In�conversation�with�Marie,�Tami�shares�that�she�recently�read�a�report�that�provided�
national�statistics�on�the�proportion�of�students�that�major�in�various�disciplines��Tami�
wants�to�know�if�there�are�similar�proportions�at�their�institution��Marie�suggests�the�
following� research� question:� Are the sample proportions of undergraduate student college
majors at Ivy Covered University in the same proportions of those nationally?�Marie�suggests�
a�chi-square�goodness-of-fit�test�as�the�test�of�inference��Her�task�is�then�to�assist�Tami�
in�generating�the�test�of�inference�to�answer�her�research�question�
Marie�then�speaks�with�Matthew,�a�lobbyist�who�is�lobbying�against�legalizing�gam-
bling�in�his�state��Matthew�wants�to�determine�if�there�is�a�relationship�between�level�
of�education�and�stance�on�a�proposed�gambling�amendment��Matthew�suspects�that�
the�proportions�supporting�gambling�vary�as�a�function�of�their�education�level��The�
following�research�question�is�suggested�by�Marie:�Is there an association between level of
education and stance on gambling?�Marie�suggests�a�chi-square�test�of�association�as�the�
test�of�inference��Her�task�is�then�to�assist�Matthew�in�generating�the�test�of�inference�
to�answer�his�research�question�
This�section�deals�with�concepts�and�procedures�for�testing�inferences�about�proportions�
that�involve�the�normal�distribution��Following�a�discussion�of�the�concepts�related�to�tests�
of�proportions,�inferential�tests�are�presented�for�situations�when�there�is�a�single�propor-
tion,�two�independent�proportions,�and�two�dependent�proportions�
8.1.1 Introduction
Let�us�examine�in�greater�detail�the�concepts�related�to�tests�of�proportions��First,�a�propor-
tion�represents�the�percentage�of�individuals�or�objects�that�fall�into�a�particular�category��
For� instance,� the� proportion� of� individuals� who� support� a� particular� political� candidate�
might�be�of�interest��Thus,�the�variable�here�is�a�dichotomous,�categorical,�nominal�variable,�
as�there�are�only�two�categories�represented,�support�or�do�not�support�the�candidate�
For�notational�purposes,�we�define�the�population proportion�π�(pi)�as
π =
f
N
where
f� is� the� number� of� frequencies� in� the� population� who� fall� into� the� category� of� interest�
(e�g�,�the�number�of�individuals�in�the�population�who�support�the�candidate)
N�is�the�total�number�of�individuals�in�the�population
For� example,� if� the� population� consists� of� 100� individuals� and� 58� support� the� candidate,�
then�π�=��58�(i�e�,�58/100)��If�the�proportion�is�multiplied�by�100%,�this�yields�the�percent-
age� of� individuals� in� the� population� who� support� the� candidate,� which� in� the� example�
would�be�58%��At�the�same�time,�1�−�π�represents�the�population�proportion�of�individuals�
who�do�not�support�the�candidate,�which�for�this�example�would�be�1�−��58�=��42��If�this�is�
multiplied�by�100%,�this�yields�the�percentage�of�individuals�in�the�population�who�do�not�
support�the�candidate,�which�in�the�example�would�be�42%�
208 An Introduction to Statistical Concepts
In�a�fashion,�the�population�proportion�is�conceptually�similar�to�the�population�mean�if�
the�category�of�interest�(support�of�candidate)�is�coded�as�1�and�the�other�category�(no�sup-
port)�is�coded�as�0��In�the�case�of�the�example�with�100�individuals,�there�are�58�individuals�
coded�1,�42�individuals�coded�0,�and�therefore,�the�mean�would�be��58��To�this�point�then,�we�
have�π�representing�the�population�proportion�of�individuals�supporting�the�candidate�and�
1�−�π�representing�the�population�proportion�of�individuals�not�supporting�the�candidate�
The�population variance of a proportion�can�also�be�determined�by�σ2�=�π(1�−�π),�and�
thus,� the� population� standard� deviation� of� a� proportion� is� σ π π= −( )1 �� These� provide�
us�with�measures�of�variability�that�represent�the�extent�to�which�the�individuals�in�the�
population� vary� in� their� support� of� the� candidate�� For� the� example� population� then,� the�
variance�is�computed�to�be�σ2�=�π(1�−�π)�=��58(1�−��58)�=��58(�42)�=��2436,�and�the�standard�
deviation�is�σ π π= − = − =( ) . ( . ) . (. )1 58 1 58 58 42 �=��4936�
For�the�population�parameters,�we�now�have�the�population�proportion�(or�mean),�the�pop-
ulation�variance,�and�the�population�standard�deviation��The�next�step�is�to�discuss�the�cor-
responding�sample�statistics�for�the�proportion��The�sample proportion�p�is�defined�as
p
f
n
=
where
f�is�the�number�of�frequencies�in�the�sample�that�fall�into�the�category�of�interest�(e�g�,�the�
number�of�individuals�who�support�the�candidate)
n�is�the�total�number�of�individuals�in�the�sample
The� sample� proportion� p� is� thus� a� sample� estimate� of� the� population� proportion� π�� One�
way�we�can�estimate�the�population�variance�is�by�the�sample�variance�s2�=�p(1�−�p),�and�the�
population�standard�deviation�of�a�proportion�can�be�estimated�by�the�sample�standard�
deviation�s p p= −( )1 �
The�next�concept�to�discuss�is�the�sampling�distribution�of�the�proportion��This�is�com-
parable� to� the� sampling� distribution� of� the� mean� discussed� in� Chapter� 5�� If� one� were� to�
take�many�samples,�and�for�each�sample,�compute�the�sample�proportion�p,�then�we�could�
generate�a�distribution�of�p��This�is�known�as�the�sampling distribution of the proportion��
For�example,�imagine�that�we�take�50�samples�of�size�100�and�determine�the�proportion�
for�each�sample��That�is,�we�would�have�50�different�sample�proportions�each�based�on�100�
observations��If�we�construct�a�frequency�distribution�of�these�50�proportions,�then�this�is�
actually�the�sampling�distribution�of�the�proportion�
In�theory,�the�sample�proportions�for�this�example�could�range�from��00�(p�=�0/100)�to�1�00�
(p�=�100/100)�given�that�there�are�100�observations�in�each�sample��One�could�also�examine�
the�variability�of�these�50�sample�proportions��That�is,�we�might�be�interested�in�the�extent�
to�which�the�sample�proportions�vary��We�might�have,�for�one�example,�most�of�the�sample�
proportions�falling�near�the�mean�proportion�of��60��This�would�indicate�for�the�candidate�
data�that�(a)�the�samples�generally�support�the�candidate,�as�the�average�proportion�is��60,�
and�(b)�the�support�for�the�candidate�is�fairly�consistent�across�samples,�as�the�sample�pro-
portions�tend�to�fall�close�to��60��Alternatively,�in�a�second�example,�we�might�find�the�sample�
proportions� varying� quite� a� bit� around� the� mean� of� �60,� say� ranging� from� �20� to� �80�� This�
would� indicate� that� (a)� the� samples� generally� support� the� candidate� again,� as� the� average�
proportion�is��60,�and�(b)�the�support�for�the�candidate�is�not�very�consistent�across�samples,�
leading�one�to�believe�that�some�groups�support�the�candidate�and�others�do�not�
209Inferences About Proportions
The�variability�of�the�sampling�distribution�of�the�proportion�can�be�determined�as�fol-
lows��The�population�variance�of�the�sampling�distribution�of�the�proportion�is�known�as�
the�variance error of the proportion,�denoted�by�σ p
2��The�variance�error�is�computed�as
σ
π π
p
n
2 1=
−( )
where
π�is�again�the�population�proportion
n�is�sample�size�(i�e�,�the�number�of�observations�in�a�single�sample)
The�population�standard�deviation�of�the�sampling�distribution�of�the�proportion�is�known�
as� the� standard error of the proportion,� denoted� by� σp�� The� standard� error� is� an� index�
of� how� variable� a� sample� statistic� (in� this� case,� the� sample� proportion)� is� when� multiple�
samples�of�the�same�size�are�drawn,�and�is�computed�as�follows:
σ
π π
p
n
=
−( )1
This�situation�is�quite�comparable�to�the�sampling�distribution�of�the�mean�discussed�in�
Chapter�5��There�we�had�the�variance�error�and�standard�error�of�the�mean�as�measures�of�
the�variability�of�the�sample�means�
Technically� speaking,� the� binomial� distribution� is� the� exact� sampling� distribution� for� the�
proportion;�binomial�here�refers�to�a�categorical�variable�with�two�possible�categories,�which�is�
certainly�the�situation�here��However,�except�for�rather�small�samples,�the�normal�distribution�
is�a�reasonable�approximation�to�the�binomial�distribution�and�is�therefore�typically�used��The�
reason�we�can�rely�on�the�normal�distribution�is�due�to�the�central�limit�theorem,�previously�
discussed�in�Chapter�5��For�proportions,�the�central�limit�theorem�states�that�as�sample�size�n�
increases,�the�sampling�distribution�of�the�proportion�from�a�random�sample�of�size�n�more�
closely�approximates�a�normal�distribution��If�the�population�distribution�is�normal�in�shape,�
then� the� sampling� distribution� of� the� proportion� is� also� normal� in� shape�� If� the� population�
distribution�is�not�normal�in�shape,�then�the�sampling�distribution�of�the�proportion�becomes�
more�nearly�normal�as�sample�size�increases��As�previously�shown�in�Figure�5�1�in�the�context�
of�the�mean,�the�bottom�line�is�that�if�the�population�is�nonnormal,�this�will�have�a�minimal�
effect�on�the�sampling�distribution�of�the�proportion�except�for�rather�small�samples�
Because� nearly� always� the� applied� researcher� only� has� access� to� a� single� sample,� the�
population� variance� error� and� standard� error� of� the� proportion� must� be� estimated�� The�
sample�variance�error�of�the�proportion�is�denoted�by�sp
2�and�computed�as
s
p p
n
p
2 1=
−( )
where
p�is�again�the�sample�proportion
n�is�sample�size
The�sample�standard�error�of�the�proportion�is�denoted�by�sp�and�computed�as
s
p p
n
p =
−( )1
210 An Introduction to Statistical Concepts
8.1.2 Inferences about a Single proportion
In�the�first�inferential�testing�situation�for�proportions,�the�researcher�would�like�to�
know� whether� the� population� proportion� is� equal� to� some� hypothesized� proportion�
or� not�� This� is� comparable� to� the� one-sample� t� test� described� in� Chapter� 6� where� a�
population�mean�was�compared�against�some�hypothesized�mean��First,�the�hypoth-
eses� to� be� evaluated� for� detecting� whether� a� population� proportion� differs� from� a�
hypothesized� proportion� are� as� follows�� The� null� hypothesis� H0� is� that� there� is� no�
difference�between�the�population�proportion�π�and�the�hypothesized�proportion�π0,�
which�we�denote�as
H0 0: π π=
Here�there�is�no�difference,�or�a�“null”�difference,�between�the�population�proportion�and�
the� hypothesized� proportion�� For� example,� if� we� are� seeking� to� determine� whether� the�
quarter� you� are� flipping� is� a� biased� coin� or� not,� then� a� reasonable� hypothesized� value�
would�be��50,�as�an�unbiased�coin�should�yield�“heads”�about�50%�of�the�time�
The� nondirectional,� scientific,� or� alternative� hypothesis� H1� is� that� there� is� a� difference�
between� the� population� proportion� π� and� the� hypothesized� proportion� π0,� which� we�
denote�as
H1 0: π π≠
The�null�hypothesis�H0�will�be�rejected�here�in�favor�of�the�alternative�hypothesis�H1�
if� the� population� proportion� is� different� from� the� hypothesized� proportion�� As� we�
have�not�specified�a�direction�on�H1,�we�are�willing�to�reject�H0�either�if�π�is�greater�
than�π0�or�if�π�is�less�than�π0��This�alternative�hypothesis�results�in�a�two-tailed�test��
Directional� alternative� hypotheses� can� also� be� tested� if� we� believe� either� that� π� is�
greater�than�π0�or�that�π�is�less�than�π0��In�either�case,�the�more�the�resulting�sample�
proportion�differs�from�the�hypothesized�proportion,�the�more�likely�we�are�to�reject�
the�null�hypothesis�
It�is�assumed�that�the�sample�is�randomly�drawn�from�the�population�(i�e�,�the�assump-
tion�of�independence)�and�that�the�normal�distribution�is�the�appropriate�sampling�distri-
bution��The�next�step�is�to�compute�the�test�statistic�z�as
z
p
s
p
n
p
=
−
=
−
−
π π
π π
0 0
0 01ˆ ( )
where�sp̂�is�estimated�based�on�the�hypothesized�proportion�π0�
The�test�statistic�z�is�then�compared�to�a�critical�value(s)�from�the�unit�normal�distribu-
tion��For�a�two-tailed�test,�the�critical�values�are�denoted�as�±α/2z�and�are�found�in�Table�
A�1��If�the�test�statistic�z�falls�into�either�critical�region,�then�we�reject�H0;�otherwise,�we�fail�
to� reject� H0�� For� a� one-tailed� test,� the� critical� value� is� denoted� as� +αz� for� the� alternative�
hypothesis�H1:�π�>�π0�(i�e�,�a�right-tailed�test)�and�as�−αz�for�the�alternative�hypothesis�
211Inferences About Proportions
H1:� π� <� π0� (i�e�,� a� left-tailed� test)�� If� the� test� statistic� z� falls� into� the� appropriate� critical�
region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�H0�
For� the� two-tailed� test,� a� (1� −� α)%� CI� can� also� be� examined�� The� CI� is� formed� as�
follows:
p z sp± α/ ( )2 ˆ
where
p�is�the�observed�sample�proportion
±α/2z�is�the�tabled�critical�value
sp̂�is�the�sample�standard�error�of�the�proportion
If�the�CI�contains�the�hypothesized�proportion�π0,�then�the�conclusion�is�to�fail�to�reject�
H0;� otherwise,� we� reject� H0�� Simulation� research� has� shown� that� this� CI� procedure�
works�fine�for�small�samples�when�the�sample�proportion�is�near��50;�that�is,�the�normal�
distribution� is� a� reasonable� approximation� in� this� situation�� However,� as� the� sample�
proportion�moves�closer�to�0�or�1,�larger�samples�are�required�for�the�normal�distribu-
tion� to� be� reasonably� approximate�� Alternative� approaches� have� been� developed� that�
appear�to�be�more�widely�applicable��The�interested�reader�is�referred�to�Ghosh�(1979)�
and�Wilcox�(1996)�
Several�points�should�be�noted�about�each�of�the�z�tests�for�proportions�developed�in�
this�chapter��First,�the�interpretation�of�CIs�described�in�this�chapter�is�the�same�as�those�
in�Chapter�7��Second,�Cohen’s�(1988)�measure�of�effect�size�for�proportion�tests�using�z�is�
known�as�h��Unfortunately,�h�involves�the�use�of�arcsine�transformations�of�the�propor-
tions,� which� is� beyond� the� scope� of� this� test�� In� addition,� standard� statistical� software,�
such�as�SPSS,�does�not�provide�measures�of�effect�size�for�any�of�these�tests�
Let�us�consider�an�example�to�illustrate�the�use�of�the�test�of�a�single�proportion��We�fol-
low�the�basic�steps�for�hypothesis�testing�that�we�applied�in�previous�chapters��These�steps�
include�the�following:
� 1�� State�the�null�and�alternative�hypotheses�
� 2�� Select�the�level�of�significance�(i�e�,�alpha,�α)�
� 3�� Calculate�the�test�statistic�value�
� 4�� Make�a�statistical�decision�(reject�or�fail�to�reject�H0)�
Suppose�a�researcher�conducts�a�survey�in�a�city�that�is�voting�on�whether�or�not�to�have�
an�elected�school�board��Based�on�informal�conversations�with�a�small�number�of�influ-
ential�citizens,�the�researcher�is�led�to�hypothesize�that�50%�of�the�voters�are�in�favor�of�
an�elected�school�board��Through�the�use�of�a�scientific�poll,�the�researcher�would�like�to�
know�whether�the�population�proportion�is�different�from�this�hypothesized�value;�thus,�
a� nondirectional,� two-tailed� alternative� hypothesis� is� utilized�� The� null� and� alternative�
hypotheses�are�denoted�as�follows:
H
H
0 0
1 0
:
:
π π
π π
=
≠
212 An Introduction to Statistical Concepts
If�the�null�hypothesis�is�rejected,�this�would�indicate�that�scientific�polls�of�larger�samples�
yield� different� results� and� are� important� in� this� situation�� If� the� null� hypothesis� is� not�
rejected,�this�would�indicate�that�informal�conversations�with�a�small�sample�are�just�as�
accurate�as�a�scientific�larger-sized�sample�
A� random� sample� of� 100� voters� is� taken,� and� 60� indicate� their� support� of� an� elected�
school�board�(i�e�,�p�=��60)��In�an�effort�to�minimize�the�Type�I�error�rate,�the�significance�
level�is�set�at�α�=��01��The�test�statistic�z�is�computed�as
z
p
n
=
−
−
=
−
−
= =
π
π π
0
0 01
60 50
50 1 50
100
10
50 50
100
10
05( )
. .
. ( . )
.
. (. )
.
. 000
2 0000= .
Note� that� the� final� value� for� the� denominator� is� the� standard� error� of� the� proportion�
(i�e�,�sp̂�=��0500),�which�we�will�need�for�computing�the�CI��From�Table�A�1,�we�determine�
the�critical�values�to�be�±α/2�z�=�±�005�z�=�±2�58;�in�other�words,�the�z�value�that�corresponds�
to�the�P(z)�value�closest�to��995�is�when�z�is�equal�to�2�58��As�the�test�statistic�(i�e�,�z�=�2�000)�
does�not�exceed�the�critical�values�and�thus�fails�to�fall�into�a�critical�region,�our�decision�
is�to�fail�to�reject�H0��Our�conclusion�then�is�that�the�accuracy�of�the�scientific�poll�is�not�
any�different�from�the�hypothesized�value�of��50�as�determined�informally�
The�99%�CI�for�the�example�would�be�computed�as�follows:
p z sp± = ± = ± =α/ ( ) . . (. ) . . (. , . )2 60 2 58 0500 60 129 471 729ˆ
Because� the� CI� contains� the� hypothesized� value� of� �50,� our� conclusion� is� to� fail� to� reject�
H0�(the�same�result�found�when�we�conducted�the�statistical�test)��The�conclusion�derived�
from�the�test�statistic�is�always�consistent�with�the�conclusion�derived�from�the�CI��We�can�
interpret�the�CI�as�follows:�99%�of�similarly�constructed�CIs�will�contain�the�hypothesized�
value�of��50�
8.1.3 Inferences about Two Independent proportions
In� our� second� inferential� testing� situation� for� proportions,� the� researcher� would� like� to�
know�whether�the�population�proportion�for�one�group�is�different�from�the�population�
proportion�for�a�second�independent�group��This�is�comparable�to�the�independent�t�test�
described�in�Chapter�7,�where�one�population�mean�was�compared�to�a�second�indepen-
dent� population� mean�� Once� again,� we� have� two� independently� drawn� samples,� as� dis-
cussed�in�Chapter�7�
First,� the� hypotheses� to� be� evaluated� for� detecting� whether� two� independent� popula-
tion�proportions�differ�are�as�follows��The�null�hypothesis�H0�is�that�there�is�no�difference�
between�the�two�population�proportions�π1�and�π2,�which�we�denote�as
H0 1 2 0: π π− =
Here� there� is� no� difference,� or� a� “null”� difference,� between� the� two� population� propor-
tions��For�example,�we�may�be�seeking�to�determine�whether�the�proportion�of�Democratic�
senators�who�support�gun�control�is�equal�to�the�proportion�of�Republican�senators�who�
support�gun�control�
213Inferences About Proportions
The� nondirectional,� scientific,� or� alternative� hypothesis� H1� is� that� there� is� a� difference�
between�the�population�proportions�π1�and�π2,�which�we�denote�as
H1 1 2 0: π π− ≠
The�null�hypothesis�H0�will�be�rejected�here�in�favor�of�the�alternative�hypothesis�H1�if�the�pop-
ulation�proportions�are�different��As�we�have�not�specified�a�direction�on�H1,�we�are�willing�
to�reject�either�if�π1�is�greater�than�π2�or�if�π1�is�less�than�π2��This�alternative�hypothesis�results�
in�a�two-tailed�test��Directional�alternative�hypotheses�can�also�be�tested�if�we�believe�either�
that�π1�is�greater�than�π2�or�that�π1�is�less�than�π2��In�either�case,�the�more�the�resulting�sample�
proportions�differ�from�one�another,�the�more�likely�we�are�to�reject�the�null�hypothesis�
It� is� assumed� that� the� two� samples� are� independently� and� randomly� drawn� from� their�
respective�populations�(i�e�,�the�assumption�of�independence)�and�that�the�normal�distribu-
tion�is�the�appropriate�sampling�distribution��The�next�step�is�to�compute�the�test�statistic�z�as
z
p p
s
p p
p p
n n
p p
=
−
=
−
− +
−
1 2 1 2
1 2
1 2
1
1 1
( )
where�n1�and�n2�are�the�sample�sizes�for�samples�1�and�2,�respectively,�and
p
f f
n n
=
+
+
1 2
1 2
where�f1�and�f2�are�the�number�of�observed�frequencies�for�samples�1�and�2,�respectively��
The�denominator�of�the�z�test�statistic�sp p1 2− �is�known�as�the�standard error of the differ-
ence between two proportions�and�provides�an�index�of�how�variable�the�sample�statistic�
(in�this�case,�the�sample�proportion)�is�when�multiple�samples�of�the�same�size�are�drawn��
This�test�statistic�is�conceptually�similar�to�the�test�statistic�for�the�independent�t�test�
The�test�statistic�z�is�then�compared�to�a�critical�value(s)�from�the�unit�normal�distribu-
tion��For�a�two-tailed�test,�the�critical�values�are�denoted�as�±α/2z�and�are�found�in�Table�
A�1�� If� the� test� statistic� z� falls� into� either� critical� region,� then� we� reject� H0;� otherwise,� we�
fail�to�reject�H0��For�a�one-tailed�test,�the�critical�value�is�denoted�as�+αz�for�the�alternative�
hypothesis�H1:�π1�−�π2�>�0�(i�e�,�a�right-tailed�test)�and�as�−αz�for�the�alternative�hypothesis�
H1:�π1�−�π2�<�0�(i�e�,�a�left-tailed�test)��If�the�test�statistic�z�falls�into�the�appropriate�critical�
region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�H0��It�should�be�noted�that�other�alter-
natives�to�this�test�have�been�proposed�(e�g�,�Storer�&�Kim,�1990)�
For�the�two-tailed�test,�a�(1�−�α)%�CI�can�also�be�examined��The�CI�is�formed�as�follows:
( ) ( )/p p z sp p1 2 2 1 2− ± −α
If� the� CI� contains� 0,� then� the� conclusion� is� to� fail� to� reject� H0;� otherwise,� we� reject� H0��
Alternative�methods�are�described�by�Beal�(1987)�and�Coe�and�Tamhane�(1993)�
Let�us�consider�an�example�to�illustrate�the�use�of�the�test�of�two�independent�propor-
tions��Suppose�a�researcher�is�taste-testing�a�new�chocolate�candy�(“chocolate�yummies”)�
and� wants� to� know� the� extent� to� which� individuals� would� likely� purchase� the� product��
214 An Introduction to Statistical Concepts
As�taste�in�candy�may�be�different�for�adults�versus�children,�a�study�is�conducted�where�
independent� samples� of� adults� and� children� are� given� “chocolate� yummies”� to� eat� and�
asked�whether�they�would�buy�them�or�not��The�researcher�would�like�to�know�whether�
the� population� proportion� of� individuals� who� would� purchase� “chocolate� yummies”� is�
different�for�adults�and�children��Thus,�a�nondirectional,�two-tailed�alternative�hypothesis�
is�utilized��The�null�and�alternative�hypotheses�are�denoted�as�follows:
H
H
0 1 2
1 1 2
0
0
:
:
π π
π π
− =
− ≠
If�the�null�hypothesis�is�rejected,�this�would�indicate�that�interest�in�purchasing�the�prod-
uct�is�different�in�the�two�groups,�and�this�might�result�in�different�marketing�and�packag-
ing�strategies�for�each�group��If�the�null�hypothesis�is�not�rejected,�then�this�would�indicate�
the�product�is�equally�of�interest�to�both�adults�and�children,�and�different�marketing�and�
packaging�strategies�are�not�necessary�
A�random�sample�of�100�children�(sample�1)�and�a�random�sample�of�100�adults�(sam-
ple� 2)� are� independently� selected�� Each� individual� consumes� the� product� and� indicates�
whether�or�not�he�or�she�would�purchase�it��Sixty-eight�of�the�children�and�54�of�the�adults�
state�they�would�purchase�“chocolate�yummies”�if�they�were�available��The�level�of�signifi-
cance�is�set�at�α�=��05��The�test�statistic�z�is�computed�as�follows��We�know�that�n1�=�100,�
n2�=�100,�f1�=�68,�f2�=�54,�p1�=��68,�and�p2�=��54��We�compute�p�to�be
p
f f
n n
=
+
+
=
+
+
= =1 2
1 2
68 54
100 100
122
200
6100.
This�allows�us�to�compute�the�test�statistic�z�as
z
p p
p p
n n
=
−
− +
=
. − .
− +
1 2
61 1 61
1
100
1
100
(1 )
1 1
68 54
. ( . )
1 2
= = =
.
(. )(. )(. )
.
.
2.0290
14
61 39 02
14
0690
The�denominator�of�the�z�test�statistic,�sp p1 2− �=��0690,�is�the�standard�error�of�the�difference�
between�two�proportions,�which�we�will�need�for�computing�the�CI�
The�test�statistic�z�is�then�compared�to�the�critical�values�from�the�unit�normal�distribu-
tion��As�this�is�a�two-tailed�test,�the�critical�values�are�denoted�as�±α/2z�and�are�found�in�
Table�A�1�to�be�±α/2z�=�±�025z�=�±1�9600��In�other�words,�this�is�the�z�value�that�is�closest�to�
a�P(z)�of��975��As�the�test�statistic�z�falls�into�the�upper�tail�critical�region,�we�reject�H0�and�
conclude�that�the�adults�and�children�are�not�equally�interested�in�the�product�
Finally,�we�can�compute�the�95%�CI�as�follows:
( ) ( ) (. . ) . (. ) (. ) (. ) (/p p z sp p1 2 2 1 2 68 54 1 96 0690 14 1352− ± = − ± = ± =−α .. , . )0048 2752
Because�the�CI�does�not�include�0,�we�would�again�reject�H0�and�conclude�that�the�adults�
and�children�are�not�equally�interested�in�the�product��As�previously�stated,�the�conclusion�
215Inferences About Proportions
derived�from�the�test�statistic�is�always�consistent�with�the�conclusion�derived�from�the�CI�
at�the�same�level�of�significance��We�can�interpret�the�CI�as�follows:�for�95%�of�similarly�
constructed�CIs,�the�true�population�proportion�difference�will�not�include�0�
8.1.4 Inferences about Two dependent proportions
In�our�third�inferential�testing�situation�for�proportions,�the�researcher�would�like�to�know�
whether�the�population�proportion�for�one�group�is�different�from�the�population�propor-
tion�for�a�second�dependent�group��This�is�comparable�to�the�dependent�t�test�described�in�
Chapter�7,�where�one�population�mean�was�compared�to�a�second�dependent�population�
mean��Once�again,�we�have�two�dependently�drawn�samples,�as�discussed�in�Chapter�7��For�
example,�we�may�have�a�pretest-posttest�situation�where�a�comparison�of�proportions�over�
time�for�the�same�individuals�is�conducted��Alternatively,�we�may�have�pairs�of�matched�
individuals�(e�g�,�spouses,�twins,�brother-sister)�for�which�a�comparison�of�proportions�is�
of�interest�
First,�the�hypotheses�to�be�evaluated�for�detecting�whether�two�dependent�population�
proportions� differ� are� as� follows�� The� null� hypothesis� H0� is� that� there� is� no� difference�
between�the�two�population�proportions�π1�and�π2,�which�we�denote�as
H0 1 2 0: π π− =
Here�there�is�no�difference�or�a�“null”�difference�between�the�two�population�proportions��
For� example,� a� political� analyst� may� be� interested� in� determining� whether� the� approval�
rating�of�the�president�is�the�same�just�prior�to�and�immediately�following�his�annual�State�
of� the� Union� address� (i�e�,� a� pretest–posttest�situation)�� As� a� second� example,� a� marriage�
counselor�wants�to�know�whether�husbands�and�wives�equally�favor�a�particular�training�
program�designed�to�enhance�their�relationship�(i�e�,�a�couple�situation)�
The� nondirectional,� scientific,� or� alternative� hypothesis� H1� is� that� there� is� a� difference�
between�the�population�proportions�π1�and�π2,�which�we�denote�as�follows:
H1 1 2 0: π π− ≠
The�null�hypothesis�H0�will�be�rejected�here�in�favor�of�the�alternative�hypothesis�H1�if�the�
population�proportions�are�different��As�we�have�not�specified�a�direction�on�H1,�we�are�will-
ing�to�reject�either�if�π1�is�greater�than�π2�or�if�π1�is�less�than�π2��This�alternative�hypothesis�
results�in�a�two-tailed�test��Directional�alternative�hypotheses�can�also�be�tested�if�we�believe�
either�that�π1�is�greater�than�π2�or�that�π1�is�less�than�π2��The�more�the�resulting�sample�pro-
portions�differ�from�one�another,�the�more�likely�we�are�to�reject�the�null�hypothesis�
Before�we�examine�the�test�statistic,�let�us�consider�a�table�in�which�the�proportions�are�
often�presented��As�shown�in�Table�8�1,�the�contingency table�lists�proportions�for�each�of�
Table 8.1
Contingency�Table�for�Two�Samples
Sample 1
Sample 2 “Unfavorable” “Favorable” Marginal Proportions
“Favorable” a b p2
“Unfavorable” c d 1�−�p2
Marginal�proportions 1�−�p1 p1
216 An Introduction to Statistical Concepts
the�different�possible�outcomes��The�columns�indicate�the�proportions�for�sample�1��The�left�
column�contains�those�proportions�related�to�the�“unfavorable”�condition�(or�disagree�or�
no,�depending�on�the�situation),�and�the�right�column,�those�proportions�related�to�the�“favor-
able”�condition�(or�agree�or�yes,�depending�on�the�situation)��At�the�bottom�of�the�columns�
are� the� marginal� proportions� shown� for� the� “unfavorable”� condition,� denoted� by� 1� −� p1,�
and� for� the� “favorable”� condition,� denoted� by� p1�� The� rows� indicate� the� proportions� for�
sample�2��The�top�row�contains�those�proportions�for�the�“favorable”� condition,�and�the�
bottom�row�contains�those�proportions�for�the�“unfavorable”�condition��To�the�right�of�the�
rows�are�the�marginal�proportions�shown�for�the�“favorable”�condition,�denoted�by�p2,�and�
for�the�“unfavorable”�condition,�denoted�by�1�−�p2�
Within�the�box�of�the�table�are�the�proportions�for�the�different�combinations�of�condi-
tions� across� the� two� samples�� The� upper� left-hand� cell� is� the� proportion� of� observations�
that�are�“unfavorable”�in�sample�1�and�“favorable”�in�sample�2�(i�e�,�dissimilar�across�sam-
ples),� denoted�by�a��The�upper�right-hand�cell�is�the�proportion� of�observations� who�are�
“favorable”�in�sample�1�and�“favorable”�in�sample�2�(i�e�,�similar�across�samples),�denoted�
by�b��The�lower�left-hand�cell�is�the�proportion�of�observations�who�are�“unfavorable”�in�
sample� 1� and� “unfavorable”� in� sample� 2� (i�e�,� similar� across� samples),� denoted� by� c�� The�
lower�right-hand�cell�is�the�proportion�of�observations�who�are�“favorable”�in�sample�1�and�
“unfavorable”�in�sample�2�(i�e�,�dissimilar�across�samples),�denoted�by�d�
It�is�assumed�that�the�two�samples�are�randomly�drawn�from�their�respective�popula-
tions�and�that�the�normal�distribution�is�the�appropriate�sampling�distribution��The�next�
step�is�to�compute�the�test�statistic�z�as
z
p p
s
p p
d a
n
p p
=
−
=
−
+−
1 2 1 2
1 2
where�n�is�the�total�number�of�pairs��The�denominator�of�the�z�test�statistic�sp p1 2− �is�again�
known�as�the�standard�error�of�the�difference�between�two�proportions�and�provides�an�
index�of�how�variable�the�sample�statistic�(i�e�,�the�difference�between�two�sample�propor-
tions)�is�when�multiple�samples�of�the�same�size�are�drawn��This�test�statistic�is�conceptu-
ally�similar�to�the�test�statistic�for�the�dependent�t�test�
The�test�statistic�z�is�then�compared�to�a�critical�value(s)�from�the�unit�normal�distribu-
tion��For�a�two-tailed�test,�the�critical�values�are�denoted�as�±α/2z�and�are�found�in�Table�
A�1�� If� the� test� statistic� z� falls� into� either� critical� region,� then� we� reject� H0;� otherwise,� we�
fail�to�reject�H0��For�a�one-tailed�test,�the�critical�value�is�denoted�as�+αz�for�the�alternative�
hypothesis�H1:�π1�−�π2�>�0�(i�e�,�right-tailed�test)�and�as�−αz�for�the�alternative�hypothesis�
H1:� π1� −� π2� <� 0� (i�e�,� left-tailed� test)�� If� the� test� statistic� z� falls� into� the� appropriate� critical�
region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�H0��It�should�be�noted�that�other�alter-
natives�to�this�test�have�been�proposed�(e�g�,�the�chi-square�test�as�described�in�the�follow-
ing�section)��Unfortunately,�the�z�test�does�not�yield�an�acceptable�CI�procedure�
Let�us�consider�an�example�to�illustrate�the�use�of�the�test�of�two�dependent�propor-
tions��Suppose�a�medical�researcher�is�interested�in�whether�husbands�and�wives�agree�
on� the� effectiveness� of� a� new� headache� medication� “No-Head�”� A� random� sample� of�
100�husband-wife�couples�were�selected�and�asked�to�try�“No-Head”�for�2�months��At�
the�end�of�2�months,�each�individual�was�asked�whether�the�medication�was�effective�
or�not�at�reducing�headache�pain��The�researcher�wants�to�know�whether�the�medica-
tion�is�differentially�effective�for�husbands�and�wives��Thus,�a�nondirectional,�two-tailed�
alternative�hypothesis�is�utilized�
217Inferences About Proportions
The�resulting�proportions�are�presented�as�a�contingency�table�in�Table�8�2��The�level�of�
significance�is�set�at�α�=��05��The�test�statistic�z�is�computed�as�follows:
z
p p
s
p p
d a
n
p p
=
−
=
−
+
=
−
+
=
−
= −
−
1 2 1 2
1 2
40 65
15 40
100
25
0742
3 3
(. . )
. .
.
.
. 6693
The�test�statistic�z�is�then�compared�to�the�critical�values�from�the�unit�normal�distribu-
tion�� As� this� is� a� two-tailed� test,� the� critical� values� are� denoted� as� ±α/2z� and� are� found�
in�Table�A�1�to�be�±α/2z�=�±�025z�=�±1�96��In�other�words,�this�is�the�z�value�that�is�closest�
to�a�P(z)�of��975��As�the�test�statistic�z�falls�into�the�lower�tail�critical�region,�we�reject�H0�
and�conclude�that�the�husbands�and�wives�do�not�believe�equally�in�the�effectiveness�of�
“No-Head�”
8.2 Inferences About Proportions Involving Chi-Square Distribution
This�section�deals�with�concepts�and�procedures�for�testing�inferences�about�proportions�
that� involve� the� chi-square� distribution�� Following� a� discussion� of� the� chi-square� distri-
bution� relevant� to� tests� of� proportions,� inferential� tests� are� presented� for� the� chi-square�
goodness-of-fit�test�and�the�chi-square�test�of�association�
8.2.1 Introduction
The�previous�tests�of�proportions�in�this�chapter�were�based�on�the�unit�normal�distri-
bution,� whereas� the� tests� of� proportions� in� the� remainder� of� the� chapter� are� based� on�
the�chi-square distribution��Thus,�we�need�to�become�familiar�with�this�new�distribu-
tion��Like�the�normal�and�t�distributions,�the�chi-square�distribution�is�really�a�family�of�
distributions�� Also,� like� the� t� distribution,� the� chi-square� distribution� family� members�
depend�on�the�number�of�degrees�of�freedom�represented��As�we�shall�see,�the�degrees�
of� freedom� for� the� chi-square� goodness-of-fit� test� are� calculated� as� the� number� of� cat-
egories�(denoted�as�J)�minus�1��For�example,�the�chi-square�distribution�for�one�degree�
of� freedom� (i�e�,� for� a� variable� which� has� two� categories)� is� denoted� by� χ1
2
� as� shown� in�
Figure� 8�1�� This� particular� chi-square� distribution� is� especially� positively� skewed� and�
leptokurtic�(sharp�peak)�
Table 8.2
Contingency�Table�for�Headache�Example
Husband Sample
Wife Sample “Ineffective” “Effective” Marginal Proportions
“Effective” a�=��40 b�=��25 p2�=��65
“Ineffective” c�=��20 d�=��15 1�−�p2�=��35
Marginal�proportions 1�−�p1�=��60 p1�=��40
218 An Introduction to Statistical Concepts
The�figure�also�describes�graphically�the�distributions�for�χ5
2
�and� χ10
2
��As�you�can�see�
in�the�figure,�as�the�degrees�of�freedom�increase,�the�distribution�becomes�less�skewed�
and� less� leptokurtic;� in� fact,� the� distribution� becomes� more� nearly� normal� in� shape� as�
the� number� of� degrees� of� freedom� increase�� For� extremely� large� degrees� of� freedom,�
the�chi-square�distribution�is�approximately�normal��In�general,�we�denote�a�particular�
chi-square� distribution� with� ν� degrees� of� freedom� as� χ ν2�� The� mean� of� any� chi-square�
distribution�is�ν,�the�mode�is�ν�−�2�when�ν�is�at�least�2,�and�the�variance�is�2ν��The�value�of�
chi-square�can�range�from�0�to�positive�infinity��A�table�of�different�percentile�values�for�
many�chi-square�distributions�is�given�in�Table�A�3��This�table�is�utilized�in�the�following�
two�chi-square�tests�
One�additional�point�that�should�be�noted�about�each�of�the�chi-square�tests�of�propor-
tions�developed�in�this�chapter�is�that�there�are�no�CI�procedures�for�either�the�chi-square�
goodness-of-fit�test�or�the�chi-square�test�of�association�
8.2.2 Chi-Square Goodness-of-Fit Test
The�first�test�to�consider�is�the�chi-square goodness-of-fit test��This�test�is�used�to�determine�
whether� the� observed� proportions� in� two� or� more� categories� of� a� categorical� variable� dif-
fer�from�what�we�would�expect�a�priori��For�example,�a�researcher�is�interested�in�whether�
the�current�undergraduate�student�body�at�ICU�is�majoring�in�disciplines�according�to�an�a�
priori�or�expected�set�of�proportions��Based�on�research�at�the�national�level,�the�expected�
proportions�of�undergraduate�college�majors�are�as�follows:��20�education,��40�arts�and�sci-
ences,��10�communications,�and��30�business��In�a�random�sample�of�100�undergraduates�at�
ICU,�the�observed�proportions�are�as�follows:��25�education,��50�arts�and�sciences,��10�com-
munications,�and��15�business��Thus,�the�researcher�would�like�to�know�whether�the�sample�
proportions�observed�at�ICU�fit�the�expected�national�proportions��In�essence,�the�chi-square�
goodness-of-fit�test�is�used�to�test�proportions�for�a�single�categorical�variable�(i�e�,�nominal�
or�ordinal�measurement�scale)�
The� observed proportions� are� denoted� by� pj,� where� p� represents� a� sample� proportion�
and�j�represents�a�particular�category�(e�g�,�education�majors),�where�j�=�1,�…,�J�categories��
The�expected proportions�are�denoted�by�πj,�where�π�represents�an�expected�proportion�
FIGuRe 8.1
Several� members� of� the� family� of� the� chi-
square�distribution�
0.3
0.2
0.1Re
la
tiv
e
fr
eq
ue
nc
y
0
0 5 10
Chi-square
15 20 25
1
5
10
219Inferences About Proportions
and� j� represents� a� particular� category�� The� null� and� alternative� hypotheses� are� denoted�
as�follows,�where�the�null�hypothesis�states�that�the�difference�between�the�observed�and�
expected�proportions�is�0�for�all�categories:
H p for all jj j0 0: ( )− =π
H p for all jj j1 0: ( )− ≠π
The�test�statistic�is�a�chi-square�and�is�computed�by
χ
π
π
2
2
1
=
−
=
∑n pj j
jj
J
( )
where�n�is�the�size�of�the�sample��The�test�statistic�is�compared�to�a�critical�value�from�the�
chi-square�table�(Table�A�3)�α νχ
2
,�where�ν�=�J�−�1��The�degrees�of�freedom�are�1�less�than�the�
total�number�of�categories�J,�because�the�proportions�must�total�to�1�00;�thus,�only�J�−�1�are�
free�to�vary�
If� the� test� statistic� is� larger� than� the� critical� value,� then� the� null� hypothesis� is� rejected�
in� favor� of� the� alternative�� This� would� indicate� that� the� observed� and� expected� propor-
tions�were�not�equal�for�all�categories��The�larger�the�differences�are�between�one�or�more�
observed�and�expected�proportions,�the�larger�the�value�of�the�test�statistic,�and�the�more�
likely�it�is�to�reject�the�null�hypothesis��Otherwise,�we�would�fail�to�reject�the�null�hypoth-
esis,�indicating�that�the�observed�and�expected�proportions�were�approximately�equal�for�
all�categories�
If�the�null�hypothesis�is�rejected,�one�may�wish�to�determine�which�sample�proportions�
are� different� from� their� respective� expected� proportions�� Here� we� recommend� you� con-
duct�tests�of�a�single�proportion�as�described�in�the�preceding�section��If�you�would�like�to�
control�the�experimentwise�Type�I�error�rate�across�a�set�of�such�tests,�then�the�Bonferroni�
method�is�recommended�where�the�α�level�is�divided�up�among�the�number�of�tests�con-
ducted��For�example,�with�an�overall�α�=��05�and�five�categories,�one�would�conduct�five�
tests�of�a�single�proportion,�each�at�the��01�level�of�α�
Another� way� to� determine� which� cells� are� statistically� different� in� observed� to�
expected�proportions�is�to�examine�the�standardized�residuals�which�can�be�computed�
as�follows:
R
O E
E
=
−
Standardized�residuals�that�are�greater�(in�absolute�value�terms)�than�1�96�(when�α�=��05)�or�
2�58�(when�α�=��01)�have�different�observed�to�expected�frequencies�and�are�contributing�to�
the�statistically�significant�chi-square�statistic��The�sign�of�the�residual�provides�informa-
tion�on�whether�the�observed�frequency�is�greater�than�the�expected�frequency�(i�e�,�posi-
tive�value)�or�less�than�the�expected�frequency�(i�e�,�negative�value)�
220 An Introduction to Statistical Concepts
Let� us� return� to� the� example� and� conduct� the� chi-square� goodness-of-fit� test�� The� test�
statistic�is�computed�as�follows:
�
χ
π
π
2
2
1
=
−
=
∑n pj j
jj
J
( )
�
=
−
+
−
+
−
+
−
100
25 20
20
50 40
40
10 10
10
15 30
3
2 2 2 2(. . )
.
(. . )
.
(. . )
.
(. . )
. 00
1
4
=
∑
j
�
= + + + = =
=
∑100 0125 0250 0000 0750 100 1125 11 25
1
4
(. . . . ) (. ) .
j
The� test� statistic� is� compared� to� the� critical� value,� from� Table� A�3,� of� .05 3
2χ � =� 7�8147��
Because� the� test� statistic� is� larger� than� the� critical� value,� we� reject� the� null� hypothesis�
and�conclude�that�the�sample�proportions�from�ICU�are�different�from�the�expected�pro-
portions�at�the�national�level��Follow-up�tests�to�determine�which�cells�are�statistically�
different�in�their�observed�to�expected�proportions�involve�examining�the�standardized�
residuals��In�this�example,�the�standardized�residuals�are�computed�as�follows:
R
O E
E
R
Education
Arts and sciences
=
−
=
−
=
=
−
=
25 20
20
1 118
50 40
40
1 58
.
. 11
10 10
10
0
15 30
30
2 739
R
R
Communication
Bu ess
=
−
=
=
−
= −sin .
The�standardized�residual�for�business�is�greater�(in�absolute�value�terms)�than�1�96�(as�α�=�
�05)�and�thus�suggests�that�there�are�different�observed�to�expected�frequencies�for�students�
majoring�in�business�at�ICU�compared�to�national�estimates,�and�that�this�category�is�the�
one�which�is�contributing�most�to�the�statistically�significant�chi-square�statistic�
8.2.2.1 Effect Size
An�effect�size�for�the�chi-square�goodness-of-fit�test�can�be�computed�by�hand�as�follows,�
where�N�is�the�total�sample�size�and�J�is�the�number�of�categories�in�the�variable:
Effect size
N J
=
−
χ2
1( )
This�effect�size�statistic�can�range�from�0�to�+1,�where�0�indicates�no�difference�between�
the�sample�and�hypothesized�proportions�(and�thus�no�effect)��Positive�one�indicates�the�
221Inferences About Proportions
maximum� difference� between� the� sample� and� hypothesized� proportions� (and� thus� a�
large�effect)��Given�the�range�of�this�value�(0�to�+1�0)�and�similarity�to�a�correlation�coeffi-
cient,�it�is�reasonable�to�apply�Cohen’s�interpretations�for�correlations�as�a�rule�of�thumb��
These�include�the�following:�small�effect�size�=��10;�medium�effect�size�=��30;�and�large�
effect�size�=��50��For�the�previous�example,�the�effect�size�would�be�calculated�as�follows�
and�would�be�interpreted�as�a�small�effect:
Effectsize
N J
=
−
=
−
= =
χ2
1
11 25
100 4 1
11 25
300
0375
( )
.
( )
.
.
8.2.2.2 Assumptions
Two� assumptions� are� made� for� the� chi-square� goodness-of-fit� test:� (1)� observations� are�
independent�(which�is�met�when�a�random�sample�of�the�population�is�selected),�and�
(2)�expected�frequency�is�at�least�5�per�cell�(and�in�the�case�of�the�chi-square�goodness-of-
fit�test,�this�translates�to�an�expected�frequency�of�at�least�5�per�category�as�there�is�only�
one� variable� included� in� the� analysis)�� When� the� expected� frequency� is� less� than� 5,� that�
particular� cell� (i�e�,� category)� has� undue� influence� on� the� chi-square� statistic�� In� other�
words,�the�chi-square�goodness-of-fit�test�becomes�too�sensitive�when�the�expected�values�
are�less�than�5�
8.2.3 Chi-Square Test of association
The�second�test�to�consider�is�the�chi-square test of association��This�test�is�equivalent�
to�the�chi-square�test�of�independence�and�the�chi-square�test�of�homogeneity,�which�
are�not�further�discussed��The�chi-square�test�of�association�incorporates�both�of�these�
tests�(e�g�,�Glass�&�Hopkins,�1996)��The�chi-square�test�of�association�is�used�to�deter-
mine�whether�there�is�an�association�or�relationship�between�two�or�more�categorical�
(i�e�,� nominal� or� ordinal)� variables�� Our� discussion� is,� for� the� most� part,� restricted� to�
the� two-variable� situation� where� each� variable� has� two� or� more� categories�� The� chi-
square�test�of�association�is�the�logical�extension�to�the�chi-square�goodness-of-fit�test,�
which�is�concerned�with�one�categorical�variable��Unlike�the�chi-square�goodness-of-
fit� test� where� the� expected� proportions� are� known� a� priori,� for� the� chi-square� test� of�
association,� the� expected� proportions� are� not� known� a� priori� but� must� be� estimated�
from�the�sample�data�
For� example,� suppose� a� researcher� is� interested� in� whether� there� is� an� association�
between� level� of� education� and� stance� on� a� proposed� amendment� to� legalize� gambling��
Thus,� one� categorical� variable� is� level� of� education� with� the� categories� being� as� follows:�
(1)�less�than�a�high�school�education,�(2)�high�school�graduate,�(3)�undergraduate�degree,�
and�(4)�graduate�school�degree��The�other�categorical� variable� is�stance�on�the�gambling�
amendment�with�the�following�categories:�(1)�in�favor�of�the�gambling�bill�and�(2)�opposed�
to�the�gambling� bill��The�null�hypothesis� is� that�there�is� no�association�between�level� of�
education�and�stance�on�gambling,�whereas�the�alternative�hypothesis�is�that�there�is�some�
association�between�level�of�education�and�stance�on�gambling��The�alternative�would�be�
supported�if�individuals�at�one�level�of�education�felt�differently�about�the�bill�than�indi-
viduals�at�another�level�of�education�
The�data�are�shown�in�Table�8�3,�known�as�a�contingency table�(or�crosstab�table)��As�
there�are�two�categorical�variables,�we�have�a�two-way�or�two-dimensional�contingency�
222 An Introduction to Statistical Concepts
table��Each�combination�of�the�two�variables�is�known�as�a�cell��For�example,�the�cell�
for�row�1,�favor�bill,�and�column�2,�high�school�graduate,�is�denoted�as�cell 12,�the�first�
value� (i�e�,� 1)� referring� to� the� row� and� the� second� value� (i�e�,� 2)� to� the� column�� Thus,�
the� first� subscript� indicates� the� particular� row� r,� and� the� second� subscript� indicates�
the� particular� column� c�� The� row� subscript� ranges� from� r� =� 1,…,� R,� and� the� column�
subscript�ranges�from�c�=�1,…,�C,�where�R�is�the�last�row�and�C�is�the�last�column��This�
example�contains�a�total�of�eight�cells,�two�rows�times�four�columns,�denoted�by�R��
C�=�2��4�=�8�
Each� cell� in� the� table� contains� two� pieces� of� information,� the� number� (or� count� or�
frequencies)�of�observations�in�that�cell�and�the�observed�proportion�in�that�cell��For�
cell 12,� there� are� 13� observations� denoted� by� n12� =� 13� and� an� observed� proportion� of�
�65�denoted�by�p12�=��65��The�observed�proportion�is�computed�by�taking�the�number�
of�observations�in�the�cell�and�dividing�by�the�number�of�observations�in�the�column��
Thus,�for�cell 12,�13�of�the�20�high�school�graduates�favor�the�bill,�or�13/20�=��65��The�col-
umn�information�is�given�at�the�bottom�of�each�column,�known�as�the�column margin-
als��Here�we�are�given�the�number�of�observations�in�a�column,�denoted�by�n�c,�where�
the�“�”�indicates�we�have�summed�across�rows�and�c�indicates�the�particular�column��
For�column�2�(reflecting�high�school�graduates),�there�are�20�observations�denoted�by�
n�2�=�20�
There�is�also�row�information�contained�at�the�end�of�each�row,�known�as�the�row mar-
ginals�� Two� values� are� listed� in� the� row� marginals�� First,� the� number� of� observations� in�
a�row�is�denoted�by�nr�,�where�r�indicates�the�particular�row�and�the�“�”�indicates�we�have�
summed�across�the�columns��Second,�the�expected�proportion�for�a�specific�row�is�denoted�
by�πr�,�where�again� r�indicates� the�particular�row�and�the�“�”�indicates� we�have�summed�
across�the�columns��The�expected�proportion�for�a�particular�row�is�computed�by�taking�
the�number�of�observations�in�that�row�nr��and�dividing�by�the�number�of�total�observa-
tions�n�����Note�that�the�total�number�of�observations�is�given�in�the�lower�right-hand�por-
tion�of�the�figure�and�denoted�as�n���=�80��Thus,�for�the�first�row,�the�expected�proportion�is�
computed�as�π1��=�n1�/n���=�44/80�=��55�
The�null�and�alternative�hypotheses�can�be�written�as�follows:
H p for all cellsrc r0 0: ( ).− =π
H p for all cellsrc r1 0: ( ).− ≠π
Table 8.3
Contingency�Table�for�Gambling�Example
Level of Education
Stance on
Gambling
Less than
High School High School Undergraduate Graduate Row Marginals
“Favor” n11�=�16 n12�=�13 n13�=�10 n14�=�5 n1��=�44
p11�=��80 p12�=��65 p13�=��50 p14�=��25 π1��=��55
“Opposed” n21�=�4 n22�=�7 n23�=�10 n24�=�15 n2��=�36
p21�=��20 p22�=��35 p23�=��50 p24�=��75 π2��=��45
Column�marginals n�1�=�20 n�2�=�20 n�3�=�20 n�4�=�20 n���=�80
223Inferences About Proportions
The�test�statistic�is�a�chi-square�and�is�computed�by
χ
π
π
2
1
2
1
=
−
==
∑∑ n pc
c
C
rc r
rr
R
.
.
.
( )
The�test�statistic�is�compared�to�a�critical�value�from�the�chi-square�table�(Table�A�3)� α χν
2 ,�
where�ν�=�(R�−�1)(C�−�1)��That�is,�the�degrees�of�freedom�are�1�less�than�the�number�of�rows�
times�1�less�than�the�number�of�columns�
If�the�test�statistic�is�larger�than�the�critical�value,�then�the�null�hypothesis�is�rejected�in�
favor�of�the�alternative��This�would�indicate�that�the�observed�and�expected�proportions�
were�not�equal�across�cells�such�that�the�two�categorical�variables�have�some�association��
The�larger�the�differences�between�the�observed�and�expected�proportions,�the�larger�the�
value�of�the�test�statistic�and�the�more�likely�it�is�to�reject�the�null�hypothesis��Otherwise,�
we� would� fail� to� reject� the� null� hypothesis,� indicating� that� the� observed� and� expected�
proportions� were� approximately� equal,� such� that� the� two� categorical� variables� have� no�
association�
If�the�null�hypothesis�is�rejected,�then�one�may�wish�to�determine�for�which�combina-
tion�of�categories�the�sample�proportions�are�different�from�their�respective�expected�
proportions��Here�we�recommend�you�construct�2��2�contingency�tables�as�subsets�of�
the�larger�table�and�conduct�chi-square�tests�of�association��If�you�would�like�to�con-
trol� the� experimentwise� Type� I� error� rate� across� the� set� of� tests,� then� the� Bonferroni�
method� is� recommended� where� the� α� level� is� divided� up� among� the� number� of� tests�
conducted�� For� example,� with� α� =� �05� and� five� 2� ×� 2� tables,� one� would� conduct� five�
tests� each� at� the� �01� level� of� � As� with� the� chi-square� goodness-of-fit� test,� it� is� also�
possible� to� examine� the� standardized� residuals� (which� can� be� requested� in� SPSS)� to�
determine�the�cells�that�have�significantly�different�observed�to�expected�proportions��
Cells�where�the�standardized�residuals�are�greater�(in�absolute�value�terms)�than�1�96�
(when�α�=��05)�or�2�58�(when�α�=��01)�are�significantly�different�in�observed�to�expected�
frequencies�
Finally,�it�should�be�noted�that�we�have�only�considered�two-way�contingency�tables�here��
Multiway�contingency�tables�can�also�be�constructed�and�the�chi-square�test�of�association�
utilized�to�determine�whether�there�is�an�association�among�several�categorical�variables�
Let�us�complete�the�analysis�of�the�example�data��The�test�statistic�is�computed�as
�
χ
π
π
2
1
2
1
=
−
==
∑∑ n pc
c
C
rc r
rr
R
.
.
.
( )
�
=
−
+
−
+
−
+
−
20
80 55
55
20
20 45
45
20
65 55
55
20
35 42 2 2(. . )
.
(. . )
.
(. . )
.
(. . 55
45
2)
.
�
+
−
+
−
+
−
+
−
20
50 55
55
20
50 45
45
20
25 55
55
20
75 42 2 2(. . )
.
(. . )
.
(. . )
.
(. . 55
45
2)
.
� = + + + + + + + =2 2727 2 7778 3636 4444 0909 1111 3 2727 4 0000 13 33. . . . . . . . . 332
224 An Introduction to Statistical Concepts
The�test�statistic�is�compared�to�the�critical�value,�from�Table�A�3,�of�.05 3
2χ �=�7�8147��Because�the�
test�statistic�is�larger�than�the�critical�value,�we�reject�the�null�hypothesis�and�conclude�
that� there� is� an� association� between� level� of� education� and� stance� on� the� gambling�
bill��In�other�words,�stance�on�gambling�is�not�the�same�for�all�levels�of�education��The�
cells�with�the�largest�contribution�to�the�test�statistic�give�some�indication�as�to�where�
the� observed� and� expected� proportions� are� most� different�� Here� the� first� and� fourth�
columns�have�the�largest�contributions�to�the�test�statistic�and�have�the�greatest�differ-
ences�between�the�observed�and�expected�proportions;�these�would�be�of�interest�in�a�
2��2�follow-up�test�
8.2.3.1 Effect Size
Several�measures�of�effect�size,�such�as�correlation�coefficients�and�measures�of�association,�
can� be� requested� in� SPSS� and� are� commonly� reported� effect� size� indices� for� results� from�
chi-square� test� of� association�� Which� effect� size� value� is� selected� depends� in� part� on� the�
measurement�scale�of�the�variable��For�example,�researchers�working�with�nominal�data�can�
select�a�contingency�coefficient,�phi�(for�2�×�2�tables),�Cramer’s�V�(for�tables�larger�than�2�×�2),�
lambda,�or�an�uncertainty�coefficient��Correlation�options�available�for�ordinal�data�include�
gamma,�Somer’s�d,�Kendall’s�tau-b,�and�Kendall’s�tau-c��From�the�contingency�coefficient,�C,�
we�can�compute�Cohen’s�w�as�follows:
w
C
C
=
−
2
21
Cohen’s�recommended�subjective�standard�for�interpreting�w�(as�well�as�the�other�correla-
tion�coefficients�presented)�is�as�follows:�small�effect�size,�w�=��10;�medium�effect�size,�w�=��30;�
and�large�effect�size,�w�=��50��See�Cohen�(1988)�for�further�details�
8.2.3.2 Assumptions
The� same� two� assumptions� that� apply� to� the� chi-square� goodness-of-fit� test� also� apply�
to� the� chi-square� test� of� association,� as� follows:� (1)� observations� are� independent� (which�
is� met� when� a� random� sample� of� the� population� is� selected),� and� (2)� expected� frequency�
is�at�least�5�per�cell��When�the�expected�frequency�is�less�than�5,�that�particular�cell�has�
undue�influence�on�the�chi-square�statistic��In�other�words,�the�chi-square�test�of�association�
becomes�too�sensitive�when�the�expected�values�are�less�than�5�
8.3 SPSS
Once� again,� we� consider� the� use� of� SPSS� for� the� example� datasets�� While� SPSS� does� not�
have�any�of�the�z�procedures�described�in�the�first�part�of�this�chapter,�it�is�capable�of�con-
ducting�both�of�the�chi-square�procedures�covered�in�the�second�part�of�this�chapter,�as�
described�in�the�following�
225Inferences About Proportions
Chi-Square Goodness-of-Fit Test
Step 1: To� conduct� the� chi-square� goodness-of-fit� test,� you� need� one� variable� that� is�
either�nominal�or�ordinal�in�scale�(e�g�,�major)��To�conduct�the�chi-square�goodness-of-fit�
test,�go�to�“Analyze”�in�the�top�pulldown�menu,�then�select�“Nonparametric Tests,”
followed� by�“Legacy Dialogs,”� and� then�“Chi-Square.”� Following� the� screenshot�
(step�1)�as�follows�produces�the�“Chi-Square Goodness-of-Fit”�dialog�box�
A
B
C D
Chi-square
goodness-of-fit:
Step 1
Step 2:� Next,� from� the� main�“Chi-Square Goodness-of-Fit”� dialog� box,� click� the�
variable�(e�g�,�major)�and�move�it�into�the�“Test Variable List”�box�by�clicking�on�the�
arrow�button��In�the�lower�right-hand�portion�of�the�screen�is�a�section�labeled�“Expected
Values.”� The� default� is� to� conduct� the� analysis� with� the� expected� values� equal� for� each�
category� (you� will� see� that� the� radio� button� for� “All categories equal”� is� prese-
lected)��Much�of�the�time,�you�will�want�to�use�different�expected�values��To�define�different�
expected�values,�click�on�the�“Values”�radio�button��Enter�each�expected�value�in�the�box�
below�“Values,”�in�the�same�order�as�the�categories�(e�g�,�first�enter�the�expected�value�for�
category�1�and�then�the�expected�value�for�category�2),�and�then�click�“Add”�to�define�the�
value�in�the�box��This�sets�up�an�expected�value�for�each�category��Repeat�this�process�for�
every�category�of�your�variable�
226 An Introduction to Statistical Concepts
Chi-square
goodness-of-fit:
Step 2a
Enter the expected
value for the
category that
corresponds to the
first numeric value
(e.g., 1).
Click on
“Add” to define the
value expected in
each group.
Repeat this for
each category.
Chi-square
goodness-of-fit:
Step 2b
The expected
values will appear
in rank order from
the first category
to the last
category.
227Inferences About Proportions
Then�click�on�“OK”�to�run�the�analysis��The�output�is�shown�in�Table�8�4�
Table 8.4
SPSS�Results�for�Undergradute�Majors�Example
Observed N reflects the observed frequencies
from your sample.
Expected N reflects the expected values that
were input by the researcher.
The residual is simply the difference between
the observed and expected frequencies.
“Asymp. sig.” is the observed
p value for the chi-square
goodness-of-fit test.
It is interpreted as: there is about
a 1% probability of the sample
proportions occurring by chance
if the null hypothesis is really true
(i.e., if the population proportions
are 20, 40, 10, and 30).
df are the degrees
of freedom.
For the chi-square
goodness-of-fit test,
they are calculated
as J – 1 (i.e., one
less than the
number of categories).
College Major
Chi-Square Test
Frequencies
Education 5.0
10.0
.0
–15.0
20.0
40.0
10.0
30.0
25
50
10
15
100
Arts and sciences
Communications
Business
Total
College Major
Observed N Expected N Residual
Chi-square
df
Asymp. sig. .010
a 0 cells (.0%) have expected
frequencies less than 5. The
minimum expected cell
frequency is 10.0.
3
11.250a
Test Statistics
“Chi-square” is the test statistic value and is calculated as:
= 100
j=1
j=1
J
4 (.25 – .20)2
.20 .40 .10 .30
+++ = 11.25
(.50 – .40)2
(pj – πj)2
(.10 – .10)2 (.15 – .30)2
Σ
Σ2 = n πj
Interpreting the output:� The� top� table� provides� the� frequencies� observed� in� the�
sample� (“Observed N”)� and� the� expected� frequencies� given� the� values� defined� by� the�
researcher�(“Expected N”)��The�“Residual”�is�simply�the�difference�between�the�two�
Ns��The�chi-square�test�statistic�value�is�11�25,�and�the�associated�p�value�is��01��Since�p�is�
less�than�α,�we�reject�the�null�hypothesis��Let�us�translate�this�back�to�the�purpose�of�our�
null� hypothesis� statistical� test�� There� is� evidence� to� suggest� that� the� sample� proportions�
observed�differ�from�the�proportions�of�college�majors�nationally��Follow-up�tests�to�deter-
mine�which�cells�are�statistically�different�in�the�observed�to�expected�proportions�can�be�
228 An Introduction to Statistical Concepts
conducted� by� examining� the� standardized� residuals�� In� this� example,� the� standardized�
residuals�were�computed�previously�as�follows:
R
O E
E
R
Education
Arts and sciences
=
−
=
−
=
=
−
=
25 20
20
1 118
50 40
40
1 58
.
. 11
10 10
10
0
15 30
30
2 739
R
R
Communication
Bu ess
=
−
=
=
−
= −sin .
The�standardized�residual�for�business�is�greater�(in�absolute�value�terms)�than�1�96�(given�
α� =� �05)� and� thus� suggests� that� there� are� different� observed� to� expected� frequencies� for�
students�majoring�in�business�at�ICU�compared�to�national�estimates��This�category�is�the�
one�contributing�most�to�the�statistically�significant�chi-square�statistic�
The�effect�size�can�be�calculated�as�follows:
Effect size
N J
=
−
=
−
= =
χ2
1
11 25
100 4 1
11 25
300
0375
( )
.
( )
.
.
Chi-Square Test of Association
Step 1:�To�conduct�a�chi-square�test�of�association,�you�need�two�categorical�variables�(nomi-
nal�and/or�ordinal)�whose�frequencies�you�wish�to�associate�(e�g�,�education�level�and�gambling�
stance)��To�compute�the�chi-square�test�of�association,�go�to�“Analyze”�in�the�top�pulldown,�
then�select�“Descriptive Statistics,”�and�then�select�the�“Crosstabs”�procedure�
A
B
C
Chi-square test
of association:
Step 1
229Inferences About Proportions
Step 2:�Select�the�dependent�variable�and�move�it�into�the�“Row(s)”�box�by�clicking�on�
the�arrow�key�[e�g�,�here�we�used�stance�on�gambling�as�the�dependent�variable�(1�=�support;�
0�=�not�support)]��Then�select�the�independent�variable�and�move�it�into�the�“Column(s)”�
box� [in� this� example,� level� of� education� is� the� independent� variable� (1� =� less� than� high�
school;�2�=�high�school;�3�=�undergraduate;�4�=�graduate)]�
Select the variable
of interest from
the list on the left
and use the arrow
to move to the
boxes on the right.
Clicking on “Cells” provides
options for what statistics
to display in the cells.
See step 4.
Clicking on “Format” allows
the option of displaying
the categories in
ascending or descending
order.
See step 5.
Chi-square test
of association:
Step 2
Clicking on “Statistics”
will allow you to select
various statistics to
generate (including the
chi-square test statistics
value and various
correlation coefficients).
See step 3.
The dependent
variable should be
displayed in the
row(s) and the
independent
variable in the
column(s).
Step 3:�In�the�top�right�corner�of�the�“Crosstabs”�dialog�box�(see�screenshot�step�2),�
click� on� the� button� labeled�“Statistics.”� From� here,� placing� a� checkmark� in� the� box�
for� “Chi-square”� will� produce� the� chi-square� test� statistic� value� and� resulting� null�
hypothesis� statistical� test� results� (including� degrees� of� freedom� and� p� value)�� Also� from�
“Statistics,”�you�can�select�various�measures�of�association�that�can�serve�as�an�effect�
size� (i�e�,� correlation� coefficient� values)�� Which� correlation� is� selected� should� depend� on�
the� measurement� scales� of� your� variables�� We� are� working� with� two� nominal� variables;�
thus,�for�purposes�of�this�illustration,�we�will�select�both�“Phi and Cramer’s V”�and�
“Contingency coefficient”�just�to�illustrate�two�different�effect�size�indices�(although�
it�is�standard�practice�to�use�and�report�only�one�effect�size)��We�will�use�the�contingency�
coefficient�to�compute�Cohen’s�w��Click�on�“Continue”�to�return�to�the�main�“Crosstabs”�
dialog�box�
230 An Introduction to Statistical Concepts
Chi-square test
of association:
Step 3
Step 4:�In�the�top�right�corner�of�the�“Crosstabs”�dialog�box�(see�screenshot�step�2),�
click�on�the�button�labeled�“Cells.”�From�the “Cells”�dialog�box,�options�are�available�
for�selecting�counts�and�percentages��We�have�requested�“Observed”�and�“Expected”�
counts,� “Column”� percentages,� and� “Standardized”� residuals�� We� will� review� the�
expected� counts� to� determine� if� the� assumption� of� five� expected� frequencies� per� cell� is�
met��We�will�use�the�standardized�residuals�post�hoc�if�the�results�of�the�test�are�statisti-
cally�significant�to�determine�which�cell(s)�is�most�influencing�the�chi-square�value��Click�
“Continue”�to�return�to�the�main�“Crosstabs”�dialog�box�
Ch-square test
of association:
Step 4
231Inferences About Proportions
Step 5:� In� the� top� right� corner� of� the� “Crosstabs”� dialog� box� (see� screenshot�
step�2),�click�on�the�button�labeled�“Format.”�From�the�“Format”�dialog�box,�options�
are� available� for� determining� which� order,� “Ascending”� or� “Descending,”� you�
want� the� row� values� presented� in� the� contingency� table� (we� asked� for� descending� in�
this� example,� such� that� row� 1� was� gambling� =� 1� and� row� 2� was� gambling� =� 0)�� Click�
“Continue”� to� return� to� the� main�“Crosstabs”� dialog� box�� Then� click� on�“OK”� to�
run�the�analysis�
Chi-square test
of association:
Step 5
Interpreting the Output:�The�output�appears�in�Table�8�5,�where�the�top�box�(“Case
Processing Summary”)�provides�information�on�the�sample�size�and�frequency�of�miss-
ing� data� (if� any)�� The�“Crosstabulation”� table� is� next� and� provides� the� contingency�
table� (i�e�,� counts,� percentages,� and� standardized� residuals)�� The�“Chi-Square Tests”�
box� gives� the� results� of� the� procedure� (including� chi-square� test� statistic� value� labeled�
“Pearson Chi-Square,”�degrees�of�freedom,�and�p�value�labeled�as�“Asymp. Sig.”)��
The� likelihood� ratio� chi-square� uses� a� different� mathematical� formula� than� the� Pearson�
chi-square;�however�for�large�sample�sizes,�the�values�for�the�likelihood�ratio�chi-square�
and�the�Pearson�chi-square�should�be�similar�(and�rarely�should�the�two�statistics�suggest�
different�conclusions�in�terms�of�rejecting�or�failing�to�reject�the�null�hypothesis)��The�lin-
ear�by�linear�association�statistic,�also�known�as�the�Mantel-Haenszel�chi-square,�is�based�
on�the�Pearson�correlation�and�tests�whether�there�is�a�linear�association�between�the�two�
variables�(and�thus�should�not�be�used�for�nominal�variables)�
For�the�contingency�coefficient,�C,�of��378,�we�compute�Cohen’s�w�effect�size�as�follows:�
w
C
C
=
−
=
−
=
−
= =
2
2
2
21
378
1 378
143
1 143
167 408
.
.
.
.
. .
Cohen’s�w�of��408�would�be�interpreted�as�a�moderate�to�large�effect��Cramer’s�V,�as�seen�in�
the�output,�is��408�and�would�be�interpreted�similarly—a�moderate�to�large�effect�
8.4 G*Power
A�priori�power�can�be�determined�using�specialized�software�(e�g�,�Power�and�Precision,�
Ex-Sample,�G*Power)�or�power�tables�(e�g�,�Cohen,�1988),�as�previously�described��However,�
since� SPSS� does� not� provide� power� information� for� the� results� of� the� chi-square� test� of�
association�just�conducted,�let�us�use�G*Power�to�compute�the�post�hoc�power�of�our�test�
232 An Introduction to Statistical Concepts
Table 8.5
SPSS�Results�for�Gambling�Example
Review the standardized residuals to
determine which cell(s) are
contributing to the statistically
significant chi-square value.
Standardized residuals greater than
an absolute value of 1.96 (critical
value when alpha=.05) indicate that
cell is contributing to the association
between the variables.
In this case, only one cell,
graduate/do not support, has a
standardized residual of 2.0 and thus
is contributing to the relationship.
When analyzing the
percentages in the crosstab
table, compare the
categories of the dependent
variable (rows) across the
columns of the independent
variable (columns).
For example, of respondents
with a high school diploma,
65% support gambling
Zero cells have expected counts less than five,
thus we have met this assumption of the
chi-square test of association.
Degrees of freedom are computed as:
(Rows–1)(Columns – 1) = (2 – 1)(4 – 1) = 3
The probability is less than 1% (see “Asymp.
sig.”) that we would see these proportions by
random chance if the proportions were all equal
(i.e., if the null hypothesis were really true).
We have a 2 × 4 table thus Cramer’s V is
appropriate. It is statistically significant,
and using Cohen’s interpretations, reflects a
moderate to large effect size.
“Pearson Chi-square” is the test statistic value and is
calculated as (see Section 8.2.3 for the full computation):
2 = n.c
πr.
R
r=1
Σ
C (prc–πr.)
2
c=1
Σ
Observed and
expected counts
The contingency coefficient can be used to
compute Cohen’s w, a measure of effect
size as follows:
w = = = .408
C 2 .3782
1–.37821–C 2
Stance on Gambling * Level of Education Crosstabulation
Chi-Square Tests
Symmetric Measures
Cases
Valid Missing Total
N N NPercent Percent Percent
100.080.00100.080Stance on gambling*
Level of education
Level of Education
Less Than
High School High School Undergraduate Graduate Total
Stance on gambling Support
Do not support
Count
% Within level of
education
Std. residual
Count
Expected count
% Within level of
education
Std. residual
Count
Expected count
% Within level of
education
Pearson chi-square
Likelihood ratio
Linear-by-linear
association
N of valid cases
16
11.0
80.0%
1.5
4
9.0
20.0%
–1.7
20
20.0
100.0%
20
–.7
35.0%
9.0
7
.6
65.0%
11.0
13
20.0
100.0%
20
2.0
75.0%
9.0
15
–1.8
25.0%
11.0
5
20.0
100.0%
45.0%
36.0
36
55.0%
44.0
44
80
80.0
100.0%
20
20.0
100.0%
10
11.0
50.0%
–.3
10
9.0
50.0%
.3
Value
13.333a
13.969
12.927
80
a0 cells (.0%) have expected co unt less than 5. The minimum
expected count is 9.00.
3
3
1 .000
Value
.408
.408
.378
80
Nominal by nominal
N of valid cases
.004
.004
.004
Approx. Sig.
.003
.004
df
Asymp. Sig.
(2-Sided)
Phi
Cramer’s V
Contingency coefficient
Case Processing Summary
Expected count
233Inferences About Proportions
Post Hoc Power for the Chi-Square Test
of Association Using G*Power
The�first�thing�that�must�be�done�when�using�G*Power�for�computing�post�hoc�power�is�
to� select� the� correct� test� family�� In� our� case,� we� conducted� a� chi-square� test� of� associa-
tion;� therefore,� the� toggle� button� must� be� used� to� change� the� test� family� to� χ2�� Next,� we�
need�to�select� the�appropriate� statistical� test��We� toggle�to�“Goodness-of-fit tests:
Contingency tables.”�The�“Type of power analysis”�desired�then�needs�to�be�
selected��To�compute�post�hoc�power,�we�need�to�select�“Post hoc: Compute achieved
power—given α, sample size, and effect size.”
The� “Input Parameters”� must� then� be� specified�� The� first� parameter� is� speci-
fication� of� the� effect� size� w� (this� was� computed� by� hand� from� the� contingency� coef-
ficient� and� w� =� �408)�� The� alpha� level� we� tested� at� was� �05,� the� sample� size� was� 80,�
and�the�degrees�of�freedom�were�3��Once�the�parameters�are�specified,�simply�click�on�
“Calculate”�to�generate�the�achieved�power�statistics�
The�“Output Parameters”�provide�the�relevant�statistics�given�the�input�just�speci-
fied��In�this�example,�we�were�interested�in�determining�post�hoc�power�given�a�two-tailed�
test,�with�an�observed�effect�size�of��408,�an�alpha�level�of��05,�and�sample�size�of�80��Based�
on�those�criteria,�the�post�hoc�power�was��88��In�other�words,�with�a�sample�size�of�80,�test-
ing�at�an�alpha�level�of��05�and�observing�a�moderate�to�large�effect�of��408,�then�the�power�
of�our�test�was��88—the�probability�of�rejecting�the�null�hypothesis�when�it�is�really�false�
was�88%,�which�is�very�high�power��Keep�in�mind�that�conducting�power�analysis�a�priori�
is�recommended�so�that�you�avoid�a�situation�where,�post�hoc,�you�find�that�the�sample�
size�was�not�sufficient�to�reach�the�desired�level�of�power�(given�the�observed�effect�size�
and�alpha�level)�
Once the
parameters are
specified, click on
“Calculate.”�e “Input Parameters” for computing
post hoc power must be specified including:
Chi-square test
of association
1. Observed effect size w
2. Alpha level
3. Total sample size
4. Degrees of freedom
234 An Introduction to Statistical Concepts
8.5 Template and APA-Style Write-Up
We�finish�the�chapter�by�presenting�templates�and�APA-style�write-ups�for�our�examples��
First�we�present�an�example�paragraph�detailing�the�results�of�the�chi-square�goodness-of-
fit�test�and�then�follow�this�by�the�chi-square�test�of�association�
Chi-Square Goodness-of-Fit Test
Recall�that�our�graduate�research�assistant,�Marie,�was�working�with�Tami,�a�staff�member�in�
the�Undergraduate�Services�Office�at�ICU,�to�assist�in�analyzing�the�proportions�of�students�
enrolled� in� undergraduate� majors�� Her� task� was� to� assist� Tami� with� writing� her� research�
question�(Are the sample proportions of undergraduate student college majors at ICU in the same
proportions as those nationally?)�and�generating�the�statistical�test�of�inference�to�answer�her�
question��Marie�suggested�a�chi-square�goodness-of-fit�test�as�the�test�of�inference��A�tem-
plate� for� writing� a� research� question� for� a� chi-square� goodness-of-fit� test� is� presented� as�
follows:
Are the sample proportions of [units in categories] in the same pro-
portions of those [identify the source to which the comparison is
being made]?
It�may�be�helpful�to�include�in�the�results�of�the�chi-square�goodness-of-fit�test�information�
on�an�examination�of�the�extent�to�which�the�assumptions�were�met�(recall�there�are�two�
assumptions:�independence�and�expected�frequency�of�at�least�5�per�cell)��This�assists�the�
reader�in�understanding�that�you�were�thorough�in�data�screening�prior�to�conducting�the�
test�of�inference�
A chi-square goodness-of-fit test was conducted to determine if the
sample proportions of undergraduate student college majors at ICU
were in the same proportions of those reported nationally. The test
was conducted using an alpha of .05. The null hypothesis was that the
proportions would be as follows: .20 education, .40 arts and sciences,
.10 communications, and .30 business. The assumption of an expected
frequency of at least 5 per cell was met. The assumption of indepen-
dence was met via random selection.
As shown in Table 8.4, there was a statistically significant differ-
ence between the proportion of undergraduate majors at ICU and those
reported nationally (χ2 = 11.250, df = 3, p = .010). Thus, the null
hypothesis that the proportions of undergraduate majors at ICU par-
allel those expected at the national level was rejected at the .05
level of significance. The effect size (χ2/[N(J − 1)]) was .0375, and
interpreted using Cohen’s guide (1988) as a very small effect.
Follow-up tests were conducted by examining the standardized residu-
als. The standardized residual for business was −2.739 and thus sug-
gests that there are different observed to expected frequencies for
students majoring in business at ICU compared to national estimates.
235Inferences About Proportions
Therefore, business is the college major that is contributing most
to the statistically significant chi-square statistic.
Chi-Square Test of Association
Marie,�our�graduate�research�assistant,�was�also�working�with�Matthew,�a�lobbyist�inter-
ested�in�examining�the�association�between�education�level�and�stance�on�gambling��Marie�
was�tasked�with�assisting�Matthew�in�writing�his�research�question�(Is there an association
between level of education and stance on gambling?)� and� generating� the� test� of� inference� to�
answer�his�question��Marie�suggested�a�chi-square�test�of�association�as�the�test�of�infer-
ence��A�template�for�writing�a�research�question�for�a�chi-square�test�of�association�is�pre-
sented�as�follows:
Is there an association between [independent variable] and [dependent
variable]?
It� may� be� helpful� to� include� in� the� results� of� the� chi-square� test� of� association� informa-
tion�on�the�extent�to�which�the�assumptions�were�met�(recall�there�are�two�assumptions:�
independence� and� expected� frequency� of� at� least� 5� per� cell)�� This� assists� the� reader� in�
understanding� that� you� were� thorough� in� data� screening� prior� to� conducting� the� test� of�
inference�� It� is� also� desirable� to� include� a� measure� of� effect� size�� Given� the� contingency�
coefficient,�C,�of��378,�we�computed�Cohen’s�w�effect�size�to�be��408,�which�would�be�inter-
preted�as�a�moderate�to�large�effect�
A chi-square test of association was conducted to determine if there
was a relationship between level of education and stance on gambling.
The test was conducted using an alpha of .01. It was hypothesized
that there was an association between the two variables. The assump-
tion of an expected frequency of at least 5 per cell was met. The
assumption of independence was not met since the respondents were
not randomly selected; thus, there is an increased probability of a
Type I error.
From Table 8.5, we can see from the row marginals that 55% of the
individuals overall support gambling. However, lower levels of edu-
cation have a much higher percentage of support, while the highest
level of education has a much lower percentage of support. Thus,
there appears to be an association or relationship between gambling
stance and level of education. This is subsequently supported sta-
tistically from the chi-square test (χ2 = 13.333, df = 3, p = .004).
Thus, the null hypothesis that there is no association between
stance on gambling and level of education was rejected at the .01
level of significance. Examination of the standardized residuals
suggests that respondents who hold a graduate degree are signifi-
cantly more likely not to support gambling (standardized residual =
2.0) as compared to all other respondents. The effect size, Cohen’s
w, was computed to be .408, which is interpreted to be a moderate
to large effect (Cohen, 1988).
236 An Introduction to Statistical Concepts
8.6 Summary
In�this�chapter,�we�described�a�third�inferential�testing�situation:�testing�hypotheses�about�
proportions��Several�inferential�tests�and�new�concepts�were�discussed��The�new�concepts�
introduced� were� proportions,� sampling� distribution� and� standard� error� of� a� proportion,�
contingency�table,�chi-square�distribution,�and�observed�versus�expected�frequencies��The�
inferential� tests� described� involving� the� unit� normal� distribution� were� tests� of� a� single�
proportion,� of� two� independent� proportions,� and� of� two� dependent� proportions�� These�
tests�are�parallel�to�the�tests�of�one�or�two�means�previously�discussed�in�Chapters�6�and�7��
The�inferential�tests�described�involving�the�chi-square�distribution�consisted�of�the�chi-
square� goodness-of-fit� test� and� the� chi-square� test� of� association�� In� addition,� examples�
were�presented�for�each�of�these�tests��Box�8�1�summarizes�the�tests�reviewed�in�this�chap-
ter�and�the�key�points�related�to�each�(including�the�distribution�involved�and�recommen-
dations�for�when�to�use�the�test)�
STOp aNd ThINk bOx 8.1
Characteristics�and�Recommendations�for�Inferences�About�Proportions
Test Distribution When to Use
Inferences�about�a�
single�proportion�
(akin�to�one-sample�
mean�test)
Unit�normal,�z •��To�determine�if�the�sample�proportion�
differs�from�a�hypothesized�proportion
•�One�variable,�nominal�or�ordinal�in�scale
Inferences�about�two�
independent�
proportions�(akin�to�
the�independent�t�test)
Unit�normal,�z •��To�determine�if�the�population�proportion�
for�one�group�differs�from�the�population�
proportion�for�a�second�independent�group
•��Two�variables,�both�nominal�and�ordinal�
in scale
Inferences�about�two�
dependent�
proportions�(akin�to�
the�dependent�t�test)
Unit�normal,�z •��To�determine�if�the�population�proportion�
for�one�group�is�different�than�the�
population�proportion�for�a�second�
dependent�group
•��Two�variables�of�the�same�measure,�both�
nominal�and�ordinal�in�scale
Chi-square�goodness-
of-fit�test
Chi-square •��To�determine�if�observed�proportions�differ�
from�what�would�be�expected�a�priori
•�One�variable,�nominal�or�ordinal�in�scale
Chi-square�test�of�
association
Chi-square •��To�determine�association/relationship�
between�two�variables�based�on�observed�
proportions
•��Two�variables,�both�nominal�and�ordinal�
in scale
At�this�point,�you�should�have�met�the�following�objectives:�(a)�be�able�to�understand�the�
basic�concepts�underlying�tests�of�proportions,�(b)�be�able�to�select�the�appropriate�test,�and�
(c)�be�able�to�determine�and�interpret�the�results�from�the�appropriate�test��In�Chapter�9,�we�
discuss�inferential�tests�involving�variances�
237Inferences About Proportions
Problems
Conceptual problems
8.1� How�many�degrees�of�freedom�are�there�in�a�5��7�contingency�table�when�the�chi-
square�test�of�association�is�used?
� a�� 12
� b�� 24
� c�� 30
� d�� 35
8.2� The�more�that�two�independent�sample�proportions�differ,�all�else�being�equal,�the�
smaller�the�z�test�statistic��True�or�false?
8.3� The�null�hypothesis�is�a�numerical�statement�about�an�unknown�parameter��True�or�
false?
8.4� In�testing�the�null�hypothesis�that�the�proportion�is��50,�the�critical�value�of�z�increases�
as�degrees�of�freedom�increase��True�or�false?
8.5� A�consultant�found�a�sample�proportion�of�individuals�favoring�the�legalization�of�
drugs� to� be� −�50�� I� assert� that� a� test� of� whether� that� sample� proportion� is� different�
from�0�would�be�rejected��Am�I�correct?
8.6� Suppose�we�wish�to�test�the�following�hypotheses�at�the��10�level�of�significance:
H
H
0
1
60
60
: .
: .
π
π
=
>
A�sample�proportion�of��15�is�observed��I�assert�if�I�conduct�the�z�test�that�it�is�possible�
to�reject�the�null�hypothesis��Am�I�correct?
8.7� When� the� chi-square� test� statistic� for� a� test� of� association� is� less� than� the� cor-
responding� critical� value,� I� assert� that� I� should� reject� the� null� hypothesis�� Am� I�
correct?
8.8� Other�things�being�equal,�the�larger�the�sample�size,�the�smaller�the�value�of�sp��True�
or�false?
8.9� In� the� chi-square� test� of� association,� as� the� difference� between� the� observed� and�
expected�proportions�increases,
� a�� The�critical�value�for�chi-square�increases�
� b�� The�critical�value�for�chi-square�decreases�
� c�� The�likelihood�of�rejecting�the�null�hypothesis�decreases�
� d�� The�likelihood�of�rejecting�the�null�hypothesis�increases�
8.10� When� the� hypothesized� value� of� the� population� proportion� lies� outside� of� the� CI�
around�a�single�sample�proportion,�I�assert�that�the�researcher�should�reject�the�null�
hypothesis��Am�I�correct?
238 An Introduction to Statistical Concepts
8.11� Statisticians�at�a�theme�park�want�to�know�if�the�same�proportions�of�visitors�select�
the� Jungle� Safari� as� their� favorite� ride� as� compared� to� the� Mountain� Rollercoaster��
They�sample�150�visitors�and�collect�data�on�one�variable:�favorite�ride�(two�catego-
ries:�Jungle�Safari�and�Mountain�Rollercoaster)��Which�statistical�procedure�is�most�
appropriate�to�use�to�test�the�hypothesis?
� a�� Chi-square�goodness-of-fit�test
� b�� Chi-square�test�of�association
8.12� Sophie� is� a� reading� teacher�� She� is� researching� the� following� question:� is� there� a�
relationship�between�a�child’s�favorite�genre�of�book�and�their�socioeconomic�sta-
tus�category?�She�collects�data�from�35�children�on�two�variables:�(a)�favorite�genre�
of�book�(two�categories:�fiction,�nonfiction)�and�(b)�socioeconomic�status�category�
(three�categories:�low,�middle,�high)��Which�statistical�procedure�is�most�appropri-
ate�to�use�to�test�the�hypothesis?
� a�� Chi-square�goodness-of-fit�test
� b�� Chi-square�test�of�association
Computational problems
8.1� For�a�random�sample�of�40�widgets�produced�by�the�Acme�Widget�Company,�30�suc-
cesses�and�10�failures�are�observed��Test�the�following�hypotheses�at�the��05�level�of�
significance:
H
H
0
1
60
60
: .
: .
π
π
=
≠
8.2� The� following� data� are� calculated� for� two� independent� random� samples� of� male�
and� female� teenagers,� respectively,� on� whether� they� expect� to� attend� graduate�
school:�n1�=�48,�p1�=�18/48,�n2�=�52,�p2�=�33/52��Test�the�following�hypotheses�at�the�
�05�level�of�significance:
H
H
0 1 2
1 1 2
0
0
:
:
π π
π π
− =
− ≠
8.3� The�following�frequencies�of�successes�and�failures�are�obtained�for�two�dependent�
random�samples�measured�at�the�pretest�and�posttest�of�a�weight�training�program:
Pretest
Posttest Success Failure
Failure 18 30
Success 33 19
Test�the�following�hypotheses�at�the��05�level�of�significance:
H
H
0 1 2
1 1 2
0
0
:
:
π π
π π
− =
− ≠
239Inferences About Proportions
8.4� A� chi-square� goodness-of-fit� test� is� to� be� conducted� with� six� categories� of� profes-
sions�to�determine�whether�the�sample�proportions�of�those�supporting�the�current�
government�differ�from�a�priori�national�proportions��The�chi-square�test�statistic�is�
equal�to�16�00��Determine�the�result�of�this�test�by�looking�up�the�critical�value�and�
making�a�statistical�decision,�using�α�=��01�
8.5� A�chi-square�goodness-of-fit�test�is�to�be�conducted�to�determine�whether�the�sample�
proportions�of�families�in�Florida�who�select�various�schooling�options�(five�catego-
ries�including�home�school,�public�school,�public�charter�school,�private�school,�and�
other)� differ� from� the� proportions� reported� nationally�� The� chi-square� test� statistic�
is�equal�to�9�00��Determine�the�result�of�this�test�by�looking�up�the�critical�value�and�
making�a�statistical�decision,�using�α�=��05�
8.6� A� random� sample� of� 30� voters� was� classified� according� to� their� general� political�
beliefs� (liberal� vs�� conservative)� and� also� according� to� whether� they� voted� for� or�
against�the�incumbent�representative�in�their�town��The�results�were�placed�into�the�
following�contingency�table:
Liberal Conservative
Yes 10 5
No 5 10
Use� the� chi-square� test� of� association� to� determine� whether� political� belief� is� inde-
pendent�of�voting�behavior�at�the��05�level�of�significance�
8.7� A�random�sample�of�40�kindergarten�children�was�classified�according�to�whether�
they� attended� at� least� 1� year� of� preschool� prior� to� entering� kindergarten� and� also�
according�to�gender��The�results�were�placed�into�the�following�contingency�table:
Boy Girl
Preschool 12 10
No�preschool 8 10
Use�the�chi-square�test�of�association�to�determine�whether�enrollment�in�preschool�
is�independent�of�gender�at�the��10�level�of�significance�
Interpretive problem
There�are�numerous�ways�to�use�the�survey�1�dataset�from�the�website�as�there�are�several�
categorical�variables��Here�are�some�examples�for�the�tests�described�in�this�chapter:
� a�� �Conduct�a�test�of�a�single�proportion:�Is�the�sample�proportion�of�females�equal�
to��50?
� b�� Conduct�a�test�of�two�independent�proportions:�Is�there�a�difference�between�the�
sample�proportion�of�females�who�are�right-handed�and�the�sample�proportion�of�
males�who�are�right-handed?
� c�� Conduct� a� test� of� two� dependent� proportions:� Is� there� a� difference� between� the�
sample�proportion�of�students’�mothers�who�are�right-handed�and�the�sample�pro-
portion�of�students’�fathers�who�are�right-handed?
240 An Introduction to Statistical Concepts
� d�� Conduct� a� chi-square� goodness-of-fit� test:� Do� the� sample� proportions� for� the�
political�view�categories�differ�from�their�expected�proportions�(very�liberal�=��10,�
liberal�=��15,�middle�of�the�road�=��50,�conservative�=��15,�very�conservative�=��10)?�
Determine�if�the�assumptions�of�the�test�are�met��Determine�and�interpret�the�cor-
responding�effect�size�
� e�� Conduct�a�chi-square�goodness-of-fit�test�to�determine�if�there�are�similar�propor-
tions� of� respondents� who� can� (vs�� cannot)� tell� the� difference� between� Pepsi� and�
Coke��Determine�if�the�assumptions�of�the�test�are�met��Determine�and�interpret�
the�corresponding�effect�size�
� f�� Conduct�a�chi-square�test�of�association:�Is�there�an�association�between�political�
view� and� gender?� Determine� if� the� assumptions� of� the� test� are� met�� Determine�
and�interpret�the�corresponding�effect�size�
� g�� Compute�a�chi-square�test�of�association�to�examine�the�relationship�between�if�a�
person�smokes�and�their�political�view��Determine�if�the�assumptions�of�the�test�
are�met��Determine�and�interpret�the�corresponding�effect�size�
241
9
Inferences About Variances
Chapter Outline
9�1� New�Concepts
9�2� Inferences�About�Single�Variance
9�3� Inferences�About�Two�Dependent�Variances
9�4� �Inferences�About�Two�or�More�Independent�Variances�(Homogeneity�of�
Variance�Tests)
9�4�1� Traditional�Tests
9�4�2� Brown–Forsythe�Procedure
9�4�3� O’Brien�Procedure
9�5� SPSS
9�6� Template�and�APA-Style�Write-Up
Key Concepts
� 1�� Sampling�distributions�of�the�variance
� 2�� The�F�distribution
� 3�� Homogeneity�of�variance�tests
In�Chapters�6�through�8,�we�looked�at�testing�inferences�about�means�(Chapters�6�and�7)�
and�about�proportions�(Chapter�8)��In�this�chapter,�we�examine�inferential�tests�involving�
variances�� Tests� of� variances� are� useful� in� two� applications,� (a)� as� an� inferential� test� by�
itself�and�(b)�as�a�test�of�the�homogeneity�of�variance�assumption�for�another�procedure�
(e�g�,�t�test,�analysis�of�variance�[ANOVA])��First,�a�researcher�may�want�to�perform�infer-
ential�tests�on�variances�for�their�own�sake,�in�the�same�fashion�that�we�described�for�the�
one-�and�two-sample�t�tests�on�means��For�example,�we�may�want�to�assess�whether�the�
variance�of�undergraduates�at�Ivy-Covered�University�(ICU)�on�an�intelligence�measure�is�
the�same�as�the�theoretically�derived�variance�of�225�(from�when�the�test�was�developed�
and�normed)��In�other�words,�is�the�variance�at�a�particular�university�greater�than�or�less�
than� 225?� As� another� example,� we� may� want� to� determine� whether� the� variances� on� an�
intelligence�measure�are�consistent�across�two�or�more�groups;�for�example,�is�the�variance�
of�the�intelligence�measure�at�ICU�different�from�that�at�Podunk�University?
242 An Introduction to Statistical Concepts
Second,�for�some�procedures�such�as�the�independent�t�test�(Chapter�7)�and�the�ANOVA�
(Chapter� 11),� it� is� assumed� that� the� variances� for� two� or� more� independent� samples� are�
equal� (known� as� the� homogeneity� of� variance� assumption)�� Thus,� we� may� want� to� use�
an� inferential� test� of� variances� to� assess� whether� this� assumption� has� been� violated� or�
not�� The� following� inferential� tests� of� variances� are� covered� in� this� chapter:� (a)� testing�
whether�a�single�variance�is�different�from�a�hypothesized�value,�(b)�testing�whether�two�
dependent�variances�are�different,�and�(c)�testing�whether�two�or�more�independent�vari-
ances� are� different�� We� utilize� many� of� the� foundational� concepts� previously� covered� in�
Chapters�6�through�8��New�concepts�to�be�discussed�include�the�following:�the�sampling�
distributions� of� the� variance,� the� F� distribution,� and� homogeneity� of� variance� tests�� Our�
objectives�are�that�by�the�end�of�this�chapter,�you�will�be�able�to�(a)�understand�the�basic�
concepts� underlying� tests� of� variances,� (b)� select� the� appropriate� test,� and� (c)� determine�
and�interpret�the�results�from�the�appropriate�test�
9.1 New Concepts
As� you� remember,� Marie� is� a� graduate� student� working� on� a� degree� in� educational�
research��She�has�been�building�her�statistical�skills�and�is�becoming�quite�adept�at�apply-
ing� her� skills� as� she� completes� tasks� assigned� to� her� by� her� faculty� advisor�� We� revisit�
Marie�again�
Another�call�has�been�fielded�by�Marie’s�advisor�for�assistance�with�statistical�analysis��
This�time,�it�is�Jessica,�an�elementary�teacher�within�the�community��Having�built�quite�
a�reputation�for�success�in�statistical�consultations,�Marie’s�advisor�requests�that�Marie�
work�with�Jessica�
Jessica�shares�with�Marie�that�she�is�conducting�a�teacher�research�project�related�to�
achievement�of�first-grade�students�at�her�school��Jessica�wants�to�determine�if�the�vari-
ances�of�the�achievement�scores�differ�when�children�begin�school�in�the�fall�as�com-
pared� to� when� they� end� school� in� the� spring�� Marie� suggests� the� following� research�
question:�Are�the�variances�of�achievement�scores�for�first-grade�children�the�same�in�
the�fall�as�compared�to�the�spring?�Marie�suggests�a�test�of�variance�as�the�test�of�infer-
ence��Her�task�is�then�to�assist�Jessica�in�generating�the�test�of�inference�to�answer�her�
research�question�
This�section�deals�with�concepts�for�testing�inferences�about�variances,�in�particular,�the�
sampling�distributions�underlying�such�tests��Subsequent�sections�deal�with�several�infer-
ential�tests�of�variances��Although�the�sampling�distribution�of�the�mean�is�a�normal�distri-
bution�(Chapters�6�and�7),�and�the�sampling�distribution�of�a�proportion�is�either�a�normal�
or�chi-square�distribution�(Chapter�8),�the�sampling distribution of a variance�is�either�a�
chi-square�distribution�for�a�single�variance,�a�t�distribution�for�two�dependent�variances,�
or� an� F� distribution� for� two� or� more� independent� variances�� Although� we� have� already�
discussed�the�t�distribution�in�Chapter�6�and�the�chi-square�distribution�in�Chapter�8,�we�
need�to�discuss�the�F�distribution�(named�in�honor�of�the�famous�statistician�Sir�Ronald�A��
Fisher)�in�some�detail�here�
243Inferences About Variances
Like�the�normal,�t,�and�chi-square�distributions,�the�F�distribution�is�really�a�family�of�
distributions��Also,�like�the�t�and�chi-square�distributions,�the�F�distribution�family�mem-
bers� depend� on� the� number� of� degrees� of� freedom� represented�� Unlike� any� previously�
discussed�distribution,�the�F�distribution�family�members�actually�depend�on�a�combina-
tion�of�two�different�degrees�of�freedom,�one�for�the�numerator�and�one�for�the�denomina-
tor��The�reason�is�that�the�F�distribution�is�a�ratio�of�two�chi-square�variables��To�be�more�
precise,�F�with�ν1�degrees�of�freedom�for�the�numerator�and�ν2�degrees�of�freedom�for�the�
denominator�is�actually�a�ratio�of�the�following�chi-square�variables:
�
Fν ν
ν
ν
χ ν
χ ν1 2
1
2
2
1
2
2
,
/
/
=
For�example,�the�F�distribution�for�1�degree�of�freedom�numerator�and�10�degrees�of�free-
dom�denominator�is�denoted�by�F1,10��The�F�distribution�is�generally�positively�skewed�
and�leptokurtic�in�shape�(like�the�chi-square�distribution)�and�has�a�mean�of�ν2/(ν2�−�2)�
when�ν2�>�2�(where�ν2�represents�the�denominator�degrees�of�freedom)��A�few�examples�
of�the�F�distribution�are�shown�in�Figure�9�1�for�the�following�pairs�of�degrees�of�freedom�
(i�e�,�numerator,�denominator):�F10,10;�F20,20;�and�F40,40�
Critical� values� for� several� levels� of� α� of� the� F� distribution� at� various� combinations� of�
degrees�of�freedom�are�given�in�Table�A�4��The�numerator�degrees�of�freedom�are�given�
in�the�columns�of�the�table�(ν1),�and�the�denominator�degrees�of�freedom�are�shown�in�the�
rows�of�the�table�(ν2)��Only�the�upper-tail�critical�values�are�given�in�the�table�(e�g�,�percen-
tiles�of��90,��95,��99�for�α�=��10,��05,��01,�respectively)��The�reason�is�that�most�inferential�tests�
involving�the�F�distribution�are�one-tailed�tests�using�the�upper-tail�critical�region��Thus,�
to�find�the�upper-tail�critical�value�for��05F1,10,�we�look�on�the�second�page�of�the�table�(α�=��05),�
in�the�first�column�of�values�on�that�page�for�ν1�=�1,�and�where�it�intersects�with�the�10th�
row�of�values�for�ν2�=�10��There�you�should�find��05F1,10�=�4�96�
1.5
1.2
0.9
0.6
Re
la
tiv
e
fr
eq
ue
nc
y
0.3
0
0 1 2 3
F
4 5
10,10
20,20
40,40
FIGuRe 9.1
Several� members� of� the� family� of�
F distributions�
244 An Introduction to Statistical Concepts
9.2 Inferences About Single Variance
In�our�initial�inferential�testing�situation�for�variances,�the�researcher�would�like�to�know�
whether�the�population�variance�is�equal�to�some�hypothesized�variance�or�not��First,�the�
hypotheses� to� be� evaluated� for� detecting� whether� a� population� variance� differs� from� a�
hypothesized�variance�are�as�follows��The�null�hypothesis�H0�is�that�there�is�no�difference�
between�the�population�variance�σ2�and�the�hypothesized�variance�σ02,�which�we�denote�as
� H0
2
0
2: σ σ=
Here�there�is�no�difference�or�a�“null”�difference�between�the�population�variance�and�the�
hypothesized�variance��For�example,�if�we�are�seeking�to�determine�whether�the�variance�
on� an� intelligence� measure� at� ICU� is� different� from� the� overall� adult� population,� then� a�
reasonable�hypothesized�value�would�be�225,�as�this�is�the�theoretically�derived�variance�
for�the�adult�population�
The� nondirectional,� scientific,� or� alternative� hypothesis� H1� is� that� there� is� a� differ-
ence�between�the�population�variance�σ2�and�the�hypothesized�variance� σ02 ,�which�we�
denote�as
� H1
2
0
2: σ σ≠
The�null�hypothesis�H0�will�be�rejected�here�in�favor�of�the�alternative�hypothesis�H1�if�
the� population� variance� is� different� from� the� hypothesized� variance�� As� we� have� not�
specified�a�direction�on�H1,�we�are�willing�to�reject�either�if�σ2�is�greater�than� σ0
2 �or�if�σ2�
is�less�than�σ0
2��This�alternative�hypothesis�results�in�a�two-tailed�test��Directional�alter-
native�hypotheses�can�also�be�tested�if�we�believe�either�that�σ2�is�greater�than� σ02 �or�that�
σ2�is�less�than�σ02��In�either�case,�the�more�the�resulting�sample�variance�differs�from�the�
hypothesized�variance,�the�more�likely�we�are�to�reject�the�null�hypothesis�
It�is�assumed�that�the�sample�is�randomly�drawn�from�the�population�(i�e�,�the�assump-
tion�of�independence)�and�that�the�population�of�scores�is�normally�distributed��Because�
we�are�testing�a�variance,�a�condition�of�the�test�is�that�the�variable�must�be�interval�or�ratio�
in�scale�
The�next�step�is�to�compute�the�test�statistic�χ2�as
�
χ
ν
σ
2
2
0
2=
s
where
s2�is�the�sample�variance
ν�=�n�−�1
The�test�statistic�χ2�is�then�compared�to�a�critical�value(s)�from�the�chi-square�distribu-
tion�� For� a� two-tailed� test,� the� critical� values� are� denoted� as� α νχ/2
2 � and� 1 2
2
− α νχ/ � and� are�
found�in�Table�A�3�(recall�that�unlike�z�and�t�critical�values,�two�unique�χ2�critical�values�
must�be�found�from�the�table�as�the�χ2�distribution�is�not�symmetric�like�z�or�t)��If�the�test�
statistic�χ2�falls�into�either�critical�region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�H0��
For�a�one-tailed�test,�the�critical�value�is�denoted�as� α νχ
2 �for�the�alternative�hypothesis�
245Inferences About Variances
H1:� σ2� <� σ0
2 � and� as� 1
2
−α νχ � for� the� alternative� hypothesis� H1:�σ2�>� σ0
2��If�the�test�statistic�
χ2�falls�into�the�appropriate�critical�region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�
H0��It�has�been�noted�by�statisticians�such�as�Wilcox�(1996)�that�the�chi-square�distribu-
tion�does�not�perform�adequately�when�sampling�from�a�nonnormal�distribution,�as�the�
actual�Type�I�error�rate�can�differ�greatly�from�the�nominal�α�level�(the�level�set�by�the�
researcher)�� However,� Wilcox� stated� “it� appears� that� a� completely� satisfactory� solution�
does�not�yet�exist,�although�many�attempts�have�been�made�to�find�one”�(p��85)�
For�the�two-tailed�test,�a�(1�−�α)%�confidence�interval�(CI)�can�also�be�examined�and�is�
formed�as�follows��The�lower�limit�of�the�CI�is
�
ν
χα ν
s2
1 2
2
− /
whereas�the�upper�limit�of�the�CI�is
�
ν
χα ν
s2
2
2
/
If� the� CI� contains� the� hypothesized� value� σ0
2 ,� then� the� conclusion� is� to� fail� to� reject� H0;�
otherwise,�we�reject�H0�
Now�consider�an�example�to�illustrate�use�of�the�test�of�a�single�variance��We�follow�the�
basic�steps�for�hypothesis�testing�that�we�applied�in�previous�chapters��These�steps�include�
the�following:
� 1�� State�the�null�and�alternative�hypotheses�
� 2�� Select�the�level�of�significance�(i�e�,�alpha,�α)�
� 3�� Calculate�the�test�statistic�value�
� 4�� Make�a�statistical�decision�(reject�or�fail�to�reject�H0)�
A�researcher�at�the�esteemed�ICU�is�interested�in�determining�whether�the�population�
variance�in�intelligence�at�the�university�is�different�from�the�norm-developed�hypoth-
esized�variance�of�225��Thus,�a�nondirectional,�two-tailed�alternative�hypothesis�is�uti-
lized�� If� the� null� hypothesis� is� rejected,� this� would� indicate� that� the� intelligence� level�
at� ICU� is� more� or� less� diverse� or� variable� than� the� norm�� If� the� null� hypothesis� is� not�
rejected,� this� would� indicate� that� the� intelligence� level� at� ICU� is� as� equally� diverse� or�
variable�as�the�norm�
The�researcher�takes�a�random�sample�of�101�undergraduates�from�throughout�the�uni-
versity�and�computes�a�sample�variance�of�149��The�test�statistic�χ2�is�computed�as�follows:
�
χ
ν
σ
2
2
0
2
100 149
225
66 2222= = =
s ( )
.
From� the� Table� A�3,� and� using� an� α� level� of� �05,� we� determine� the� critical� values� to� be�
. .025 100
2 74 2219χ = �and� . .975 100
2 129 561χ = ��As�the�test�statistic�does�exceed�one�of�the�critical�
values�by�falling�into�the�lower-tail�critical�region�(i�e�,�66�2222�<�74�2219),�our�decision�is�to�
246 An Introduction to Statistical Concepts
reject�H0��Our�conclusion�then�is�that�the�variance�of�the�undergraduates�at�ICU�is�different�
from�the�hypothesized�value�of�225�
The�95%�CI�for�the�example�is�computed�as�follows��The�lower�limit�of�the�CI�is
�
ν
χα ν
s2
1 2
2
100 149
129 561
115 0037
−
= =
/
( )
.
.
and�the�upper�limit�of�the�CI�is
�
ν
χα ν
s2
2
2
100 149
74 2219
200 7494
/
( )
.
.= =
As�the�limits�of�the�CI�(i�e�,�115�0037,�200�7494)�do�not�contain�the�hypothesized�variance�of�
225,�the�conclusion�is�to�reject�H0��As�always,�the�CI�procedure�leads�us�to�the�same�conclu-
sion�as�the�hypothesis�testing�procedure�for�the�same�α�level�
9.3 Inferences About Two Dependent Variances
In�our�second�inferential�testing�situation�for�variances,�the�researcher�would�like�to�know�
whether�the�population�variance�for�one�group�is�different�from�the�population�variance�
for� a� second� dependent� group�� This� is� comparable� to� the� dependent� t� test� described� in�
Chapter�7�where�one�population�mean�was�compared�to�a�second�dependent�population�
mean��Once�again,�we�have�two�dependently�drawn�samples�
First,�the�hypotheses�to�be�evaluated�for�detecting�whether�two�dependent�population�
variances�differ�are�as�follows��The�null�hypothesis�H0�is�that�there�is�no�difference�between�
the�two�population�variances�σ1
2�and�σ2
2,�which�we�denote�as
� H0 1
2
2
2 0: σ σ− =
Here�there�is�no�difference�or�a�“null”�difference�between�the�two�population�variances��
For�example,�we�may�be�seeking�to�determine�whether�the�variance�of�income�of�husbands�
is�equal�to�the�variance�of�their�wives’�incomes��Thus,�the�husband�and�wife�samples�are�
drawn�as�couples�in�pairs�or�dependently,�rather�than�individually�or�independently�
The� nondirectional,� scientific,� or� alternative� hypothesis� H1� is� that� there� is� a� difference�
between�the�population�variances�σ1
2�and�σ2
2,�which�we�denote�as
� H1 1
2
2
2 0: σ σ− ≠
The�null�hypothesis�H0�is�rejected�here�in�favor�of�the�alternative�hypothesis�H1�if�the�popu-
lation�variances�are�different��As�we�have�not�specified�a�direction�on�H1,�we�are�willing�
to�reject�either�if� σ1
2�is�greater�than�σ2
2 �or�if�σ1
2�is�less�than�σ2
2 ��This�alternative�hypothesis�
results� in� a� two-tailed� test�� Directional� alternative� hypotheses� can� also� be� tested� if� we�
believe�either�that�σ1
2�is�greater�than�σ2
2�or�that�σ1
2�is�less�than� σ2
2 ��In�either�case,�the�more�
the�resulting�sample�variances�differ�from�one�another,�the�more�likely�we�are�to�reject�the�
null�hypothesis�
247Inferences About Variances
It� is� assumed� that� the� two� samples� are� dependently� and� randomly� drawn� from� their�
respective�populations,�that�both�populations�are�normal�in�shape,�and�that�the�t�distribu-
tion�is�the�appropriate�sampling�distribution��Since�we�are�testing�variances,�a�condition�of�
the�test�is�that�the�variable�must�be�interval�or�ratio�in�scale�
The�next�step�is�to�compute�the�test�statistic�t�as�follows:
�
t
s s
s s
r
=
−
−
1
2
2
2
1 2
12
2
2
1
ν
where
s1
2�and�s2
2�are�the�sample�variances�for�samples�1�and�2�respectively
s1�and�s2�are�the�sample�standard�deviations�for�samples�1�and�2�respectively
r12�is�the�correlation�between�the�scores�from�sample�1�and�sample�2�(which�is�then�squared)
ν� is� the� number� of� degrees� of� freedom,� ν� =� n� −� 2,� with� n� being� the� number� of� paired�
observations�(not�the�number�of�total�observations)
Although�correlations�are�not�formally�discussed�until�Chapter�10,�conceptually�the�cor-
relation�is�a�measure�of�the�relationship�between�two�variables��This�test�statistic�is�concep-
tually�somewhat�similar�to�the�test�statistic�for�the�dependent�t�test�
The� test� statistic� t� is� then� compared� to� a� critical� value(s)� from� the� t� distribution�� For� a�
two-tailed�test,�the�critical�values�are�denoted�as� ± α ν2 t �and�are�found�in�Table�A�2��If�the�
test�statistic�t�falls�into�either�critical�region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�H0��
For�a�one-tailed�test,�the�critical�value�is�denoted�as� + α ν1 t �for�the�alternative�hypothesis�H1:�
σ1
2 �−� σ2
2 �>�0�and�as� − α ν1 t �for�the�alternative�hypothesis�H1:� σ1
2 �−� σ2
2 �<�0��If�the�test�statistic�t�
falls�into�the�appropriate�critical�region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�H0��
It�is�thought�that�this�test�is�not�particularly�robust�to�nonnormality�(Wilcox,�1987)��As�a�
result,�other�procedures�have�been�developed�that�are�thought�to�be�more�robust��However,�
little�in�the�way�of�empirical�results�is�known�at�this�time��Some�of�the�new�procedures�can�
also�be�used�for�testing�inferences�involving�the�equality�of�two�or�more�dependent�variances��
In�addition,�note�that�acceptable�CI�procedures�are�not�currently�available�
Let�us�consider�an�example�to�illustrate�use�of�the�test�of�two�dependent�variances��The�
same�basic�steps�for�hypothesis�testing�that�we�applied�in�previous�chapters�will�be�applied�
here�as�well��These�steps�include�the�following:
� 1�� State�the�null�and�alternative�hypotheses�
� 2�� Select�the�level�of�significance�(i�e�,�alpha,�α)�
� 3�� Calculate�the�test�statistic�value�
� 4�� Make�a�statistical�decision�(reject�or�fail�to�reject�H0)�
A�researcher�is�interested�in�whether�there�is�greater�variation�in�achievement�test�scores�
at�the�end�of�the�first�grade�as�compared�to�the�beginning�of�the�first�grade��Thus,�a�direc-
tional,�one-tailed�alternative�hypothesis�is�utilized��If�the�null�hypothesis�is�rejected,�this�
would�indicate�that�first�graders’�achievement�test�scores�are�more�variable�at�the�end�of�
the�year�than�at�the�beginning�of�the�year��If�the�null�hypothesis�is�not�rejected,�this�would�
indicate�that�first�graders’�achievement�test�scores�have�approximately�the�same�variance�
at�both�the�end�of�the�year�and�at�the�beginning�of�the�year�
248 An Introduction to Statistical Concepts
A�random�sample�of�62�first-grade�children�is�selected�and�given�the�same�achievement�
test� at� the� beginning� of� the� school� year� (September)� and� at� the� end� of� the� school� year�
(April)��Thus,�the�same�students�are�tested�twice�with�the�same�instrument,�thereby�result-
ing�in�dependent�samples�at�time�1�and�time�2��The�level�of�significance�is�set�at�α�=��01��The�
test�statistic�t�is�computed�as�follows��We�determine�that�n�=�62,�ν�=�60,� s12 �=�100,�s1�=�10,� s2
2 �=�169,�
s2�=�13,�and�r12�=��80��We�compute�the�test�statistic�t�to�be�as�follows:
�
t
s s
s s
r
=
−
−
=
−
−
= −1
2
2
2
1 2
12
2
2
1
100 169
2 10 13
1 64
60
3 4261
ν
( )
.
.
The�test�statistic�t�is�then�compared�to�the�critical�value�from�the�t�distribution��As�this�is�
a�one-tailed�test,�the�critical�value�is�denoted�as�− α ν1t �and�is�determined�from�Table�A�2�to�
be�−�01t60�=�−2�390��The�test�statistic�t�falls�into�the�lower-tail�critical�region,�as�it�is�less�than�
the�critical�value�(i�e�,�−3�4261�<�−2�390),�so�we�reject�H0�and�conclude�that�the�variance�in�
achievement�test�scores�increases�from�September�to�April�
9.4 Inferences About Two or More Independent
Variances (Homogeneity of Variance Tests)
In�our�third�and�final�inferential�testing�situation�for�variances,�the�researcher�would�
like�to�know�whether�the�population�variance�for�one�group�is�different�from�the�pop-
ulation� variance� for� one� or� more� other� independent� groups�� In� this� section,� we� first�
describe� the� somewhat� cloudy� situation� that� exists� for� the� traditional� tests�� Then� we�
provide� details� on� two� recommended� tests,� the� Brown–Forsythe� procedure� and� the�
O’Brien�procedure�
9.4.1 Traditional Tests
One�of�the�more�heavily�studied�inferential�testing�situations�in�recent�years�has�been�for�
testing�whether�differences�exist�among�two�or�more�independent�group�variances��These�
tests�are�often�referred�to�as�homogeneity of variance tests��Here�we�briefly�discuss�the�
more�traditional�tests�and�their�associated�problems��In�the�sections�that�follow,�we�then�
recommend�two�of�the�“better”�tests��As�was�noted�in�the�previous�procedures,�the�vari-
able�for�which�the�variance(s)�is�computed�must�be�interval�or�ratio�in�scale�
Several� tests� have� traditionally� been� used� to� test� for� the� equality� of� independent� vari-
ances�� An� early� simple� test� for� two� independent� variances� is� to� form� a� ratio� of� the� two�
sample�variances,�which�yields�the�following�F�test�statistic:
�
F
s
s
= 1
2
2
2
This�F�ratio�test�assumes�that�the�two�populations�are�normally�distributed��However,�it�
is�known�that�the�F�ratio�test�is�not�very�robust�to�violation�of�the�normality�assumption,�
249Inferences About Variances
except�for�when�the�sample�sizes�are�equal�(i�e�,�n1�=�n2)��In�addition,�the�F�ratio�test�can�only�
be�used�for�the�two-group�situation�
Subsequently,� more� general� tests� were� developed� to� cover� the� multiple-group� situation��
One�such�popular�test�is�Hartley’s�Fmax�test�(developed�in�1950),�which�is�simply�a�more�general�
version�of�the�F�ratio�test�just�described��The�test�statistic�for�Hartley’s�Fmax�test�is�the�following:
�
F
s
s
max
largest
smallest
=
2
2
where
slargest
2 �is�the�largest�variance�in�the�set�of�variances
ssmallest
2 �is�the�smallest�variance�in�the�set
Hartley’s�Fmax�test�assumes�normal�population�distributions�and�requires�equal�sample�sizes��
We�also�know�that�Hartley’s�Fmax�test�is�not�very�robust�to�violation�of�the�normality�assump-
tion��Cochran’s�C�test�(developed�in�1941)�is�also�an�F�test�statistic�and�is�computed�by�taking�the�
ratio�of�the�largest�variance�to�the�sum�of�all�of�the�variances��Cochran’s�C�test�also�assumes�nor-
mality,�requires�equal�sample�sizes,�and�has�been�found�to�be�even�less�robust�to�nonnormality�
than�Hartley’s�Fmax�test��As�we�see�in�Chapter�11�for�the�ANOVA,�it�is�when�we�have�unequal�
sample�sizes�that�unequal�variances�are�a�problem;�for�these�reasons,�none�of�these�tests�can�be�
recommended,�which�is�the�same�situation�we�encountered�with�the�independent�t�test�
Bartlett’s�χ2�test�(developed�in�1937)�does�not�have�the�stringent�requirement�of�equal�sam-
ple�sizes;�however,�it�does�still�assume�normality��Bartlett’s�test�is�very�sensitive�to�nonnor-
mality�and�is�therefore�not�recommended�either��Since�1950,�the�development�of�homogeneity�
tests�has�proliferated,�with�the�goal�being�to�find�a�test�that�is�fairly�robust�to�nonnormality��
Seemingly�as�each�new�test�was�developed,�later�research�would�show�that�the�test�was�not�
very�robust��Today�there�are�well�over�60�such�tests�available�for�examining�homogeneity�of�
variance�(e�g�,�a�bootstrap�method�developed�by�Wilcox�[2002])��Rather�than�engage�in�a�pro-
tracted�discussion�of�these�tests�and�their�associated�limitations,�we�simply�present�two�tests�
that�have�been�shown�to�be�most�robust�to�nonnormality�in�several�recent�studies��These�are�
the�Brown–Forsythe�procedure�and�the�O’Brien�procedure��Unfortunately,�neither�of�these�
tests�is�available�in�the�major�statistical�packages�(e�g�,�SPSS),�which�only�include�some�of�the�
problematic�tests�previously�described�
9.4.2 brown–Forsythe procedure
The�Brown–Forsythe�procedure�is�a�variation�of�Levene’s�test�developed�in�1960��Levene’s�
test�is�essentially�an�ANOVA�on�the�transformed�variable:
�
Z Y Yij ij j= − �
where
ij�designates�the�ith�observation�in�group�j
Zij�is�computed�for�each�individual�by�taking�their�score�Yij,�subtracting�from�it�the�group�
mean�Y
–
�j�(the�“�”�indicating�we�have�averaged�across�all�i�observations�in�group j),�and�
then�taking�the�absolute�value�(i�e�,�by�removing�the�sign)
Unfortunately,� Levene’s� test� is� not� very� robust� to� nonnormality,� except� when� sample�
sizes�are�equal�
250 An Introduction to Statistical Concepts
Developed�in�1974,�the�Brown–Forsythe�procedure�has�been�shown�to�be�quite�robust�to�
nonnormality�in�numerous�studies�(e�g�,�Olejnik�&�Algina,�1987;�Ramsey,�1994)��Based�on�
this� and� other� research,� the� Brown–Forsythe� procedure� is� recommended� for� leptokurtic�
distributions� (i�e�,� those� with� sharp� peaks),� as� it� is� robust� to� nonnormality� and� provides�
adequate� Type� I� error� protection� and� excellent� power�� In� the� next� section,� we� describe�
the� O’Brien� procedure,� which� is� recommended� for� other� distributions� (i�e�,� mesokurtic�
and�platykurtic�distributions)��In�cases�where�you�are�unsure�of�which�procedure�to�use,�
Algina,� Blair,� and� Combs� (1995)� recommend� using� a� maximum� procedure,� where� both�
tests�are�conducted�and�the�procedure�with�the�maximum�test�statistic�is�selected�
Let�us�now�examine�in�detail�the�Brown–Forsythe�procedure��The�null�hypothesis�is�that�
H0:� σ1
2 �=� σ2
2 �=�…�=� σJ
2,�and�the�alternative�hypothesis�is�that�not�all�of�the�population�group�
variances�are�the�same��The�Brown–Forsythe�procedure�is�essentially�an�ANOVA�on�the�
transformed�variable:
�
Z Y Mdnij ij j= − �
which� is� computed� for� each� individual� by� taking� their� score� Yij,� subtracting� from� it� the�
group�median�Mdn�j,�and�then�taking�the�absolute�value�(i�e�,�by�removing�the�sign)��The�
test�statistic�is�an�F�and�is�computed�by
�
F
n Z Z J
Z Z N J
j j
j
J
ij j
j
J
i
nj
=
− −
− −
=
==
∑
∑∑
( ) /( )
( ) /( )
. ..
.
2
1
2
11
1
where
nj�designates�the�number�of�observations�in�group�j
J�is�the�number�of�groups�(where�j�ranges�from�1�to�J)
Z
–
�j�is�the�mean�for�group�j�(computed�by�taking�the�sum�of�the�observations�in�group�j�
and�dividing�by�the�number�of�observations�in�group�j,�which�is�nj)
Z
–
���is�the�overall�mean�regardless�of�group�membership�(computed�by�taking�the�sum�of�all�
of�the�observations�across�all�groups�and�dividing�by�the�total�number�of�observations�N)
The� test� statistic� F� is� compared� against� a� critical� value� from� the� F� table� (Table� A�4)� with�
J�−�1�degrees�of�freedom�in�the�numerator�and�N�−�J�degrees�of�freedom�in�the�denomina-
tor,�denoted�by� αFJ−1,�N−J��If�the�test�statistic�is�greater�than�the�critical�value,�we�reject�H0;�
otherwise,�we�fail�to�reject�H0�
An� example� using� the� Brown–Forsythe� procedure� is� certainly� in� order� now�� Three� dif-
ferent� groups� of� children,� below-average,� average,� and� above-average� readers,� play� a� com-
puter�game��The�scores�on�the�dependent�variable�Y�are�their�total�points�from�the�game��We�
are� interested� in� whether� the� variances� for� the� three� student� groups� are� equal� or� not�� The�
example�data�and�computations�are�given�in�Table�9�1��First�we�compute�the�median�for�each�
group,� and� then� compute� the� deviation� from� the� median� for� each� individual� to� obtain� the�
transformed�Z�values��Then�the�transformed�Z�values�are�used�to�compute�the�F�test�statistic�
The�test�statistic�F�=�1�6388�is�compared�against�the�critical�value�for�α�=��05�of��05F2,9�=�
4�26�� As� the� test� statistic� is� smaller� than� the� critical� value� (i�e�,� 1�6388� <� 4�26),� we� fail� to�
reject�the�null�hypothesis�and�conclude�that�the�three�student�groups�do�not�have�different�
variances�
251Inferences About Variances
9.4.3 O’brien procedure
The� final� test� to� consider� in� this� chapter� is� the� O’Brien� procedure�� While� the� Brown–
Forsythe�procedure�is�recommended�for�leptokurtic�distributions,�the�O’Brien�procedure�
is�recommended�for�other�distributions�(i�e�,�mesokurtic�and�platykurtic�distributions)��
Let�us�now�examine�in�detail�the�O’Brien�procedure��The�null�hypothesis�is�again�that�
H0:� σ1
2 � =� σ2
2� =� …� =� σJ
2,� and� the� alternative� hypothesis� is� that� not� all� of� the� population�
group�variances�are�the�same�
Table 9.1
Example�Using�the�Brown–Forsythe�and�O’Brien�Procedures
Group 1 Group 2 Group 3
Y Z r Y Z r Y Z r
6 4 124�2499 9 4 143 10 8 704
8 2 14�2499 12 1 −7 16 2 −16
12 2 34�2499 14 1 −7 20 2 −96
13 3 89�2499 17 4 143 30 12 1104
Mdn Z
–
r– Mdn Z
–
r– Mdn Z
–
r–
10 2�75 65�4999 13 2�50 68 18 6 424
Overall�Z
–
Overall�r–
3�75 185�8333
Computations�for�the�Brown–Forsythe�procedure:
F
n Z Z J
Z Z N J
j j
j
J
ij j
j
J
i
nj
=
− −
− −
=
=
==
∑
∑∑
( ) /( )
( ) /( )
[ ( .
. ..
.
2
1
2
11
1
4 2 775 3 75 4 2 50 3 75 4 6 3 75 2
4 2 75 2 2 75
2 2 2
2 2
− + − + −
− + −
. ) ( . . ) ( . ) ]/
[( . ) ( . ) ++ + −
=
� ( ) ]/
.
12 6 9
1 63882
Computations�for�the�O’Brien�procedure:
Sample�means:�Y
–
1�=�9�75,�Y
–
2�=�13�0,�Y
–
3�=�19�0
Sample�variances:�s1
2 0.= 1 9167,�s22 = 11 3333. ,�s32 0= 7 6667.
Example�computation�for�rij:
r
n n Y Y s n
n n
ij
j j ij j j j
j j
=
− − − −
− −
=
−( . ) ( ) . ( )
( )( )
( . ) (.1 5 5 1
1 2
4 1 5 42 2 66 9 75 5 10 9167 4 1
4 1 4 2
124 2499
2− − −
− −
=
. ) . ( . )( )
( )( )
.
Test�statistic:
F
n r r J
r r N J
j j
j
J
ij j
j
J
i
nj
=
− −
− −
=
=
==
∑
∑∑
( ) /( )
( ) / ( )
[ (
. ..
.
2
1
2
11
1
4 65.. . ) ( . ) ( . ) ]/
[(
4999 185 8333 4 68 185 8333 4 424 185 8333 2
124
2 2 2− + − + −
.. . ) ( . . ) ( ) ]/
.
2499 65 4999 14 2499 65 4999 1104 424 9
1 479
2 2 2− + − + + −
=
�
99
252 An Introduction to Statistical Concepts
The�O’Brien�procedure�is�an�ANOVA�on�a�different�transformed�variable:
�
r
n n Y Y s n
n n
ij
j j ij j j j
j j
=
− − − −
− −
( . ) ( ) . ( )
( )( )
.1 5 5 1
1 2
2 2
which�is�computed�for�each�individual,�where
nj�is�the�size�of�group�j
Y
–
�j�is�the�mean�for�group�j
sj
2�is�the�sample�variance�for�group�j
The�test�statistic�is�an�F�and�is�computed�by
�
F
n r r J
r r N J
j j
j
J
ij j
j
J
i
nj
=
− −
− −
=
==
∑
∑∑
( ) /( )
( ) /( )
. ..
.
2
1
2
11
1
where
nj�designates�the�number�of�observations�in�group�j
J�is�the�number�of�groups�(where�j�ranges�from�1�to�J)
r–�j�is�the�mean�for�group�j�(computed�by�taking�the�sum�of�the�observations�in�group�j�
and�dividing�by�the�number�of�observations�in�group�j,�which�is�nj)
r���is�the�overall�mean�regardless�of�group�membership�(computed�by�taking�the�sum�of�
all�of�the�observations�across�all�groups�and�dividing�by�the�total�number�of�observa-
tions�N)
The�test�statistic�F�is�compared�against�a�critical�value�from�the�F�table�(Table�A�4)�with�J�−�1�
degrees�of�freedom�in�the�numerator�and�N�−�J�degrees�of�freedom�in�the�denominator,�
denoted�by�αFJ−1,N−J��If�the�test�statistic�is�greater�than�the�critical�value,�then�we�reject�H0;�
otherwise,�we�fail�to�reject�H0�
Let�us�return�to�the�example�in�Table�9�1�and�consider�the�results�of�the�O’Brien�proce-
dure��From�the�computations�shown�in�the�table,�the�test�statistic�F�=�1�4799�is�compared�
against�the�critical�value�for�α�=��05�of��05F2,9�=�4�26��As�the�test�statistic�is�smaller�than�the�
critical�value�(i�e�,�1�4799�<�4�26),�we�fail�to�reject�the�null�hypothesis�and�conclude�that�
the�three�student�groups�do�not�have�different�variances�
9.5 SPSS
Unfortunately,�there�is�not�much�to�report�on�tests�of�variances�for�SPSS��There�are�no�tests�
available�for�inferences�about�a�single�variance�or�for�inferences�about�two�dependent�vari-
ances��For�inferences�about�independent�variances,�SPSS�does�provide�Levene’s�test�as�part�
of�the�“Independent�t�Test”�procedure�(previously�discussed�in�Chapter�7),�and�as�part�of�the�
253Inferences About Variances
“One-Way�ANOVA”�and�“Univariate�ANOVA”�procedures�(to�be�discussed�in�Chapter�11)��
Given�our�previous�concerns�with�Levene’s�test,�use�it�with�caution��There�is�also�little�infor-
mation�published�in�the�literature�on�power�and�effect�sizes�for�tests�of�variances�
9.6 Template and APA-Style Write-Up
Consider�an�example�paragraph�for�one�of�the�tests�described�in�this�chapter,�more�spe-
cifically,� testing� inferences� about� two� dependent� variances�� As� you� may� remember,� our�
graduate�research�assistant,�Marie,�was�working�with�Jessica,�a�classroom�teacher,�to�assist�
in�analyzing�the�variances�of�first-grade�students��Her�task�was�to�assist�Jessica�with�writ-
ing� her� research� question� (Are the variances of achievement scores for first-grade children the
same in the fall as compared to the spring?)�and�generating�the�test�of�inference�to�answer�her�
question��Marie�suggested�a�dependent�variances�test�as�the�test�of�inference��A�template�
for�writing�a�research�question�for�the�dependent�variances�is�presented�as�follows:
Are the Variances of [Variable] the Same in [Time 1] as Compared to
[Time 2]?
An�example�write-up�is�presented�as�follows:
A dependent variances test was conducted to determine if variances
of achievement scores for first-grade children were the same in the
fall as compared to the spring. The test was conducted using an alpha
of .05. The null hypothesis was that the variances would be the same.
There was a statistically significant difference in variances of
achievement scores of first-grade children in the fall as compared to
the spring (t = −3.4261, df = 60, p < .05). Thus, the null hypothesis
that the variances would be equal at the beginning and end of the
first grade was rejected. The variances of achievement test scores
significantly increased from September to April.
9.7 Summary
In� this� chapter,� we� described� testing� hypotheses� about� variances�� Several� inferential�
tests� and� new� concepts� were� discussed�� The� new� concepts� introduced� were� the� sam-
pling� distribution� of� the� variance,� the� F� distribution,� and� homogeneity� of� variance�
tests�� The� first� inferential� test� discussed� was� the� test� of� a� single� variance,� followed�
by�a�test�of�two�dependent�variances��Next�we�examined�several�tests�of�two�or�more�
independent�variances��Here�we�considered�the�following�traditional�procedures:�the�
F� ratio� test,� Hartley’s� Fmax� test,� Cochran’s� C� test,� Bartlett’s� χ2� test,� and� Levene’s� test��
Unfortunately,�these�tests�are�not�very�robust�to�violation�of�the�normality�assumption��
We� then� discussed� two� newer� procedures� that� are� relatively� robust� to� nonnormality,�
254 An Introduction to Statistical Concepts
the�Brown–Forsythe�procedure�and�the�O’Brien�procedure��Examples�were�presented�
for� each� of� the� recommended� tests�� At� this� point,� you� should� have� met� the� following�
objectives:� (a)� be� able� to� understand� the� basic� concepts� underlying� tests� of� variances,�
(b)�be�able�to�select�the�appropriate�test,�and�(c)�be�able�to�determine�and�interpret�the�
results�from�the�appropriate�test��In�Chapter�10,�we�discuss�correlation�coefficients,�as�
well�as�inferential�tests�involving�correlations�
Problems
Conceptual problems
9.1� Which�of�the�following�tests�of�homogeneity�of�variance�is�most�robust�to�assump-
tion�violations?
� a�� F�ratio�test
� b�� Bartlett’s�chi-square�test
� c�� The�O’Brien�procedure
� d�� Hartley’s�Fmax�test
9.2� Cochran’s�C�test�requires�equal�sample�sizes��True�or�false?
9.3� I�assert�that�if�two�dependent�sample�variances�are�identical,�I�would�not�be�able�to�
reject�the�null�hypothesis��Am�I�correct?
9.4� Suppose�that�I�wish�to�test�the�following�hypotheses�at�the��01�level�of�significance:
�
H
H
0
2
1
2
250
250
:
:
σ
σ
=
>
� A� sample� variance� of� 233� is� observed�� I� assert� that� if� I� compute� the� χ2� test� statistic�
and�compare�it�to�the�χ2�table,�it�is�possible�that�I�could�reject�the�null�hypothesis��Am�
I�correct?
9.5� Suppose�that�I�wish�to�test�the�following�hypotheses�at�the��05�level�of�significance:
�
H
H
0
2
1
2
16
16
:
:
σ
σ
=
>
� A�sample�variance�of�18�is�observed��I�assert�that�if�I�compute�the�χ2�test�statistic�and�
compare� it� to� the� χ2� table,� it� is� possible� that� I� could� reject� the� null� hypothesis�� Am� I�
correct?
9.6� If� the� 90%� CI� for� a� single� variance� extends� from� 25�7� to� 33�6,� I� assert� that� the� null�
hypothesis�would�definitely�be�rejected�at�the��10�level��Am�I�correct?
9.7� If� the� 95%� CI� for� a� single� variance� ranges� from� 82�0� to� 93�5,� I� assert� that� the� null�
hypothesis�would�definitely�be�rejected�at�the��05�level��Am�I�correct?
9.8� If�the�mean�of�the�sampling�distribution�of�the�difference�between�two�variances�equals�
0,�I�assert�that�both�samples�probably�represent�a�single�population��Am�I�correct?
255Inferences About Variances
9.9� Which�of�the�following�is�an�example�of�two�dependent�samples?
� a�� Pretest�scores�of�males�in�one�course�and�posttest�scores�of�females�in�another�course
� b�� Husbands�and�their�wives�in�your�neighborhood
� c�� Softball�players�at�your�school�and�football�players�at�your�school
� d�� Professors�in�education�and�professors�in�psychology
9.10� The� mean� of� the� F� distribution� increases� as� degrees� of� freedom� denominator� (ν2)�
increase��True�or�false?
Computational problems
9.1� The�following�random�sample�of�scores�on�a�preschool�ability�test�is�obtained�from�a�
normally�distributed�population�of�4�year�olds:
20 22 24 30 18 22 29 27
25 21 19 22 38 26 17 25
� a�� Test�the�following�hypotheses�at�the��10�level�of�significance:
�
H
H
0
2
1
2
75
75
:
:
σ
σ
=
≠
� b�� Construct�a�90%�CI�
9.2� The� following� two� independent� random� samples� of� number� of� CDs� owned� are�
obtained�from�two�populations�of�undergraduate�(sample�1)�and�graduate�students�
(sample�2),�respectively:
Sample 1 Data Sample 2 Data
42 36 47 35 46 45 50 57 58 43
37 52 44 47 51 52 43 60 41 49
56 54 55 50 40 44 51 49 55 56
40 46 41
� Test� the� following� hypotheses� at� the� �05� level� of� significance� using� the� Brown–
Forsythe�and�O’Brien�procedures:
�
H
H
0 1
2
2
2
1 1
2
2
2
0
0
:
:
σ σ
σ σ
− =
− ≠
9.3� The�following�summary�statistics�are�available�for�two�dependent�random�samples�
of�brothers�and�sisters,�respectively,�on�their�allowance�for�the�past�month:�s1
2�=�49,�
s2
2�=�25,�n =�32,�r12�=��60�
�Test�the�following�hypotheses�at�the��05�level�of�significance:
�
H
H
0 1
2
2
2
1 1
2
2
2
0
0
:
:
σ σ
σ σ
− =
− ≠
256 An Introduction to Statistical Concepts
9.4� The�following�summary�statistics�are�available�for�two�dependent�random�samples�
of�first�semester�college�students�who�were�measured�on�their�high�school�and�first�
semester�college�GPAs,�respectively:� s1
2 �=�1�56,� s2
2 �=�4�42,�n�=�62,�r12�=��72�
�Test�the�following�hypotheses�at�the��05�level�of�significance:
�
H
H
0 1
2
2
2
1 1
2
2
2
0
0
:
:
σ σ
σ σ
− =
− ≠
9.5� A�random�sample�of�21�statistics�exam�scores�is�collected�with�a�sample�mean�of�
50� and� a� sample� variance� of� 10�� Test� the� following� hypotheses� at� the� �05� level� of�
significance:
�
H
H
0
2
1
2
25
25
:
:
σ
σ
=
≠
9.6� A� random� sample� of� 30� graduate� entrance� exam� scores� is� collected� with� a� sample�
mean�of�525�and�a�sample�variance�of�16,900��Test�the�following�hypotheses�at�the��05�
level�of�significance:
�
H
H
0
2
1
2
10 000
10 000
: ,
: ,
σ
σ
=
≠
9.7� A� pretest� was� given� at� the� beginning� of� a� history� course� and� a� posttest� at� the� end�
of� the� course�� The� pretest� variance� is� 36,� the� posttest� variance� is� 64,� sample� size� is�
31,� and� the� pretest-posttest� correlation� is� �80�� Test� the� null� hypothesis� that� the� two�
dependent�variances�are�equal�against�a�nondirectional�alternative�at�the��01�level�of�
significance�
Interpretive problems
9.1� Use� the� survey� 1� dataset� from� the� website� to� determine� if� there� are� gender� differ-
ences�among�the�variances�for�any�items�of�interest�that�are�at�least�interval�or�ratio�
in�scale��Some�example�items�might�include�the�following:
� a�� Item�#1:�height�in�inches
� b�� Item�#6:�amount�spent�at�last�hair�appointment
� c�� Item�#7:�number�of�compact�disks�owned
� d�� Item�#9:�current�GPA
� e�� Item�#10:�amount�of�exercise�per�week
� f�� Item�#15:�number�of�alcoholic�drinks�per�week
� g�� Item�#21:�number�of�hours�studied�per�week
257Inferences About Variances
9.2� Use� the� survey� 1� dataset� from� the� website� to� determine� if� there� are� differences�
between�the�variances�for�left-�versus�right-handed�individuals�on�any�items�of�inter-
est�that�are�at�least�interval�or�ratio�in�scale��Some�example�items�might�include�the�
following:
� a�� Item�#1:�height�in�inches
� b�� Item�#6:�amount�spent�at�last�hair�appointment
� c�� Item�#7:�number�of�compact�disks�owned
� d�� Item�#9:�current�GPA
� e�� Item�#10:�amount�of�exercise�per�week
� f�� Item�#15:�number�of�alcoholic�drinks�per�week
� g�� Item�#21:�number�of�hours�studied�per�week
259
10
Bivariate Measures of Association
Chapter Outline
10�1� Scatterplot
10�2� Covariance
10�3� Pearson�Product–Moment�Correlation�Coefficient
10�4� Inferences�About�the�Pearson�Product–Moment�Correlation�Coefficient
10�4�1� Inferences�for�a�Single�Sample
10�4�2� Inferences�for�Two�Independent�Samples
10�5� Assumptions�and�Issues�Regarding�Correlations
10�5�1� Assumptions
10�5�2� Correlation�and�Causality
10�5�3� Restriction�of�Range
10�6� Other�Measures�of�Association
10�6�1� Spearman’s�Rho
10�6�2� Kendall’s�Tau
10�6�3� Phi
10�6�4� Cramer’s�Phi
10�6�5� Other�Correlations
10�7� SPSS
10�8� G*Power
10�9� Template�and�APA-Style�Write-Up
Key Concepts
� 1�� Scatterplot
� 2�� Strength�and�direction
� 3�� Covariance
� 4�� Correlation�coefficient
� 5�� Fisher’s�Z�transformation
� 6�� Linearity�assumption,�causation,�and�restriction�of�range�issues
260 An Introduction to Statistical Concepts
We� have� considered� various� inferential� tests� in� the� last� four� chapters,� specifically� those�
that�deal�with�tests�of�means,�proportions,�and�variances��In�this�chapter,�we�examine�mea-
sures�of�association�as�well�as�inferences�involving�measures�of�association��Methods�for�
directly�determining�the�relationship�among�two�variables�are�known�as�bivariate analy-
sis,� rather� than� univariate analysis� which� is� only� concerned� with� a� single� variable�� The�
indices�used�to�directly�describe�the�relationship�among�two�variables�are�known�as�cor-
relation coefficients�(in�the�old�days,�known�as�co-relation)�or�as�measures of association�
These�measures�of�association�allow�us�to�determine�how�two�variables�are�related�to�
one� another� and� can� be� useful� in� two� applications,� (a)� as� a� descriptive� statistic� by� itself�
and�(b)�as�an�inferential�test��First,�a�researcher�may�want�to�compute�a�correlation�coeffi-
cient�for�its�own�sake,�simply�to�tell�the�researcher�precisely�how�two�variables�are�related�
or� associated�� For� example,� we� may� want� to� determine� whether� there� is� a� relationship�
between�the�GRE-Quantitative�(GRE-Q)�subtest�and�performance�on�a�statistics�exam��Do�
students�who�score�relatively�high�on�the�GRE-Q�perform�higher�on�a�statistics�exam�than�
do�students�who�score�relatively�low�on�the�GRE-Q?�In�other�words,�as�scores�increase�on�
the�GRE-Q,�do�they�also�correspondingly�increase�their�performance�on�a�statistics�exam?
Second,� we� may� want� to� use� an� inferential� test� to� assess� whether� (a)� a� correlation� is�
significantly�different�from�0�or�(b)�two�correlations�are�significantly�different�from�one�
another��For�example,�is�the�correlation�between�GRE-Q�and�statistics�exam�performance�
significantly�different�from�0?�As�a�second�example,�is�the�correlation�between�GRE-Q�
and�statistics�exam�performance�the�same�for�younger�students�as�it�is�for�older�students?
The�following�topics�are�covered�in�this�chapter:�scatterplot,�covariance,�Pearson�product-
moment�correlation�coefficient,�inferences�about�the�Pearson�product–moment�correlation�
coefficient,� some� issues� regarding� correlations,� other� measures� of� association,� SPSS,� and�
power��We�utilize�some�of�the�basic�concepts�previously�covered�in�Chapters�6�through�9��
New�concepts�to�be�discussed�include�the�following:�scatterplot;�strength�and�direction;�
covariance;� correlation� coefficient;� Fisher’s� Z� transformation;� and� linearity� assumption,�
causation,�and�restriction�of�range�issues��Our�objectives�are�that�by�the�end�of�this�chapter,�
you�will�be�able�to�(a)�understand�the�concepts�underlying�the�correlation�coefficient�and�
correlation�inferential�tests,�(b)�select�the�appropriate�type�of�correlation,�and�(c)�determine�
and�interpret�the�appropriate�correlation�and�inferential�test�
10.1 Scatterplot
Marie,�the�graduate�student�pursuing�a�degree�in�educational�research,�continues�to�work�
diligently�on�her�coursework��Additionally,�as�we�will�once�again�see�in�this�chapter,�Marie�
continues�to�assist�her�faculty�advisor�with�various�research�tasks�
Marie’s� faculty� advisor� received� a� telephone� call� from� Matthew,� the� director� of� mar-
keting�for�the�local�animal�shelter��Based�on�the�donor�list,�it�appears�that�the�donors�
who� contribute� the� largest� donations� also� have� children� and� pets�� In� an� effort� to�
attract�more�donors�to�the�animal�shelter,�Matthew�is�targeting�select�groups—one�of�
which�he�believes�may�be�families�that�have�children�at�home�and�who�also�have�pets��
Matthew�believes�if�there�is�a�relationship�between�these�variables,�he�can�more�easily�
reach� the� intended� audience� with� his� marketing� materials� which� will� then� translate�
into� increased� donations� to� the� animal� shelter�� However,� Matthew� wants� to� base� his�
261Bivariate Measures of Association
decision�on�solid�evidence�and�not�just�a�hunch��Having�built�a�good�knowledge�base�
with� previous� consulting� work,� Marie’s� faculty� advisor� puts� Matthew� in� touch� with�
Marie�� After� consulting� with� Matthew,� Marie� suggests� a� Pearson� correlation� as� the�
test�of�inference�to�test�his�research�question:�Is there a correlation between the number of
children in a family and the number of pets?�Marie’s�task�is�then�to�assist�in�generating�the�
test�of�inference�to�answer�Matthew’s�research�question�
This�section�deals�with�an�important�concept�underlying�the�relationship�among�two�vari-
ables,�the�scatterplot��Later�sections�move�us�into�ways�of�measuring�the�relationship�among�
two� variables�� First,� however,� we� need� to� set� up� the� situation� where� we� have� data� on� two�
different�variables�for�each�of�N�individuals�in�the�population��Table�10�1�displays�such�a�situ-
ation��The�first�column�is�simply�an�index�of�the�individuals�in�the�population,�from�i�=�1,…,�N,�
where� N� is� the� total� number� of� individuals� in� the� population�� The� second� column� denotes�
the�values�obtained�for�the�first�variable�X��Thus,�X1�=�10�means�that�the�first�individual�had�
a�score�of�10�on�variable�X��The�third�column�provides�the�values�for�the�second�variable�Y��
Thus,�Y1�=�20�indicates�that�the�first�individual�had�a�score�of�20�on�variable�Y��In�an�actual�
data�table,�only�the�scores�would�be�shown,�not�the�Xi�and�Yi�notation��Thus,�we�have�a�tabular�
method�for�depicting�the�data�of�a�two-variable�situation�in�Table�10�1�
A�graphical�method�for�depicting�the�relationship�among�two�variables�is�to�plot�the�
pair�of�scores�on�X�and�Y�for�each�individual�on�a�two-dimensional�figure�known�as�a�
scatterplot�(or�scattergram)��Each�individual�has�two�scores�in�a�two-dimensional�coor-
dinate�system,�denoted�by�(X,�Y)��For�example,�our�individual�1�has�the�paired�scores�
of� (10,� 20)�� An� example� scatterplot� is� shown� in� Figure� 10�1�� The� X� axis� (the� horizontal�
Table 10.1
Layout�for�Correlational�Data
Individual X Y
1 X1�=�10 Y1�=�20
2 X2�=�12 Y2�=�28
3 X3�=�20 Y3�=�33
� � �
� � �
� � �
N XN�=�44 YN�=�65
20
Y
10 X
FIGuRe 10.1
Scatterplot�
262 An Introduction to Statistical Concepts
axis�or�abscissa)�represents�the�values�for�variable�X,�and�the�Y�axis�(the�vertical�axis�or�
ordinate)�represents�the�values�for�variable�Y��Each�point�on�the�scatterplot�represents�a�
pair�of�scores�(X,�Y)�for�a�particular�individual��Thus,�individual�1�has�a�point�at�X�=�10�
and�Y�=�20�(the�circled�point)��Points�for�other�individuals�are�also�shown��In�essence,�
the�scatterplot�is�actually�a�bivariate�frequency�distribution��When�there�is�a�moderate�
degree�of�relationship,�the�points�may�take�the�shape�of�an�ellipse�(i�e�,�a�football�shape�
where� the� direction� of� the� relationship,� positive� or� negative,� may� make� the� football�
appear�to�point�up�to�the�right—as�with�a�positive�relation�depicted�in�this�figure),�as�
in�Figure�10�1�
The�scatterplot�allows�the�researcher�to�evaluate�both�the�direction�and�the�strength�of�
the�relationship�among�X�and�Y��The�direction�of�the�relationship�has�to�do�with�whether�
the�relationship�is�positive�or�negative��A�positive�relationship�occurs�when�as�scores�on�
variable�X�increase�(from�left�to�right),�scores�on�variable�Y�also�increase�(from�bottom�to�
top)��Thus,�Figure�10�1�indicates�a�positive�relationship�among�X�and�Y��Examples�of�dif-
ferent�scatterplots�are�shown�in�Figure�10�2��Figure�10�2a�and�d�displays�positive�relation-
ships��A�negative�relationship,�sometimes�called�an�inverse�relationship,�occurs�when�as�
scores�on�variable�X�increase�(from�left�to�right),�scores�on�variable�Y�decrease�(from�top�to�
bottom)��Figure�10�2b�and�e�shows�examples�of�negative�relationships��There�is�no�relation-
ship�between�X�and�Y�when�for�a�large�value�of�X,�a�large�or�a�small�value�of�Y�can�occur,�
and�for�a�small�value�of�X,�a�large�or�a�small�value�of�Y�can�also�occur��In�other�words,�X�
and�Y�are�not�related,�as�shown�in�Figure�10�2c�
The� strength� of� the� relationship� among� X� and� Y� is� determined� by� the� scatter� of� the�
points� (hence� the� name� scatterplot)�� First,� we� draw� a� straight� line� through� the� points�
which�cuts�the�bivariate�distribution�in�half,�as�shown�in�Figures�10�1�and�10�2��In�Chapter�17,�
we�note�that�this�line�is�known�as�the�regression�line��If�the�scatter�is�such�that�the�points�
tend�to�fall�close�to�the�line,�then�this�is�indicative�of�a�strong�relationship�among�X�and�Y��
Figure�10�2a�and�b�denotes�strong�relationships��If�the�scatter�is�such�that�the�points�are�
widely�scattered�around�the�line,�then�this�is�indicative�of�a�weak�relationship�among�
(e)
(c)(a)
(d)
(b)
FIGuRe 10.2
Examples�of�possible�scatterplots�
263Bivariate Measures of Association
X� and� Y�� Figure� 10�2d� and� e� denotes� weak� relationships�� To� summarize� Figure� 10�2,�
part�(a)�represents�a�strong�positive�relationship,�part�(b)�a�strong�negative�relationship,�part�
(c)� no� relationship,� part� (d)� a� weak� positive� relationship,� and� part� (e)� a� weak� negative�
relationship��Thus,�the�scatterplot�is�useful�for�providing�a�quick�visual�indication�of�the�
nature�of�the�relationship�among�variables�X�and�Y�
10.2 Covariance
The�remainder�of�this�chapter�deals�with�statistical�methods�for�measuring�the�relationship�
among�variables�X�and�Y��The�first�such�method�is�known�as�the�covariance��The�covariance�
conceptually� is� the� shared� variance� (or� co-variance)� among� X� and� Y�� The� covariance� and�
correlation� share� commonalities� as� the� correlation� is� simply� the� standardized� covariance��
The�population�covariance�is�denoted�by�σXY,�and�the�conceptual�formula�is�given�as�follows:
σ
µ µ
XY
i X i Y
i
N
X Y
N
=
− −
=
∑( )( )
1
where
Xi�and�Yi�are�the�scores�for�individual�i�on�variables�X�and�Y,�respectively
μX�and�μY�are�the�population�means�for�variables�X�and�Y,�respectively
N�is�the�population�size
This� equation� looks� similar� to� the� computational� formula� for� the� variance� presented� in�
Chapter�3,�where�deviation�scores�from�the�mean�are�computed�for�each�individual��The�
conceptual� formula� for� the� covariance� is� essentially� an� average� of� the� paired� deviation�
score�products��If�variables�X�and�Y�are�positively�related,�then�the�deviation�scores�will�
tend�to�be�of�the�same�sign,�their�products�will�tend�to�be�positive,�and�the�covariance�will�
be�a�positive�value�(i�e�,�σXY�>�0)��If�variables�X�and�Y�are�negatively�related,�then�the�devia-
tion�scores�will�tend�to�be�of�opposite�signs,�their�products�will�tend�to�be�negative,�and�
the�covariance�will�be�a�negative�value�(i�e�,�σXY�<�0)��Finally,�if�variables�X�and�Y�are�not�
related,�then�the�deviation�scores�will�consist�of�both�the�same�and�opposite�signs,�their�
products�will�be�both�positive�and�negative�and�sum�to�0,�and�the�covariance�will�be�a�zero�
value�(i�e�,�σXY�=�0)�
The� sample� covariance� is� denoted� by� sXY,� and� the� conceptual� formula� becomes� as�
follows:
s
X X Y Y
n
XY
i i
i
n
=
− −
−
=
∑( )( )
1
1
where
X
–
�and�Y
–
�are�the�sample�means�for�variables�X�and�Y,�respectively
n�is�sample�size
264 An Introduction to Statistical Concepts
Note� that� the� denominator� becomes� n� −� 1� so� as� to� yield� an� unbiased� sample� esti-
mate�of�the�population�covariance�(i�e�,�similar�to�what�we�did�in�the�sample�variance�
situation)�
The� conceptual� formula� is� unwieldy� and� error� prone� for� other� than� small� samples��
Thus,� a� computational� formula� for� the� population� covariance� has� been� developed� as�
seen�here:
σXY
i i
i
N
i
i
N
i
i
N
N X Y X Y
N
=
−
= = =
∑ ∑ ∑
1 1 1
2
where�the�first�summation�involves�the�cross�product�of�X�multiplied�by�Y�for�each�indi-
vidual� summed� across� all� N� individuals,� and� the� other� terms� should� be� familiar�� The�
computational�formula�for�the�sample�covariance�is�the�following:
s
n X Y X Y
n n
XY
i i
i
n
i
i
n
i
i
n
=
−
−
= = =
∑ ∑ ∑
1 1 1
1( )
where�the�denominator�is�n(n�−�1)�so�as�to�yield�an�unbiased�sample�estimate�of�the�popula-
tion�covariance�
Table�10�2�gives�an�example�of�a�population�situation�where�a�strong�positive�relation-
ship�is�expected�because�as�X�(number�of�children�in�a�family)�increases,�Y�(number�of�pets�
in�a�family)�also�increases��Here�σXY�is�computed�as�follows:
σXY
i i
i
N
i
i
N
i
i
N
N X Y X Y
N
=
−
=
−= = =
∑ ∑ ∑
1 1 1 5 108
2
( ) (115 30
25
3 6000
)( )
.=
The�sign�indicates� that�the�relationship�between�X�and�Y�is�indeed�positive��That�is,�the�
more�children�a�family�has,�the�more�pets�they�tend�to�have��However,�like�the�variance,�
Table 10.2
Example�Correlational�Data�(X�=�#�Children,�Y�=�#�Pets)
Individual X Y XY X 2 Y 2 Rank X Rank Y (Rank X − Rank Y )2
1 1 2 2 1 4 1 1 0
2 2 6 12 4 36 2 3 1
3 3 4 12 9 16 3 2 1
4 4 8 32 16 64 4 4 0
5 5 10 50 25 100 5 5 0
Sums 15 30 108 55 220 2
265Bivariate Measures of Association
the� value� of� the� covariance� depends� on� the� scales� of� the� variables� involved�� Thus,� inter-
pretation�of�the�magnitude�of�a�single�covariance�is�difficult,�as�it�can�take�on�literally�any�
value�� We� see� shortly� that� the� correlation� coefficient� takes� care� of� this� problem�� For� this�
reason,� you� are� only� likely� to� see� the� covariance� utilized� in� the� analysis� of� covariance�
(Chapter� 14)� and� advanced� techniques� such� as� structural� equation� modeling� and� multi-
level�modeling�(beyond�the�scope�of�this�text)�
10.3 Pearson Product–Moment Correlation Coefficient
Other�methods�for�measuring�the�relationship�among�X�and�Y�have�been�developed�that�
are�easier�to�interpret�than�the�covariance��We�refer�to�these�measures�as�correlation coeffi-
cients��The�first�correlation�coefficient�we�consider�is�the�Pearson product–moment corre-
lation coefficient,�developed�by�the�famous�statistician�Karl�Pearson,�and�simply�referred�
to�as�the�Pearson�here��The�Pearson�can�be�considered�in�several�different�forms,�where�the�
population�value�is�denoted�by�ρXY�(rho)�and�the�sample�value�by�rXY��One�conceptual�form�
of�the�Pearson�is�a�product�of�standardized�z�scores�(previously�described�in�Chapter�4)��
This�formula�for�the�Pearson�is�given�as�follows:
ρXY
X Y
i
N
z z
N
= =
∑( )
1
where�zX�and�zY�are�the�z�scores�for�variables�X�and�Y,�respectively,�whose�product�is�taken�
for�each�individual,�and�then�summed�across�all�N�individuals�
Because�z�scores�are�standardized�versions�of�raw�scores,�then�the�Pearson�correla-
tion�is�simply�a�standardized�version�of�the�covariance��The�sign�of�the�Pearson�denotes�
the�direction�of�the�relationship�(e�g�,�positive�or�negative),�and�the�value�of�the�Pearson�
denotes� the� strength� of� the� relationship�� The� Pearson� falls� on� a� scale� from� −1�00� to�
+1�00,� where� −1�00� indicates� a� perfect� negative� relationship,� 0� indicates� no� relation-
ship,� and� +1�00� indicates� a� perfect� positive� relationship�� Values� near� �50� or� −�50� are�
considered�as�moderate�relationships,�values�near�0�as�weak�relationships,�and�values�
near�+1�00�or�−1�00�as�strong�relationships�(although�these�are�subjective�terms)��Cohen�
(1988)� also� offers� rules� of� thumb,� which� are� presented� later� in� this� chapter,� for� inter-
preting� the� value� of� the� correlation�� As� you� may� see� as� you� read� more� statistics� and�
research� methods� textbooks,� there� are� other� guidelines� offered� for� interpreting� the�
value�of�the�correlation�
There� are� other� forms� of� the� Pearson�� A� second� conceptual� form� of� the� Pearson� is� in�
terms�of�the�covariance�and�the�standard�deviations�and�is�given�as�follows:
ρ
σ
σ σ
XY
XY
X Y
=
266 An Introduction to Statistical Concepts
This�form�is�useful�when�the�covariance�and�standard�deviations�are�already�known��A�final�
form�of�the�Pearson�is�the�computational�formula,�written�as�follows:
ρXY
i i
i
N
i
i
N
i
i
N
i
i
N
N X Y X Y
N X
=
−
= = =
=
∑ ∑ ∑
∑
1 1 1
2
1
−
−
= = =
∑ ∑ ∑X N Y Yi
i
N
i
i
N
i
i
N
1
2
2
1 1
2
where�all�terms�should�be�familiar�from�the�computational�formulas�of�the�variance�and�
covariance��This�is�the�formula�to�use�for�hand�computations,�as�it�is�more�error-free�than�
the�other�previously�given�formulas�
For�the�example�children-pet�data�given�in�Table�10�2,�we�see�that�the�Pearson�correlation�
is�computed�as�follows:
ρXY
i i
i
N
i
i
N
i
i
N
i
i
N
N X Y X Y
N X
=
−
= = =
=
∑ ∑ ∑
∑
1 1 1
2
1
−
−
= = =
∑ ∑ ∑X N Y Yi
i
N
i
i
N
i
i
N
1
2
2
1 1
2
=
−
− −
=
5 108 15 30
5 55 15 5 220 302 2
( ) ( )( )
( ) ( ) ( ) ( )
.99000
Thus,�there�is�a�very�strong�positive�relationship�among�variables�X�(the�number�of�chil-
dren)�and�Y�(the�number�of�pets)�
The�sample�correlation�is�denoted�by�rXY��The�formulas�are�essentially�the�same�for�the�
sample�correlation�rXY�and�the�population�correlation�ρXY,�except�that�n�is�substituted�for�N��
For�example,�the�computational�formula�for�the�sample�correlation�is�noted�here:
r
n X Y X Y
n X
XY
i i
i
n
i
i
n
i
i
n
i
i
n
=
−
= = =
=
∑ ∑ ∑
∑
1 1 1
2
1
−
−
= = =
∑ ∑ ∑X n Y Yi
i
n
i
i
n
i
i
n
1
2
2
1 1
2
Unlike�the�sample�variance�and�covariance,�the�sample�correlation�has�no�correction�for�bias�
10.4 Inferences About Pearson Product–Moment Correlation Coefficient
Once�a�researcher�has�determined�one�or�more�Pearson�correlation�coefficients,�it�is�often�
useful�to�know�whether�the�sample�correlations�are�significantly�different�from�0��Thus,�
we�need�to�visit�the�world�of�inferential�statistics�again��In�this�section,�we�consider�two�
267Bivariate Measures of Association
different� inferential� tests:� first� for� testing� whether� a� single� sample� correlation� is� signifi-
cantly�different�from�0�and�second�for�testing�whether�two�independent�sample�correla-
tions�are�significantly�different�
10.4.1 Inferences for a Single Sample
Our�first�inferential�test�is�appropriate�when�you�are�interested�in�determining�whether�
the� correlation� among� variables� X� and� Y� for� a� single� sample� is� significantly� different�
from� 0�� For� example,� is� the� correlation� between� the� number� of� years� of� education� and�
current�income�significantly�different�from�0?�The�test�of�inference�for�the�Pearson�cor-
relation�will�be�conducted�following�the�same�steps�as�those�in�previous�chapters��The�
null�hypothesis�is�written�as
H0 0: ρ =
A�nondirectional�alternative�hypothesis,�where�we�are�willing�to�reject�the�null�if�the�sam-
ple�correlation�is�either�significantly�greater�than�or�less�than�0,�is�nearly�always�utilized��
Unfortunately,�the�sampling�distribution�of�the�sample�Pearson�r�is�too�complex�to�be�of�
much� value� to� the� applied� researcher�� For� testing� whether� the� correlation� is� different�
from�0�(i�e�,�where�the�alternative�hypothesis�is�specified�as�H1:�ρ�≠�0),�a�transformation�of�r�
can�be�used�to�generate�a�t-distributed�test�statistic��The�test�statistic�is
t r
n
r
=
−
−
2
1 2
which�is�distributed�as�t�with�ν�=�n�−�2�degrees�of�freedom,�assuming�that�both�X�and�Y�
are�normally�distributed�(although�even�if�one�variable�is�normal�and�the�other�is�not,�the�
t�distribution�may�still�apply;�see�Hogg�&�Craig,�1970)�
There�are�two�assumptions�with�the�Pearson�correlation��First,�the�Pearson�correlation�
is� appropriate� only� when� there� is� a� linear� relationship� assumed� between� the� variables�
(given�that�both�variables�are�at�least�interval�in�scale)��In�other�words,�when�a�curvilinear�
or�some�type�of�polynomial�relationship�is�present,�the�Pearson�correlation�should�not�be�
computed�� Testing� for� linearity� can� be� done� by� simply� graphing� a� bivariate� scatterplot�
and�reviewing�it�for�a�general�linear�display�of�points��Also,�and�as�we�have�seen�with�the�
other�inferential�procedures�discussed�in�previous�chapters,�we�need�to�again�assume�that�
the�scores�of�the�individuals�are�independent�of�one�another��For�the�Pearson�correlation,�the�
assumption� of� independence� is� met� when� a� random� sample� of� units� have� been� selected�
from�the�population�
It� should� be� noted� for� inferential� tests� of� correlations� that� sample� size� plays� a� role� in�
determining� statistical� significance�� For� instance,� this� particular� test� is� based� on� n� −� 2�
degrees� of� freedom�� If� sample� size� is� small� (e�g�,� 10),� then� it� is� difficult� to� reject� the� null�
hypothesis�except�for�very�strong�correlations��If�sample�size�is�large�(e�g�,�200),�then�it�is�
easy� to� reject� the� null� hypothesis� for� all� but� very� weak� correlations�� Thus,� the� statistical�
significance�of�a�correlation�is�definitely�a�function�of�sample�size,�both�for�tests�of�a�single�
correlation�and�for�tests�of�two�correlations�
Effect�size�and�power�are�always�important,�particularly�here�where�sample�size�plays�
such� a� large� role�� Cohen� (1988)� proposed� using� r� as� a� measure� of� effect� size,� using� the�
subjective�standard�(ignoring�the�sign�of�the�correlation)�of�r�=��1�as�a�weak�effect,�r�=��3�
268 An Introduction to Statistical Concepts
as�a�moderate�effect,�and�r�=��5�as�a�strong�effect��These�standards�were�developed�for�the�
behavioral�sciences,�but�other�standards�may�be�used�in�other�areas�of�inquiry��Cohen�also�
has�a�nice�series�of�power�tables�in�his�Chapter�3�for�determining�power�and�sample�size�
when�planning�a�correlational�study��As�for�confidence�intervals�(CIs),�Wilcox�(1996)�notes�
that�“many�methods�have�been�proposed�for�computing�CIs�for�ρ,�but�it�seems�that�a�satis-
factory�method�for�applied�work�has�yet�to�be�derived”�(p��303)��Thus,�a�CI�procedure�is�not�
recommended,�even�for�large�samples�
From�the�example�children-pet�data,�we�want�to�determine�whether�the�sample�Pearson�
correlation�is�significantly�different�from�0,�with�a�nondirectional�alternative�hypothesis�
and�at�the��05�level�of�significance��The�test�statistic�is�computed�as�follows:
t r
n
r
=
−
−
=
−
−
=
2
1
9000
5 2
1 8100
3 57622 . .
.
The� critical� values� from� Table� A�2� are� ± = ±α2 3 3 182t . �� Thus,� we� would� reject� the� null�
hypothesis,� as� the� test� statistic� exceeds� the� critical� value,� and� conclude� the� correlation�
among�variables�X�and�Y�is�significantly�different�from�0��In�summary,�there�is�a�strong,�
positive,�statistically�significant�correlation�between�the�number�of�children�and�the�num-
ber�of�pets�
10.4.2 Inferences for Two Independent Samples
In�a�second�situation,�the�researcher�may�have�collected�data�from�two�different�indepen-
dent�samples��It�can�be�determined�whether�the�correlations�among�variables�X�and�Y�are�
equal�for�these�two�independent�samples�of�observations��For�example,�is�the�correlation�
among�height�and�weight�the�same�for�children�and�adults?�Here�the�null�and�alternative�
hypotheses�are�written�as
H
H
0 1 2
1 1 2
0
0
:
:
ρ ρ
ρ ρ
− =
− ≠
where�ρ1�is�the�correlation�among�X�and�Y�for�sample�1�and�ρ2�is�the�correlation�among�X�
and�Y�for�sample�2��However,�because�correlations�are�not�normally�distributed�for�every�
value�of�ρ,�a�transformation�is�necessary��This�transformation�is�known�as�Fisher’s�Z�trans-
formation,� named� after� the� famous� statistician� Sir� Ronald� A�� Fisher,� which� is� approxi-
mately� normally� distributed� regardless� of� the� value� of� ρ�� Table� A�5� is� used� to� convert� a�
sample�correlation�r�to�a�Fisher’s�Z�transformed�value��Note�that�Fisher’s�Z�is�a�totally�dif-
ferent�statistic�from�any�z�score�or�z�statistic�previously�covered�
The�test�statistic�for�this�situation�is
z
Z Z
n n
=
−
−
+
−
1 2
1 2
1
3
1
3
where
n1�and�n2�are�the�sizes�of�the�two�samples
Z1�and�Z2�are�the�Fisher’s�Z�transformed�values�for�the�two�samples
269Bivariate Measures of Association
The� test� statistic� is� then�compared� to� critical� values� from� the� z� distribution� in� Table� A�1��
For� a� nondirectional� alternative� hypothesis� where� the� two� correlations� may� be� different�
in� either� direction,� the� critical� values� are� ± α2z�� Directional� alternative� hypotheses� where�
the�correlations�are�different�in�a�particular�direction�can�also�be�tested�by�looking�in�the�
appropriate�tail�of�the�z�distribution�(i�e�,�either�+ α1 z�or�− α1 z)�
Cohen�(1988)�proposed�a�measure�of�effect�size�for�the�difference�between�two�indepen-
dent�correlations�as�q�=�Z1�−�Z2��The�subjective�standards�proposed�(ignoring�the�sign)�are�
q�=��1�as�a�weak�effect,�q�=��3�as�a�moderate�effect,�and�q�=��5�as�a�strong�effect�(these�are�the�
standards�for�the�behavioral�sciences,�although�standards�vary�across�disciplines)��A�nice�
set�of�power�tables�for�planning�purposes�is�contained�in�Chapter�4�of�Cohen��Once�again,�
while�CI�procedures�have�been�developed,�none�of�these�have�been�viewed�as�acceptable�
(Marascuilo�&�Serlin,�1988;�Wilcox,�2003)�
Consider� the� following� example�� Two� samples� have� been� independently� drawn� of� 28�
children� (sample� 1)� and� 28� adults� (sample� 2)�� For� each� sample,� the� correlations� among�
height�and�weight�were�computed�to�be�rchildren�=��8�and�radults�=��4��A�nondirectional�alter-
native�hypothesis�is�utilized�where�the�level�of�significance�is�set�at��05��From�Table�A�5,�we�
first�determine�the�Fisher’s�Z�transformed�values�to�be�Zchildren�=�1�099�and�Zadults�=��4236��
Then�the�test�statistic�z�is�computed�as�follows:
z
Z Z
n n
=
−
−
+
−
=
−
+
=1 2
1 2
1
3
1
3
1 099 4236
1
25
1
25
2 3878
. .
.
From�Table�A�1,�the�critical�values�are�± = ±α2 1 96z . ��Our�decision�then�is�to�reject�the�null�
hypothesis�and�conclude�that�height�and�weight�do�not�have�the�same�correlation�for�chil-
dren�and�adults��In�other�words,�there�is�a�statistically�significant�difference�of�the�height-
weight�correlation�between�children�and�adults�with�a�strong�effect�size�(q�=��6754)��This�
inferential�test�assumes�both�variables�are�normally�distributed�for�each�population�and�
that�scores�are�independent�across�individuals;�however,�the�procedure�is�not�very�robust�
to� nonnormality� as� the� Z� transformation� assumes� normality� (Duncan� &� Layard,� 1973;�
Wilcox,�2003;�Yu�&�Dunn,�1982)��Thus,�caution�should�be�exercised�in�using�the�z�test�when�
data�are�nonnormal�(e�g�,�Yu�&�Dunn�recommend�the�use�of�Kendall’s�τ�as�discussed�later�
in�this�chapter)�
10.5 Assumptions and Issues Regarding Correlations
There�are�several�issues�about�the�Pearson�and�other�types�of�correlations�that�you�should�
be�aware�of��These�issues�are�concerned�with�the�assumption�of�linearity,�correlation�and�
causation,�and�restriction�of�range�
10.5.1 assumptions
First,� as� mentioned� previously,� the� Pearson� correlation� assumes� that� the� relationship�
among�X�and�Y�is�a�linear relationship.�In�fact,�the�Pearson�correlation,�as�a�measure�of�
relationship,�is�really�a�linear�measure�of�relationship��Recall�from�earlier�in�the�chapter�
270 An Introduction to Statistical Concepts
the� scatterplots� to� which� we� fit� a� straight� line�� The� linearity� assumption� means� that� a�
straight�line�provides�a�reasonable�fit�to�the�data��If�the�relationship�is�not�a�linear�one,�
then�the�linearity�assumption�is�violated��However,�these�correlational�methods�can�still�
be�computed,�fitting�a�straight�line�to�the�data,�albeit�inappropriately��The�result�of�such�
a� violation� is� that� the� strength� of� the� relationship� will� be� reduced�� In� other� words,� the�
linear�correlation�will�be�much�closer�to�0�than�the�true�nonlinear�relationship�
For�example,�there�is�a�perfect�curvilinear�relationship�shown�by�the�data�in�Figure�10�3,�
where�all�of�the�points�fall�precisely�on�the�curved�line��Something�like�this�might�occur�if�
you�correlate�age�with�time�in�the�mile�run,�as�younger�and�older�folks�would�take�longer�
to�run�this�distance�than�others��If�these�data�are�fit�by�a�straight�line,�then�the�correlation�
will�be�severely�reduced,�in�this�case,�to�a�value�of�0�(i�e�,�the�horizontal�straight�line�that�
runs�through�the�curved�line)��This�is�another�good�reason�to�always�examine�your�data��
The� computer� may� determine� that� the� Pearson� correlation� among� variables� X� and� Y� is�
small�or�around�0��However,�on�examination�of�the�data,�you�might�find�that�the�relation-
ship�is�indeed�nonlinear;�thus,�you�should�get�to�know�your�data��We�return�to�the�assess-
ment�of�nonlinear�relationships�in�Chapter�17�
Second,�the�assumption�of�independence�applies�to�correlations��This�assumption�is�met�
when�units�or�cases�are�randomly�sampled�from�the�population�
10.5.2 Correlation and Causality
A�second�matter�to�consider�is�an�often-made�misinterpretation�of�a�correlation��Many�indi-
viduals�(e�g�,�researchers,�the�public,�and�the�media)�often�infer�a�causal�relationship�from�a�
strong�correlation��However,�a�correlation�by�itself�should�never�be�used�to�infer�causation��
In�particular,�a�high�correlation�among�variables�X�and�Y�does�not�imply�that�one�variable�is�
causing�the�other;�it�simply�means�that�these�two�variables�are�related�in�some�fashion��There�
are�many�reasons�why�variables�X�and�Y�are�highly�correlated��A�high�correlation�could�be�
the�result�of�(a)�X�causing�Y,�(b)�Y�causing�X,�(c)�a�third�variable�Z�causing�both�X�and�Y,�or�
(d)�even�many�more�variables�being�involved��The�only�methods�that�can�strictly�be�used�to�
infer�cause�are�experimental�methods�that�employ�random�assignment�where�one�variable�
is� manipulated� by� the� researcher� (the� cause),� a� second� variable� is� subsequently� observed�
(the�effect),�and�all�other�variables�are�controlled��[There�are,�however,�some�excellent�quasi-
experimental�methods,�propensity�score�analysis�and�regression�discontinuity,�that�can�be�
used�in�some�situations�and�that�mimic�random�assignment�and�increase�the�likelihood�of�
speaking�to�causal�inference�(Shadish,�Cook,�&�Campbell,�2002)�]
FIGuRe 10.3
Nonlinear�relationship�
Y
X
271Bivariate Measures of Association
10.5.3 Restriction of Range
A�final�issue�to�consider�is�the�effect�of�restriction of the range�of�scores�on�one�or�both�
variables��For�example,�suppose�that�we�are�interested�in�the�relationship�among�GRE�
scores�and�graduate�grade�point�average�(GGPA)��In�the�entire�population�of�students,�
the� relationship� might� be� depicted� by� the� scatterplot� shown� in� Figure� 10�4�� Say� the�
Pearson�correlation�is�found�to�be��60�as�depicted�by�the�entire�sample�in�the�full�scat-
terplot��Now�we�take�a�more�restricted�population�of�students,�those�students�at�highly�
selective� Ivy-Covered� University� (ICU)�� ICU� only� admits� students� whose� GRE� scores�
are�above�the�cutoff�score�shown�in�Figure�10�4��Because�of�restriction�of�range�in�the�
scores�of�the�GRE�variable,�the�strength�of�the�relationship�among�GRE�and�GGPA�at�
ICU�is�reduced�to�a�Pearson�correlation�of��20,�where�only�the�subsample�portion�of�the�
plot�to�the�right�of�the�cutoff�score�is�involved��Thus,�when�scores�on�one�or�both�vari-
ables�are�restricted�due�to�the�nature�of�the�sample�or�population,�then�the�magnitude�
of�the�correlation�will�usually�be�reduced�(although�see�an�exception�in�Figure�6�3�from�
Wilcox,�2003)�
It�is�difficult�for�two�variables�to�be�highly�related�when�one�or�both�variables�have�little�
variability��This�is�due�to�the�nature�of�the�formula��Recall�that�one�version�of�the�Pearson�
formula�consisted�of�standard�deviations�in�the�denominator��Remember�that�the�standard�
deviation�measures�the�distance�of�the�sample�scores�from�the�mean��When�there�is�restric-
tion�of�range,�the�distance�of�the�individual�scores�from�the�mean�is�minimized��In�other�
words,� there� is� less� variation� or� variability� around� the� mean�� This� translates� to� smaller�
correlations�(and�smaller�covariances)��If�the�size�of�the�standard�deviation�for�one�variable�
is�reduced,�everything�else�being�equal,�then�the�size�of�correlations�with�other�variables�
will�also�be�reduced��In�other�words,�we�need�sufficient�variation�for�a�relationship�to�be�
evidenced�through�the�correlation�coefficient�value��Otherwise�the�correlation�is�likely�to�
be�reduced�in�magnitude,�and�you�may�miss�an�important�correlation��If�you�must�use�a�
restrictive�subsample,�we�suggest�you�choose�measures�of�greater�variability�for�correla-
tional�purposes�
Outliers,�observations�that�are�different�from�the�bulk�of�the�observations,�also�reduce�
the� magnitude� of� correlations�� If� one� observation� is� quite� different� from� the� rest� such�
that�it�fell�outside�of�the�ellipse,�then�the�correlation�would�be�smaller�in�magnitude�(e�g�,�
closer�to�0)�than�the�correlation�without�the�outlier��We�discuss�outliers�in�this�context�
in�Chapter�17�
GGPA
Cuto� GRE
FIGuRe 10.4
Restriction�of�range�example�
272 An Introduction to Statistical Concepts
10.6 Other Measures of Association
Thus�far,�we�have�considered�one�type�of�correlation,�the�Pearson�product–moment�cor-
relation� coefficient�� The� Pearson� is� most� appropriate� when� both� variables� are� at� least�
interval�level��That�is,�both�variables�X�and�Y�are�interval-�and/or�ratio-level�variables��
The�Pearson�is�considered�a�parametric�procedure�given�the�distributional�assumptions�
associated� with� it�� If� both� variables� are� not� at� least� interval� level,� then� other� measures�
of� association,� considered� nonparametric� procedures,� should� be� considered� as� they� do�
not�have�distributional�assumptions�associated�with�them��In�this�section,�we�examine�
in� detail� the� Spearman’s� rho� and� phi� types� of� correlation� coefficients� and� briefly� men-
tion�several�other�types��While�a�distributional�assumption�for�these�correlations�is�not�
necessary,�the�assumption�of�independence�still�applies�(and�thus�a�random�sample�from�
the�population�is�assumed)�
10.6.1 Spearman’s Rho
Spearman’s�rho�rank�correlation�coefficient�is�appropriate�when�both�variables�are�ordinal�
level�� This� type� of� correlation� was� developed� by� Charles� Spearman,� the� famous� quanti-
tative� psychologist�� Recall� from� Chapter� 1� that� ordinal� data� are� where� individuals� have�
been�rank-ordered,�such�as�class�rank��Thus,�for�both�variables,�either�the�data�are�already�
available�in�ranks,�or�the�researcher�(or�computer)�converts�the�raw�data�to�ranks�prior�to�
the�analysis�
The�equation�for�computing�Spearman’s�rho�correlation�is
ρS
i i
i
N
X Y
N N
= −
−
−
=
∑
1
6
1
2
1
2
( )
( )
where
ρS�denotes�the�population�Spearman’s�rho�correlation
(Xi�−�Yi)�represents�the�difference�between�the�ranks�on�variables�X�and�Y�for�individual�i�
The�sample�Spearman’s�rho�correlation�is�denoted�by�rS�where�n�replaces�N,�but�other-
wise�the�equation�remains�the�same��In�case�you�were�wondering�where�the�“6”�in�the�
equation�comes�from,�you�will�find�an�interesting�article�by�Lamb�(1984)��Unfortunately,�
this�particular�computational�formula�is�only�appropriate�when�there�are�no�ties�among�
the�ranks�for�either�variable��An�example�of�a�tie�in�rank�would�be�if�two�cases�scored�the�
same�value�on�either�X�or�Y��With�ties,�the�formula�given�is�only�approximate,�depending�
on�the�number�of�ties��In�the�case�of�ties,�particularly�when�there�are�more�than�a�few,�
many�researchers�recommend�using�Kendall’s�τ�(tau)�as�an�alternative�correlation�(e�g�,�
Wilcox,�1996)�
As� with� the� Pearson� correlation,� Spearman’s� rho� ranges� from� −1�0� to� +1�0�� The� rules� of�
thumb�that�we�used�for�interpreting�the�Pearson�correlation�(e�g�,�Cohen,�1988)�can�be�applied�
to�Spearman’s�rho�correlation�values�as�well��The�sign�of�the�coefficient�can�be�interpreted�as�
with�the�Pearson��A�negative�sign�indicates�that�as�the�values�for�one�variable�increase,�the�
values�for�the�other�variable�decrease��A�positive�sign�indicates�that�as�one�variable�increases�
in�value,�the�value�of�the�second�variable�also�increases�
273Bivariate Measures of Association
As�an�example,�consider�the�children-pets�data�again�in�Table�10�2��To�the�right�of�the�table,�
you� see� the� last� three� columns� labeled� as� rank� X,� rank� Y,� and� (rank� X� −� rank� Y)2�� The� raw�
scores�were�converted�to�ranks,�where�the�lowest�raw�score�received�a�rank�of�1��The�last�col-
umn�lists�the�squared�rank�differences��As�there�were�no�ties,�the�computations�are�as�follows:
ρS
i i
i
N
X Y
N N
= −
−
−
= − ==
∑
1
6
1
1
6 2
5 24
9000
2
1
2
( )
( )
( )
( )
.
Thus,�again�there�is�a�strong�positive�relationship�among�variables�X�and�Y��It�is�a�coincidence�
that�ρ�=�ρS�for�this�dataset,�but�not�so�for�computational�problem�1�at�the�end�of�this�chapter�
To�test�whether�a�sample�Spearman’s�rho�correlation�is�significantly�different�from�0,�
we� examine� the� following� null� hypothesis� (the� alternative� hypothesis� would� be� stated�
as�H1:�ρS�≠�0):
H S0 0: ρ =
The�test�statistic�is�given�as
t
r n
r
S
S
=
−
−
2
1 2
which�is�approximately�distributed�as�a�t�distribution�with�ν�=�n�−�2�degrees�of�freedom�
(Ramsey,�1989)��The�approximation�works�best�when�n�is�at�least�10��A�nondirectional�alter-
native�hypothesis,�where�we�are�willing�to�reject�the�null�if�the�sample�correlation�is�either�
significantly�greater�than�or�less�than�0,�is�nearly�always�utilized��From�the�example,�we�
want�to�determine�whether�the�sample�Spearman’s�rho�correlation�is�significantly�different�
from�0�at�the��05�level�of�significance��For�a�nondirectional�alternative�hypothesis,�the�test�
statistic�is�computed�as
t
r n
r
S
S
=
−
−
=
−
−
=
2
1
9000 5 2
1 81
3 5762
2
.
.
.
where�the�critical�values�from�Table�A�2�are�± = ±α2 3 3 182t . ��Thus,�we�would�reject�the�null�
hypothesis� and� conclude� that� the� correlation� is� significantly� different� from� 0,� strong� in�
magnitude�(suggested�by�the�value�of�the�correlation�coefficient;�using�Cohen’s�guidelines�
for� interpretation� as� an� effect� size,� this� would� be� considered� a� large� effect),� and� positive�
in� direction� (evidenced� from� the� sign� of� the� correlation� coefficient)�� The� exact� sampling�
distribution�for�when�3�≤�n�≤�18�is�given�by�Ramsey�
10.6.2 kendall’s Tau
Another�correlation�that�can�be�computed�with�ordinal�data�is�Kendall’s�tau,�which�also�
uses�ranks�of�data�to�calculate�the�correlation�coefficient�(and�has�an�adjustment�for�tied�
ranks)�� The� ranking� for� Kendall’s� tau� differs� from� Spearman’s� rho� in� the� following� way��
274 An Introduction to Statistical Concepts
With�Kendall’s�tau,�the�values�for�one�variable�are�rank-ordered,�and�then�the�order�of�the�
second�variable�is�examined�to�see�how�many�pairs�of�values�are�out�of�order��A�perfect�
positive�correlation�(+1�0)�is�achieved�with�Kendall’s�tau�when�no�scores�are�out�of�order,�
and�a�perfect�negative�correlation�(−1�0)�is�obtained�when�all�scores�are�out�of�order��Values�
for�Kendall’s�tau�range�from�−1�0�to�+1�0��The�rules�of�thumb�that�we�used�for�interpreting�
the�Pearson�correlation�(e�g�,�Cohen,�1988)�can�be�applied�to�Kendall’s�tau�correlation�val-
ues�as�well��The�sign�of�the�coefficient�can�be�interpreted�as�with�the�Pearson:�A�negative�
sign�indicates�that�as�the�values�for�one�variable�increase,�the�values�for�the�second�vari-
able�decrease��A�positive�sign�indicates�that�as�one�variable�increases�in�value,�the�value�
of�the�second�variable�also�increases��While�similar�in�some�respects,�Spearman’s�rho�and�
Kendall’s�tau�are�based�on�different�calculations,�and,�thus,�finding�different�results�is�not�
uncommon�� While� both� are� appropriate� when� ordinal� data� are� being� correlated,� it� has�
been�suggested�that�Kendall’s�tau�provides�a�better�estimation�of�the�population�correla-
tion�coefficient�value�given�the�sample�data�(Howell,�1997),�especially�with�smaller�sample�
sizes�(e�g�,�n�≤�10)�
10.6.3 phi
The�phi�coefficient�ϕ�is�appropriate�when�both�variables�are�dichotomous�in�nature�(and�is�
statistically� equivalent� to� the� Pearson)�� Recall� from� Chapter� 1� that� a� dichotomous� variable�
is�one�consisting�of�only�two�categories�(i�e�,�binary),�such�as�gender,�pass/fail,�or�enrolled/
dropped�out��Thus,�the�variables�being�correlated�would�be�either�nominal�or�ordinal�in�scale��
When�correlating�two�dichotomous�variables,�one�can�think�of�a�2��2�contingency�table�as�
previously�discussed�in�Chapter�8��For�instance,�to�determine�if�there�is�a�relationship�among�
gender�and�whether�students�are�still�enrolled�since�their�freshman�year,�a�contingency�table�
like�Table�10�3�can�be�constructed��Here�the�columns�correspond�to�the�two�levels�of�the�enroll-
ment�status�variable,�enrolled�(coded�1)�or�dropped�out�(0),�and�the�rows�correspond�to�the�
two�levels�of�the�gender�variable,�female�(1)�or�male�(0)��The�cells�indicate�the�frequencies�for�
the�particular�combinations�of�the�levels�of�the�two�variables��If�the�frequencies�in�the�cells�are�
denoted�by�letters,�then�a�represents�females�who�dropped�out,�b�represents�females�who�
are�enrolled,�c�indicates�males�who�dropped�out,�and�d�indicates�males�who�are�enrolled�
The�equation�for�computing�the�phi�coefficient�is
ρφ =
−
+ + + +
( )
( )( )( )( )
bc ad
a c b d a b c d
where�ρϕ�denotes�the�population�phi�coefficient�(for�consistency’s�sake,�although�typically�
written�as�ϕ),�and�rϕ�denotes�the�sample�phi�coefficient�using�the�same�equation��Note�that�
Table 10.3
Contingency�Table�for�Phi�Correlation
Enrollment Status
Student Gender Dropped Out (0) Enrolled (1)
Female�(1) a�=�5 b�=�20 a�+�b�=�25
Male�(0) c�=�15 d�=�10 c�+�d�=�25
a�+�c�=�20 b�+�d�=�30 a�+�b�+�c�+�d�=�50
275Bivariate Measures of Association
the�bc�product�involves�the�consistent�cells,�where�both�values�are�the�same,�either�both�0�or�
both�1,�and�the�ad�product�involves�the�inconsistent�cells,�where�both�values�are�different�
Using�the�example�data�from�Table�10�3,�we�compute�the�phi�coefficient�to�be�the�following:
ρφ =
−
+ + + +
=
−
=
( )
( )( )( )( )
( )
( )( )( )( )
.
bc ad
a c b d a b c d
300 50
20 30 25 25
40082
Thus,�there�is�a�moderate,�positive�relationship�between�gender�and�enrollment�status��We�
see�from�the�table�that�a�larger�proportion�of�females�than�males�are�still�enrolled�
To� test� whether� a� sample� phi� correlation� is� significantly� different� from� 0,� we� test� the�
following�null�hypothesis�(the�alternative�hypothesis�would�be�stated�as�H1:�ρϕ�≠�0):
H0 0: ρφ =
The�test�statistic�is�given�as
χ φ
2 2= nr
which�is�distributed�as�a�χ2�distribution�with�one�degree�of�freedom��From�the�example,�
we�want�to�determine�whether�the�sample�phi�correlation�is�significantly�different�from�0�
at�the��05�level�of�significance��The�test�statistic�is�computed�as
χ φ
2 2 250 4082 8 3314= = =nr (. ) .
and�the�critical�value�from�Table�A�3�is�. .05 1
2 3 84χ = ��Thus,�we�would�reject�the�null�hypoth-
esis�and�conclude�that�the�correlation�among�gender�and�enrollment�status�is�significantly�
different�from�0�
10.6.4 Cramer’s phi
When�the�variables�being�correlated�have�more�than�two�categories,�Cramer’s�phi�(Cramer’s�V�
in�SPSS)�can�be�computed��Thus,�Cramer’s�phi�is�appropriate�when�both�variables�are�nominal�
(and�at�least�one�variable�has�more�than�two�categories)�or�when�one�variable�is�nominal�and�
the�other�variable�is�ordinal�(and�at�least�one�variable�has�more�than�two�categories)��
As� with�the�other�correlation�coefficients�that�we�have�discussed,�values�range�from�−1�0�to�
+1�0�� Cohen’s� guidelines� (1988)� for� interpreting� the� correlation� in� terms� of� effect� size� can� be�
applied�to�Cramer’s�phi�correlations,�as�they�can�with�any�other�correlation�examined�
10.6.5 Other Correlations
Other� types� of� correlations� have� been� developed� for� different� combinations� of� types� of�
variables,�but�these�are�rarely�used�in�practice�and�are�unavailable�in�most�statistical�pack-
ages�(e�g�,�rank�biserial�and�point�biserial)��Table�10�4�provides�suggestions�for�when�dif-
ferent�types�of�correlations�are�most�appropriate��We�mention�briefly�the�two�other�types�
of�correlations�in�the�table:�the�rank�biserial�correlation�is�appropriate�when�one�variable�
is�dichotomous�and�the�other�variable�is�ordinal,�whereas�the�point�biserial�correlation�is�
appropriate�when�one�variable�is�dichotomous�and�the�other�variable�is�interval�or�ratio�
(statistically�equivalent�to�the�Pearson;�thus,�the�Pearson�correlation�can�be�computed�in�
this�situation)�
276 An Introduction to Statistical Concepts
10.7 SPSS
Next� let� us� see� what� SPSS� has� to� offer� in� terms� of� measures� of� association� using� the�
children-pets�example�dataset��There�are�two�programs�for�obtaining�measures�of�asso-
ciation� in� SPSS,� dependent� on� the� measurement� scale� of� your� variables—the� Bivariate�
Correlation�program�(for�computing�the�Pearson,�Spearman’s�rho,�and�Kendall’s�tau)�and�
the�Crosstabs�program�(for�computing�the�Pearson,�Spearman’s�rho,�Kendall’s�tau,�phi,�
Cramer’s�phi,�and�several�other�types�of�measures�of�association)�
Bivariate Correlations
Step 1:�To�locate�the�Bivariate�Correlations�program,�we�go�to�“Analyze”�in�the�top�pull-
down�menu,�then�select�“Correlate,”�and�then�“Bivariate.”�Following�the�screenshot�
(step�1),�as�follows,�produces�the�“Bivariate”�dialog�box�
A
B
C
Bivariate
correlations:
Step 1
Table 10.4
Different�Types�of�Correlation�Coefficients
Variable X
Variable Y Nominal Ordinal Interval/Ratio
Nominal Phi�(when�both�variables�are�
dichotomous)�or�Cramer’s�V�
(when�one�or�both�variables�have�
more�than�two�categories)
Rank�biserial�or�
Cramer’s�V
Point�biserial�(Pearson�
in lieu�of�point�biserial)
Ordinal Rank�biserial�or�Cramer’s�V Spearman’s�rho�or�
Kendall’s�tau
Spearman’s�rho�or�
Kendall’s�tau�or�Pearson
Interval/ratio Point�biserial�(Pearson�in�lieu�
of point�biserial)
Spearman’s�rho�or�
Kendall’s�tau�or�
Pearson
Pearson
277Bivariate Measures of Association
Step 2:�Next,�from�the�main�“Bivariate Correlations”�dialog�box,�click�the�variables�
to�correlate�(e�g�,�number�of�children�and�pets)�and�move�them�into�the�“Variables”�box�
by�clicking�on�the�arrow�button��In�the�bottom�half�of�this�dialog�box,�options�are�available�
for�selecting�the�type�of�correlation,�one-�or�two-tailed�test�(i�e�,�directional�or�nondirectional�
test),�and�whether�to�flag�statistically�significant�correlations��For�illustrative�purposes,�we�
will� place� a� checkmark� to� generate� the�“Pearson”� and�“Spearman’s rho”� correlation�
coefficients��We�will�also�select�the�radio�button�for�a�“Two-tailed”�test�of�significance,�and�
at�the�very�bottom�check,�we�will�“Flag significant correlations”�(which�simply�
means�an�asterisk�will�be�placed�next�to�significant�correlations�in�the�output)�
Select the
variables of
interest from the
list on the left and
use the arrow to
move to the
“Variables” box
on the right.
Clicking on
“Options” will
allow you to obtain
the means, standard
deviations, and/or
covariances.
Place a checkmark in the
box that corresponds to the
type of correlation to
generate. This decision will
be based on the
measurement scale of your
variables.
“Test of significance”
selected is based on a non-
directional (two-tailed) or
directional (one-tailed) test.
“Flag significant
correlations” will
generate asterisks in the
output for statistically
significant correlations.
Bivariate
correlations:
Step 2
Step 3 (optional):� To� obtain� means,� standard� deviations,� and/or� covariances,� as�
well�as�options�for�dealing�with�missing�data�(listwise�or�pairwise�deletion),�click�on�the�
“Options”�button�located�in�the�top�right�corner�of�the�main�dialog�box�
Step 3
278 An Introduction to Statistical Concepts
From� the� main� dialog� box,� click� on� “Ok”� to� run� the� analysis� and� to� generate� the�
output�
Interpreting the output:�The�output�for�generation�of�the�Pearson�and�Spearman’s�
rho� bivariate� correlations� between� number� of� children� and� number� of� pets� appears� in�
Table� 10�5�� For� illustrative� purposes,� we� asked� for� both� the� Pearson� and� Spearman’s� rho�
correlations�(although�the�Pearson�is�the�appropriate�correlation�given�the�measurement�
scales�of�our�variables,�we�have�also�generated�the�Spearman’s�rho�so�that�the�output�can�
be�reviewed)��Thus,�the�top�Correlations�box�gives�the�Pearson�results�and�the�bottom�
Correlations�box�the�Spearman’s�rho�results��In�both�cases,�the�output�presents�the�cor-
relation,�sample�size�(N�in�SPSS�language,�although�usually�denoted�as�n�by�everyone�else),�
observed�level�of�significance,�and�asterisks�denoting�statistically�significant�correlations��
In� reviewing� Table� 10�5,� we� see� that� SPSS� does� not� provide� any� output� in� terms� of� CIs,�
power,� or� effect� size�� Later� in� the� chapter,� we� illustrate� the� use� of� G*Power� for� comput-
ing�power��Effect�size�is�easily�interpreted�from�the�correlation�coefficient�value�utilizing�
Cohen’s�(1988)�subjective�standards�previously�described,�and�we�have�not�recommended�
any�CI�procedures�for�correlations�
Table 10.5
SPSS�Results�for�Child—Pet�Data
The bivariate Pearson correlations are
presented in the top row. The value of
“1” indicates the Pearson correlation of
the variable with itself. The correlation
of interest (relationship of number of
children to number of pets) is .900.
The asterisk indicates the correlation is
statistically significant at an alpha of .05.
�e probability is less than
4% (see “Sig. (two-tailed)”)
that we would see this
relationship by random
chance if the relationship
between variables was zero
(i.e., if the null hypothesis
was really true).
N represents the total
sample size.
�e bottom half of the
table presents the same
information as that
presented in the top half.
�e results for the
same data
computed with
Spearman’s rho are
presented here and
interpreted similarly.
Children
Pets
Pearson correlation
Sig. (two-tailed)
N
Pearson correlation
Sig. (two-tailed)
N
Correlation coe�cient
Correlations
Children Pets
Sig. (two-tailed)
N
Correlation coe�cient
Sig. (two-tailed)
N
* Correlation is significant at the 0.05 level (two-tailed).
Children
Correlations
Pets
.900*
5
1
5
1
5
5
.037
.900*
.037
* Correlation is significant at the 0.05 level (two-tailed).
Pets
ChildrenSpearman’s rho 1.000
.
5
.900*
.037
5
5
.900*
.037
5
.
1.000
279Bivariate Measures of Association
Using Scatterplots to Examine Linearity
for Bivariate Correlations
Step 1:�As�alluded�to�earlier�in�the�chapter,�understanding�the�extent�to�which�linear-
ity� is� a� reasonable� assumption� is� an� important� first� step� prior� to� computing� a� Pearson�
correlation� coefficient�� To� generate� a� scatterplot,� go� to� “Graphs”� in� the� top� pulldown�
menu��From�there,�select�“Legacy Dialogs,”�then�“Scatter/Dot”�(see�screenshot�for�
“Scatterplots: Step 1”)�
A
B
C
Scatterplots:
Step 1
Step 2:� This� will� bring� up� the� “Scatter/Dot”� dialog� box� (see� screenshot� for�
“Scatterplots: Step 2”).�The�default�selection�is�“Simple Scatter,”�and�this�is�
the�option�we�will�use��Then�click�“Define.”
Scatterplots:
Step 2
Step 3:� This� will� bring� up� the�“Simple Scatterplot”� dialog� box� (see� screenshot�
for�“Scatterplots: Step 3”)��Click�the�dependent�variable�(e�g�,�number�of�pets)�and�
move�it�into�the�“Y�Axis”�box�by�clicking�on�the�arrow��Click�the�independent�variable�
(e�g�,� number� of� children)� and� move� it� into� the�“X� Axis”� box� by� clicking� on� the� arrow��
Then�click�“Ok.”
280 An Introduction to Statistical Concepts
Scatterplots:
Step 3
Interpreting linearity evidence:� Scatterplots� are� also� often� examined� to� deter-
mine�visual�evidence�of�linearity�prior�to�computing�Pearson�correlations��Scatterplots�are�
graphs�that�depict�coordinate�values�of�X�and�Y��Linearity�is�suggested�by�points�that�fall�in�
a�straight�line��This�line�may�suggest�a�positive�relation�(as�scores�on�X�increase,�scores�on�Y�
increase,�and�vice�versa),�a�negative�relation�(as�scores�on�X�increase,�scores�on�Y�decrease,�
and� vice� versa),� little� or� no� relation� (relatively� random� display� of� points),� or� a� polynomial�
relation�(e�g�,�curvilinear)��In�this�example,�our�scatterplot�suggests�evidence�of�linearity�and,�
more�specifically,�a�positive�relationship�between�number�of�children�and�number�of�pets��
Thus,�proceeding�to�compute�a�bivariate�Pearson�correlation�coefficient�is�reasonable�
10.00
8.00
6.00
4.00
2.00
1.00 2.00 3.00
Number of children
4.00 5.00
N
um
be
r o
f p
et
s
281Bivariate Measures of Association
Using Crosstabs to Compute Correlations
The�Crosstabs�program�has�already�been�discussed�in�Chapter�8,�but�it�can�also�be�used�
for� obtaining� many� measures� of� association� (specifically� Spearman’s� rho,� Kendall’s� tau,�
Pearson,�phi�and�Cramer’s�phi)��We�will�illustrate�the�use�of�Crosstabs�for�two�nominal�
variables,�thus�generating�phi�and�Cramer’s�phi�
Step 1:�To�compute�phi�or�Cramer’s�phi�correlations,�go�to�“Analyze”�in�the�top�pull-
down,� then� select� “Descriptive Statistics,”� and� then� select� the� “Crosstabs”�
procedure�
A
B
C
Phi and
Cramers’s phi:
Step 1
Step 2:�Select�the�dependent�variable�(if�applicable;�many�times,�there�are�not�depen-
dent�and�independent�variables,�per�se,�with�bivariate�correlations,�and�in�those�cases,�
determining� which� variable� is� X� and� which� variable� is� Y� is� largely� irrelevant)� and�
move�it�into�the�“Row(s)”�box�by�clicking�on�the�arrow�key�[e�g�,�here�we�used�enroll-
ment�status�as�the�dependent�variable�(1�=�enrolled;�0�=�not�enrolled)]��Then�select�the�
independent�variable�and�move�it�into�the�“Column(s)”�box�[in�this�example,�gender�
is�the�independent�variable�(0�=�male;�1�=�female)]�
282 An Introduction to Statistical Concepts
Clicking on
“Statistics” will
allow you to select various
statistics to generate
(including various measures
of association).
Select the variable
of interest from
the list on the left
and use the arrow
to move to the
boxes on the right.
If applicable, the
dependent variable
should be displayed
in the row(s) and
the independent
variable in the
column(s).
Phi and
Cramers’s phi:
Step 2
Step 3:�In�the�top�right�corner�of�the�“Crosstabs”�dialog�box�(see�screenshot�step�2),�
click� on� the� button� labeled�“Statistics.”� From� here,� you� can� select� various� measures�
of�association�(i�e�,�types�of�correlation�coefficients)��Which�correlation�is�selected�should�
depend� on� the� measurement� scales� of� your� variables�� With� two� nominal� variables,� the�
appropriate�correlation�to�select� is�“Phi and Cramer’s V.”� Click�on�“Continue”�to�
return�to�the�main�“Crosstabs”�dialog�box�
Clicking on
“Correlations” will
generate Pearson,
Spearman’s rho, and
Kendall’s tau correlations.
Phi and
Cramer’s phi:
Step 3
From�the�main�dialog�box,�click�on�“Ok”�to�run�the�analysis�and�generate�the�output�
283Bivariate Measures of Association
10.8 G*Power
A� priori� and� post� hoc� power� could� again� be� determined� using� the� specialized� software�
described�previously�in�this�text�(e�g�,�G*Power),�or�you�can�consult�a�priori�power�tables�
(e�g�,�Cohen,�1988)��As�an�illustration,�we�use�G*Power�to�compute�the�post�hoc�power�of�
our�test�
Post Hoc Power for the Pearson Bivariate
Correlation Using G*Power
The�first�thing�that�must�be�done�when�using�G*Power�for�computing�post�hoc�power�is�to�
select�the�correct�test�family��In�our�case,�we�conducted�a�Pearson�correlation��To�find�the�
Pearson,�we�will�select�“Tests”�in�the�top�pulldown�menu,�then�“Correlations and
regression,”� and� then�“Correlations: Bivariate normal model.”� Once� that�
selection�is�made,�the�“Test family”�automatically�changes�to�“Exact.”
A
B
C
Step 1
284 An Introduction to Statistical Concepts
The�“Type of power analysis”�desired�then�needs�to�be�selected��To�compute�
post hoc�power,�select�“Post hoc: Compute achieved power—given�α,�sample
size, and effect size.”
�e default selection
for “Test
Family” is“t
tests.” Following
the procedures
presented in Step 1
will automatically
change the test
family to “exact.”
�e default selection for “Statistical
Test” is “Correlation: Point
biserial model.” Following the procedures
presented in Step 1 will automatically change the
statistical test to “correlation:
bivariate normal model.”
Step 2
The� “Input Parameters”� must� then� be� specified�� The� first� parameter� is� specifica-
tion� of� the� number� of� tail(s)�� For� a� directional� hypothesis,� “One”� is� selected,� and� for� a�
nondirectional�hypothesis,�“Two”�is�selected��In�our�example,�we�chose�a�nondirectional�
hypothesis�and�thus�will�select�“Two”�tails��We�then�input�the�observed�correlation�coef-
ficient�value�in�the�box�for�“Correlation�ρ�H1�”�In�this�example,�our�Pearson�correlation�
coefficient�value�was��90��The�alpha�level�we�tested�at�was��05,�the�total�sample�size�was�5,�
and� the�“Correlation� ρ�H0”� will� remain� as� the� default� 0� (this� is� the� correlation� value�
expected�if�the�null�hypothesis�is�true;�in�other�words,�there�is�zero�correlation�between�
variables� given� the� null� hypothesis)�� Once� the� parameters� are� specified,� simply� click� on�
“Calculate”�to�generate�the�power�results�
285Bivariate Measures of Association
The “Input Parameters” for computing
post hoc power must be specified for:
1. One or two tailed test
2. Observed correlation coefficient value
3. Alpha level
4. Total sample size
5. Hypothesized correlation coefficient value
Once the
parameters are
specified, click on
“Calculate.”
The�“Output Parameters”�provide�the�relevant�statistics�given�the�input�just�speci-
fied��In�this�example,�we�were�interested�in�determining�post�hoc�power�for�a�Pearson�cor-
relation�given�a�two-tailed�test,�with�a�computed�correlation�value�of��90,�an�alpha�level�of�
�05,�total�sample�size�of�5,�and�a�null�hypothesis�correlation�value�of�0�
Based� on� those� criteria,� the� post� hoc� power� was� �67�� In� other� words,� with� a� two-tailed�
test,�an�observed�Pearson�correlation�of��90,�an�alpha�level�of��05,�sample�size�of�5,�and�a�
null� hypothesis� correlation� value� of� 0,� the� power� of� our� test� was� �67—the� probability� of�
rejecting�the�null�hypothesis�when�it�is�really�false�(in�this�case,�the�probability�that�there�
is�not�a�zero�correlation�between�our�variables)�was�67%,�which�is�slightly�less�than�what�
would�be�usually�considered�sufficient�power�(sufficient�power�is�often��80�or�above)��Keep�
in�mind�that�conducting�power�analysis�a�priori�is�recommended�so�that�you�avoid�a�situ-
ation�where,�post�hoc,�you�find�that�the�sample�size�was�not�sufficient�to�reach�the�desired�
level�of�power�(given�the�observed�parameters)�
286 An Introduction to Statistical Concepts
10.9 Template and APA-Style Write-Up
Finally�we�conclude�the�chapter�with�a�template�and�an�APA-style�paragraph�detailing�the�
results�from�an�example�dataset�
Pearson Correlation Test
As� you� may� recall,� our� graduate� research� assistant,� Marie,� was� working� with� the� mar-
keting� director� of� the� local� animal� shelter,� Matthew�� Marie’s� task� was� to� assist� Matthew�
in�generating�the�test�of�inference�to�answer�his�research�question,�“Is there a relationship
between the number of children in a family and the number of pets”?�A�Pearson�correlation�was�
the�test�of�inference�suggested�by�Marie��A�template�for�writing�a�research�question�for�a�
correlation�(regardless�of�which�type�of�correlation�coefficient�is�computed)�is�presented�
in�the�following:
Is There a Correlation Between [Variable 1] and [Variable 2]?
It� may� be� helpful� to� include� in� the� results� information� on� the� extent� to� which� the�
assumptions�were�met�(recall�there�are�two�assumptions:�independence�and�linearity)��
This�assists�the�reader�in�understanding�that�you�were�thorough�in�data�screening�prior�
to�conducting�the�test�of�inference��Recall�that�the�assumption�of�independence�is�met�
when�the�cases�in�our�sample�have�been�randomly�selected�from�the�population��One�
or�two�sentences�are�usually�sufficient�to�indicate�if�the�assumptions�are�met��It�is�also�
important� to� address� effect� size� in� the� write-up�� Correlations� are� unique� in� that� they�
are� already� effect� size� measures,� so� computing� an� effect� size� in� addition� to� the� cor-
relation�value�is�not�needed��However,�it�is�desirable�to�interpret�the�correlation�value�
as� an�effect�size�� Effect�size�is�easily�interpreted�from�the�correlation�coefficient�value�
utilizing�Cohen’s�(1988)�subjective�standards�previously�described��Here�is�an�APA-style�
example�paragraph�of�results�for�the�correlation�between�number�of�children�and�num-
ber�of�pets�
A Pearson correlation coefficient was computed to determine if there
is a relationship between the number of children in a family and the
number of pets in the family. The test was conducted using an alpha
of .05. The null hypothesis was that the relationship would be 0. The
assumption of independence was met via random selection. The assump-
tion of linearity was reasonable given a review of a scatterplot of
the variables.
The Pearson correlation between children and pets is .90, which is
positive, is interpreted as a large effect size (Cohen, 1988), and
is statistically different from 0 (r =�.90,�n�= 5, p = .037).�Thus,
the null hypothesis that the correlation is 0 was rejected at the
.05 level of significance. There is a strong, positive correlation
between the number of children in a family and the number of pets
in the family.
287Bivariate Measures of Association
10.10 Summary
In�this�chapter,�we�described�various�measures�of�the�association�or�correlation�among�two�
variables�� Several� new� concepts� and� descriptive� and� inferential� statistics� were� discussed��
The� new� concepts� covered� were� as� follows:� scatterplot;� strength� and� direction;� covariance;�
correlation� coefficient;� Fisher’s� Z� transformation;� and� linearity� assumption,� causation,� and�
restriction�of�range�issues��We�began�by�introducing�the�scatterplot�as�a�graphical�method�for�
visually�depicting�the�association�among�two�variables��Next�we�examined�the�covariance�as�
an�unstandardized�measure�of�association��Then�we�considered�the�Pearson�product–moment�
correlation�coefficient,�first�as�a�descriptive�statistic�and�then�as�a�method�for�making�infer-
ences�when�there�are�either�one�or�two�samples�of�observations��Some�important�issues�about�
the�correlational�measures�were�also�discussed��Finally,�a�few�other�measures�of�association�
were�introduced,�in�particular,�the�Spearman’s�rho�and�Kendall’s�tau�rank-order�correlation�
coefficients�and�the�phi�and�Cramer’s�phi�coefficients��At�this�point,�you�should�have�met�the�
following�objectives:�(a)�be�able�to�understand�the�concepts�underlying�the�correlation�coef-
ficient�and�correlation�inferential�tests,�(b)�be�able�to�select�the�appropriate�type�of�correlation,�
and�(c)�be�able�to�determine�and�interpret�the�appropriate�correlation�and�correlation�inferen-
tial�test��In�Chapter�11,�we�discuss�the�one-factor�analysis�of�variance,�the�logical�extension�of�
the�independent�t�test,�for�assessing�mean�differences�among�two�or�more�groups�
Problems
Conceptual problems
10.1� The�variance�of�X�is�9,�the�variance�of�Y�is�4,�and�the�covariance�between�X�and�Y�is�2��
What�is�rXY?
� a�� �039
� b�� �056
� c�� �233
� d�� �333
10.2� The�standard�deviation�of�X�is�20,�the�standard�deviation�of�Y�is�50,�and�the�covari-
ance�between�X�and�Y�is�30��What�is�rXY?
� a�� �030
� b�� �080
� c�� �150
� d�� �200
10.3� Which�of�the�following�correlation�coefficients,�each�obtained�from�a�sample�of�1000�
children,�indicates�the�weakest�relationship?
� a�� −�90
� b�� −�30
� c�� +�20
� d�� +�80
288 An Introduction to Statistical Concepts
10.4� �Which�of�the�following�correlation�coefficients,�each�obtained�from�a�sample�of�1000�
children,�indicates�the�strongest�relationship?
� a�� −�90
� b�� −�30
� c�� +�20
� d�� +�80
10.5� �If�the�relationship�between�two�variables�is�linear,�which�of�the�following�is�neces-
sarily�true?
� a�� The�relation�can�be�most�accurately�represented�by�a�straight�line�
� b�� All�the�points�will�fall�on�a�curved�line�
� c�� The�relationship�is�best�represented�by�a�curved�line�
� d�� All�the�points�must�fall�exactly�on�a�straight�line�
10.6� �In� testing� the� null� hypothesis� that� a� correlation� is� equal� to� 0,� the� critical� value�
decreases�as�α�decreases��True�or�false?
10.7� �If�the�variances�of�X�and�Y�are�increased,�but�their�covariance�remains�constant,�the�
value�of�rXY�will�be�unchanged��True�or�false?
10.8� �We�compute�rXY�=��50�for�a�sample�of�students�on�variables�X�and�Y��I�assert�that�if�
the�low-scoring�students�on�variable�X�are�removed,�then�the�new�value�of�rXY�would�
most�likely�be�less�than��50��Am�I�correct?
10.9� �Two�variables�are�linearly�related�such�that�there�is�a�perfect�relationship�between�X�
and�Y��I�assert�that�rXY�must�be�equal�to�either�+1�00�or�−1�00��Am�I�correct?
10.10� �If� the� number� of� credit� cards� owned� and� the� number� of� cars� owned� are� strongly
positively�correlated,�then�those�with�more�credit�cards�tend�to�own�more�cars��True�
or�false?
10.11� �If� the� number� of� credit� cards� owned� and� the� number� of� cars� owned� are� strongly
negatively� correlated,� then� those� with� more� credit� cards� tend� to� own� more� cars��
True�or�false?
10.12� �A�statistical�consultant�at�a�rival�university�found�the�correlation�between�GRE-Q�
scores� and� statistics� grades� to� be� +2�0�� I� assert� that� the� administration� should� be�
advised�to�congratulate�the�students�and�faculty�on�their�great�work�in�the�class-
room��Am�I�correct?
10.13� �If� X� correlates� significantly� with� Y,� then� X� is� necessarily� a� cause� of� Y�� True� or�
false?
10.14� �A�researcher�wishes�to�correlate�the�grade�students�earned�from�a�pass/fail�course�
(i�e�,�pass�or�fail)�with�their�cumulative�GPA��Which�is�the�most�appropriate�correla-
tion�coefficient�to�examine�this�relationship?
� a�� Pearson
� b�� Spearman’s�rho�or�Kendall’s�tau
� c�� Phi
� d�� None�of�the�above
10.15� �If�both�X�and�Y�are�ordinal�variables,�then�the�most�appropriate�measure�of�associa-
tion�is�the�Pearson��True�or�false?
289Bivariate Measures of Association
Computational problems
10.1� You�are�given�the�following�pairs�of�sample�scores�on�X�(number�of�credit�cards�in�
your�possession)�and�Y�(number�of�those�credit�cards�with�balances):
X Y
5 4
6 1
4 3
8 7
2 2
� a�� Graph�a�scatterplot�of�the�data�
� b�� Compute�the�covariance�
� c�� Determine�the�Pearson�product–moment�correlation�coefficient�
� d�� Determine�the�Spearman’s�rho�correlation�coefficient�
10.2� If� rXY� =� �17� for� a� random� sample� of� size� 84,� test� the� hypothesis� that� the� population�
Pearson�is�significantly�different�from�0�(conduct�a�two-tailed�test�at�the��05�level�of�
significance)�
10.3� If� rXY� =� �60� for� a� random� sample� of� size� 30,� test� the� hypothesis� that� the� population�
Pearson�is�significantly�different�from�0�(conduct�a�two-tailed�test�at�the��05�level�of�
significance)�
10.4� The�correlation�between�vocabulary�size�and�mother’s�age�is��50�for�12�rural�children�
and��85�for�17�inner-city�children��Does�the�correlation�for�rural�children�differ�from�
that�of�the�inner-city�children�at�the��05�level�of�significance?
10.5� You�are�given�the�following�pairs�of�sample�scores�on�X�(number�of�coins�in�posses-
sion)�and�Y�(number�of�bills�in�possession):
X Y
2 1
3 3
4 5
5 5
6 3
7 1
� a�� Graph�a�scatterplot�of�the�data�
� b�� Describe�the�relationship�between�X�and�Y�
� c�� What�do�you�think�the�Pearson�correlation�will�be?
290 An Introduction to Statistical Concepts
10.6� Six� adults� were� assessed� on� the� number� of� minutes� it� took� to� read� a� government�
report�(X)�and�the�number�of�items�correct�on�a�test�of�the�content�of�that�report�(Y)��
Use�the�following�data�to�determine�the�Pearson�correlation�and�the�effect�size�
X Y
10 17
8 17
15 13
12 16
14 15
16 12
10.7� Ten�kindergarten�children�were�observed�on�the�number�of�letters�written�in�proper�
form�(given�26�letters)�(X)�and�the�number�of�words�that�the�child�could�read�(given�
50�words)�(Y)��Use�the�following�data�to�determine�the�Pearson�correlation�and�the�
effect�size�
X Y
10 5
16 8
22 40
8 15
12 28
20 37
17 29
21 30
15 18
9 4
Interpretive problems
10.1� Select�two�interval/ratio�variables�from�the�survey�1�dataset�on�the�website��Use�SPSS�
to�generate�the�appropriate�correlation,�determine�statistical�significance,�interpret�the�
correlation�value�(including�interpretation�as�an�effect�size),�and�examine�and�inter-
pret�the�scatterplot�
10.2� Select� two� ordinal� variables� from� the� survey� 1� dataset� on� the� website�� Use� SPSS�
to�generate�the�appropriate�correlation,�determine�statistical�significance,�interpret�
the�correlation�value�(including�interpretation�as�an�effect�size),�and�examine�and�
interpret�the�scatterplot�
10.3� Select�one�ordinal�variable�and�one�interval/ratio�variable�from�the�survey�1�dataset�
on�the�website��Use�SPSS�to�generate�the�appropriate�correlation,�determine�statisti-
cal�significance,�interpret�the�correlation�value�(including�interpretation�as�an�effect�
size),�and�examine�and�interpret�the�scatterplot�
10.4� Select� one� dichotomous� variable� and� one� interval/ratio� variable� from� the� survey� 1�
dataset�on�the�website��Use�SPSS�to�generate�the�appropriate�correlation,�determine�
statistical�significance,�interpret�the�correlation�value�(including�interpretation�as�an�
effect�size),�and�examine�and�interpret�the�scatterplot�
291
11
One-Factor Analysis of Variance:
Fixed-Effects Model
Chapter Outline
11�1� Characteristics�of�the�One-Factor�ANOVA�Model
11�2� Layout�of�Data
11�3� ANOVA�Theory
� 11�3�1� General�Theory�and�Logic
� 11�3�2� Partitioning�the�Sums�of�Squares
� 11�3�3� ANOVA�Summary�Table
11�4� ANOVA�Model
� 11�4�1� Model
� 11�4�2� Estimation�of�the�Parameters�of�the�Model
� 11�4�3� Effect�Size�Measures,�Confidence�Intervals,�and�Power
� 11�4�4� Example
� 11�4�5� Expected�Mean�Squares
11�5� Assumptions�and�Violation�of�Assumptions
� 11�5�1� Independence
� 11�5�2� Homogeneity�of�Variance
� 11�5�3� Normality
11�6� Unequal�n’s�or�Unbalanced�Design
11�7� Alternative�ANOVA�Procedures
� 11�7�1� Kruskal–Wallis�Test
� 11�7�2� Welch,�Brown–Forsythe,�and�James�Procedures
11�8� SPSS�and�G*Power
11�9� Template�and�APA-Style�Write-Up
Key Concepts
� 1�� Between-�and�within-groups�variability
� 2�� Sources�of�variation
� 3�� Partitioning�the�sums�of�squares
� 4�� The�ANOVA�model
� 5�� Expected�mean�squares
292 An Introduction to Statistical Concepts
In�the�last�five�chapters,�our�discussion�has�dealt�with�various�inferential�statistics,�includ-
ing�inferences� about� means��The�next�six� chapters�are�concerned�with� different� analysis�
of�variance�(ANOVA)�models��In�this�chapter,�we�consider�the�most�basic�ANOVA�model,�
known� as� the� one-factor� ANOVA� model�� Recall� the� independent� t� test� from� Chapter� 7�
where� the� means� from� two� independent� samples� were� compared�� What� if� you� wish� to�
compare� more� than� two� means?� The� answer� is� to� use� the� analysis of variance�� At� this�
point,�you�may�be�wondering�why�the�procedure�is�called�the�analysis�of�variance�rather�
than�the�analysis�of�means,�because�the�intent�is�to�study�possible�mean�differences��One�
way�of�comparing�a�set�of�means�is�to�think�in�terms�of�the�variability�among�those�means��
If�the�sample�means�are�all�the�same,�then�the�variability�of�those�means�would�be�0��If�the�
sample�means�are�not�all�the�same,�then�the�variability�of�those�means�would�be�somewhat�
greater�than�0��In�general,�the�greater�the�mean�differences�are,�the�greater�is�the�variabil-
ity�of�the�means��Thus,�mean�differences�are�studied�by�looking�at�the�variability�of�the�
means;�hence,�the�term�analysis�of�variance�is�appropriate�rather�than�analysis�of�means�
(further�discussed�in�this�chapter)�
We� use� X� to� denote� our� single� independent variable,� which� we� typically� refer� to� as�
a� factor,� and� Y� to� denote� our� dependent� (or� criterion)� variable�� Thus,� the� one-factor�
ANOVA� is� a� bivariate,� or� two-variable,� procedure�� Our� interest� here� is� in� determin-
ing�whether�mean�differences�exist�on�the�dependent�variable��Stated�another�way,�the�
researcher� is� interested� in� the� influence� of� the� independent� variable� on� the� dependent�
variable�� For� example,� a� researcher� may� want� to� determine� the� influence� that� method�
of�instruction�has�on�statistics�achievement��The�independent�variable,�or�factor,�would�
be� method� of� instruction� and� the� dependent� variable� would� be� statistics� achievement��
Three� different� methods� of� instruction� that� might� be� compared� are� large� lecture� hall�
instruction,�small-group�instruction,�and�computer-assisted�instruction��Students�would�
be�randomly�assigned�to�one�of�the�three�methods�of�instruction�and�at�the�end�of�the�
semester�evaluated�as�to�their�level�of�achievement�in�statistics��These�results�would�be�of�
interest�to�a�statistics�instructor�in�determining�the�most�effective�method�of�instruction�
(where�“effective”�is�measured�by�student�performance�in�statistics)��Thus,�the�instructor�
may�opt�for�the�method�of�instruction�that�yields�the�highest�mean�achievement�
There� are� a� number� of� new� concepts� introduced� in� this� chapter� as� well� as� a� refresher�
of� concepts� that� have� been� covered� in� previous� chapters�� The� concepts� addressed� in�
this� chapter� include� the� following:� independent� and� dependent� variables;� between-� and�
within-groups�variability;�fixed�and�random�effects;�the�linear�model;�partitioning�of�the�
sums�of�squares;�degrees�of�freedom,�mean�square�terms,�and�F�ratios;�the�ANOVA�sum-
mary� table;� expected� mean� squares;� balanced� and� unbalanced� models;� and� alternative�
ANOVA�procedures��Our�objectives�are�that�by�the�end�of�this�chapter,�you�will�be�able�to�
(a)�understand�the�characteristics�and�concepts�underlying�a�one-factor�ANOVA,�(b)�gener-
ate�and�interpret�the�results�of�a�one-factor�ANOVA,�and�(c)�understand�and�evaluate�the�
assumptions�of�the�one-factor�ANOVA�
11.1 Characteristics of One-Factor ANOVA Model
We� have� been� following� Marie,� our� very� capable� educational� research� graduate� student,�
as�she�develops�her�statistical�skills��As�we�will�see,�Marie�is�embarking�on�a�very�exciting�
research�adventure�of�her�own�
293One-Factor Analysis of Variance: Fixed-Effects Model
Marie�is�enrolled�in�an�independent�study�class��As�part�of�the�course�requirement,�she�
has�to�complete�a�research�study��In�collaboration�with�the�statistics�faculty�in�her�pro-
gram,�Marie�designs�an�experimental�study�to�determine�if�there�is�a�mean�difference�
in�student�attendance�in�the�statistics�lab�based�on�the�attractiveness�of�the�statistics�lab�
instructor��Marie’s�research�question�is:�Is there a mean difference in the number of statistics
labs attended by students based on the attractiveness of the lab instructor?�Marie�determined�
that�a�one-way�ANOVA�was�the�best�statistical�procedure�to�use�to�answer�her�ques-
tion��Her�next�task�is�to�collect�and�analyze�the�data�to�address�her�research�question�
This� section� describes� the� distinguishing� characteristics� of� the� one-factor� ANOVA� model��
Suppose�you�are�interested�in�comparing�the�means�of�two�independent�samples��Here�the�
independent�t�test�would�be�the�method�of�choice�(or�perhaps�the�Welch�t′�test)��What�if�your�
interest�is�in�comparing�the�means�of�more�than�two�independent�samples?�One�possibility�
is�to�conduct�multiple�independent�t�tests�on�each�pair�of�means��For�example,�if�you�wished�
to�determine�whether�the�means�from�five�independent�samples�are�the�same,�you�could�do�
all�possible�pairwise�t�tests��In�this�case,�the�following�null�hypotheses�could�be�evaluated:�
μ1�=�μ2,�μ1�=�μ3,�μ1�=�μ4,�μ1�=�μ5,�μ2�=�μ3,�μ2�=�μ4,�μ2�=�μ5,�μ3�=�μ4,�μ3�=�μ5,�and�μ4�=�μ5��Thus,�we�
would�have�to�carry�out�10�different�independent�t�tests��The�number�of�possible�pairwise�
t�tests�that�could�be�done�for�J�means�is�equal�to�½[J(J�−�1)]�
Is� there� a� problem� in� conducting�so�many�t� tests?� Yes;� the� problem� has� to�do� with� the�
probability�of�making�a�Type�I�error�(i�e�,�α),�where�the�researcher�incorrectly�rejects�a�true�
null�hypothesis��Although�the�α�level�for�each�t�test�can�be�controlled�at�a�specified�nominal�
α�level�that�is�set�by�the�researcher,�say��05,�what�happens�to�the�overall�α�level�for�the�entire�
set�of�tests?�The�overall�α�level�for�the�entire�set�of�tests�(i�e�,�αtotal),�often�called�the�experi-
mentwise Type I error rate,�is�larger�than�the�α�level�for�each�of�the�individual�t�tests�
In�our�example,�we�are�interested�in�comparing�the�means�for�10�pairs�of�groups�(again,�
these�would�be�μ1�=�μ2,�μ1�=�μ3,�μ1�=�μ4,�μ1�=�μ5,�μ2�=�μ3,�μ2�=�μ4,�μ2�=�μ5,�μ3�=�μ4,�μ3�=�μ5,�and�
μ4�=�μ5)��A�t�test�is�conducted�for�each�of�the�10�pairs�of�groups�at�α�=��05��Although�each�
test�controls�the�α�level�at��05,�the�overall�α�level�will�be�larger�because�the�risk�of�a�Type�
I�error�accumulates�across�the�tests��For�each�test,�we�are�taking�a�risk;�the�more�tests�we�
do,�the�more�risks�we�are�taking��This�can�be�explained�by�considering�the�risk�you�take�
each�day�you�drive�your�car�to�school�or�work��The�risk�of�an�accident�is�small�for�any�1�day;�
however,�over�the�period�of�a�year,�the�risk�of�an�accident�is�much�larger�
For�C�independent�(or�orthogonal)�tests,�the�experimentwise�error�is�as�follows:
α αtotal = − −1 1( )
C
Assume�for�the�moment�that�our�10�tests�are�independent�(although�they�are�not�because�
within� those� 10� tests,� each� group� is� actually� being� compared� to� another� group� in� four�
different�instances)��If�we�go�ahead�with�our�10�t�tests�at�α�=��05,�then�the�experimentwise�
error�rate�is
α total = − − = − =1 1 05 1 60 40
10( . ) . .
Although�we�are�seemingly�controlling�our�α�level�at�the��05�level,�the�probability�of�making�a�
Type�I�error�across�all�10�tests�is��40��In�other�words,�in�the�long�run,�if�we�conduct�10�indepen-
dent�t�tests,�4�times�out�of�10,�we�will�make�a�Type�I�error��For�this�reason,�we�do�not�want�to�do�
294 An Introduction to Statistical Concepts
all�possible�t�tests��Before�we�move�on,�the�experimentwise�error�rate�for�C�dependent�tests�αtotal�
(which�would�be�the�case�when�doing�all�possible�pairwise�t�tests,�as�in�our�example)�is�more�
difficult�to�determine,�so�let�us�just�say�that
α α α≤ ≤total C
Are� there� other� options� available� to� us� where� we� can� maintain� better� control� over� our�
experimentwise�error�rate?�The�optimal�solution,�in�terms�of�maintaining�control�over�
our� overall� α� level� as� well� as� maximizing� power,� is� to� conduct� one� overall� test,� often�
called� an� omnibus test�� Recall� that� power� has� to� do� with� the� probability� of� correctly�
rejecting� a� false� null� hypothesis�� The� omnibus� test� could� assess� the� equality� of� all� of�
the�means�simultaneously�and�is�the�one�used�in�ANOVA��The�one-factor�ANOVA�then�
represents� an� extension� of� the� independent� t� test� for� two� or� more� independent� sample�
means,�where�the�experimentwise�error�rate�is�controlled�
In�addition,�the�one-factor�ANOVA�has�only�one�independent�variable�or�factor�with�two�
or� more� levels�� The� independent� variable� is� a� discrete� or� grouping� variable,� where� each�
subject� responds� to� only� one� level�� The� levels� represent� the� different� samples� or� groups�
or�treatments�whose�means�are�to�be�compared��In�our�example,�method�of�instruction�is�
the�independent�variable�with�three�levels:�large�lecture�hall,�small-group,�and�computer-
assisted�� There� are� two� ways� of� conceptually� thinking� about� the� selection� of� levels�� In�
the� fixed-effects� model,� all� levels� that� the� researcher� is� interested� in� are� included� in� the�
design� and� analysis� for� the� study�� As� a� result,� generalizations� can� only� be� made� about�
those�particular�levels�of�the�independent�variable�that�are�actually�selected��For�instance,�
if�a�researcher�is�only�interested�in�these�three�methods�of�instruction—large�lecture�hall,�
small-group,� and� computer-assisted—then� only� those� levels� are� incorporated� into� the�
study�� Generalizations� about� other� methods� of� instruction� cannot� be� made� because� no�
other�methods�were�considered�for�selection��Other�examples�of�fixed-effects�independent�
variables� might� be� SES,� gender,� specific� types� of� drug� treatment,� age� group,� weight,� or�
marital�status�
In�the�random-effects�model,�the�researcher�randomly�samples�some�levels�of�the�inde-
pendent�variable�from�the�population�of�levels��As�a�result,�generalizations�can�be�made�
about�all�of�the�levels�in�the�population,�even�those�not�actually�sampled��For�instance,�a�
researcher�interested�in�teacher�effectiveness�may�have�randomly�sampled�history�teach-
ers� (i�e�,� the� independent� variable)� from� the� population� of� history� teachers� in� a� particu-
lar�school�district��Generalizations�can�then�be�made�about�other�history�teachers�in�that�
school�district�not�actually�sampled��The�random�selection�of�levels�is�much�the�same�as�
the� random� selection� of� individuals� or� objects� in� the� random� sampling� process�� This� is�
the�nature�of�inferential�statistics,�where�inferences�are�made�about�a�population�(of�indi-
viduals,�objects,�or�levels)�from�a�sample��Other�examples�of�random-effects�independent�
variables� might� be� randomly� selected� classrooms,� types� of� medication,� animals,� or� time�
(e�g�,�hours,�days)��The�remainder�of�this�chapter�is�concerned�with�the�fixed-effects�model��
Chapter�15�discusses�the�random-effects�model�in�more�detail�
In�the�fixed-effects�model,�once�the�levels�of�the�independent�variable�are�selected,�sub-
jects�(i�e�,�persons�or�objects)�are�randomly�assigned�to�the�levels�of�the�independent�vari-
able��In�certain�situations,�the�researcher�does�not�have�control�over�which�level�a�subject�is�
assigned�to��The�groups�may�already�be�in�place�when�the�researcher�arrives�on�the�scene��
For�instance,�students�may�be�assigned�to�their�classes�at�the�beginning�of�the�year�by�the�
school�administration��Researchers�typically�have�little�input�regarding�class�assignments��
295One-Factor Analysis of Variance: Fixed-Effects Model
In� another� situation,� it� may� be� theoretically� impossible� to� assign� subjects� to� groups�� For�
example,�as�much�as�we�might�like,�researchers�cannot�randomly�assign�individuals�to�an�
age� level�� Thus,� a� distinction� needs� to� be� made� about� whether� or� not� the� researcher� can�
control� the� assignment� of� subjects� to� groups�� Although� the� analysis� will� not� be� altered,�
the�interpretation�of�the�results�will�be��When�researchers�have�control�over�group�assign-
ments,� the� extent� to� which� they� can� generalize� their� findings� is� greater� than� for� those�
researchers� who� do� not� have� such� control�� For� further� information� on� the� differences�
between�true experimental designs�(i�e�,�with�random�assignment)�and�quasi-experimental
designs� (i�e�,� without� random� assignment),� take� a� look� at� Campbell� and� Stanley� (1966),�
Cook�and�Campbell�(1979),�and�Shadish,�Cook,�and�Campbell�(2002)�
Moreover,�in�the�model�being�considered�here,�each�subject�