SPSS questions

please answer these questions “STATISTICAL QUESTIONS based of SPSS knowledge. 

Richard G. Lomax
The Ohio State University
Debbie L. Hahs-Vaughn
University of Central Florida

Don't use plagiarized sources. Get Your Custom Essay on
SPSS questions
Just from $13/Page
Order Essay

Routledge
Taylor & Francis Group
711 Third Avenue
New York, NY 10017
Routledge
Taylor & Francis Group
27 Church Road
Hove, East Sussex BN3 2FA
© 2012 by Taylor & Francis Group, LLC
Routledge is an imprint of Taylor & Francis Group, an Informa business
Printed in the United States of America on acid-free paper
Version Date: 20111003
International Standard Book Number: 978-0-415-88005-3 (Hardback)
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://
www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923,
978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For
organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Library of Congress Cataloging‑in‑Publication Data
Lomax, Richard G.
An introduction to statistical concepts / Richard G. Lomax, Debbie L. Hahs-Vaughn. — 3rd ed.
p. cm.
Includes bibliographical references and index.
ISBN 978-0-415-88005-3
1. Statistics. 2. Mathematical statistics. I. Hahs-Vaughn, Debbie L. II. Title.
QA276.12.L67 2012
519.5–dc23 2011035052
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the Psychology Press Web site at
http://www.psypress.com

http://www.copyright.com/

http://www.copyright.com/

http://www.taylorandfrancis.com

http://www.psypress.com

www.copyright.com

This book is dedicated to our families
and to all of our former students.

vii
Contents
Preface�������������������������������������������������������������������������������������������������������������������������������������������� xiii
Acknowledgments���������������������������������������������������������������������������������������������������������������������� xvii
1. Introduction������������������������������������������������������������������������������������������������������������������������������ 1
1�1� What�Is�the�Value�of�Statistics?������������������������������������������������������������������������������������ 3
1�2� Brief�Introduction�to�History�of�Statistics������������������������������������������������������������������� 4
1�3� General�Statistical�Definitions�������������������������������������������������������������������������������������� 5
1�4� Types�of�Variables���������������������������������������������������������������������������������������������������������� 7
1�5� Scales�of�Measurement�������������������������������������������������������������������������������������������������� 8
1�6� Summary����������������������������������������������������������������������������������������������������������������������� 13
Problems����������������������������������������������������������������������������������������������������������������������������������� 14
2. Data Representation�������������������������������������������������������������������������������������������������������������� 17
2�1� �Tabular�Display�of�Distributions������������������������������������������������������������������������������� 18
2�2� �Graphical�Display�of�Distributions��������������������������������������������������������������������������� 23
2�3� �Percentiles��������������������������������������������������������������������������������������������������������������������� 29
2�4� �SPSS�������������������������������������������������������������������������������������������������������������������������������� 33
2�5� �Templates�for�Research�Questions�and�APA-Style�Paragraph������������������������������ 41
2�6� �Summary����������������������������������������������������������������������������������������������������������������������� 42
Problems����������������������������������������������������������������������������������������������������������������������������������� 43
3. Univariate Population Parameters and Sample Statistics��������������������������������������������� 49
3�1� �Summation�Notation��������������������������������������������������������������������������������������������������� 50
3�2� Measures�of�Central�Tendency����������������������������������������������������������������������������������� 51
3�3� �Measures�of�Dispersion����������������������������������������������������������������������������������������������� 56
3�4� �SPSS�������������������������������������������������������������������������������������������������������������������������������� 65
3�5� �Templates�for�Research�Questions�and�APA-Style�Paragraph������������������������������ 69
3�6� �Summary����������������������������������������������������������������������������������������������������������������������� 70
Problems����������������������������������������������������������������������������������������������������������������������������������� 71
4. Normal Distribution and Standard Scores���������������������������������������������������������������������� 77
4�1� �Normal�Distribution���������������������������������������������������������������������������������������������������� 78
4�2� �Standard�Scores������������������������������������������������������������������������������������������������������������ 84
4�3� �Skewness�and�Kurtosis�Statistics������������������������������������������������������������������������������� 87
4�4� �SPSS�������������������������������������������������������������������������������������������������������������������������������� 91
4�5� �Templates�for�Research�Questions�and�APA-Style�Paragraph������������������������������ 98
4�6� �Summary����������������������������������������������������������������������������������������������������������������������� 99
Problems����������������������������������������������������������������������������������������������������������������������������������� 99
5. Introduction to Probability and Sample Statistics������������������������������������������������������� 105
5�1� �Brief�Introduction�to�Probability������������������������������������������������������������������������������ 106
5�2� �Sampling�and�Estimation����������������������������������������������������������������������������������������� 109
5�3� �Summary��������������������������������������������������������������������������������������������������������������������� 117
� Appendix:�Probability�That�at�Least�Two Individuals�Have�the�Same�Birthday�������� 117
Problems��������������������������������������������������������������������������������������������������������������������������������� 118

viii Contents
6. Introduction to Hypothesis Testing: Inferences About a Single Mean������������������� 121
6�1� Types�of�Hypotheses������������������������������������������������������������������������������������������������� 122
6�2� Types�of�Decision�Errors������������������������������������������������������������������������������������������� 124
6�3� Level�of�Significance�(α)��������������������������������������������������������������������������������������������� 127
6�4� Overview�of�Steps�in�Decision-Making�Process��������������������������������������������������� 129
6�5� Inferences�About�μ�When�σ�Is�Known�������������������������������������������������������������������� 130
6�6� Type�II�Error�(β)�and�Power�(1�−�β)��������������������������������������������������������������������������� 134
6�7� Statistical�Versus�Practical�Significance������������������������������������������������������������������ 138
6�8� Inferences�About�μ�When�σ�Is�Unknown��������������������������������������������������������������� 139
6�9� SPSS������������������������������������������������������������������������������������������������������������������������������ 145
6�10� G*Power����������������������������������������������������������������������������������������������������������������������� 149
6�11� Template�and�APA-Style�Write-Up�������������������������������������������������������������������������� 155
6�12� Summary��������������������������������������������������������������������������������������������������������������������� 156
Problems��������������������������������������������������������������������������������������������������������������������������������� 157
7. Inferences About the Difference Between Two Means����������������������������������������������� 163
7�1� �New�Concepts������������������������������������������������������������������������������������������������������������� 164
7�2� �Inferences�About�Two�Independent�Means����������������������������������������������������������� 166
7�3� �Inferences�About�Two�Dependent�Means�������������������������������������������������������������� 176
7�4� �SPSS������������������������������������������������������������������������������������������������������������������������������ 180
7�5� �G*Power����������������������������������������������������������������������������������������������������������������������� 192
7�6� �Template�and�APA-Style�Write-Up�������������������������������������������������������������������������� 195
7�7� �Summary��������������������������������������������������������������������������������������������������������������������� 198
Problems��������������������������������������������������������������������������������������������������������������������������������� 198
8. Inferences About Proportions������������������������������������������������������������������������������������������ 205
8�1� �Inferences�About�Proportions�Involving�Normal�Distribution�������������������������� 206
8�2� �Inferences�About�Proportions�Involving�Chi-Square�Distribution�������������������� 217
8�3� �SPSS������������������������������������������������������������������������������������������������������������������������������ 224
8�4� �G*Power����������������������������������������������������������������������������������������������������������������������� 231
8�5� �Template�and�APA-Style�Write-Up�������������������������������������������������������������������������� 234
8�6� �Summary��������������������������������������������������������������������������������������������������������������������� 236
Problems��������������������������������������������������������������������������������������������������������������������������������� 237
9. Inferences About Variances���������������������������������������������������������������������������������������������� 241
9�1� �New�Concepts������������������������������������������������������������������������������������������������������������� 242
9�2� �Inferences�About�Single�Variance���������������������������������������������������������������������������� 244
9�3� �Inferences�About�Two�Dependent�Variances��������������������������������������������������������� 246
9�4� Inferences�About�Two�or�More�Independent�Variances�(Homogeneity�
of Variance�Tests)�������������������������������������������������������������������������������������������������������� 248
9�5� �SPSS������������������������������������������������������������������������������������������������������������������������������ 252
9�6� �Template�and�APA-Style�Write-Up�������������������������������������������������������������������������� 253
9�7� �Summary��������������������������������������������������������������������������������������������������������������������� 253
Problems��������������������������������������������������������������������������������������������������������������������������������� 254

ixContents
10. Bivariate Measures of Association����������������������������������������������������������������������������������� 259
10�1� �Scatterplot������������������������������������������������������������������������������������������������������������������� 260
10�2� �Covariance������������������������������������������������������������������������������������������������������������������ 263
10�3� �Pearson�Product–Moment�Correlation�Coefficient����������������������������������������������� 265
10�4� �Inferences�About�Pearson�Product–Moment�Correlation�Coefficient���������������� 266
10�5� �Assumptions�and�Issues�Regarding�Correlations������������������������������������������������� 269
10�6� �Other�Measures�of�Association�������������������������������������������������������������������������������� 272
10�7� �SPSS������������������������������������������������������������������������������������������������������������������������������ 276
10�8� �G*Power����������������������������������������������������������������������������������������������������������������������� 283
10�9� �Template�and�APA-Style�Write-Up�������������������������������������������������������������������������� 286
10�10� �Summary��������������������������������������������������������������������������������������������������������������������� 287
Problems��������������������������������������������������������������������������������������������������������������������������������� 287
11. One-Factor Analysis of Variance: Fixed-Effects Model��������������������������������������������� 291
11�1� �Characteristics�of�One-Factor�ANOVA�Model������������������������������������������������������� 292
11�2� �Layout�of�Data������������������������������������������������������������������������������������������������������������ 296
11�3� �ANOVA�Theory���������������������������������������������������������������������������������������������������������� 296
11�4� �ANOVA�Model����������������������������������������������������������������������������������������������������������� 302
11�5� �Assumptions�and�Violation�of�Assumptions��������������������������������������������������������� 309
11�6� �Unequal�n’s�or�Unbalanced�Procedure������������������������������������������������������������������� 312
11�7� �Alternative�ANOVA�Procedures������������������������������������������������������������������������������ 312
11�8� �SPSS�and�G*Power������������������������������������������������������������������������������������������������������ 313
11�9� �Template�and�APA-Style�Write-Up�������������������������������������������������������������������������� 334
11�10� �Summary��������������������������������������������������������������������������������������������������������������������� 336
Problems��������������������������������������������������������������������������������������������������������������������������������� 336
12. Multiple Comparison Procedures������������������������������������������������������������������������������������ 341
12�1� �Concepts�of�Multiple�Comparison�Procedures������������������������������������������������������ 342
12�2� �Selected�Multiple�Comparison�Procedures������������������������������������������������������������ 348
12�3� �SPSS������������������������������������������������������������������������������������������������������������������������������ 362
12�4� �Template�and�APA-Style�Write-Up�������������������������������������������������������������������������� 366
12�5� �Summary��������������������������������������������������������������������������������������������������������������������� 366
Problems��������������������������������������������������������������������������������������������������������������������������������� 367
13. Factorial Analysis of Variance: Fixed-Effects Model��������������������������������������������������� 371
13�1� �Two-Factor�ANOVA�Model��������������������������������������������������������������������������������������� 372
13�2� �Three-Factor�and�Higher-Order�ANOVA��������������������������������������������������������������� 390
13�3� �Factorial�ANOVA�With�Unequal�n’s������������������������������������������������������������������������ 393
13�4� �SPSS�and�G*Power������������������������������������������������������������������������������������������������������ 395
13�5� �Template�and�APA-Style�Write-Up�������������������������������������������������������������������������� 417
13�6� �Summary��������������������������������������������������������������������������������������������������������������������� 419
Problems��������������������������������������������������������������������������������������������������������������������������������� 420
14. Introduction to Analysis of Covariance: One- Factor Fixed-Effects Model
With Single Covariate��������������������������������������������������������������������������������������������������������� 427
14�1� �Characteristics�of�the�Model������������������������������������������������������������������������������������� 428
14�2� �Layout�of�Data������������������������������������������������������������������������������������������������������������ 431
14�3� �ANCOVA�Model��������������������������������������������������������������������������������������������������������� 431

x Contents
14�4� �ANCOVA�Summary�Table���������������������������������������������������������������������������������������� 432
14�5� �Partitioning�the�Sums�of�Squares���������������������������������������������������������������������������� 433
14�6� �Adjusted�Means�and�Related�Procedures�������������������������������������������������������������� 434
14�7� �Assumptions�and�Violation�of�Assumptions��������������������������������������������������������� 436
14�8� �Example����������������������������������������������������������������������������������������������������������������������� 441
14�9� �ANCOVA�Without�Randomization������������������������������������������������������������������������� 443
14�10� �More�Complex�ANCOVA�Models���������������������������������������������������������������������������� 444
14�11� �Nonparametric�ANCOVA�Procedures�������������������������������������������������������������������� 444
14�12� �SPSS�and�G*Power������������������������������������������������������������������������������������������������������ 445
14�13� �Template�and�APA-Style�Paragraph������������������������������������������������������������������������ 469
14�14� �Summary��������������������������������������������������������������������������������������������������������������������� 471
Problems��������������������������������������������������������������������������������������������������������������������������������� 471
15. Random- and Mixed-Effects Analysis of Variance Models��������������������������������������� 477
15�1� �One-Factor�Random-Effects�Model������������������������������������������������������������������������� 478
15�2� �Two-Factor�Random-Effects�Model������������������������������������������������������������������������� 483
15�3� �Two-Factor�Mixed-Effects�Model����������������������������������������������������������������������������� 488
15�4� �One-Factor�Repeated�Measures�Design������������������������������������������������������������������ 493
15�5� �Two-Factor�Split-Plot�or�Mixed�Design������������������������������������������������������������������� 500
15�6� �SPSS�and�G*Power������������������������������������������������������������������������������������������������������ 508
15�7� �Template�and�APA-Style�Write-Up�������������������������������������������������������������������������� 548
15�8� �Summary��������������������������������������������������������������������������������������������������������������������� 551
Problems��������������������������������������������������������������������������������������������������������������������������������� 551
16. Hierarchical and Randomized Block Analysis of Variance Models������������������������ 557
16�1� �Two-Factor�Hierarchical�Model������������������������������������������������������������������������������� 558
16�2� �Two-Factor�Randomized�Block�Design�for�n�=�1��������������������������������������������������� 566
16�3� �Two-Factor�Randomized�Block�Design�for�n�>�1��������������������������������������������������� 574
16�4� �Friedman�Test������������������������������������������������������������������������������������������������������������� 574
16�5� �Comparison�of�Various�ANOVA�Models��������������������������������������������������������������� 575
16�6� �SPSS������������������������������������������������������������������������������������������������������������������������������ 576
16�7� �Template�and�APA-Style�Write-Up�������������������������������������������������������������������������� 603
16�8� �Summary��������������������������������������������������������������������������������������������������������������������� 605
Problems��������������������������������������������������������������������������������������������������������������������������������� 605
17. Simple Linear Regression�������������������������������������������������������������������������������������������������� 611
17�1� �Concepts�of�Simple�Linear�Regression������������������������������������������������������������������� 612
17�2� �Population�Simple�Linear�Regression�Model��������������������������������������������������������� 614
17�3� �Sample�Simple�Linear�Regression�Model��������������������������������������������������������������� 615
17�4� �SPSS������������������������������������������������������������������������������������������������������������������������������ 634
17�5� �G*Power����������������������������������������������������������������������������������������������������������������������� 647
17�6� �Template�and�APA-Style�Write-Up�������������������������������������������������������������������������� 650
17�7� �Summary��������������������������������������������������������������������������������������������������������������������� 652
Problems��������������������������������������������������������������������������������������������������������������������������������� 652

xiContents
18. Multiple Regression������������������������������������������������������������������������������������������������������������ 657
18�1� Partial�and�Semipartial�Correlations���������������������������������������������������������������������� 658
18�2� Multiple�Linear�Regression�������������������������������������������������������������������������������������� 661
18�3� Methods�of�Entering�Predictors������������������������������������������������������������������������������� 676
18�4� Nonlinear�Relationships������������������������������������������������������������������������������������������� 679
18�5� Interactions����������������������������������������������������������������������������������������������������������������� 680
18�6� Categorical�Predictors����������������������������������������������������������������������������������������������� 680
18�7� SPSS������������������������������������������������������������������������������������������������������������������������������ 682
18�8� G*Power����������������������������������������������������������������������������������������������������������������������� 698
18�9� Template�and�APA-Style�Write-Up�������������������������������������������������������������������������� 701
18�10� Summary��������������������������������������������������������������������������������������������������������������������� 703
Problems��������������������������������������������������������������������������������������������������������������������������������� 704
19. Logistic Regression������������������������������������������������������������������������������������������������������������� 709
19�1� �How�Logistic�Regression�Works������������������������������������������������������������������������������ 710
19�2� �Logistic�Regression�Equation����������������������������������������������������������������������������������� 711
19�3� �Estimation�and�Model�Fit����������������������������������������������������������������������������������������� 715
19�4� �Significance�Tests������������������������������������������������������������������������������������������������������� 716
19�5� �Assumptions�and�Conditions���������������������������������������������������������������������������������� 721
19�6� �Effect�Size�������������������������������������������������������������������������������������������������������������������� 725
19�7� �Methods�of�Predictor�Entry�������������������������������������������������������������������������������������� 726
19�8� �SPSS������������������������������������������������������������������������������������������������������������������������������ 727
19�9� �G*Power����������������������������������������������������������������������������������������������������������������������� 746
19�10� �Template�and�APA-Style�Write-Up�������������������������������������������������������������������������� 749
19�11� �What�Is�Next?�������������������������������������������������������������������������������������������������������������� 751
19�12� �Summary��������������������������������������������������������������������������������������������������������������������� 752
Problems��������������������������������������������������������������������������������������������������������������������������������� 752
Appendix: Tables������������������������������������������������������������������������������������������������������������������������ 757
References������������������������������������������������������������������������������������������������������������������������������������ 783
Odd-Numbered Answers to Problems����������������������������������������������������������������������������������� 793
Author Index�������������������������������������������������������������������������������������������������������������������������������� 809
Subject Index������������������������������������������������������������������������������������������������������������������������������� 813

xiii
Preface
Approach
We�know,�we�know!�We’ve�heard�it�a�million�times�before��When�you�hear�someone�at�a�
party�mention�the�word�statistics�or�statistician,�you�probably�say�“I�hate�statistics”�and�turn�
the�other�cheek��In�the�many�years�that�we�have�been�in�the�field�of�statistics,�it�is�extremely�
rare� when� someone� did� not� have� that� reaction�� Enough� is� enough�� With� the� help� of� this�
text,�we�hope�that�“statistics�hating”�will�become�a�distant�figment�of�your�imagination�
As�the�title�suggests,�this�text�is�designed�for�a�course�in�statistics�for�students�in�educa-
tion� and� the� behavioral� sciences�� We� begin� with� the� most� basic� introduction� to�statistics�
in� the� first� chapter� and� proceed� through� intermediate� statistics�� The� text� is� designed� for�
you�to�become�a�better�prepared�researcher�and�a�more�intelligent�consumer�of�research��
We�do�not�assume�that�you�have�extensive�or�recent�training�in�mathematics��Many�of�you�
have�only�had�algebra,�perhaps�some�time�ago��We�also�do�not�assume�that�you�have�ever�
had�a�statistics�course��Rest�assured;�you�will�do�fine�
We�believe�that�a�text�should�serve�as�an�effective�instructional�tool��You�should�find�this�
text�to�be�more�than�a�reference�book;�you�might�actually�use�it�to�learn�statistics��(What�an�
oxymoron�that�a�statistics�book�can�actually�teach�you�something�)�This�text�is�not�a�theo-
retical�statistics�book,�nor�is�it�a�cookbook�on�computing�statistics�or�a�statistical�software�
manual�� Recipes� have� to� be� memorized;� consequently,� you� tend� not� to� understand� how�
or�why�you�obtain�the�desired�product��As�well,�knowing�how�to�run�a�statistics�package�
without� understanding� the� concepts� or� the� output� is� not� particularly� useful�� Thus,� con-
cepts�drive�the�field�of�statistics�
Goals and Content Coverage
Our�goals�for�this�text�are�lofty,�but�the�effort�and�its�effects�will�be�worthwhile��First,�the�
text�provides�a�comprehensive�coverage�of�topics�that�could�be�included�in�an�undergradu-
ate�or�graduate�one-�or�two-course�sequence�in�statistics��The�text�is�flexible�enough�so�that�
instructors�can�select�those�topics�that�they�desire�to�cover�as�they�deem�relevant�in�their�
particular�discipline��In�other�words,�chapters�and�sections�of�chapters�from�this�text�can�
be�included�in�a�statistics�course�as�the�instructor�sees�fit��Most�of�the�popular�as�well�as�
many�of�the�lesser-known�procedures�and�models�are�described�in�the�text��A�particular�
feature�is�a�thorough�and�up-to-date�discussion�of�assumptions,�the�effects�of�their�viola-
tion,�and�how�to�deal�with�their�violation�
The�first�five�chapters�of�the�text�cover�basic�descriptive�statistics,�including�ways�of�repre-
senting�data�graphically,�statistical�measures�which�describe�a�set�of�data,�the�normal�distri-
bution�and�other�types�of�standard�scores,�and�an�introduction�to�probability�and�sampling��

xiv Preface
The�remainder�of�the�text�covers�different�inferential�statistics��In�Chapters�6�through�10,�we�
deal�with�different�inferential�tests�involving�means�(e�g�,�t�tests),�proportions,�variances,�and�
correlations��In�Chapters�11�through�16,�all�of�the�basic�analysis�of�variance�(ANOVA)�models�
are�considered��Finally,�in�Chapters�17�through�19�we�examine�various�regression�models�
Second,�the�text�communicates�a�conceptual,�intuitive�understanding�of�statistics,�which�
requires� only� a� rudimentary� knowledge� of� basic� algebra� and� emphasizes� the� important�
concepts�in�statistics��The�most�effective�way�to�learn�statistics�is�through�the�conceptual�
approach��Statistical�concepts�tend�to�be�easy�to�learn�because�(a)�concepts�can�be�simply�
stated,�(b)�concepts�can�be�made�relevant�through�the�use�of�real-life�examples,�(c)�the�same�
concepts�are�shared�by�many�procedures,�and�(d)�concepts�can�be�related�to�one�another�
This�text�will�help�you�to�reach�these�goals��The�following�indicators�will�provide�some�
feedback�as�to�how�you�are�doing��First,�there�will�be�a�noticeable�change�in�your�attitude�
toward� statistics�� Thus,� one� outcome� is� for� you� to� feel� that� “statistics� is� not� half� bad,”� or�
“this� stuff� is� OK�”� Second,� you� will� feel� comfortable� using� statistics� in� your� own� work��
Finally,�you�will�begin�to�“see�the�light�”�You�will�know�when�you�have�reached�this�high-
est�stage�of�statistics�development�when�suddenly,�in�the�middle�of�the�night,�you�wake�up�
from�a�dream�and�say,�“now�I�get�it!”�In�other�words,�you�will�begin�to�think�statistics�rather�
than�think�of�ways�to�get�out�of�doing�statistics�
Pedagogical Tools
The�text�contains�several�important�pedagogical�features�to�allow�you�to�attain�these�goals��
First,�each�chapter�begins�with�an�outline�(so�you�can�anticipate�what�will�be�covered),�and�
a�list�of�key�concepts�(which�you�will�need�in�order�to�understand�what�you�are�doing)��
Second,�realistic�examples�from�education�and�the�behavioral�sciences�are�used�to�illustrate�
the�concepts�and�procedures�covered�in�each�chapter��Each�of�these�examples�includes�an�
initial� vignette,� an� examination� of� the� relevant� procedures� and� necessary� assumptions,�
how�to�run�SPSS�and�develop�an�APA�style�write-up,�as�well�as�tables,�figures,�and�anno-
tated�SPSS�output�to�assist�you��Third,�the�text�is�based�on�the�conceptual�approach��That�
is,�material�is�covered�so�that�you�obtain�a�good�understanding�of�statistical�concepts��If�
you� know� the� concepts,� then� you� know� statistics�� Finally,� each� chapter� ends� with� three�
sets�of�problems,�computational,�conceptual,�and�interpretive��Pay�particular�attention�to�
the� conceptual� problems� as� they� provide� the� best� assessment� of� your� understanding� of�
the�concepts�in�the�chapter��We�strongly�suggest�using�the�example�data�sets�and�the�com-
putational� and� interpretive� problems� for� additional� practice� through� available� statistics�
software��This�will�serve�to�reinforce�the�concepts�covered��Answers�to�the�odd-numbered�
problems�are�given�at�the�end�of�the�text�
New to This Edition
A� number� of� changes� have� been� made� in� the� third� edition� based� on� the� suggestions�
of�reviewers,�instructors,�teaching�assistants,�and�students��These�improvements�have�
been� made� in� order� to� better� achieve� the� goals� of� the� text�� You� will� note� the� addition�
of� a� coauthor� to� this� edition,� Debbie� Hahs-Vaughn,� who� has� contributed� greatly� to�

xvPreface
the�further�development�of�this�text��The�changes�include�the�following:�(a)�additional�
end�of�chapter�problems�have�been�included;�(b)�more�information�on�power�has�been�
added,�particularly�use�of�the�G*Power�software�with�screenshots;�(c)�content�has�been�
updated�and�numerous�additional�references�have�been�provided;�(d)�the�final�chapter�
on� logistic� regression� has� been� added� for� a� more� complete� presentation� of� regression�
models;�(e)�numerous�SPSS�(version�19)�screenshots�on�statistical�techniques�and�their�
assumptions�have�been�included�to�assist�in�the�generation�and�interpretation�of�output;�
(f)�more�information�has�been�added�to�most�chapters�on�SPSS;�(g)�research�vignettes�
and�templates�have�been�added�to�the�beginning�and�end�of�each�chapter,�respectively;�
(h)�a�discussion�of�expected�mean�squares�has�been�folded�into�the�analysis�of�variance�
chapters�to�provide�a�rationale�for�the�formation�of�proper�F�ratios;�and�(i)�a�website�for�
the�text�that�provides�students�and�instructors�access�to�detailed�solutions�to�the�book’s�
odd-numbered�problems;�chapter�outlines;�lists�of�key�terms�for�each�chapter;�and�SPSS�
datasets� that� correspond� to� the� chapter� examples� and� end-of-chapter� problems� that�
can�be�used�in�SPSS�and�other�packages�such�as�SAS,�HLM,�STATA,�and�LISREL��Only�
instructors� are� granted� access� to� the� PowerPoint� slides� for� each� chapter� that� include�
examples� and� APA� style� write� ups,� chapter� outlines,� and� key� terms;� multiple-choice�
(approximately�25�for�each�chapter)�and�short�answer�(approximately�5�for�each�chapter)�
test� questions;� and� answers� to� the� even-numbered� problems�� This� material� is� available� at:�
http://www�psypress�com/an-introduction-to-statistical-concepts-9780415880053�

http://www.psypress.com/an-introduction-to-statistical-concepts-9780415880053.

xvii
Acknowledgments
There� are� many� individuals� whose� assistance� enabled� the� completion� of� this� book�� We�
would� like� to� thank� the� following� individuals� whom� we� studied� with� in� school:� Jamie�
Algina,� Lloyd� Bond,� Amy� Broeseker,� Jim� Carlson,� Bill� Cooley,� Judy� Giesen,� Brian� Gray,�
Harry�Hsu,�Mary�Nell�McNeese,�Camille�Ogden,�Lou�Pingel,�Rod�Roth,�Charles�Stegman,�
and�Neil�Timm��Next,�numerous�colleagues�have�played�an�important�role�in�our�personal�
and� professional� lives� as� statisticians�� Rather� than� include� an� admittedly� incomplete�
listing,�we�just�say�“thank�you”�to�all�of�you��You�know�who�you�are�
Thanks� also� to� all� of� the� wonderful� people� at� Lawrence� Erlbaum� Associates� (LEA),� in�
particular,�to�Ray�O’Connell�for�inspiring�this�project�back�in�1986,�and�to�Debra�Riegert�
(formerly� at� LEA� and� now� at� Routledge)� for� supporting� the� development� of� subsequent�
texts� and� editions�� We� are� most� appreciative� of� the� insightful� suggestions� provided� by�
the� reviewers� of� this� text� over� the� years,� and� in� particular� the� reviewers� of� this� edition:�
Robert�P��Conti,�Sr��(Mount�Saint�Mary�College),�Feifei�Ye�(University�of�Pittsburgh),�Nan�
Thornton� (Capella� University),� and� one� anonymous� reviewer�� A� special� thank� you� to�
all�of�the�terrific�students�that�we�have�had�the�pleasure�of�teaching�at�the�University�of�
Pittsburgh,�the�University�of�Illinois–Chicago,�Louisiana�State�University,�Boston�College,�
Northern� Illinois� University,� the� University� of� Alabama,� The� Ohio� State� University,� and�
the� University� of� Central� Florida�� For� all� of� your� efforts,� and� the� many� lights� that� you�
have�seen�and�shared�with�us,�this�book�is�for�you��We�are�most�grateful�to�our�families,�
in�particular�to�Lea�and�Kristen,�and�to�Mark�and�Malani��It�is�because�of�your�love�and�
understanding�that�we�were�able�to�cope�with�such�a�major�project��Thank�you�one�and�all�
Richard G. Lomax
Debbie L. Hahs-Vaughn

1
1
Introduction
Chapter Outline
1�1� What�Is�the�Value�of�Statistics?
1�2� Brief�Introduction�to�History�of�Statistics
1�3� General�Statistical�Definitions
1�4� Types�of�Variables
1�5� Scales�of�Measurement
1�5�1� Nominal�Measurement�Scale
1�5�2� Ordinal�Measurement�Scale
1�5�3� Interval�Measurement�Scale
1�5�4� Ratio�Measurement�Scale
Key Concepts
� 1�� General�statistical�concepts
Population
Parameter
Sample
Statistic
Descriptive�statistics
Inferential�statistics
� 2�� Variable-related�concepts
Variable
Constant
Categorical
Dichotomous�variables
Numerical
Discrete�variables
Continuous�variables

2 An Introduction to Statistical Concepts
� 3�� Measurement�scale�concepts
Measurement
Nominal
Ordinal
Interval
Ratio
We�want�to�welcome�you�to�the�wonderful�world�of�statistics��More�than�ever,�statistics�are�
everywhere��Listen�to�the�weather�report�and�you�hear�about�the�measurement�of�variables�
such�as�temperature,�rainfall,�barometric�pressure,�and�humidity��Watch�a�sporting�event�
and�you�hear�about�batting�averages,�percentage�of�free�throws�completed,�and�total�rush-
ing�yardage��Read�the�financial�page�and�you�can�track�the�Dow�Jones�average,�the�gross�
national�product,�and�bank�interest�rates��Turn�to�the�entertainment�section�to�see�movie�
ratings,�movie�revenue,�or�the�top�10�best-selling�novels��These�are�just�a�few�examples�of�
statistics�that�surround�you�in�every�aspect�of�your�life�
Although�you�may�be�thinking�that�statistics�is�not�the�most�enjoyable�subject�on�the�planet,�
by�the�end�of�this�text,�you�will�(a)�have�a�more�positive�attitude�about�statistics,�(b)�feel�more�
comfortable�using�statistics,�and�thus�be�more�likely�to�perform�your�own�quantitative�data�
analyses,�and�(c)�certainly�know�much�more�about�statistics�than�you�do�now��In�other�words,�
our�goal�is�to�equip�you�with�the�skills�you�need�to�be�both�a�better�consumer�and�producer�of�
research��But�be�forewarned;�the�road�to�statistical�independence�is�not�easy��However,�we�will�
serve�as�your�guides�along�the�way��When�the�going�gets�tough,�we�will�be�there�to�help�you�
with�advice�and�numerous�examples�and�problems��Using�the�powers�of�logic,�mathematical�
reasoning,�and�statistical�concept�knowledge,�we�will�help�you�arrive�at�an�appropriate�solu-
tion�to�the�statistical�problem�at�hand�
Some�students�begin�their�first�statistics�class�with�some�anxiety��This�could�be�caused�
by�not�having�had�a�quantitative�course�for�some�time,�apprehension�built�up�by�delaying�
taking�statistics,�a�poor�past�instructor�or�course,�or�less�than�adequate�past�success��Let�
us�offer�a�few�suggestions�along�these�lines��First,�this�is�not�a�math�class�or�text��If�you�
want�one�of�those,�then�you�need�to�walk�over�to�the�math�department��This�is�a�course�
and�text�on�the�application�of�statistics�to�education�and�the�behavioral�sciences��Second,�
the�philosophy�of�the�text�is�on�the�understanding�of�concepts�rather�than�on�the�deriva-
tion�of�statistical�formulas��It�is�more�important�to�understand�concepts�than�to�derive�or�
memorize�various�and�sundry�formulas��If�you�understand�the�concepts,�you�can�always�
look�up�the�formulas�if�need�be��If�you�do�not�understand�the�concepts,�then�knowing�the�
formulas�will�only�allow�you�to�operate�in�a�cookbook�mode�without�really�understanding�
what� you� are� doing�� Third,� the� calculator� and� computer� are� your� friends�� These� devices�
are�tools�that�allow�you�to�complete�the�necessary�computations�and�obtain�the�results�of�
interest��If�you�are�performing�hand�computations,�find�a�calculator�that�you�are�comfort-
able�with;�it�need�not�have�800�functions,�as�the�four�basic�operations�and�sum�and�square�
root� functions� are� sufficient� (one� of� our� personal� calculators� is� one� of� those� little� credit�
card�calculators,�although�we�often�use�the�calculator�on�our�computers)��If�you�are�using�
a� statistical� software� program,� find� one� that� you� are� comfortable� with� (most� instructors�
will�have�you�using�a�program�such�as�SPSS,�SAS,�or�Statistica)��In�this�text,�we�use�SPSS�
to�illustrate�statistical�applications��Finally,�this�text�will�take�you�from�raw�data�to�results�
using�realistic�examples��These�can�then�be�followed�up�using�the�problems�at�the�end�of�
each�chapter��Thus,�you�will�not�be�on�your�own�but�will�have�the�text,�a�computer/calculator,�
as�well�as�your�course�and�instructor,�to�help�guide�you�

3Introduction
The�intent�and�philosophy�of�this�text�is�to�be�conceptual�and�intuitive�in�nature��Thus,�the�
text�does�not�require�a�high�level�of�mathematics�but�rather�emphasizes�the�important�con-
cepts�in�statistics��Most�statistical�concepts�really�are�fairly�easy�to�learn�because�(a)�concepts�
can�be�simply�stated,�(b)�concepts�can�be�related�to�real-life�examples,�(c)�many�of�the�same�
concepts�run�through�much�of�statistics,�and�therefore,�(d)�many�concepts�can�be�related�
In� this� introductory� chapter,� we� describe� the� most� basic� statistical� concepts�� We� begin�
with� the� question,� “What� is� the� value� of� statistics?”� We� then� look� at� a� brief� history� of�
statistics� by� mentioning� a� few� of� the� more� important� and� interesting� statisticians�� Then�
we�consider�the�concepts�of�population,�parameter,�sample�and�statistic,�descriptive�and�
inferential�statistics,�types�of�variables,�and�scales�of�measurement��Our�objectives�are�that�
by�the�end�of�this�chapter,�you�will�(a)�have�a�better�sense�of�why�statistics�are�necessary,�
(b)�see�that�statisticians�are�an�interesting�group�of�people,�and�(c)�have�an�understanding�
of�several�basic�statistical�concepts�
1.1 What Is the Value of Statistics?
Let�us�start�off�with�a�reasonable�rhetorical�question:�why�do�we�need�statistics?�In�other�
words,�what�is�the�value�of�statistics,�either�in�your�research�or�in�your�everyday�life?�As�a�
way�of�thinking�about�these�questions,�consider�the�following�headlines,�which�have�prob-
ably�appeared�in�your�local�newspaper�
Cigarette Smoking Causes Cancer—Tobacco Industry Denies Charges
A� study� conducted� at� Ivy-Covered� University� Medical� School,� recently� published� in�
the� New England Journal of Medicine,� has� definitively� shown� that� cigarette� smoking�
causes�cancer��In�interviews�with�100�randomly�selected�smokers�and�nonsmokers�over�
50 years�of�age,�30%�of�the�smokers�have�developed�some�form�of�cancer,�while�only�
10%� of� the� nonsmokers� have� cancer�� “The� higher� percentage� of� smokers� with� cancer�
in� our� study� clearly� indicates� that� cigarettes� cause� cancer,”� said� Dr�� Jason� P�� Smythe��
On� the� contrary,� “this� study� doesn’t� even� suggest� that� cigarettes� cause� cancer,”� said�
tobacco�lobbyist�Cecil�B��Hacker��“Who�knows�how�these�folks�got�cancer;�maybe�it�is�
caused�by�the�aging�process�or�by�the�method�in�which�individuals�were�selected�for�
the�interviews,”�Mr��Hacker�went�on�to�say�
North Carolina Congressional Districts
Gerrymandered—African-Americans Slighted
A�study�conducted�at�the�National�Center�for�Legal�Research�indicates�that�congressio-
nal�districts�in�the�state�of�North�Carolina�have�been�gerrymandered�to�minimize�the�
impact�of�the�African-American�vote��“From�our�research,�it�is�clear�that�the�districts�
are�apportioned�in�a�racially�biased�fashion��Otherwise,�how�could�there�be�no�single�
district� in� the� entire� state� which� has� a� majority� of� African-American� citizens� when�
over� 50%� of� the� state’s� population� is� African-American�� The� districting� system� abso-
lutely�has�to�be�changed,”�said�Dr��I��M��Researcher��A�spokesman�for�The�American�
Bar�Association�countered�with�the�statement�“according�to�a�decision�rendered�by�the�
United�States�Supreme�Court�in�1999�(No��98-85),�intent�or�motive�must�be�shown�for�
racial�bias�to�be�shown�in�the�creation�of�congressional�districts��The�decision�states�a�

4 An Introduction to Statistical Concepts
‘facially�neutral�law�…�warrants�strict�scrutiny�only�if�it�can�be�proved�that�the�law�was�
motivated�by�a�racial�purpose�or�object�’�The�data�in�this�study�do�not�show�intent�or�
motive��To�imply�that�these�data�indicate�racial�bias�is�preposterous�”
Global Warming—Myth According to the President
Research�conducted�at�the�National�Center�for�Global�Warming�(NCGW)�has�shown�
the�negative�consequences�of�global�warming�on�the�planet�Earth��As�summarized�by�
Dr��Noble�Pryze,�“our�studies�at�NCGW�clearly�demonstrate�that�if�global�warming�is�
not�halted�in�the�next�20�years,�the�effects�on�all�aspects�of�our�environment�and�cli-
matology�will�be�catastrophic�”�A�different�view�is�held�by�U�S��President�Harold�W��
Tree��He�stated�in�a�recent�address�that�“the�scientific�community�has�not�convinced�
him�that�global�warming�even�exists��Why�should�our�administration�spend�millions�
of�dollars�on�an�issue�that�has�not�been�shown�to�be�a�real�concern?”
How� is� one� to� make� sense� of� the� studies� described� by� these� headlines?� How� is� one� to�
decide�which� side�of�the�issue�these�data�support,�so�as�to�take�an�intellectual� stand?�In�
other�words,�do�the�interview�data�clearly�indicate�that�cigarette�smoking�causes�cancer?�
Do� the� congressional� district� percentages� of� African-Americans� necessarily� imply� that�
there�is�racial�bias?�Have�scientists�convinced�us�that�global�warming�is�a�problem?�These�
studies�are�examples�of�situations�where�the�appropriate�use�of�statistics�is�clearly�neces-
sary��Statistics�will�provide�us�with�an�intellectually�acceptable�method�for�making�deci-
sions�in�such�matters��For�instance,�a�certain�type�of�research,�statistical�analysis,�and�set�
of� results� are� all� necessary� to� make� causal� inferences� about� cigarette� smoking�� Another�
type�of�research,�statistical�analysis,�and�set�of�results�are�all�necessary�to�lead�one�to�con-
fidently�state�that�the�districting�system�is�racially�biased�or�not,�or�that�global�warming�
needs�to�be�dealt�with��The�bottom�line�is�that�the�purpose�of�statistics,�and�thus�of�this�
text,�is�to�provide�you�with�the�tools�to�make�important�decisions�in�an�appropriate�and�
confident�manner��You�will�not�have�to�trust�a�statement�made�by�some�so-called�expert�on�
an�issue,�which�may�or�may�not�have�any�empirical�basis�or�validity;�you�can�make�your�
own�judgments�based�on�the�statistical�analyses�of�data��For�you,�the�value�of�statistics�can�
include�(a)�the�ability�to�read�and�critique�articles�in�both�professional�journals�and�in�the�
popular� press� and� (b)� the� ability� to� conduct� statistical� analyses� for� your� own� research�
(e�g�,�thesis�or�dissertation)�
1.2 Brief Introduction to History of Statistics
As�a�way�of�getting�to�know�the�topic�of�statistics,�we�want�to�briefly�introduce�you�to�a�
few� famous� statisticians�� The� purpose� of� this� section� is� not� to� provide� a� comprehensive�
history�of�statistics,�as�those�already�exist�(e�g�,�Heyde,�Seneta,�Crepel,�Fienberg,�&�Gani,�
2001;�Pearson,�1978;�Stigler,�1986)��Rather,�the�purpose�of�this�section�is�to�show�that�famous�
statisticians�not�only�are�interesting�but�are�human�beings�just�like�you�and�me�
One� of� the� fathers� of� probability� (see� Chapter� 5)� is� acknowledged� to� be� Blaise� Pascal�
from� the� late� 1600s�� One� of� Pascal’s� contributions� was� that� he� worked� out� the� probabili-
ties� for� each� dice� roll� in� the� game� of� craps,� enabling� his� friend,� a� member� of� royalty,� to�
become�a�consistent�winner��He�also�developed�Pascal’s�triangle�which�you�may�remember�

5Introduction
from�your�early�mathematics�education��The�statistical�development�of�the�normal�or�bell-
shaped�curve�(see�Chapter�4)�is�interesting��For�many�years,�this�development�was�attrib-
uted�to�Karl�Friedrich�Gauss�(early�1800s)�and�was�actually�known�for�some�time�as�the�
Gaussian� curve�� Later� historians� found� that� Abraham� DeMoivre� actually� developed� the�
normal�curve�in�the�1730s��As�statistics�was�not�thought�of�as�a�true�academic�discipline�
until� the� late� 1800s,� people� like� Pascal� and� DeMoivre� were� consulted� by� the� wealthy� on�
odds�about�games�of�chance�and�by�insurance�underwriters�to�determine�mortality�rates�
Karl� Pearson� is� one� of� the� most� famous� statisticians� to� date� (late� 1800s� to� early� 1900s)��
Among�his�many�accomplishments�is�the�Pearson�product–moment�correlation�coefficient�
still�in�use�today�(see�Chapter�10)��You�may�know�of�Florence�Nightingale�(1820–1910)�as�an�
important�figure�in�the�field�of�nursing��However,�you�may�not�know�of�her�importance�in�
the�field�of�statistics��Nightingale�believed�that�statistics�and�theology�were�linked�and�that�
by�studying�statistics�we�might�come�to�understand�God’s�laws�
A�quite�interesting�statistical�personality�is�William�Sealy�Gossett,�who�was�employed�
by� the� Guinness� Brewery� in� Ireland�� The� brewery� wanted� to� select� a� sample� of� people�
from�Dublin�in�1906�for�purposes�of�taste�testing��Gossett�was�asked�how�large�a�sample�
was�needed�in�order�to�make�an�accurate�inference�about�the�entire�population�(see�next�
section)�� The� brewery� would� not� let� Gossett� publish� any� of� his� findings� under� his� own�
name,� so� he� used� the� pseudonym� of� Student�� Today,� the� t� distribution� is� still� known� as�
Student’s�t�distribution��Sir�Ronald�A��Fisher�is�another�of�the�most�famous�statisticians�of�
all�time��Working�in�the�early�1900s,�Fisher�introduced�the�analysis�of�variance�(ANOVA)�
(see�Chapters�11�through�16)�and�Fisher’s�z�transformation�for�correlations�(see�Chapter�10)��
In� fact,� the� major� statistic� in� the� ANOVA� is� referred� to� as� the� F� ratio� in� honor� of� Fisher��
These� individuals� represent� only� a� fraction� of� the� many� famous� and� interesting� statisti-
cians�over�the�years��For�further�information�about�these�and�other�statisticians,�we�sug-
gest�you�consult�references�such�as�Pearson�(1978),�Stigler�(1986),�and�Heyde�et�al��(2001),�
which�consist�of�many�interesting�stories�about�statisticians�
1.3 General Statistical Definitions
In�this�section,�we�define�some�of�the�most�basic�concepts�in�statistics��Included�here�are�
definitions�and�examples�of�the�following�concepts:�population,�parameter,�sample,�statis-
tic,�descriptive�statistics,�and�inferential�statistics�
The�first�four�concepts�are�tied�together,�so�we�discuss�them�together��A�population�is�
defined�as�consisting�of�all�members�of�a�well-defined�group��A�population�may�be�large�
in�scope,�such�as�when�a�population�is�defined�as�all�of�the�employees�of�IBM�worldwide��
A� population� may� be� small� in� scope,� such� as� when� a� population� is� defined� as� all� of� the�
IBM� employees� at� the� building� on� Main� Street� in� Atlanta�� Thus,� a� population� could� be�
large�or�small�in�scope��The�key�is�that�the�population�is�well�defined�such�that�one�could�
determine�specifically�who�all�of�the�members�of�the�group�are�and�then�information�or�
data�could�be�collected�from�all�such�members��Thus,�if�our� population�is�defined�as�all�
members�working�in�a�particular�office�building,�then�our�study�would�consist�of�collect-
ing�data�from�all�employees�in�that�building��It�is�also�important�to�remember�that�you,�the�
researcher,�define�the�population�
A� parameter� is� defined� as� a� characteristic� of� a� population�� For� instance,� parameters�
of� our� office� building� example� might� be� the� number� of� individuals� who� work� in� that�

6 An Introduction to Statistical Concepts
building�(e�g�,�154),�the�average�salary�of�those�individuals�(e�g�,�$49,569),�and�the�range�of�
ages�of�those�individuals�(e�g�,�21–68�years�of�age)��When�we�think�about�characteristics�of�
a�population,�we�are�thinking�about�population parameters��Those�two�terms�are�often�
linked�together�
A� sample� is� defined� as� consisting� of� a� subset� of� a� population�� A� sample� may� be� large�
in�scope,�such�as�when�a�population�is�defined�as�all�of�the�employees�of�IBM�worldwide�
and�20%�of�those�individuals�are�included�in�the�sample��A�sample�may�be�small�in�scope,�
such�as�when�a�population�is�defined�as�all�of�the�IBM�employees�at�the�building�on�Main�
Street�in�Atlanta�and�10%�of�those�individuals�are�included�in�the�sample��Thus,�a�sample�
could�be�large�or�small�in�scope�and�consist�of�any�portion�of�the�population��The�key�is�
that� the� sample� consists� of� some,� but� not� all,� of� the� members� of� the� population;� that� is,�
anywhere�from�one�individual�to�all�but�one�individual�from�the�population�is�included�in�
the�sample��Thus,�if�our�population�is�defined�as�all�members�working�in�the�IBM�building�
on�Main�Street�in�Atlanta,�then�our�study�would�consist�of�collecting�data�from�a�sample�
of�some�of�the�employees�in�that�building��It�follows�that�if�we,�the�researcher,�define�the�
population,�then�we�also�determine�what�the�sample�will�be�
A�statistic�is�defined�as�a�characteristic�of�a�sample��For�instance,�statistics�of�our�office�
building�example�might�be�the�number�of�individuals�who�work�in�the�building�that�we�
sampled�(e�g�,�77),�the�average�salary�of�those�individuals�(e�g�,�$54,090),�and�the�range�of�
ages� of� those� individuals� (e�g�,� 25–62� years� of� age)�� Notice� that� the� statistics� of� a� sample�
need�not�be�equal�to�the�parameters�of�a�population�(more�about�this�in�Chapter�5)��When�
we�think�about�characteristics�of�a�sample,�we�are�thinking�about�sample statistics��Those�
two� terms� are� often� linked� together�� Thus,� we� have� population� parameters� and� sample�
statistics,� but� no� other� combinations� of� those� terms� exist�� The� field� has� become� known�
as�statistics�simply�because�we�are�almost�always�dealing�with�sample�statistics�because�
population�data�are�rarely�obtained�
The�final�two�concepts�are�also�tied�together�and�thus�considered�together��The�field�of�
statistics�is�generally�divided�into�two�types�of�statistics,�descriptive�statistics�and�inferen-
tial�statistics��Descriptive statistics�are�defined�as�techniques�which�allow�us�to�tabulate,�
summarize,�and�depict�a�collection�of�data�in�an�abbreviated�fashion��In�other�words,�the�
purpose�of�descriptive�statistics�is�to�allow�us�to�talk�about�(or�describe)�a�collection�of�data�
without�having�to�look�at�the�entire�collection��For�example,�say�we�have�just�collected�a�
set�of�data�from�100,000�graduate�students�on�various�characteristics�(e�g�,�height,�weight,�
gender,�grade�point�average,�aptitude�test�scores)��If�you�were�to�ask�us�about�the�data,�we�
could�do�one�of�two�things��On�the�one�hand,�we�could�carry�around�the�entire�collection�
of�data�everywhere�we�go,�and�when�someone�asks�us�about�the�data,�simply�say�“Here�is�
the�data;�take�a�look�at�them�yourself�”�On�the�other�hand,�we�could�summarize�the�data�
in�an�abbreviated�fashion,�and�when�someone�asks�us�about�the�data,�simply�say�“Here�is�
a�table�and�a�graph�about�the�data;�they�summarize�the�entire�collection�”�So,�rather�than�
viewing�100,000�sheets�of�paper,�perhaps�we�would�only�have�to�view�two�sheets�of�paper��
Since� statistics� is� largely� a� system� of� communicating� information,� descriptive� statistics�
are�considerably�more�useful�to�a�consumer�than�an�entire�collection�of�data��Descriptive�
statistics�are�discussed�in�Chapters�2�through�4�
Inferential statistics�are�defined�as�techniques�which�allow�us�to�employ�inductive�rea-
soning�to�infer�the�properties�of�an�entire�group�or�collection�of�individuals,�a�population,�
from�a�small�number�of�those�individuals,�a�sample��In�other�words,�the�purpose�of�infer-
ential�statistics�is�to�allow�us�to�collect�data�from�a�sample�of�individuals�and�then�infer�the�
properties�of�that�sample�back�to�the�population�of�individuals��In�case�you�have�forgotten�
about�logic,�inductive�reasoning�is�where�you�infer�from�the�specific�(here�the�sample)�to�

7Introduction
the�general�(here�the�population)��For�example,�say�we�have�just�collected�a�set�of�sample�
data�from�5,000�of�the�population�of�100,000�graduate�students�on�various�characteristics�
(e�g�,�height,�weight,�gender,�grade�point�average,�aptitude�test�scores)��If�you�were�to�ask�
us�about�the�data,�we�could�compute�various�sample�statistics�and�then�infer�with�some�
confidence�that�these�would�be�similar�to�the�population�parameters��In�other�words,�this�
allows� us� to� collect� data� from� a� subset� of� the� population� yet� still� make� inferential� state-
ments�about�the�population�without�collecting�data�from�the�entire�population��So,�rather�
than�collecting�data�from�all�100,000�graduate�students�in�the�population,�we�could�collect�
data�on�a�sample�of�say�5,000�students�
As�another�example,�Gossett�(aka�Student)�was�asked�to�conduct�a�taste�test�of�Guinness�
beer� for� a� sample� of� Dublin� residents�� Because� the� brewery� could� not� afford� to� do� this�
with�the�entire�population�of�Dublin,�Gossett�collected�data�from�a�sample�of�Dublin�resi-
dents�and�was�able�to�make�an�inference�from�these�sample�results�back�to�the�population��
A discussion�of�inferential�statistics�begins�in�Chapter�5��In�summary,�the�field�of�statistics�
is�roughly�divided�into�descriptive�statistics�and�inferential�statistics��Note,�however,�that�
many�further�distinctions�are�made�among�the�types�of�statistics,�but�more�about�that�later�
1.4 Types of Variables
There�are�several�terms�we�need�to�define�about�variables��First,�it�might�be�useful�to�define�
the�term�variable��A�variable�is�defined�as�any�characteristic�of�persons�or�things�that�is�
observed�to�take�on�different�values��In�other�words,�the�values�for�a�particular�character-
istic�vary�across�the�individuals�observed��For�example,�the�annual�salary�of�the�families�
in�your�neighborhood�varies�because�not�every�family�earns�the�same�annual�salary��One�
family�might�earn�$50,000�while�the�family�right�next�door�might�earn�$65,000��Thus,�the�
annual�family�salary�is�a�variable�because�it�varies�across�families�
In� contrast,� a� constant� is� defined� as� any� characteristic� of� persons� or� things� that� is�
observed�to�take�on�only�a�single�value��In�other�words,�the�values�for�a�particular�char-
acteristic� are� the� same� for� all� individuals� observed�� For� example,� say� every� family� in�
your� neighborhood� has� a� lawn�� Although� the� nature� of� the� lawns� may� vary,� everyone�
has�a�lawn��Thus,�whether�a�family�has�a�lawn�in�your�neighborhood�is�a�constant�and�
therefore�would�not�be�a�very�interesting�characteristic�to�study��When�designing�a�study,�
you�(i�e�,�the�researcher)�can�determine�what�is�a�constant��This�is�part�of�the�process�of�
delimiting,�or�narrowing�the�scope�of,�your�study��As�an�example,�you�may�be�interested�
in� studying� career� paths� of� girls� who� complete� AP� science� courses�� In� designing� your�
study,�you�are�only�interested�in�girls,�and�thus,�gender�would�be�a�constant��This�is�not�
to� say� that� the� researcher� wholly� determines� when� a� characteristic� is� a� constant�� It� is�
sometimes�the�case�that�we�find�that�a�characteristic�is�a�constant�after�we�conduct�the�
study�� In� other� words,� one� of� the� measures� has� no� variation—everyone� or� everything�
scored�or�remained�the�same�on�that�particular�characteristic�
There�are�different�typologies�for�describing�variables��One�typology�is�categorical�(or�
qualitative)� versus� numerical� (or� quantitative),� and� within� numerical,� discrete,� and� con-
tinuous��A�categorical�variable�is�a�qualitative�variable�that�describes�categories�of�a�char-
acteristic�or�attribute��Examples�of�categorical�variables�include�political�party�affiliation�
(Republican�=�1,�Democrat�=�2,�Independent�=�3),�religious�affiliation�(e�g�,�Methodist�=�1,�
Baptist�=�2,�Roman�Catholic�=�3),�and�course�letter�grade�(A�=�4,�B�=�3,�C�=�2,�D�=�1,�F�=�0)��

8 An Introduction to Statistical Concepts
A�dichotomous variable�is�a�special,�restricted�type�of�categorical�variable�and�is�defined�
as� a� variable� that� can� take� on� only� one� of� two� values�� For� example,� biologically� deter-
mined�gender�is�a�variable�that�can�only�take�on�the�values�of�male�or�female�and�is�often�
coded�numerically�as�0�(e�g�,�for�males)�or�1�(e�g�,�for�females)��Other�dichotomous�variables�
include�pass/fail,�true/false,�living/dead,�and�smoker/nonsmoker��Dichotomous�variables�
will�take�on�special�importance�as�we�study�binary�logistic�regression�(Chapter�19)�
A�numerical�variable�is�a�quantitative�variable��Numerical�variables�can�further�be�clas-
sified�as�either�discrete�or�continuous��A�discrete variable�is�defined�as�a�variable�that�can�
only�take�on�certain�values��For�example,�the�number�of�children�in�a�family�can�only�take�on�
certain�values��Many�values�are�not�possible,�such�as�negative�values�(e�g�,�the�Joneses�cannot�
have�−2�children)�or�decimal�values�(e�g�,�the�Smiths�cannot�have�2�2�children)��In�contrast,�
a�continuous variable�is�defined�as�a�variable�that�can�take�on�any�value�within�a�certain�
range�given�a�precise�enough�measurement�instrument��For�example,�the�distance�between�
two� cities� can� be� measured� in� miles,� with� miles� estimated� in� whole� numbers�� However,�
given� a� more� precise� instrument� with� which� to� measure,� distance� can� even� be� measured�
down� to� the� inch� or� millimeter�� When� considering� the� difference� between� a� discrete� and�
continuous� variable,� keep� in� mind� that� discrete variables arise from the counting process� and�
continuous variables arise from the measuring process�� For� example,� the� number� of� students�
enrolled�in�your�statistics�class�is�a�discrete�variable��If�we�were�to�measure�(i�e�,�count)�the�
number�of�students�in�the�class,�it�would�not�matter�if�we�counted�first�names�alphabetically�
from�A�to�Z�or�if�we�counted�beginning�with�who�sat�in�the�front�row�to�the�last�person�in�
the�back�row—either�way,�we�would�arrive�at�the�same�value��In�other�words,�how�we�“mea-
sure”�(again,�count)�the�students�in�the�class�does�not�matter—we�will�always�arrive�at�the�
same�result��In�comparison,�the�value�of�a�continuous�variable�is�dependent�on�how�precise�
the�measuring�instrument�is��Weighing�yourself�on�a�scale�that�rounds�to�whole�numbers�
will�give�us�one�measure�of�weight��However,�weighing�on�another,�more�precise,�scale�that�
rounds�to�three�decimal�places�will�provide�a�more�precise�measure�of�weight�
Here�are�a�few�additional�examples�of�the�discrete�and�continuous�variables��Other�dis-
crete� variables� include� a� number� of� CDs� owned,� number� of� credit� hours� enrolled,� and�
number�of�teachers�employed�at�a�school��Other�continuous�variables�include�salary�(from�
zero�to�billions�in�dollars�and�cents),�age�(from�zero�up,�in�millisecond�increments),�height�
(from� zero� up,�in�increments�of�fractions�of�millimeters),�weight� (from� zero�up,� in�incre-
ments�of�fractions�of�ounces),�and�time�(from�zero�up,�in�millisecond�increments)��Variable�
type�is�often�important�in�terms�of�selecting�an�appropriate�statistic,�as�shown�later�
1.5 Scales of Measurement
Another�concept�useful�for�selecting�an�appropriate�statistic�is�the�scale�of�measurement�
of�the�variables��First,�however,�we�define�measurement�as�the�assignment�of�numerical�
values�to�persons�or�things�according�to�explicit�rules��For�example,�how�do�we�measure�a�
person’s�weight?�Well,�there�are�rules�that�individuals�commonly�follow��Currently,�weight�
is�measured�on�some�sort�of�balance�or�scale�in�pounds�or�grams��In�the�old�days,�weight�
was�measured�by�different�rules,�such�as�the�number�of�stones�or�gold�coins��These�explicit�
rules�were�developed�so�that�there�was�a�standardized�and�generally�agreed�upon�method�
of� measuring� weight�� Thus,� if� you� weighted� 10� stones� in� Coventry,� England,� then� that�
meant�the�same�as�10�stones�in�Liverpool,�England�

9Introduction
In�1951,�the�psychologist�S�S��Stevens�developed�four�types�of�measurement�scales�that�
could�be�used�for�assigning�these�numerical�values��In�other�words,�the�type�of�rule�used�
was�related�to�the�measurement�scale��The�four�types�of�measurement�scales�are�the�nomi-
nal,�ordinal,�interval,�and�ratio�scales��They�are�presented�in�order�of�increasing�complex-
ity�and�of�increasing�information�(remembering�the�acronym�NOIR�might�be�helpful)��
It�is�worth�restating�the�importance�of�understanding�the�measurement�scales�of�variables�
as�the�measurement�scales�will�dictate�what�statistical�procedures�can�be�performed�with�
the�data�
1.5.1   Nominal Measurement Scale
The� simplest� scale� of� measurement� is� the� nominal scale�� Here� individuals� or� objects�
are�classified�into�categories�so�that�all�of�those�in�a�single�category�are�equivalent�with�
respect� to� the� characteristic� being� measured�� For� example,� the� country� of� birth� of� an�
individual� is� a� nominally� scaled� variable�� Everyone� born� in� France� is� equivalent� with�
respect�to�this�variable,�whereas�two�people�born�in�different�countries�(e�g�,�France�and�
Australia)�are�not�equivalent�with�respect�to�this�variable��The�categories�are�truly�quali-
tative�in�nature,�not�quantitative��Categories�are�typically�given�names�or�numbers��For�
our� example,� the� country� name� would� be� an� obvious� choice� for� categories,� although�
numbers�could�also�be�assigned�to�each�country�(e�g�,�Brazil�=�5,�India�=�34)��The�numbers�
do�not�represent�the�amount�of�the�attribute�possessed��An�individual�born�in�India�does�
not�possess�any�more�of�the�“country�of�birth�origin”�attribute�than�an�individual�born�
in�Brazil�(which�would�not�make�sense�anyway)��The�numbers�merely�identify�to�which�
category� an� individual� or� object� belongs�� The� categories� are� also� mutually� exclusive��
That�is,�an�individual�can�belong�to�one�and�only�one�category,�such�as�a�person�being�
born�in�only�one�country�
The� statistics� of� a� nominal� scale� variable� are� quite� simple� as� they� can� only� be� based�
on� the� frequencies� that� occur� within� each� of� the� categories�� For� example,� we� may� be�
studying�characteristics�of�various�countries�in�the�world��A�nominally�scaled�variable�
could� be� the� hemisphere� in� which� the� country� is� located� (northern,� southern,� eastern,�
and�western)��While�it�is�possible�to�count�the�number�of�countries�that�belong�to�each�
hemisphere,�that�is�all�that�we�can�do��The�only�mathematical�property�that�the�nominal�
scale�possesses�is�that�of�equality�versus�inequality��In�other�words,�two�individuals�or�
objects�are�either�in�the�same�category�(equal)�or�in�different�categories�(unequal)��For�the�
hemisphere�variable,�we�can�either�use�the�country�name�or�assign�numerical�values�
to�each�country��We�might�perhaps�assign�each�hemisphere�a�number�alphabetically�from�
1�to�4��Countries�that�are�in�the�same�hemisphere�are�equal�with�respect�to�this�character-
istic��Countries�that�are�in�different�hemispheres�are�unequal�with�respect�to�this�charac-
teristic��Again,�these�particular�numerical�values�are�meaningless�and�could�arbitrarily�
be�any�values��The�numerical�values�assigned�only�serve�to�keep�the�categories�distinct�
from�one�another��Many�other�numerical�values�could�be�assigned�for�the�hemispheres�
and� still� maintain� the� equality� versus� inequality� property�� For� example,� the� northern�
hemisphere�could�easily�be�categorized�as�1000�and�the�southern�hemisphere�as�2000�with�
no�change�in�information��Other�examples�of�nominal�scale�variables�include�hair�color,�
eye�color,�neighborhood,�gender,�ethnic�background,�religious�affiliation,�political�party�
affiliation,�type�of�life�insurance�owned�(e�g�,�term,�whole�life),�blood�type,�psychological�
clinical�diagnosis,�Social�Security�number,�and�type�of�headache�medication�prescribed��
The� term� nominal� is� derived� from� “giving� a� name�”� Nominal� variables� are� considered�
categorical�or�qualitative�

10 An Introduction to Statistical Concepts
1.5.2   Ordinal Measurement Scale
The�next�most�complex�scale�of�measurement�is�the�ordinal scale��Ordinal�measurement�
is�determined�by�the�relative�size�or�position�of�individuals�or�objects�with�respect�to�the�
characteristic�being�measured��That�is,�the�individuals�or�objects�are�rank-ordered�accord-
ing�to�the�amount�of�the�characteristic�that�they�possess��For�example,�say�a�high�school�
graduating� class� had� 250� students�� Students� could� then� be� assigned� class� ranks� accord-
ing�to�their�academic�performance�(e�g�,�grade�point�average)�in�high�school��The�student�
ranked�1�in�the�class�had�the�highest�relative�performance,�and�the�student�ranked�250�had�
the�lowest�relative�performance�
However,�equal�differences�between�the�ranks�do�not�imply�equal�distance�in�terms�of�
the�characteristic�being�measured��For�example,�the�students�ranked�1�and�2�in�the�class�
may�have�a�different�distance�in�terms�of�actual�academic�performance�than�the�students�
ranked� 249� and� 250,� even� though� both� pairs� of� students� differ� by� a� rank� of� 1�� In� other�
words,�here�a�rank�difference�of�1�does�not�imply�the�same�actual�performance�distance��
The�pairs�of�students�may�be�very,�very�close�or�be�quite�distant�from�one�another��As�
a�result�of�equal�differences�not�implying�equal�distances,�the�statistics�that�we�can�use�
are�limited�due�to�these�unequal�intervals��The�ordinal�scale�then�consists�of�two�math-
ematical�properties:�equality�versus�inequality�again;�and�if�two�individuals�or�objects�
are�unequal,�then�we�can�determine�greater�than�or�less�than��That�is,�if�two�individuals�
have�different�class�ranks,�then�we�can�determine�which�student�had�a�greater�or�lesser�
class�rank��Although�the�greater�than�or�less�than�property�is�evident,�an�ordinal�scale�
cannot�tell�us�how�much�greater�than�or�less�than�because�of�the�unequal�intervals��Thus,�
the�student�ranked�250�could�be�farther�away�from�student�249�than�the�student�ranked�2�
from�student�1�
When� we� have� untied� ranks,� as� shown� on� the� left� side� of� Table� 1�1,� assigning� ranks� is�
straightforward�� What� do� we� do� if� there� are� tied� ranks?� For� example,� suppose� there� are�
two�students�with�the�same�grade�point�average�of�3�8�as�given�on�the�right�side�of�Table�1�1��
How�do�we�assign�them�into�class�ranks?�It�is�clear�that�they�have�to�be�assigned�the�same�
rank,�as�that�would�be�the�only�fair�method��However,�there�are�at�least�two�methods�for�
dealing�with�tied�ranks��One�method�would�be�to�assign�each�of�them�a�rank�of�2�as�that�is�
the�next�available�rank��However,�there�are�two�problems�with�that�method��First,�the�sum�
of�the�ranks�for�the�same�number�of�scores�would�be�different�depending�on�whether�there�
Table 1.1
Untied�Ranks�and�Tied�Ranks�for�Ordinal�Data
Untied Ranks Tied Ranks
Grade Point
Average Rank
Grade Point
Average Rank
4�0 1 4�0 1
3�9 2 3�8 2�5
3�8 3 3�8 2�5
3�6 4 3�6 4
3�2 5 3�0 6
3�0 6 3�0 6
2�7 7 3�0 6
Sum�=�28 Sum�=�28

11Introduction
were�ties�or�not��Statistically,�this�is�not�a�satisfactory�solution��Second,�what�rank�would�
the�next�student�having�the�3�6�grade�point�average�be�given,�a�rank�of�3�or�4?
The� second� and� preferred� method� is� to� take� the� average� of� the� available� ranks� and�
assign�that�value�to�each�of�the�tied�individuals��Thus,�the�two�persons�tied�at�a�grade�
point�average�of�3�8�have�as�available�ranks�2�and�3��Both�would�then�be�assigned�the�
average�rank�of�2�5��Also,�the�three�persons�tied�at�a�grade�point�average�of�3�0�have�as�
available�ranks�5,�6,�and�7��These�all�would�be�assigned�the�average�rank�of�6��You�also�
see�in�the�table�that�with�this�method�the�sum�of�the�ranks�for�7�scores�is�always�equal�
to�28,�regardless�of�the�number�of�ties��Statistically,�this�is�a�satisfactory�solution�and�the�
one� we� prefer,� whether� we� are� using� a� statistical� software� package� or� hand� computa-
tions�� Other� examples� of� ordinal� scale� variables� include� course� letter� grades,� order� of�
finish� in� the� Boston� Marathon,� socioeconomic� status,� hardness� of� minerals� (1� =� soft-
est�to�10�=�hardest),�faculty�rank�(assistant,�associate,�and�full�professor),�student�class�
(freshman,�sophomore,�junior,�senior,�graduate�student),�ranking�on�a�personality�trait�
(e�g�,� extreme� intrinsic� to� extreme� extrinsic� motivation),� and� military� rank�� The� term�
ordinal� is� derived� from� “ordering”� individuals� or� objects�� Ordinal� variables� are� most�
often�considered�categorical�or�qualitative�
1.5.3   Interval Measurement Scale
The�next�most�complex�scale�of�measurement�is�the�interval scale��An�interval�scale�is�one�
where� individuals� or� objects� can� be� ordered,� and� equal� differences� between� the� values�
do�imply�equal�distance�in�terms�of�the�characteristic�being�measured��That�is,�order�and�
distance�relationships�are�meaningful��However,�there�is�no�absolute�zero�point��Absolute�
zero,�if�it�exists,�implies�the�total�absence�of�the�property�being�measured��The�zero�point�of�
an�interval�scale,�if�it�exists,�is�arbitrary�and�does�not�reflect�the�total�absence�of�the�prop-
erty�being�measured��Here�the�zero�point�merely�serves�as�a�placeholder��For�example,�sup-
pose�that�we�gave�you�the�final�exam�in�advanced�statistics�right�now��If�you�were�to�be�so�
unlucky�as�to�obtain�a�score�of�0,�this�score�does�not�imply�a�total�lack�of�knowledge�of�sta-
tistics��It�would�merely�reflect�the�fact�that�your�statistics�knowledge�is�not�that�advanced�
yet�(or�perhaps�the�questions�posed�on�the�exam�just�did�not�capture�those�concepts�that�
you�do�understand)��You�do�have�some�knowledge�of�statistics�but�just�at�an�introductory�
level�in�terms�of�the�topics�covered�so�far�
Take�as�an�example�the�Fahrenheit�temperature�scale,�which�has�a�freezing�point�of�
32�degrees��A�temperature�of�zero�is�not�the�total�absence�of�heat,�just�a�point�slightly�
colder� than� 1� degree� and� slightly� warmer� than� −1� degree�� In� terms� of� the� equal� dis-
tance�notion,�consider�the�following�example��Say�that�we�have�two�pairs�of�Fahrenheit�
temperatures,�the�first�pair�being�55�and�60�degrees�and�the�second�pair�being�25�and�
30�degrees��The�difference�of�5�degrees�is�the�same�for�both�pairs�and�is�also�the�same�
everywhere�along�the�Fahrenheit�scale��Thus,�every�5�degree�interval�is�an�equal�interval��
However,�we�cannot�say�that�60�degrees�is�twice�as�warm�as�30�degrees,�as�there�is�no�
absolute�zero��In�other�words,�we�cannot�form�true�ratios�of�values�(i�e�,�60/30�=�2)��This�
property�only�exists�for�the�ratio�scale�of�measurement��The�interval�scale�has�as�math-
ematical� properties� equality� versus� inequality,� greater� than� or� less� than� if� unequal,�
and�equal�intervals��Other�examples�of�interval�scale�variables�include�the�Centigrade�
temperature� scale,� calendar� time,� restaurant� ratings� by� the� health� department� (on� a�
100-point�scale),�year�(since�1�AD),�and�arguably,�many�educational�and�psychological�
assessment�devices�(although�statisticians�have�been�debating�this�one�for�many�years;�

12 An Introduction to Statistical Concepts
e�g�,�on�occasion�there�is�a�fine�line�between�whether�an�assessment�is�measured�along�
the�ordinal�or�the�interval�scale)��Interval�variables�are�considered�numerical�and�pri-
marily�continuous�
1.5.4   Ratio Measurement Scale
The�most�complex�scale�of�measurement�is�the�ratio scale��A�ratio�scale�has�all�of�the�proper-
ties�of�the�interval�scale,�plus�an�absolute�zero�point�exists��Here�a�measurement�of�0�indi-
cates�a�total�absence�of�the�property�being�measured��Due�to�an�absolute�zero�point�existing,�
true�ratios�of�values�can�be�formed�which�actually�reflect�ratios�in�the�amounts�of�the�charac-
teristic�being�measured��Thus,�if�concepts�such�as�“one-half�as�big”�or�“twice�as�large”�make�
sense,�then�that�may�be�a�good�indication�that�the�variable�is�ratio�in�scale�
For�example,�the�height�of�individuals�is�a�ratio�scale�variable��There�is�an�absolute�zero�
point� of� zero� height�� We� can� also� form� ratios� such� that� 6′0″� Sam� is� twice� as� tall� as� his�
3′0″� daughter� Samantha�� The� ratio� scale� of� measurement� is� not� observed� frequently� in�
education�and�the�behavioral�sciences,�with�certain�exceptions��Motor�performance�vari-
ables�(e�g�,�speed�in�the�100�meter�dash,�distance�driven�in�24�hours),�elapsed�time,�calorie�
consumption,�and�physiological�characteristics�(e�g�,�weight,�height,�age,�pulse�rate,�blood�
pressure)� are� ratio� scale� measures� (and� are� all� also� examples� of� continuous� variables)��
Discrete�variables,�those�that�arise�from�the�counting�process,�are�also�examples�of�ratio�
variables�since�zero�indicates�an�absence�of�what�is�measured�(e�g�,�the�number�of�children�
in�a�family�or�the�number�of�trees�in�a�park)��A�summary�of�the�measurement�scales,�their�
characteristics,� and� some� examples� is� given� in� Table� 1�2�� Ratio� variables� are� considered�
numerical�and�can�be�either�discrete�or�continuous�
Table 1.2
Summary�of�the�Scales�of�Measurement
Scale Characteristics Examples
Nominal Classify�into�categories;�categories�are�given�
names�or�numbers,�but�the�numbers�are�
arbitrary;�mathematical�property:
1��Equal�versus�unequal
Hair�or�eye�color,�ethnic�background,�
neighborhood,�gender,�country�of�birth,�social�
security�number,�type�of�life�insurance,�religious�
or�political�affiliation,�blood�type,�clinical�
diagnosis
Ordinal Rank-ordered�according�to�relative�size�
or position;�mathematical�properties:
1��Equal�versus�unequal
2��If�unequal,�then�greater�than�or�less�than
Letter�grades,�order�of�finish�in�race,�class�rank,�
SES,�hardness�of�minerals,�faculty�rank,�student�
class,�military�rank,�rank�on�personality�trait
Interval Rank-ordered�and�equal�differences�between�
values�imply�equal�distances�in�the�attribute;�
mathematical�properties:
1��Equal�versus�unequal
2��If�unequal,�then�greater�than�or�less�than
3��Equal�intervals
Temperature,�calendar�time,�most�assessment�
devices,�year,�restaurant�ratings
Ratio Rank-ordered,�equal�intervals,�absolute�zero�
allows�ratios�to�be�formed;�mathematical�
properties:
1��Equal�versus�unequal
2��If�unequal,�then�greater�than�or�less�than
3��Equal�intervals
4��Absolute�zero
Speed�in�100�meter�dash,�height,�weight,�age,�
distance�driven,�elapsed�time,�pulse�rate,�blood�
pressure,�calorie�consumption

13Introduction
1.6 Summary
In� this� chapter,� an� introduction� to� statistics� was� given�� First,� we� discussed� the� value� and�
need�for�knowledge�about�statistics�and�how�it�assists�in�decision�making��Next,�a�few�of�
the�more�colorful�and�interesting�statisticians�of�the�past�were�mentioned��Then,�we�defined�
the�following�general�statistical�terms:�population,�parameter,�sample,�statistic,�descriptive�
statistics,�and�inferential�statistics��We�then�defined�variable-related�terms�including�vari-
ables,� constants,� categorical� variables,� and� continuous� variables�� For� a� summary� of� these�
definitions,�see�Box�1�1��Finally,�we�examined�the�four�classic�types�of�measurement�scales,�
nominal,� ordinal,� interval,� and� ratio�� By� now,� you� should� have� met� the� following� objec-
tives:�(a) have�a�better�sense�of�why�statistics�are�necessary;�(b)�see�that�statisticians�are�an�
interesting�group�of�people;�and�(c)�have�an�understanding�of�the�basic�statistical�concepts�
of�population,�parameter,�sample,�and�statistic,�descriptive�and�inferential�statistics,�types�
of� variables,� and� scales� of� measurement�� The� next� chapter� begins� to� address� some� of� the�
details�of�descriptive�statistics�when�we�consider�how�to�represent�data�in�terms�of�tables�
and�graphs��In�other�words,�rather�than�carrying�our�data�around�with�us�everywhere�we�go,�
we�examine�ways�to�display�data�in�tabular�and�graphical�forms�to�foster�communication�
STOp aNd ThINk bOx 1.1
Summary�of�Definitions
Term Definition Example(s)
Population All�members�of�a�well-defined�group All�employees�of�IBM�Atlanta
Parameter A�characteristic�of�a�population Average�salary�of�a�population
Sample A�subset�of�a�population Some�employees�of�IBM�Atlanta
Statistic A�characteristic�of�a�sample Average�salary�of�a�sample
Descriptive�statistics Techniques�which�allow�us�to�tabulate,�
summarize,�and�depict�a�collection�of�data�
in an�abbreviated�fashion
Table�or�graph�summarizing�data
Inferential�statistics Techniques�which�allow�us�to�employ�inductive�
reasoning�to�infer�the�properties�of�a�
population�from�a�sample
Taste�test�statistics�from�sample�
of Dublin�residents
Variable Any�characteristic�of�persons�or�things�that�
is observed�to�take�on�different�values
Salary�of�the�families�in�your�
neighborhood
Constant Any�characteristic�of�persons�or�things�that�
is observed�to�take�on�only�a�single�value
Every�family�has�a�lawn�in�your�
neighborhood
Categorical�variable A�qualitative�variable Political�party�affiliation
Dichotomous�variable A�categorical�variable�that�can�take�on�only�
one of�two�values
Biologically�determined�gender
Numerical�variable A�quantitative�variable�that�is�either�discrete�
or continuous
Number�of�children�in�a�family;�
the�distance�between�two�cities
Discrete�variable A�numerical�variable�that�arises�from�the�
counting�process�that�can�take�on�only�certain�
values
Number�of�children�in�a�family
Continuous�variable A�numerical�variable�that�can�take�on�any�value�
within�a�certain�range�given�a�precise�enough�
measurement�instrument
Distance�between�two�cities

14 An Introduction to Statistical Concepts
Problems
Conceptual problems
1.1� �A�mental�health�counselor�is�conducting�a�research�study�on�satisfaction�that�married�
couples� have� with� their� marriage�� “Marital� status”� (e�g�,� single,� married,� divorced,�
widowed),�in�this�scenario,�is�which�one�of�the�following?
� a�� Constant
� b�� Variable
1.2� �Belle� randomly� samples� 100� library� patrons� and� gathers� data� on� the� genre� of� the�
“first�book”�that�they�checked�out�from�the�library��She�finds�that�85�library�patrons�
checked� out� a� fiction� book� and� 15� library� patrons� checked� out� a� nonfiction� book��
Which�of�the�following�best�characterizes�the�type�of�“first�book”�checked�out�in�this�
study?
� a�� Constant
� b�� Variable
1.3� For�interval�level�variables,�which�of�the�following�properties�does�not�apply?
� a�� Jim�is�two�units�greater�than�Sally�
� b�� Jim�is�greater�than�Sally�
� c�� Jim�is�twice�as�good�as�Sally�
� d�� Jim�differs�from�Sally�
1.4� �Which�of�the�following�properties�is�appropriate�for�ordinal�but�not�for�nominal�variables?
� a�� Sue�differs�from�John�
� b�� Sue�is�greater�than�John�
� c�� Sue�is�10�units�greater�than�John�
� d�� Sue�is�twice�as�good�as�John�
1.5� �Which� scale� of� measurement� is� implied� by� the� following� statement:� “Jill’s� score� is�
three�times�greater�than�Eric’s�score”?
� a�� Nominal
� b�� Ordinal
� c�� Interval
� d�� Ratio
1.6� �Which�scale�of�measurement�is�implied�by�the�following�statement:�“Bubba�had�the�
highest�score”?
� a�� Nominal
� b�� Ordinal
� c�� Interval
� d�� Ratio
1.7� �A�band�director�collects�data�on�the�number�of�years�in�which�students�in�the�band�
have� played� a� musical� instrument�� Which� scale� of� measurement� is� implied� by� this�
scenario?

15Introduction
� a�� Nominal
� b�� Ordinal
� c�� Interval
� d�� Ratio
1.8� �Kristen�has�an�IQ�of�120��I�assert�that�Kristen�is�20%�more�intelligent�than�the�average�
person�having�an�IQ�of�100��Am�I�correct?
1.9� Population�is�to�parameter�as�sample�is�to�statistic��True�or�false?
1.10� Every�characteristic�of�a�sample�of�100�persons�constitutes�a�variable��True�or�false?
1.11� A�dichotomous�variable�is�also�a�categorical�variable��True�or�false?
1.12� �The� amount� of� time� spent� studying� in� 1� week� for� a� population� of� students� is� an�
inferential�statistic��True�or�false?
1.13� For�ordinal�level�variables,�which�of�the�following�properties�does�not�apply?
� a�� IBM�differs�from�Apple�
� b�� IBM�is�greater�than�Apple�
� c�� IBM�is�two�units�greater�than�Apple�
� d�� All�of�the�aforementioned�properties�apply�
1.14� �A�sample�of�50�students�take�an�exam,�and�the�instructor�decides�to�give�the�top�5�
scores�a�bonus�of�5�points��Compared�to�the�original�set�of�scores�(no�bonus),�I�assert�
that�the�ranks�of�the�new�set�of�scores�(including�bonus)�will�be�exactly�the�same��Am�
I�correct?
1.15� �Johnny�and�Buffy�have�class�ranks�of�5�and�6��Ingrid�and�Toomas�have�class�ranks�of�
55�and�56��I�assert�that�the�GPAs�of�Johnny�and�Buffy�are�the�same�distance�apart�as�
are�the�GPAs�of�Ingrid�and�Toomas��Am�I�correct?
Computational problems
1.1� �Rank� the� following� values� of� the� number� of� CDs� owned,� assigning� rank� 1� to� the�
largest�value:
10 15 12 8 20 17 5 21 3 19
1.2� �Rank�the�following�values�of�the�number�of�credits�earned,�assigning�rank�1�to�the�
largest�value:
10 16 10 8 19 16 5 21 3 19
1.3� �Rank�the�following�values�of�the�number�of�pairs�of�shoes�owned,�assigning�rank�1�
to�the�largest�value:
8 6 3 12 19 7 10 25 4 42
Interpretive problems
Consider�the�following�class�survey:
1.1� What�is�your�gender?
1.2� What�is�your�height�in�inches?
1.3� What�is�your�shoe�size�(length)?

16 An Introduction to Statistical Concepts
1.4� Do�you�smoke?
1.5� Are�you�left-�or�right-handed?�Your�mother?�Your�father?
1.6� How�much�did�you�spend�at�your�last�hair�appointment�(including�tip)?
1.7� How�many�CDs�do�you�own?
1.8� What�was�your�quantitative�GRE�score?
1.9� What�is�your�current�GPA?
1.10� On�average,�how�much�exercise�do�you�get�per�week�(in�hours)?
1.11� �On�a�5-point�scale,�what�is�your�political�view�(1�=�very�liberal,�3�=�moderate,�5�=�very�
conservative)?
1.12� On�average,�how�many�hours�of�TV�do�you�watch�per�week?
1.13� How�many�cups�of�coffee�did�you�drink�yesterday?
1.14� How�many�hours�did�you�sleep�last�night?
1.15� On�average,�how�many�alcoholic�drinks�do�you�have�per�week?
1.16� Can�you�tell�the�difference�between�Pepsi�and�Coke?
1.17� What�is�the�natural�color�of�your�hair�(black,�blonde,�brown,�red,�other)?
1.18� What�is�the�natural�color�of�your�eyes�(black,�blue,�brown,�green,�other)?
1.19� How�far�do�you�live�from�this�campus�(in�miles)?
1.20� On�average,�how�many�books�do�you�read�for�pleasure�each�month?
1.21� On�average,�how�many�hours�do�you�study�per�week?
1.22� �Which�question�on�this�survey�is�the�most�interesting�to�you?�The�least�interesting?
Possible Activities
1�� �For�each�item,�determine�the�most�likely�scale�of�measurement�(nominal,�ordinal,�inter-
val,�or�ratio)�and�the�type�of�variable�[categorical�or�numerical�(if�numerical,�discrete�or�
continuous)]�
2�� �Create� scenarios� in� which� one� or� more� of� the� variables� in� this� survey� would� be� a�
constant,�given�the�delimitations�that�you�define�for�your�study��For�example,�we�are�
designing�a�study�to�measure�study�habits�(as�measured�by�Question�1�21)�for�students�
who�do�not�exercise�(Question�1�10)��In�this�sample�study,�our�constant�is�the�number�of�
hours�per�week�that�a�student�studies�(in�this�case,�we�are�delimiting�that�to�be�zero—
and�thus,�Question�1�10�will�be�a�constant;�all�students�in�our�study�will�have�answered�
Question�1�10�as�“zero”)�
3�� �Collect�data�from�a�sample�of�individuals��In�subsequent�chapters,�you�will�be�asked�to�
analyze�these�data�for�different�procedures�
N O T E : � An�actual�sample�dataset�using�this�survey�is�contained�on�the�website�(SPSS�file:�
survey1)�and�is�utilized�in�later�chapters�

17
2
Data Representation
Chapter Outline
2�1� Tabular�Display�of�Distributions
2�1�1� Frequency�Distributions
2�1�2� Cumulative�Frequency�Distributions
2�1�3� Relative�Frequency�Distributions
2�1�4� Cumulative�Relative�Frequency�Distributions
2�2� Graphical�Display�of�Distributions
2�2�1� Bar�Graph
2�2�2� Histogram
2�2�3� Frequency�Polygon
2�2�4� Cumulative�Frequency�Polygon
2�2�5� Shapes�of�Frequency�Distributions
2�2�6� Stem-and-Leaf�Display
2�3� Percentiles
2�3�1� Percentiles
2�3�2� Quartiles
2�3�3� Percentile�Ranks
2�3�4� Box-and-Whisker�Plot
2�4� SPSS
2�5� Templates�for�Research�Questions�and�APA-Style�Paragraph
Key Concepts
� 1�� Frequencies,�cumulative�frequencies,�relative�frequencies,�and�cumulative�relative�
frequencies
� 2�� Ungrouped�and�grouped�frequency�distributions
� 3�� Sample�size
� 4�� Real�limits�and�intervals
� 5�� Frequency�polygons
� 6�� Normal,�symmetric,�and�skewed�frequency�distributions
� 7�� Percentiles,�quartiles,�and�percentile�ranks

18 An Introduction to Statistical Concepts
In� Chapter� 1,� we� introduced� the� wonderful� world� of� statistics�� There,� we� discussed� the�
value�of�statistics,�met�a�few�of�the�more�interesting�statisticians,�and�defined�several�basic�
statistical� concepts�� The� concepts� included� population,� parameter,� sample� and� statistic,�
descriptive�and�inferential�statistics,�types�of�variables,�and�scales�of�measurement��In�this�
chapter,�we�begin�our�examination�of�descriptive�statistics,�which�we�previously�defined�
as�techniques�that�allow�us�to�tabulate,�summarize,�and�depict�a�collection�of�data�in�an�
abbreviated� fashion�� We� used� the� example� of� collecting� data� from� 100,000� graduate� stu-
dents�on�various�characteristics�(e�g�,�height,�weight,�gender,�grade�point�average,�aptitude�
test� scores)�� Rather� than� having� to� carry� around� the� entire� collection� of� data� in� order� to�
respond�to�questions,�we�mentioned�that�you�could�summarize�the�data�in�an�abbreviated�
fashion�through�the�use�of�tables�and�graphs��This�way,�we�could�communicate�features�of�
the�data�through�a�few�tables�or�figures�without�having�to�carry�around�the�entire�dataset�
This�chapter�deals�with�the�details�of�the�construction�of�tables�and�figures�for�purposes�
of�describing�data��Specifically,�we�first�consider�the�following�types�of�tables:�frequency�dis-
tributions�(ungrouped�and�grouped),�cumulative�frequency�distributions,�relative�frequency�
distributions,�and�cumulative�relative�frequency�distributions��Next�we�look�at�the�following�
types�of�figures:�bar�graph,�histogram,�frequency�polygon,�cumulative�frequency�polygon,�
and�stem-and-leaf�display��We�also�discuss�common�shapes�of�frequency�distributions��Then�
we� examine� the� use� of� percentiles,� quartiles,� percentile� ranks,� and� box-and-whisker� plots��
Finally,�we�look�at�the�use�of�SPSS�and�develop�an�APA-style�paragraph�of�results��Concepts�
to�be�discussed�include�frequencies,�cumulative�frequencies,�relative�frequencies,�and�cumu-
lative�relative�frequencies;�ungrouped�and�grouped�frequency�distributions;�sample�size;�real�
limits�and�intervals;�frequency�polygons;�normal,�symmetric,�and�skewed�frequency�distri-
butions;�and�percentiles,�quartiles,�and�percentile�ranks��Our�objectives�are�that�by�the�end�of�
this�chapter,�you�will�be�able�to�(1)�construct�and�interpret�statistical�tables,�(2)�construct�and�
interpret�statistical�graphs,�and�(3)�determine�and�interpret�percentile-related�information�
2.1 Tabular Display of Distributions
Consider�the�following�research�scenario:
Marie,�a�graduate�student�pursuing�a�master’s�degree�in�educational�research,�has�been�
assigned�to�her�first�task�as�a�research�assistant��Her�faculty�mentor�has�given�Marie�
quiz�data�collected�from�25�students�enrolled�in�an�introductory�statistics�course�and�
has�asked�Marie�to�summarize�the�data��In�addition�to�the�data,�the�faculty�mentor�has�
shared�the�following�research�question�that�should�guide�Marie�in�her�analysis�of�the�
data:�How can the quiz scores of students enrolled in an introductory statistics class be graphi-
cally represented in a table? In a figure? What is the distributional shape of the statistics quiz
score? What is the 50th�percentile of the quiz scores?
In�this�section,�we�consider�ways�in�which�data�can�be�represented�in�the�form�of�tables��
More�specifically,�we�are�interested�in�how�the�data�for�a�single�variable�can�be�represented�
(the�representation�of�data�for�multiple�variables�is�covered�in�later�chapters)��The�methods�
described� here� include� frequency� distributions� (both� ungrouped� and� grouped),� cumu-
lative� frequency� distributions,� relative� frequency� distributions,� and� cumulative� relative�
frequency�distributions�

19Data Representation
2.1.1   Frequency distributions
Let�us�use�an�example�set�of�data�in�this�chapter�to�illustrate�ways�in�which�data�can�be�
represented��We�have�selected�a�small�dataset�for�purposes�of�simplicity,�although�datasets�
are�typically�larger�in�size��Note�that�there�is�a�larger�dataset�(based�on�the�survey�from�
Chapter�1�interpretive�problem)�utilized�in�the�end�of�chapter�problems�and�available�on�
our�website�as�“survey1�”�As�shown�in�Table�2�1,�the�smaller�dataset�consists�of�a�sample�
of�25�student�scores�on�a�statistics�quiz,�where�the�maximum�score�is�20�points��If�a�col-
league� asked� a� question� about� these� data,� again� a� response� could� be,� “take� a� look� at� the�
data�yourself�”�This�would�not�be�very�satisfactory�to�the�colleague,�as�the�person�would�
have�to�eyeball�the�data�to�answer�his�or�her�question��Alternatively,�one�could�present�the�
data�in�the�form�of�a�table�so�that�questions�could�be�more�easily�answered��One�question�
might�be�which�score�occurred�most�frequently?�In�other�words,�what�score�occurred�more�
than�any�other�score?�Other�questions�might�be�which�scores�were�the�highest�and�lowest�
scores�in�the�class?�and�where�do�most�of�the�scores�tend�to�fall?�In�other�words,�how�well�
did�the�students�tend�to�do�as�a�class?�These�and�other�questions�can�be�easily�answered�
by�looking�at�a�frequency distribution�
Let�us�first�look�at�how�an�ungrouped frequency distribution�can�be�constructed�for�
these� and� other� data�� By� following� these� steps,� we� develop� the� ungrouped� frequency�
distribution�as�shown�in�Table�2�2��The�first�step�is�to�arrange�the�unique�scores�on�a�list�
from�the�lowest�score�to�the�highest�score��The�lowest�score�is�9�and�the�highest�score�is�20��Even�
though�scores�such�as�15�were�observed�more�than�once,�the�value�of�15�is�only�entered�
in�this�column�once��This�is�what�we�mean�by�unique��Note�that�if�the�score�of�15�was�not�
observed,�it�could�still�be�entered�as�a�value�in�the�table�to�serve�as�a�placeholder�within�
Table 2.1
Statistics�Quiz�Data
9 11 20 15 19 10 19 18 14 12 17 11 13
16 17 19 18 17 13 17 15 18 17 19 15
Table 2.2
Ungrouped�Frequency�Distribution�
of Statistics�Quiz�Data
X f cf rf crf
9 1 1 f/n�=�1/25�=��04 �04
10 1 2 �04 �08
11 2 4 �08 �16
12 1 5 �04 �20
13 2 7 �08 �28
14 1 8 �04 �32
15 3 11 �12 �44
16 1 12 �04 �48
17 5 17 �20 �68
18 3 20 �12 �80
19 4 24 �16 �96
20 1 25 �04 1�00
n�=�25 1�00

20 An Introduction to Statistical Concepts
the�distribution�of�scores�observed��We�label�this�column�as�“raw�score”�or�“X,”�as�shown�by�
the�first�column�in�the�table��Raw scores�are�a�set�of�scores�in�their�original�form;�that�is,�the�
scores�have�not�been�altered�or�transformed�in�any�way��X�is�often�used�in�statistics�to�denote�
a�variable,�so�you�see�X�quite�a�bit�in�this�text��(As�a�side�note,�whenever�upper�or�lowercase�
letters�are�used�to�denote�statistical�notation,�the�letter�is�always�italicized�)
The� second� step� is� to� determine� for� each� unique� score� the� number� of� times� it� was�
observed�� We� label� this� second� column� as� “frequency”� or� by� the� abbreviation� “f�”� The�
frequency� column� tells� us� how� many� times� or� how� frequently� each� unique� score� was�
observed��For�instance,�the�score�of�20�was�only�observed�one�time�whereas�the�score�of�17�
was�observed�five�times��Now�we�have�some�information�with�which�to�answer�the�ques-
tions�of�our�colleague��The�most�frequently�observed�score�is�17,�the�lowest�score�is�9,�and�
the�highest�score�is�20��We�can�also�see�that�scores�tended�to�be�closer�to�20�(the�highest�
score)�than�to�9�(the�lowest�score)�
Two�other�concepts�need�to�be�introduced�that�are�included�in�Table�2�2��The�first�concept�
is�sample size��At�the�bottom�of�the�second�column,�you�see�n�=�25��From�now�on,�n�will�
be�used�to�denote�sample�size,�that�is,�the�total�number�of�scores�obtained�for�the�sample��
Thus,�because�25�scores�were�obtained�here,�then�n�=�25�
The�second�concept�is�related�to�real limits�and�intervals��Although�the�scores�obtained�
for� this� dataset� happened� to� be� whole� numbers,� not� fractions� or� decimals,� we� still� need� a�
system�that�will�cover�that�possibility��For�example,�what�would�we�do�if�a�student�obtained�
a�score�of�18�25?�One�option�would�be�to�list�that�as�another�unique�score,�which�would�prob-
ably�be�more�confusing�than�useful��A�second�option�would�be�to�include�it�with�one�of�the�
other�unique�scores�somehow;�this�is�our�option�of�choice��The�system�that�all�researchers�
use�to�cover�the�possibility�of�any�score�being�obtained�is�through�the�concepts�of�real�limits�
and� intervals�� Each� value� of� X� in� Table� 2�2� can� be� thought� of� as� being� the� midpoint� of� an�
interval��Each�interval�has�an�upper�and�a�lower�real�limit��The�upper�real�limit�of�an�interval�
is�halfway�between�the�midpoint�of�the�interval�under�consideration�and�the�midpoint�of�
the�next�larger�interval��For�example,�the�value�of�18�represents�the�midpoint�of�an�interval��
The�next�larger�interval�has�a�midpoint�of�19��Therefore,�the�upper�real�limit�of�the�interval�
containing�18�would�be�18�5,�halfway�between�18�and�19��The�lower�real�limit�of�an�interval�
is�halfway�between�the�midpoint�of�the�interval�under�consideration�and�the�midpoint�of�the�
next�smaller�interval��Following�the�example�interval�of�18�again,�the�next�smaller�interval�
has�a�midpoint�of�17��Therefore,�the�lower�real�limit�of�the�interval�containing�18�would�be�
17�5,�halfway�between�18�and�17��Thus,�the�interval�of�18�has�18�5�as�an�upper�real�limit�and�
17�5�as�a�lower�real�limit��Other�intervals�have�their�upper�and�lower�real�limits�as�well�
Notice� that� adjacent� intervals� (i�e�,� those� next� to� one� another)� touch� at� their� respective�
real�limits��For�example,�the�18�interval�has�18�5�as�its�upper�real�limit�and�the�19�interval�
has� 18�5� as� its� lower� real� limit�� This� implies� that� any� possible� score� that� occurs� can� be�
placed�into�some�interval�and�no�score�can�fall�between�two�intervals��If�someone�obtains�
a�score�of�18�25,�that�will�be�covered�in�the�18�interval��The�only�limitation�to�this�procedure�
is� that� because� adjacent� intervals� must� touch� in� order� to� deal� with� every� possible� score,�
what�do�we�do�when�a�score�falls�precisely�where�two�intervals�touch�at�their�real�limits�
(e�g�,�at�18�5)?�There�are�two�possible�solutions��The�first�solution�is�to�assign�the�score�to�
one�interval�or�another�based�on�some�rule��For�instance,�we�could�randomly�assign�such�
scores� to� one� interval� or� the� other� by� flipping� a� coin�� Alternatively,� we� could� arbitrarily�
assign�such�scores�always�into�either�the�larger�or�smaller�of�the�two�intervals��The�second�
solution�is�to�construct�intervals�such�that�the�number�of�values�falling�at�the�real�limits�
is�minimized��For�example,�say�that�most�of�the�scores�occur�at��5�(e�g�,�15�5,�16�5,�17�5)��We�
could�construct�the�intervals�with��5�as�the�midpoint�and��0�as�the�real�limits��Thus,�the�15�5�

21Data Representation
interval�would�have�15�5�as�the�midpoint,�16�0�as�the�upper�real�limit,�and�15�0�as�the�lower�
real�limit��It�should�also�be�noted�that,�strictly�speaking,�real�limits�are�only�appropriate�
for�continuous�variables�but�not�for�discrete�variables��That�is,�since�discrete�variables�can�
only�have�limited�values,�we�probably�don’t�need�to�worry�about�real�limits�(e�g�,�there�is�
not�really�an�interval�for�two�children)�
Finally,� the� width� of� an� interval� is� defined� as� the� difference� between� the� upper� and�
lower�real�limits�of�an�interval��We�can�denote�this�as�w = URL − LRL,�where�w�is�interval�
width,�and�URL�and�LRL�are�the�upper�and�lower�real�limits,�respectively��In�the�case�of�
our�example�interval�again,�we�see�that�w = URL − LRL�=�18�5�−�17�5�=�1�0��For�Table�2�2,�
then,�all�intervals�have�the�same�interval�width�of�1�0��For�each�interval,�we�have�a�mid-
point,�a�lower�real�limit�that�is�one-half�unit�below�the�midpoint,�and�an�upper�real�limit�
that�is�one-half�unit�above�the�midpoint��In�general,�we�want�all�of�the�intervals�to�have�the�
same�width�for�consistency�as�well�as�for�equal�interval�reasons��The�only�exception�might�
be�if�the�largest�or�smallest�intervals�were�above�a�certain�value�(e�g�,�greater�than�20)�or�
below�a�certain�value�(e�g�,�less�than�9),�respectively�
A�frequency�distribution�with�an�interval�width�of�1�0�is�often�referred�to�as�an�ungrouped
frequency distribution,�as�the�intervals�have�not�been�grouped�together��Does�the�interval�
width�always�have�to�be�equal�to�1�0?�The�answer,�of�course,�is�no��We�could�group�intervals�
together�and�form�what�is�often�referred�to�as�a�grouped frequency distribution��For�our�
example�data,�we�can�construct�a�grouped�frequency�distribution�with�an�interval�width�
of�2�0,�as�shown�in�Table�2�3��The�largest�interval�now�contains�the�scores�of�19�and�20,�the�
second� largest� interval� the� scores� of� 17� and� 18,� and� so� on� down� to� the� smallest� interval�
with�the�scores�of�9�and�10��Correspondingly,�the�largest�interval�contains�a�frequency�of�5,�
the�second�largest�interval�a�frequency�of�8,�and�the�smallest�interval�a�frequency�of�2��All�
we�have�really�done�is�collapse�the�intervals�from�Table�2�2,�where�interval�width�was�1�0,�
into�the�intervals�of�width�2�0,�as�shown�in�Table�2�3��If�we�take,�for�example,�the�interval�
containing�the�scores�of�17�and�18,�then�the�midpoint�of�the�interval�is�17�5,�the�URL�is�18�5,�
the�LRL�is�16�5,�and�thus�w�=�2�0��The�interval�width�could�actually�be�any�value,�including�
�20�or�100,�depending�on�what�best�suits�the�data�
How�does�one�determine�what�the�proper�interval�width�should�be?�If�there�are�many�
frequencies�for�each�score�and�less�than�15�or�20�intervals,�then�an�ungrouped�frequency�
distribution�with�an�interval�width�of�1�is�appropriate�(and�this�is�the�default�in�SPSS�for�
determining� frequency� distributions)�� If� there� are� either� minimal� frequencies� per� score�
(say� 1� or� 2)� or� a� large� number� of� unique� scores� (say� more� than� 20),� then� a� grouped� fre-
quency�distribution�with�some�other�interval�width�is�appropriate��For�a�first�example,�say�
Table 2.3
Grouped�Frequency�Distribution�
of�Statistics�Quiz�Data
X f
9–10 2
11–12 3
13–14 3
15–16 4
17–18 8
19–20 5
n�=�25

22 An Introduction to Statistical Concepts
that�there�are�100�unique�scores�ranging�from�0�to�200��An�ungrouped�frequency�distri-
bution�would�not�really�summarize�the�data�very�well,�as�the�table�would�be�quite�large��
The�reader�would�have�to�eyeball�the�table�and�actually�do�some�quick�grouping�in�his�or�
her�head�so�as�to�gain�any�information�about�the�data��An�interval�width�of�perhaps�10–15�
would�be�more�useful��In�a�second�example,�say�that�there�are�only�20�unique�scores�rang-
ing�from�0�to�30,�but�each�score�occurs�only�once�or�twice��An�ungrouped�frequency�dis-
tribution�would�not�be�very�useful�here�either,�as�the�reader�would�again�have�to�collapse�
intervals�in�his�or�her�head��Here�an�interval�width�of�perhaps�2–5�would�be�appropriate�
Ultimately,�deciding�on�the�interval�width,�and�thus,�the�number�of�intervals,�becomes�a�
trade-off�between�good�communication�of�the�data�and�the�amount�of�information�contained�
in�the�table��As�interval�width�increases,�more�and�more�information�is�lost�from�the�original�
data�� For� the� example� where� scores� range� from� 0� to� 200� and� using� an� interval� width� of� 10,�
some� precision� in� the� 15� scores� contained� in� the� 30–39� interval� is� lost�� In� other� words,� the�
reader�would�not�know�from�the�frequency�distribution�where�in�that�interval�the�15�scores�
actually� fall�� If� you� want� that� information� (you� may� not),� you� would� need� to� return� to� the�
original�data��At�the�same�time,�an�ungrouped�frequency�distribution�for�those�data�would�
not�have�much�of�a�message�for�the�reader��Ultimately,�the�decisive�factor�is�the�adequacy�with�
which�information�is�communicated�to�the�reader��The�nature�of�the�interval�grouping�comes�
down�to�whatever�form�best�represents�the�data��With�today’s�powerful�statistical�computer�
software,�it�is�easy�for�the�researcher�to�try�several�different�interval�widths�before�deciding�
which�one�works�best�for�a�particular�set�of�data��Note�also�that�the�frequency�distribution�can�
be�used�with�variables�of�any�measurement�scale,�from�nominal�(e�g�,�the�frequencies�for�eye�
color�of�a�group�of�children)�to�ratio�(e�g�,�the�frequencies�for�the�height�of�a�group�of�adults)�
2.1.2   Cumulative Frequency distributions
A�second�type�of�frequency�distribution�is�known�as�the�cumulative frequency distribution��
For�the�example�data,�this�is�depicted�in�the�third�column�of�Table�2�2�and�labeled�as�“cf�”�To�
put�it�simply,�the�number�of�cumulative�frequencies�for�a�particular�interval�is�the�number�of�
scores�contained�in�that�interval�and�all�of�the�smaller�intervals��Thus,�the�nine�interval�con-
tains�one�frequency,�and�there�are�no�frequencies�smaller�than�that�interval,�so�the�cumulative�
frequency�is�simply�1��The�10�interval�contains�one�frequency,�and�there�is�one�frequency�in�
a�smaller�interval,�so�the�cumulative�frequency�is�2��The�11�interval�contains�two�frequencies,�
and�there�are�two�frequencies�in�smaller�intervals;�thus,�the�cumulative�frequency�is�4��Then�
four�people�had�scores�in�the�11�interval�and�smaller�intervals��One�way�to�think�about�deter-
mining�the�cumulative�frequency�column�is�to�take�the�frequency�column�and�accumulate�
downward�(i�e�,�from�the�top�down,�yielding�1;�1�+�1�=�2;�1�+�1�+�2�=�4;�etc�)��Just�as�a�check,�the�
cf�in�the�largest�interval�(i�e�,�the�interval�largest�in�value)�should�be�equal�to�n,�the�number�
of�scores�in�the�sample,�25�in�this�case��Note�also�that�the�cumulative�frequency�distribution�
can�be�used�with�variables�of�measurement�scales�from�ordinal�(e�g�,�the�number�of�students�
receiving�a�B�or�less)�to�ratio�(e�g�,�the�number�of�adults�that�are�5′7″�or�less),�but�cannot�be�
used�with�nominal�as�there�is�not�at�least�rank�order�to�nominal�data�(and�thus�accumulating�
information�from�one�nominal�category�to�another�does�not�make�sense)�
2.1.3   Relative Frequency distributions
A�third�type�of�frequency�distribution�is�known�as�the�relative frequency distribution��For�
the�example�data,�this�is�shown�in�the�fourth�column�of�Table�2�2�and�labeled�as�“rf�”�Relative�
frequency�is�simply�the�percentage�of�scores�contained�in�an�interval��Computationally,�

23Data Representation
rf = f/n�� For� example,� the� percentage� of� scores� occurring� in� the� 17� interval� is� computed� as�
rf� =� 5/25� =� �20�� Relative� frequencies� take� sample� size� into� account� allowing� us� to� make�
statements�about�the�number�of�individuals�in�an�interval�relative�to�the�total�sample��Thus,�
rather�than�stating�that�5�individuals�had�scores�in�the�17�interval,�we�could�say�that�20%�of�
the�scores�were�in�that�interval��In�the�popular�press,�relative�frequencies�(which�they�call�
percentages)�are�quite�often�reported�in�tables�without�the�frequencies��Note�that�the�sum�of�
the�relative�frequencies�should�be�1�00�(or�100%)�within�rounding�error��Also�note�that�the�
relative�frequency�distribution�can�be�used�with�variables�of�any�measurement�scale,�from�
nominal�(e�g�,�the�percent�of�children�with�blue�eye�color)�to�ratio�(e�g�,�the�percent�of�adults�
that�are�5′7″)�
2.1.4   Cumulative Relative Frequency distributions
A�fourth�and�final�type�of�frequency�distribution�is�known�as�the�cumulative relative fre-
quency distribution��For�the�example�data,�this�is�depicted�in�the�fifth�column�of�Table�2�2�
and�labeled�as�“crf�”�The�number�of�cumulative�relative�frequencies�for�a�particular�interval�
is�the�percentage�of�scores�in�that�interval�and�smaller��Thus,�the�nine�interval�has�a�rela-
tive�frequency�of��04,�and�there�are�no�relative�frequencies�smaller�than�that�interval,�so�the�
cumulative�relative�frequency�is�simply��04��The�10�interval�has�a�relative�frequency�of��04,�
and� the� relative� frequencies� less� than� that� interval� are� �04,� so� the� cumulative� relative� fre-
quency�is��08��The�11�interval�has�a�relative�frequency�of��08,�and�the�relative�frequencies�less�
than�that�interval�total��08,�so�the�cumulative�relative�frequency�is��16��Thus,�16%�of�the�peo-
ple�had�scores�in�the�11�interval�and�smaller��In�other�words,�16%�of�people�scored�11�or�less��
One�way�to�think�about�determining�the�cumulative�relative�frequency�column�is�to�take�
the�relative�frequency�column�and�accumulate�downward�(i�e�,�from�the�top�down,�yield-
ing��04;��04�+��04�=��08;��04�+��04�+��08�=��16;�etc�)��Just�as�a�check,�the�crf�in�the�largest�interval�
should�be�equal�to�1�0,�within�rounding�error,�just�as�the�sum�of�the�relative�frequencies�is�
equal�to�1�0��Also�note�that�the�cumulative�relative�frequency�distribution�can�be�used�with�
variables�of�measurement�scales�from�ordinal�(e�g�,�the�percent�of�students�receiving�a�B�or�
less)�to�ratio�(e�g�,�the�percent�of�adults�that�are�5′7″�or�less)��As�with�relative�frequency�dis-
tributions,�cumulative�relative�frequency�distributions�cannot�be�used�with�nominal�data�
2.2 Graphical Display of Distributions
In� this� section,� we� consider� several� types� of� graphs�for� viewing� a� distribution� of� scores��
Again,�we�are�still�interested�in�how�the�data�for�a�single�variable�can�be�represented,�but�
now� in� a� graphical� display� rather� than� a� tabular� display�� The� methods� described� here�
include�the�bar�graph,�histogram,�frequency,�relative�frequency,�cumulative�frequency�and�
cumulative� relative� frequency� polygons,� and� stem-and-leaf� display�� Common� shapes� of�
distributions�will�also�be�discussed�
2.2.1   bar Graph
A�popular�method�used�for�displaying�nominal�scale�data�in�graphical�form�is�the�bar
graph�� As� an� example,� say� that� we� have� data� on� the� eye� color� of� a� sample� of� 20� chil-
dren�� Ten� children� are� blue� eyed,� six� are� brown� eyed,� three� are� green� eyed,� and� one�

24 An Introduction to Statistical Concepts
is�black�eyed��A�bar�graph�for�these�data�is�shown�in�Figure�2�1�(SPSS�generated)��The�
horizontal� axis,� going� from� left� to� right� on� the� page,� is� often� referred� to� in� statistics�
as�the�X�axis�(for�variable�X,�in�this�example�our�variable�is�eye color)��On�the�X�axis�of�
Figure�2�1,�we�have�labeled�the�different�eye�colors�that�were�observed�from�individu-
als� in� our� sample�� The� order� of� the� colors� is� not� relevant� (remember,� this� is� nominal�
data,�so�order�or�rank�is�irrelevant)��The�vertical�axis,�going�from�bottom�to�top�on�the�
page,�is�often�referred�to�in�statistics�as�the�Y�axis�(the�Y�label�will�be�more�relevant�in�
later�chapters�when�we�have�a�second�variable�Y)��On�the�Y�axis�of�Figure�2�1,�we�have�
labeled�the�frequencies��Finally,�a�bar�is�drawn�for�each�eye�color�where�the�height�of�
the�bar�denotes�the�number�of�frequencies�for�that�particular�eye�color�(i�e�,�the�number�
of�times�that�particular�eye�color�was�observed�in�our�sample)��For�example,�the�height�
of�the�bar�for�the�blue-eyed�category�is�10�frequencies��Thus,�we�see�in�the�graph�which�
eye� color� is� most� popular� in� this� sample� (i�e�,� blue)� and� which� eye� color� occurs� least�
(i�e�,�black)�
Note�that�the�bars�are�separated�by�some�space�and�do�not�touch�one�another,�reflect-
ing�the�nature�of�nominal�data��As�there�are�no�intervals�or�real�limits�here,�we�do�not�
want� the� bars� to� touch� one� another�� One� could� also� plot� relative� frequencies� on� the� Y�
axis�to�reflect�the�percentage�of�children�in�the�sample�who�belong�to�each�category�of�
eye�color��Here�we�would�see�that�50%�of�the�children�had�blue�eyes,�30%�brown�eyes,�
15%�green�eyes,�and�5%�black�eyes��Another�method�for�displaying�nominal�data�graphi-
cally�is�the�pie�chart,�where�the�pie�is�divided�into�slices�whose�sizes�correspond�to�the�
frequencies� or� relative� frequencies� of� each� category�� However,� for� numerous� reasons�
(e�g�,�contains�little�information�when�there�are�few�categories;�is�unreadable�when�there�
are�many�categories;�visually�assessing�the�sizes�of�each�slice�is�difficult�at�best),�the�pie�
chart� is� statistically� problematic� such� that� Tufte� (1992)� states,� “the� only� worse� design�
than�a�pie�chart�is�several�of�them”�(p��178)��The�bar�graph�is�the�recommended�graphic�
for�nominal�data�
FIGuRe 2.1
Bar�graph�of�eye-color�data�
10
8
6
4
Fr
eq
ue
nc
y
2
Black Blue Brown
Eye color
Green

25Data Representation
2.2.2   histogram
A�method�somewhat�similar�to�the�bar�graph�that�is�appropriate�for�data�that�are�at�least�
ordinal�(i�e�,�ordinal,�interval,�or�ratio)�is�the�histogram��Because�the�data�are�at�least�theo-
retically�continuous�(even�though�they�may�be�measured�in�whole�numbers),�the�main�dif-
ference�in�the�histogram�(as�compared�to�the�bar�graph)�is�that�the�bars�touch�one�another,�
much�like�intervals�touching�one�another�as�real�limits��An�example�of�a�histogram�for�the�
statistics�quiz�data�is�shown�in�Figure�2�2�(SPSS�generated)��As�you�can�see,�along�the�X�axis�
we�plot�the�values�of�the�variable�X�and�along�the�Y�axis�the�frequencies�for�each�interval��
The�height�of�the�bar�again�corresponds�to�the�number�of�frequencies�for�a�particular�value�
of�X��This�figure�represents�an�ungrouped�histogram�as�the�interval�size�is�1��That�is,�along�
the�X�axis�the�midpoint�of�each�bar�is�the�midpoint�of�the�interval,�the�bar�begins�on�the�left�
at�the�lower�real�limit�of�the�interval,�the�bar�ends�on�the�right�at�the�upper�real�limit,�and�
the�bar�is�one�unit�wide��If�we�wanted�to�use�an�interval�size�of�2,�for�example,�using�the�
grouped�frequency�distribution�in�Table�2�3,�then�we�could�construct�a�grouped�histogram�
in�the�same�way;�the�differences�would�be�that�the�bars�would�be�two�units�wide,�and�the�
height�of�the�bars�would�obviously�change��Try�this�one�on�your�own�for�practice�
One� could� also� plot� relative� frequencies� on� the� Y� axis� to� reflect� the� percentage� of� stu-
dents�in�the�sample�whose�scores�fell�into�a�particular�interval��In�reality,�all�that�we�have�
to�change�is�the�scale�of�the�Y�axis��The�height�of�the�bars�would�remain�the�same��For�this�
particular�dataset,�each�frequency�corresponds�to�a�relative�frequency�of��04�
2.2.3   Frequency polygon
Another� graphical� method� appropriate� for� data� that� have� at� least� some� rank� order� (i�e�,�
ordinal,�interval,�or�ratio)�is�the�frequency polygon�(line�graph�in�SPSS�terminology)��A�poly-
gon�is�defined�simply�as�a�many-sided�figure��The�frequency�polygon�is�set�up�in�a�fashion�
5
4
3
2
Fr
eq
ue
nc
y
1
9 10 11 12 13 14 15
Quiz
16 17 18 19 20 FIGuRe 2.2
Histogram�of�statistics�quiz�data�

26 An Introduction to Statistical Concepts
similar�to�the�histogram��However,�rather�than�plotting�a�bar�for�each�interval,�points�are�
plotted�for�each�interval�and�then�connected�together�as�shown�in�Figure�2�3�(SPSS�gener-
ated)��The�axes�are�the�same�as�with�the�histogram��A�point�is�plotted�at�the�intersection�(or�
coordinates)�of�the�midpoint�of�each�interval�along�the�X�axis�and�the�frequency�for�that�
interval�along�the�Y�axis��Thus,�for�the�15�interval,�a�point�is�plotted�at�the�midpoint�of�the�
interval� 15�0�and�for�three� frequencies��Once� the�points� are� plotted� for� each� interval,� we�
“connect�the�dots�”
One�could�also�plot�relative�frequencies�on�the�Y�axis�to�reflect�the�percentage�of�students�
in�the�sample�whose�scores�fell�into�a�particular�interval��This�is�known�as�the�relative fre-
quency polygon��As�with�the�histogram,�all�we�have�to�change�is�the�scale�of�the�Y�axis��The�
position�of�the�polygon�would�remain�the�same��For�this�particular�dataset,�each�frequency�
corresponds�to�a�relative�frequency�of��04�
Note�also�that�because�the�histogram�and�frequency�polygon�each�contain�the�exact�same�
information,�Figures�2�2�and�2�3�can�be�superimposed�on�one�another��If�you�did�this,�you�
would�see�that�the�points�of�the�frequency�polygon�are�plotted�at�the�top�of�each�bar�of�the�
histogram��There�is�no�advantage�of�the�histogram�or�frequency�polygon�over�the�other;�how-
ever,�the�histogram�is�more�frequently�used�due�to�its�availability�in�all�statistical�software�
2.2.4   Cumulative Frequency polygon
Cumulative�frequencies�of�data�that�have�at�least�some�rank�order�(i�e�,�ordinal,�interval,�
or�ratio)�can�be�displayed�as�a�cumulative frequency polygon�(sometimes�referred�to�as�
the�ogive curve)��As�shown�in�Figure�2�4�(SPSS�generated),�the�differences�between�the�
frequency� polygon� and� the� cumulative� frequency� polygon� are� that� (a)� the� cumulative�
frequency� polygon� involves� plotting� cumulative� frequencies� along� the� Y� axis,� (b)� the�
points� should� be� plotted� at the upper real limit� of� each� interval� (although� SPSS� plots�
the points�at�the�interval�midpoints�by�default),�and�(c)�the�polygon�cannot�be�closed�on�
the�right-hand�side�
FIGuRe 2.3
Frequency�polygon�of�statistics�quiz�data�
5
Markers/lines show count
4
3
Fr
eq
ue
nc
y
2
1
0
9 10 11 12 13 14 15
Quiz
16 17 18 19 20

27Data Representation
Let�us�discuss�each�of�these�differences��First,�the�Y�axis�represents�the�cumulative�frequen-
cies�from�the�cumulative�frequency�distribution��The�X�axis�is�the�usual�set�of�raw�scores��
Second,�to�reflect�the�cumulative�nature�of�this�type�frequency,�the�points�must�be�plotted�at�
the�upper�real�limit�of�each�interval��For�example,�the�cumulative�frequency�for�the�16�inter-
val�is�12,�indicating�that�there�are�12�scores�in�that�interval�and�smaller��Finally,�the�polygon�
cannot� be� closed� on� the� right-hand� side�� Notice� that� as� you� move� from� left� to� right� in� the�
cumulative�frequency�polygon,�the�height�of�the�points�always�increases�or�stays�the�same��
Because� of� the� nature� of� accumulating� information,� there� will� never� be� a� decrease� in� the�
accumulation�of�the�frequencies��For�example,�there�is�an�increase�in�cumulative�frequency�
from�the�16�to�the�17�interval�as�five�new�frequencies�are�included��Beyond�the�20�interval,�the�
number�of�cumulative�frequencies�remains�at�25�as�no�new�frequencies�are�included�
One�could�also�plot�cumulative�relative�frequencies�on�the�Y�axis�to�reflect�the�percent-
age�of�students�in�the�sample�whose�scores�fell�into�a�particular�interval�and�smaller��This�
is� known� as� the� cumulative relative frequency polygon�� All� we� have� to� change� is� the�
scale� of� the� Y� axis� to� cumulative� relative� frequency�� The� position� of� the� polygon� would�
remain�the�same��For�this�particular�dataset,�each�cumulative�frequency�corresponds�to�a�
cumulative�relative�frequency�of��04��Thus,�a�cumulative�relative�frequency�polygon�of�the�
example�data�would�look�exactly�like�Figure�2�4;�except�on�the�Y�axis�we�plot�cumulative�
relative�frequencies�ranging�from�0�to�1�
2.2.5   Shapes of Frequency distributions
There�are�several�common�shapes�of�frequency�distributions�that�you�are�likely�to�encoun-
ter,� as� shown� in� Figure� 2�5�� These� are� briefly� described� here� and� more� fully� in� later�
chapters�� Figure� 2�5a� is� a� normal distribution� (or� bell-shaped� curve)� where� most� of� the�
scores�are�in�the�center�of�the�distribution�with�fewer�higher�and�lower�scores��The�normal�
distribution�plays�a�large�role�in�statistics,�both�for�descriptive�statistics�(as�we�show�begin-
ning�in�Chapter�4)�and�particularly�as�an�assumption�for�many�inferential�statistics�(as�we�
show�beginning�in�Chapter�6)��This�distribution�is�also�known�as�symmetric�because�if�we�
divide�the�distribution�into�two�equal�halves�vertically,�the�left�half�is�a�mirror�image�of�
the�right�half�(see�Chapter�4)��Figure�2�5b�is�a�positively skewed�distribution�where�most�
of�the�scores�are�fairly�low�and�there�are�a�few�higher�scores�(see�Chapter�4)��Figure�2�5c�is�
25
20
15
10
C
um
ul
at
iv
e
fr
eq
ue
nc
y
5
0
9 10 11 12 13 14 15
Quiz
16 17 18 19 20
FIGuRe 2.4
Cumulative� frequency� polygon� of�
statistics�quiz�data�

28 An Introduction to Statistical Concepts
a�negatively skewed�distribution�where�most�of�the�scores�are�fairly�high�and�there�are�a�
few�lower�scores�(see�Chapter�4)��Skewed�distributions�are�not�symmetric�as�the�left�half�is�
not�a�mirror�image�of�the�right�half�
2.2.6   Stem-and-leaf display
A�refined�form�of�the�grouped�frequency�distribution�is�the�stem-and-leaf display,�devel-
oped�by�John�Tukey�(1977)��This�is�shown�in�Figure�2�6�(SPSS�generated)�for�the�example�
statistics�quiz�data��The�stem-and-leaf�display�was�originally�developed�to�be�constructed�
on� a� typewriter� using� lines� and� numbers� in� a� minimal� amount� of� space�� In� a� way,� the�
f
x(a)
f
x(b)
f
x(c)
FIGuRe 2.5
Common�shapes�of�frequency�distributions:�(a)�normal,�(b)�positively�skewed,�and�(c)�negatively�skewed�
FIGuRe 2.6
Stem-and-leaf�display�of�statistics�quiz�data�
Quiz Stem-and-Leaf Plot
Frequency Stem and Leaf
1.00 0 . 9
7.00 1 . 0112334
16.00 1 . 5556777778889999
1.00 2 . 0
Stem width: 10.0
Each leaf: 1 case(s)

29Data Representation
stem-and-leaf�display�looks�like�a�grouped�type�of�histogram�on�its�side��The�vertical�value�
on�the�left�is�the�stem�and,�in�this�example,�represents�all�but�the�last�digit�(i�e�,�the�tens�digit)��
The� leaf� represents,� in� this� example,� the� remaining� digit� of� each� score� (i�e�,� the� unit’s�
digit)��Note�that�SPSS�has�grouped�values�in�increments�of�five��For�example,�the�second�
line�(“1�0112334”)�indicates�that�there�are�7�scores�from�10�to�14;�thus,�“1�0”�means�that�there�
is�one�frequency�for�the�score�of�10��The�fact�that�there�are�two�values�of�“1”�that�occur�in�
that�stem�indicates�that�the�score�of�11�occurred�twice��Interpreting�the�rest�of�this�stem,�we�
see�that�12�occurred�once�(i�e�,�there�is�only�one�2�in�the�stem),�13�occurred�twice�(i�e�,�there�
are�two�3s�in�the�stem),�and�14�occurred�once�(i�e�,�only�one�4�in�the�stem)��From�the�stem-
and-leaf�display,�one�can�determine�every�one�of�the�raw�scores;�this�is�not�possible�with�
a� typical� grouped� frequency� distribution� (i�e�,� no� information� is� lost� in� a� stem-and-leaf�
display)��However,�with�a�large�sample�the�display�can�become�rather�unwieldy��Consider�
what�a�stem-and-leaf�display�would�look�like�for�100,000�GRE�scores!
In�summary,�this�section�included�the�most�basic�types�of�statistical�graphics,�although�
more� advanced� graphics� are� described� in� later� chapters�� Note,� however,� that� there� are� a�
number�of�publications�on�how�to�properly�display�graphics,�that�is,�“how�to�do�graphics�
right�”�While�a�detailed�discussion�of�statistical�graphics�is�beyond�the�scope�of�this�text,�
the� following� publications� are� recommended:� Chambers,� Cleveland,� Kleiner,� and� Tukey�
(1983),�Schmid�(1983),�Wainer�(e�g�,�1984,�1992,�2000),�Tufte�(1992),�Cleveland�(1993),�Wallgren,�
Wallgren,�Persson,�Jorner,�and�Haaland�(1996),�Robbins�(2004),�and�Wilkinson�(2005)�
2.3 Percentiles
In�this�section,�we�consider�several�concepts�and�the�necessary�computations�for�the�area�
of�percentiles,�including�percentiles,�quartiles,�percentile�ranks,�and�the�box-and-whisker�
plot��For�instance,�you�might�be�interested�in�determining�what�percentage�of�the�distribu-
tion�of�the�GRE-Quantitative�subtest�fell�below�a�score�of�600�or�in�what�score�divides�the�
distribution�of�the�GRE-Quantitative�subtest�into�two�equal�halves�
2.3.1   percentiles
Let�us�define�a�percentile�as�that�score�below�which�a�certain�percentage�of�the�distribu-
tion�lies��For�instance,�you�may�be�interested�in�that�score�below�which�50%�of�the�distri-
bution�of�the�GRE-Quantitative�subscale�lies��Say�that�this�score�is�computed�as�480;�then�
this�would�mean�that�50%�of�the�scores�fell�below�a�score�of�480��Because�percentiles�are�
scores,�they�are�continuous�values�and�can�take�on�any�value�of�those�possible��The�30th�
percentile�could�be,�for�example,�the�score�of�387�6750��For�notational�purposes,�a�percen-
tile�will�be�known�as�Pi,�where�the�i�subscript�denotes�the�particular�percentile�of�interest,�
between�0�and�100��Thus,�the�30th�percentile�for�the�previous�example�would�be�denoted�
as�P30�=�387�6750�
Let�us�now�consider�how�percentiles�are�computed��The�formula�for�computing�the�Pi�
percentile�is
� P LRL
i n cf
f
wi = +
−



%( )
� (2�1)

30 An Introduction to Statistical Concepts
where
LRL�is�the�lower�real�limit�of�the�interval�containing�Pi
i%�is�the�percentile�desired�(expressed�as�a�proportion�from�0�to�1)
n�is�the�sample�size
cf� is� the� cumulative� frequency� less� than� but� not� including� the� interval� containing� Pi�
(known�as�cf�below)
f�is�the�frequency�of�the�interval�containing�Pi
w�is�the�interval�width
As� an� example,� consider� computing� the� 25th� percentile� of� our� statistics� quiz� data�� This�
would�correspond�to�that�score�below�which�25%�of�the�distribution�falls��For�the�example�
data�in�the�form�presented�in�Table�2�2,�using�Equation�2�1,�we�compute�P25�as�follows:

P LRL
i n cf
f
w25 12 5
25 25 5
2
1 12 5 0 625= +
−



= +
−



= +
%( )
.
%( )
. . == 13 125.
Conceptually,� let� us� discuss� how� the� equation� works�� First,� we� have� to� determine� what�
interval�contains�the�percentile�of�interest��This�is�easily�done�by�looking�in�the�crf�column�
of�the�frequency�distribution�for�the�interval�that�contains�a�crf�of��25�somewhere�within�
the� interval�� We� see� that� for� the� 13� interval� the� crf� =� �28,� which� means� that� the� interval�
spans�a�crf�of��20�(the�URL�of�the�12�interval)�up�to��28�(the�URL�of�the�13�interval)�and�thus�
contains��25��The�next�largest�interval�of�14�takes�us�from�a�crf�of��28�up�to�a�crf�of��32�and�
thus�is�too�large�for�this�particular�percentile��The�next�smallest�interval�of�12�takes�us�from�
a�crf�of��16�up�to�a�crf�of��20�and�thus�is�too�small��The�LRL�of�12�5�indicates�that�P25�is�at�least�
12�5��The�rest�of�the�equation�adds�some�positive�amount�to�the�LRL�
Next�we�have�to�determine�how�far�into�that�interval�we�need�to�go�in�order�to�reach�the�
desired�percentile��We�take�i�percent�of�n,�or�in�this�case�25%�of�the�sample�size�of�25,�which�is�
6�25��So�we�need�to�go�one-fourth�of�the�way�into�the�distribution,�or�6�25�scores,�to�reach�the�
25th�percentile��Another�way�to�think�about�this�is,�because�the�scores�have�been�rank-ordered�
from�lowest�or�smallest�(top�of�the�frequency�distribution)�to�highest�or�largest�(bottom�of�the�
frequency�distribution),�we�need�to�go�25%,�or�6�25�scores,�into�the�distribution�from�the�top�
(or�smallest�value)�to�reach�the�25th�percentile��We�then�subtract�out�all�cumulative�frequen-
cies�smaller�than�(or�below)�the�interval�we�are�looking�in,�where�cf�below�=�5��Again�we�just�
want�to�determine�how�far�into�this�interval�we�need�to�go,�and�thus,�we�subtract�out�all�of�
the�frequencies�smaller�than�this�interval,�or�cf�below��The�numerator�then�becomes�6�25�−�5�=�
1�25��Then�we�divide�by�the�number�of�frequencies�in�the�interval�containing�the�percentile�
we�are�looking�for��This�forms�the�ratio�of�how�far�into�the�interval�we�go��In�this�case,�we�
needed�to�go�1�25�scores�into�the�interval�and�the�interval�contains�2�scores;�thus,�the�ratio�is�
1�25/2�=��625��In�other�words,�we�need�to�go��625�unit�into�the�interval�to�reach�the�desired�
percentile��Now�that�we�know�how�far�into�the�interval�to�go,�we�need�to�weigh�this�by�the�
width�of�the�interval��Here�we�need�to�go�1�25�scores�into�an�interval�containing�2�scores�that�
is�1�unit�wide,�and�thus,�we�go��625�unit�into�the�interval�[(1�25/2)�1�=��625]��If�the�interval�width�
was�instead�10,�then�1�25�scores�into�the�interval�would�be�equal�to�6�25�units�
Consider�two�more�worked�examples�to�try�on�your�own,�either�through�statistical�software�
or�by�hand��The�50th�percentile,�P50,�is

P50 16 5
50 25 12
5
1 16 5 0 100 16 600= +
−



= + =.
%( )
. . .

31Data Representation
while�the�75th�percentile,�P75,�is

P75 17 5
75 25 17
3
1 17 5 0 583 18 083= +
−



= + =.
%( )
. . .
We�have�only�examined�a�few�example�percentiles�of�the�many�possibilities�that�exist��For�
example,�we�could�also�have�determined�P55�5�or�even�P99�5��Thus,�we�could�determine�any�
percentile,�in�whole�numbers�or�decimals,�between�0�and�100��Next�we�examine�three�par-
ticular�percentiles�that�are�often�of�interest,�the�quartiles�
2.3.2   Quartiles
One�common�way�of�dividing�a�distribution�of�scores�into�equal�groups�of�scores�is�known�
as�quartiles��This�is�done�by�dividing�a�distribution�into�fourths�or�quartiles�where�there�are�
four�equal�groups,�each�containing�25%�of�the�scores��In�the�previous�examples,�we�deter-
mined�P25,�P50,�and�P75,�which�divided�the�distribution�into�four�equal�groups,�from�0�to�25,�
from�25�to�50,�from�50�to�75,�and�from�75�to�100��Thus,�the�quartiles�are�special�cases�of�per-
centiles��A�different�notation,�however,�is�often�used�for�these�particular�percentiles�where�
we�denote�P25�as�Q1,�P50�as�Q2,�and�P75�as�Q3��Thus,�the�Qs�represent�the�quartiles�
An�interesting�aspect�of�quartiles�is�that�they�can�be�used�to�determine�whether�a�distri-
bution�of�scores�is�positively�or�negatively�skewed��This�is�done�by�comparing�the�values�of�
the�quartiles�as�follows��If�(Q3�−�Q2)�>�(Q2�−�Q1),�then�the�distribution�of�scores�is�positively�
skewed� as� the� scores� are� more� spread� out� at� the� high� end� of� the� distribution� and� more�
bunched�up�at�the�low�end�of�the�distribution�(remember�the�shapes�of�the�distributions�
from�Figure�2�5)��If�(Q3�−�Q2)�<�(Q2�−�Q1),�then�the�distribution�of�scores�is�negatively�skewed� as�the�scores�are�more�spread�out�at�the�low�end�of�the�distribution�and�more�bunched�up� at�the�high�end�of�the�distribution��If�(Q3�−�Q2)�=�(Q2�−�Q1),�then�the�distribution�of�scores� is�obviously�not�skewed,�but�is�symmetric�(see�Chapter�4)��For�the�example�statistics�quiz� data,�(Q3�−�Q2)�=�1�4833�and�(Q2�−�Q1)�=�3�4750;�thus,�(Q3�−�Q2)�<�(Q2�−�Q1)�and�we�know� that� the� distribution� is� negatively� skewed�� This� should� already� have� been� evident� from� examining� the� frequency� distribution� in� Figure� 2�3� as� scores� are� more� spread� out� at� the� low�end�of�the�distribution�and�more�bunched�up�at�the�high�end��Examining�the�quartiles� is�a�simple�method�for�getting�a�general�sense�of�the�skewness�of�a�distribution�of�scores� 2.3.3   percentile Ranks Let�us�define�a�percentile rank�as�the�percentage�of�a�distribution�of�scores�that�falls�below� (or�is�less�than)�a�certain� score��For�instance,�you�may�be�interested� in�the�percentage� of� scores�of�the�GRE-Quantitative�subscale�that�falls�below�the�score�of�480��Say�that�the�per- centile�rank�for�the�score�of�480�is�computed�to�be�50;�then�this�would�mean�that�50%�of� the�scores�fell�below�a�score�of�480��If�this�sounds�familiar,�it�should��The�50th�percentile� was� previously� stated� to� be� 480�� Thus,� we� have� logically� determined� that� the� percentile� rank�of�480�is�50��This�is�because�percentile�and�percentile�rank�are�actually�opposite�sides� of�the�same�coin��Many�are�confused�by�this�and�equate�percentiles�and�percentile�ranks;� however,� they� are� related� but� different� concepts�� Recall� earlier� we� said� that� percentiles� were�scores��Percentile�ranks�are�percentages,�as�they�are�continuous�values�and�can�take� on�any�value�from�0�to�100��The�score�of�400�can�have�a�percentile�rank�of�42�6750��For�nota- tional�purposes,�a�percentile�rank�will�be�known�as�PR(Pi),�where�Pi�is�the�particular�score� 32 An Introduction to Statistical Concepts whose�percentile�rank,�PR,�you�wish�to�determine��Thus,�the�percentile�rank�of�the�score� 400�would�be�denoted�as�PR(400)�=�42�6750��In�other�words,�about�43%�of�the�distribution� falls�below�the�score�of�400� Let�us�now�consider�how�percentile�ranks�are�computed��The�formula�for�computing�the� PR(Pi)�percentile�rank�is � PR P cf f P LRL w n i i ( ) ( ) %= + −          100 � (2�2) where PR(Pi)�indicates�that�we�are�looking�for�the�percentile�rank�PR�of�the�score�Pi cf� is� the� cumulative� frequency� up� to� but� not� including� the� interval� containing� PR(Pi)� (again�known�as�cf�below) f�is�the�frequency�of�the�interval�containing�PR(Pi) LRL�is�the�lower�real�limit�of�the�interval�containing�PR(Pi) w�is�the�interval�width n�is�the�sample�size,�and�finally�we�multiply�by�100%�to�place�the�percentile�rank�on�a� scale�from�0�to�100�(and�also�to�remind�us�that�the�percentile�rank�is�a�percentage) As�an�example,�consider�computing�the�percentile�rank�for�the�score�of�17��This�would�cor- respond�to�the�percentage�of�the�distribution�that�falls�below�a�score�of�17��For�the�example� data�again,�using�Equation�2�2,�we�compute�PR(17)�as�follows: � PR( ) ( . ) % . %17 12 5 17 16 5 1 25 100 12 2 5 25 100 5= + −          = +    = 88 00. % Conceptually,�let�us�discuss�how�the�equation�works��First,�we�have�to�determine�what�inter- val�contains�the�percentile�rank�of�interest��This�is�easily�done�because�we�already�know�the� score�is�17�and�we�simply�look�in�the�interval�containing�17��The�cf�below�the�17�interval�is� 12�and�n�is�25��Thus,�we�know�that�we�need�to�go�at�least�12/25,�or�48%,�of�the�way�into�the� distribution�to�obtain�the�desired�percentile�rank��We�know�that�Pi�=�17�and�the�LRL�of�that� interval�is�16�5��There�are�5�frequencies�in�that�interval,�so�we�need�to�go�2�5�scores�into� the�interval�to�obtain�the�proper�percentile�rank��In�other�words,�because�17�is�the�midpoint� of�an�interval�with�width�of�1,�we�need�to�go�halfway�or�2�5/5�of�the�way�into�the�interval� to�obtain�the�percentile�rank��In�the�end,�we�need�to�go�14�5/25�(or��58)�of�the�way�into�the� distribution�to�obtain�our�percentile�rank,�which�translates�to�58%� As� another� example,� we� have� already� determined� that� P50� =� 16�6000�� Therefore,� you� should�be�able�to�determine�on�your�own�that�PR(16�6000)�=�50%��This�verifies�that�percen- tiles�and�percentile�ranks�are�two�sides�of�the�same�coin��The�computation�of�percentiles� identifies� a� specific� score,� and� you� start� with� the� score� to� determine� the� score’s� percen- tile� rank�� You� can� further� verify� this� by� determining� that� PR(13�1250)� =� 25�00%� and� PR(18�0833)� =� 75�00%�� Next� we� consider� the� box-and-whisker� plot,� where� quartiles� and� percentiles�are�used�graphically�to�depict�a�distribution�of�scores� 33Data Representation 2.3.4   box-and-Whisker plot A�simplified�form�of�the�frequency�distribution�is�the�box-and-whisker plot�(often�referred� to� simply� as� a� “box� plot”),� developed� by� John� Tukey� (1977)�� This� is� shown� in� Figure� 2�7� (SPSS�generated)�for�the�example�data��The�box-and-whisker�plot�was�originally�developed� to�be�constructed�on�a�typewriter�using�lines�in�a�minimal�amount�of�space��The�box�in� the� center� of� the� figure� displays� the� middle� 50%� of� the� distribution� of� scores�� The� left- hand�edge�or�hinge�of�the�box�represents�the�25th�percentile�(or�Q1)��The�right-hand�edge� or�hinge�of�the�box�represents�the�75th�percentile�(or�Q3)��The�middle�vertical�line�in�the� box�represents�the�50th�percentile�(or�Q2)��The�lines�extending�from�the�box�are�known�as� the�whiskers��The�purpose�of�the�whiskers�is�to�display�data�outside�of�the�middle�50%�� The�left-hand�whisker�can�extend�down�to�the�lowest�score�(as�is�the�case�with�SPSS),�or� to�the�5th�or�the�10th�percentile�(by�other�means),�to�display�more�extreme�low�scores,�and� the�right-hand� whisker� correspondingly� can� extend� up� to�the�highest� score�(SPSS),� or�to� the� 95th� or� 90th� percentile� (elsewhere),� to� display� more� extreme� high� scores�� The� choice� of�where�to�extend�the�whiskers�is�the�preference�of�the�researcher�and/or�the�software�� Scores�that�fall�beyond�the�end�of�the�whiskers,�known�as�outliers�due�to�their�extreme- ness�relative�to�the�bulk�of�the�distribution,�are�often�displayed�by�dots�and/or�asterisks�� Box-and-whisker�plots�can�be�used�to�examine�such�things�as�skewness�(through�the�quar- tiles),�outliers,�and�where�most�of�the�scores�tend�to�fall� 2.4 SPSS The�purpose�of�this�section�is�to�briefly�consider�applications�of�SPSS�for�the�topics�covered� in�this�chapter�(including�important�screenshots)��The�following�SPSS�procedures�will�be� illustrated:�“Frequencies”�and�“Graphs.” 8 10 12 14 16 18 20 Q ui z FIGuRe 2.7 Box-and-whisker�plot�of�statistics�quiz�data� 34 An Introduction to Statistical Concepts Frequencies Frequencies: Step 1.�For�the�types�of�tables�discussed�in�this�chapter,�in�SPSS�go�to� “Analyze”� in� the� top� pulldown� menu,� then�“Descriptive Statistics,”� and� then� select� “Frequencies.”� Following� the� screenshot� for� “Frequencies: Step 1”� will� produce�the�“Frequencies”�dialog�box� � A B C Frequencies: Step 1 Stem and leaf plots (and many other statistics) can be generated using the “Explore” program. Frequencies: Step 2.�The�“Frequencies”�dialog�box�will�open�(see�screenshot�for� “Frequencies: Step 2”)��From�this�main�“Frequencies”�dialog�box,�click�the�vari- able�of�interest�from�the�list�on�the�left�(e�g�,�quiz)�and�move�it�into�the�“Variables”�box� by�clicking�on�the�arrow�button��By�default,�there�is�a�checkmark�in�the�box�for�“Display frequency tables,”�and�we�will�keep�this�checked��This�(i�e�,�selecting�“Display fre- quency tables”)�will�generate�a�table�of�frequencies,�relative�frequencies,�and�cumula- tive�relative�frequencies��There�are�three�buttons�on�the�right�side�of�the�“Frequencies”� dialog�box�(“Statistics,” “Charts,” and “Format”)��Let�us�first�cover�the�options� available�through�“Statistics.” Select the variable of interest from the list on the left and use the arrow to move to the “Variable” box on the right. �is is checked by default and will produce a frequency distribution table in the output. Clicking on these options will allow you to select various statistics and graphs. Frequencies: Step 2 35Data Representation Frequencies: Step 3a. If� you� click� on� the� “Statistics”� button� from� the� main� “Frequencies”�dialog�box�(see�“Frequencies: Step 2”),�a�new�box�labeled�“Frequencies: Statistics”�will�appear�(see�screenshot�for�“Frequencies: Step 3a”)��From�here,�you�can� obtain�quartiles�and�selected�percentiles�as�well�as�numerous�other�descriptive�statistics�simply� by�placing�a�checkmark�in�the�boxes�for�the�statistics�that�you�want�to�generate��For�better�accu- racy�when�generating�the�median,�quartiles,�and�percentiles,�check�the�box�for�“Values are group midpoints.”�However,�it�should�be�noted�that�these�values�are�not�always�as�precise� as�those�from�the�formula�given�earlier�in�this�chapter� Check this for better accuracy with the median, quartiles and percentiles. Options available when clicking on “Statistics” from the main dialog box for Frequencies. Placing a checkmark will generate the respective statistic in the output. Frequencies: Step 3a Frequencies: Step 3b.�If�you�click�on�the�“Charts”�button�from�the�main�“Frequencies”� dialog�box�(see�screenshot�for�“Frequencies: Step 2”),�a�new�box�labeled�“Frequencies: Charts”�will�appear�(see�screenshot�for�“Frequencies: Step 3b”)��From�here,�you�can� select� options� to� generate� bar� graphs,� pie� charts,� or� histograms�� If� you� select� bar� graphs� or� pie� charts,� you� can� plot� either� frequencies� or� percentages� (relative� frequencies)�� Thus,� the� “Frequencies”�program�enables�you�to�do�much�of�what�this�chapter�has�covered��In�addi- tion,� stem-and-leaf� plots� are� available� in� the�“Explore”� program� (see�“Frequencies: Step 1”�for�a�screenshot�on�where�the�“Explore”�program�can�be�accessed)� Options available when clicking on “Charts” from the main dialog box for frequencies. Frequencies: Step 3b 36 An Introduction to Statistical Concepts Graphs There�are�multiple�graphs�that�can�be�generated�in�SPSS��We�will�examine�how�to�generate� histograms,�boxplots,�bar�graphs,�and�more�using�the�“Graphs”�procedure�in�SPSS� Histograms Histograms: Step 1.�For�other�ways�to�generate�the�types�of�graphical�displays�covered� in�this�chapter,�go�to�“Graphs”�in�the�top�pulldown� menu��From� there,�select�“Legacy Dialogs,”�then�“Histogram”�(see�screenshot�for�“Graphs: Step 1”)��Another�option� for�creating�a�histogram,�although�not�shown�here,�starts�again�from�the�“Graphs”�option� in� the� top� pulldown� menu,� where� you� select�“Legacy Dialogs,”� then�“Graphboard Template Chooser,”�and�finally�“Histogram.” Options available when clicking on “Legacy Dialogs” from the main pulldown menu for graphs. Graphs: Step 1 A B C Histograms: Step 2.�This�will�bring�up�the�“Histogram”�dialog�box�(see�screenshot� for�“Histograms: Step 2”)��Click�the�variable�of�interest�(e�g�,�quiz)�and�move�it�into�the� “Variable(s)”�box�by�clicking�on�the�arrow��Place�a�checkmark�in�“Display normal curve,”� and� then� click�“OK.”� This� will� generate� the� same� histogram� as� was� produced� through�the�“Frequencies”�program�already�mentioned� Histograms: Step 2 37Data Representation Boxplots Boxplots: Step 1.�To�produce�a�boxplot�for�individual�variables,�click�on�“Graphs”� in�the�top�pulldown�menu��From�there,�select�“Legacy Dialogs,”�then�“Boxplot”� (see� “GRAPHS: Step 1”� for� screenshot� of� this� step)�� Another� option� for� creating� a� boxplot� (although� not� shown� here)� starts� again� from� the� “Graphs”� option� in� the� top� pulldown� menu,� where� you� select� “Graphboard Template chooser,”� then� “Boxplots.” Boxplots: Step 2.� This� will� bring� up� the� “Boxplot”� dialog� box� (see� screenshot� for�“Boxplots: Step 2”)��Select�the�“Simple”�option�(by�default,�this�will�already�be� selected)��To�generate�a�separate�boxplot�for�individual�variables,�click�on�the�“Summaries of separate variables”�radio�button��Then�click�“Define.” Boxplots: Step 2 Boxplots: Step 3.�This�will�bring�up�the�“Define Simple Boxplot: Summaries of Separate Variables”�dialog�box�(see�screenshot�for�“Boxplots: Step 3”)��Click� the�variable�of�interest�(e�g�,�quiz)�into�the�“Variable(s)”�box��Then�click�“OK.”�This�will� generate�a�boxplot� Boxplots: Step 3 38 An Introduction to Statistical Concepts Bar Graphs Bar Graphs: Step 1.� To� produce� a� bar� graph� for� individual� variables,� click� on� “Graphs”�in�the�top�pulldown�menu��From�there,�select�“Legacy Dialogs,”�then�“Bar”� (see�“Graphs: Step 1”�for�screenshot�of�this�step)� Bar Graphs: Step 2.�From�the�main�“Bar Chart”�dialog�box,�select�“Simple”�(which� will�be�selected�by�default)�and�click�on�the�“Summaries for groups of cases”�radio� button�(see�screenshot�for�“Bar Graphs: Step 2”)� Bar graphs: Step 2 Bar Graphs: Step 3.�A�new�box�labeled�“Define Simple Bar: Summaries for Groups of Cases”�will�appear��Click�the�variable�of�interest�(e�g�,�eye�color)�and�move� it�into�the�“Variable”�box�by�clicking�the�arrow�button��Then�a�decision�must�be�made� for�how�the�bars�will�be�displayed��Several�types�of�displays�for�bar�graph�data�are�avail- able,�including�“N of cases”�for�frequencies,�“cum. N”�for�cumulative�frequencies,� “% of cases”�for�relative�frequencies,�and�“cum. %”�for�cumulative�relative�frequen- cies�(see�screenshot�for�“Bar Graphs: Step 3”)��Additionally,�other�statistics�can�be� selected�through�the�“Other statistic (e.g., mean)”�option��The�most�common� bar�graph�is�one�which�simply�displays�the�frequencies�(i�e�,�selecting�the�radio�button� for�“N of cases”)��Once�your�selections�are�made,�click�“OK.”�This�will�generate�a� bar�graph� 39Data Representation When “Other statistic (e.g., mean)” is selected, a dialog box (shown here as “Statistic”) will appear. All other statistics that can be represented by the bars in the graph are listed. Clicking on the radio button will select the statistic. Once the selection is made, click on “Continue” to return to the “Define Simple:Summaries for Groups of Cases” dialog box. Bar graphs: Step 3 Frequency Polygons Frequency Polygons: Step 1.� Frequency� polygons� can� be� generated� by� clicking� on�“Graphs”� in� the� top� pulldown� menu�� From� there,� select�“Legacy Dialogs,”� then� “Line”�(see�“Graphs: Step 1”�for�a�screenshot�of�this�step)� Frequency Polygons: Step 2.�From�the�main�“Line Charts”�dialog�box,�select� “Simple”�(which�will�be�selected�by�default)�and�click�on�the�“Summaries for groups of cases”�(which�will�be�selected�by�default)�radio�button�(see�screenshot�for�“Frequency Polygons: Step 2”)� 40 An Introduction to Statistical Concepts Frequency polygons: Step 2 Frequency Polygons: Step 3.�A�new�box�labeled�“Define Simple Line: Summaries for Groups of Cases”� will� appear�� Click� the� variable� of� interest� (e�g�,� quiz)�and�move�it�into�the�“Variable”�box�by�clicking�the�arrow�button��Then�a�decision� must�be�made�for�how�the�lines�will�be�displayed��Several�types�of�displays�for�line�graph� (i�e�,� frequency� polygon)� data� are� available,� including� “N of cases”� for� frequencies,� “cum. N”�for�cumulative�frequencies,�“% of cases”�for�relative�frequencies,�and�“cum. %”�for�cumulative�relative�frequencies�(see�screenshot�for�“Frequency Polygons: Step 3”)��Additionally,�other�statistics�can�be�selected�through�the�“Other statistic (e.g., mean)”� option�� The� most� common� frequency� polygon� is� one� which� simply� displays� the� frequencies�(i�e�,�selecting�the�radio�button�for�“N of cases”)��Once�your�selections�are� made,�click�“OK.”�This�will�generate�a�frequency�polygon� When “Other statistic (e.g., mean)” is selected, a dialog box (shown here as “Statistic”) will appear. All other statistics that can be represented by the bars in the graph are listed. Clicking on the radio button will select the statistic. Once the selection is made, click on “Continue” to return to the “Define Simple: Summaries for Groups of Cases” dialog box. Frequency polygons: Step 3 41Data Representation Editing Graphs Once�a�graph�or�table�is�created,�double�clicking�on�the�table�or�graph�produced�in�the�out- put�will�allow�the�user�to�make�changes�such�as�changing�the�X�and/or�Y�axis,�colors,�and� more��An�illustration�of�the�options�available�in�chart�editor�is�presented�here� 5 4 3 Fr eq ue nc y 2 1 0 9.0 12.0 15.0 Quiz 18.0 21.0 Mean = 15.56 Std. Dev. = 3.163 N = 25 Chart editor 2.5 Templates for Research Questions and APA-Style Paragraph Depending�on�the�purpose�of�your�research�study,�you�may�or�may�not�write�a�research� question�that�corresponds�to�your�descriptive�statistics��If�the�end�result�of�your�research�paper� is� to� present� results� from� inferential� statistics,� it� may� be� that� your� research� questions� correspond�only�to�those�inferential�questions�and�thus�no�question�is�presented�to�rep- resent�the�descriptive�statistics��That�is�quite�common��On�the�other�hand,�if�the�ultimate� purpose�of�your�research�study�is�purely�descriptive�in�nature,�then�writing�one�or�more� research�questions�that�correspond�to�the�descriptive�statistics�is�not�only�entirely�appro- priate� but� (in� most� cases)� absolutely� necessary�� At� this� time,� let� us� revisit� our� gradu- ate� research� assistant,� Marie,� who� was� introduced� at� the� beginning� of� the� chapter�� As� you� may� recall,� her� task� was� to� summarize� data� from� 25� students� enrolled� in� a� statis- tics�course��The�questions�that�Marie’s�faculty�mentor�shared�with�her�were�as�follows:� How can the quiz scores of students enrolled in an introductory 42 An Introduction to Statistical Concepts statistics class be graphically represented in a table? In a figure? What is the distributional shape of the statistics quiz score? What is the 50th percentile of the quiz scores?�A�template�for�writing�descriptive� research�questions�for�summarizing�data�may�be�as�follows��Please�note�that�these�are� just�a�few�examples��Given�the�multitude�of�descriptive�statistics�that�can�be�generated,� these�are�not�meant�to�be�exhaustive� How can [variable] be graphically represented in a table? In a figure? What is the distributional shape of the [variable]? What is the 50th percentile of [variable]? Next,�we�present�an�APA-like�paragraph�summarizing�the�results�of�the�statistics�quiz�data� example� As shown in Table 2.2 and Figure 2.2, scores ranged from 9 to 20, with more students achieving a score of 17 than any other score (20%). From Figure 2.2, we also know that the distribution of scores was negatively skewed, with the bulk of the scores being at the high end of the distribution. Skewness was also evident as the quartiles were not equally spaced, as shown in Figure 2.7. Thus, overall the sample of students tended to do rather well on this particular quiz (must have been the awesome teach- ing), although a few low scores should be troubling (as 20% did not pass the quiz and need some remediation). 2.6 Summary In�this�chapter,�we�considered�both�tabular�and�graphical�methods�for�representing�data�� First,� we� discussed� the� tabular� display� of� distributions� in� terms� of� frequency� distribu- tions� (ungrouped� and� grouped),� cumulative� frequency� distributions,� relative� frequency� distributions,�and�cumulative�relative�frequency�distributions��Next,�we�examined�various� methods�for�depicting�data�graphically,�including�bar�graphs,�histograms�(ungrouped�and� grouped),� frequency� polygons,� cumulative� frequency� polygons,� shapes� of� distributions,� and� stem-and-leaf� displays�� Then,� concepts� and� procedures� related� to� percentiles� were� covered,� including� percentiles,� quartiles,� percentile� ranks,� and� box-and-whisker� plots�� Finally,� an� overview� of� SPSS� for� these� procedures� was� included,� as� well� as� a� summary� APA-style�paragraph�of�the�quiz�dataset��We�include�Box�2�1�as�a�summary�of�which�data� representation� techniques� are� most� appropriate� for� each� type� of� measurement� scale�� At� this�point,�you�should�have�met�the�following�objectives:�(a)�be�able�to�construct�and�inter- pret�statistical�tables,�(b)�be�able�to�construct�and�interpret�statistical�graphs,�and�(c)�be�able� to�determine�and�interpret�percentile-related�information��In�the�next�chapter,�we�address� the�major�population�parameters�and�sample�statistics�useful�for�looking�at�a�single�vari- able��In�particular,�we�are�concerned�with�measures�of�central�tendency�and�measures�of� dispersion� 43Data Representation STOp aNd ThINk bOx 2.1 Appropriate�Data�Representation�Techniques Measurement Scale Tables Figures Nominal •�Frequency�distribution •�Bar�graph •�Relative�frequency�distribution Ordinal,�interval,�or�ratio •�Frequency�distribution •�Histogram •��Cumulative�frequency� distribution •�Relative�frequency�distribution •��Cumulative�relative�frequency� distribution •�Frequency�polygon •�Relative�frequency�polygon •�Cumulative�frequency�polygon •��Cumulative�relative�frequency� polygon •�Stem-and-leaf�display �•�Box-and-whisker�plot Problems Conceptual problems 2.1� For�a�distribution�where�the�50th�percentile�is�100,�what�is�the�percentile�rank�of�100? � a�� 0 � b�� �50 � c�� 50 � d�� 100 2.2� Which�of�the�following�frequency�distributions�will�generate�the�same�relative�fre- quency�distribution? X f Y f Z f 100 2 100 6 100 8 99 5 99 15 99 18 98 8 98 24 98 28 97 5 97 15 97 18 96 2 96 6 96 8 � a�� X�and�Y�only � b�� X�and�Z�only � c�� Y�and�Z�only � d�� X,�Y,�and�Z � e�� None�of�the�above 44 An Introduction to Statistical Concepts 2.3� Which� of� the� following� frequency� distributions� will� generate� the� same� cumulative� relative�frequency�distribution? X f Y f Z f 100 2 100 6 100 8 99 5 99 15 99 18 98 8 98 24 98 28 97 5 97 15 97 18 96 2 96 6 96 8 � a�� X�and�Y�only � b�� X�and�Z�only � c�� Y�and�Z�only � d�� X,�Y,�and�Z � e�� None�of�the�above 2.4� In�a�histogram,�48%�of�the�area�lies�below�the�score�whose�percentile�rank�is�52��True� or�false? 2.5� Among�the�following,�the�preferred�method�of�graphing�data�pertaining�to�the�eth- nicity�of�a�sample�would�be � a�� A�histogram � b�� A�frequency�polygon � c�� A�cumulative�frequency�polygon � d�� A�bar�graph 2.6� The�proportion�of�scores�between�Q1�and�Q3�may�be�less�than��50��True�or�false? 2.7� The�values�of�Q1,�Q2,�and�Q3�in�a�positively�skewed�population�distribution�are�calcu- lated��What�is�the�expected�relationship�between�(Q2�−�Q1)�and�(Q3�−�Q2)? � a�� (Q2�−�Q1)�is�greater�than�(Q3�−�Q2)� � b�� (Q2�−�Q1)�is�equal�to�(Q3�−�Q2)� � c�� (Q2�−�Q1)�is�less�than�(Q3�−�Q2)� � d�� Cannot�be�determined�without�examining�the�data� 2.8� If�the�percentile�rank�of�a�score�of�72�is�65,�we�may�say�that�35%�of�the�scores�exceed� 72��True�or�false? 2.9� In�a�negatively�skewed�distribution,�the�proportion�of�scores�between�Q1�and�Q2�is� less�than��25��True�or�false? 2.10� A� group� of� 200� sixth-grade� students� was� given� a� standardized� test� and� obtained� scores�ranging�from�42�to�88��If�the�scores�tended�to�“bunch�up”�in�the�low�80s,�the� shape�of�the�distribution�would�be�which�one�of�the�following: � a�� Symmetrical � b�� Positively�skewed � c�� Negatively�skewed � d�� Normal 45Data Representation 2.11� The�preferred�method�of�graphing�data�on�the�eye�color�of�a�sample�is�which�one�of� the�following? � a�� Bar�graph � b�� Frequency�polygon � c�� Cumulative�frequency�polygon � d�� Relative�frequency�polygon 2.12� If�Q2�=�60,�then�what�is�P50? � a�� 50 � b�� 60 � c�� 95 � d�� Cannot�be�determined�with�the�information�provided 2.13� With�the�same�data�and�using�an�interval�width�of�1,�the�frequency�polygon�and�his- togram�will�display�the�same�information��True�or�false? 2.14� A�researcher�develops�a�histogram�based�on�an�interval�width�of�2��Can�she�recon- struct�the�raw�scores�using�only�this�histogram?�Yes�or�no? 2.15� Q2�=�50�for�a�positively�skewed�variable,�and�Q2�=�50�for�a�negatively�skewed�variable�� I�assert�that�Q1�will�not�necessarily�be�the�same�for�both�variables��Am�I�correct?�True� or�false? 2.16� Which�of�the�following�statements�is�correct�for�a�continuous�variable? � a�� The�proportion�of�the�distribution�below�the�25th�percentile�is�75%� � b�� The�proportion�of�the�distribution�below�the�50th�percentile�is�25%� � c�� The�proportion�of�the�distribution�above�the�third�quartile�is�25%� � d�� The�proportion�of�the�distribution�between�the�25th�and�75th�percentiles�is 25%� 2.17� For�a�dataset�with�four�unique�values�(55,�70,�80,�and�90),�the�relative�frequency�for�the� value�55�is�20%,�the�relative�frequency�for�70�is�30%,�the�relative�frequency�for�80�is�20%,� and�the�relative�frequency�for�90�is�30%��What�is�the�cumulative�relative�frequency�for� the�value�70? � a�� 20% � b�� 30% � c�� 50% � d�� 100% 2.18� In�examining�data�collected�over�the�past�10�years,�researchers�at�a�theme�park�find� the�following�for�5000�first-time�guests:�2250�visited�during�the�summer�months;� 675� visited� during� the� fall;� 1300� visited� during� the� winter;� and� 775� visited� dur- ing� the� spring�� What� is� the� relative� frequency� for� guests� who� visited� during� the� spring? � a�� �135 � b�� �155 � c�� �260 � d�� �450 46 An Introduction to Statistical Concepts Computational problems 2.1� The�following�scores�were�obtained�from�a�statistics�exam: 47 50 47 49 46 41 47 46 48 44 46 47 45 48 45 46 50 47 43 48 47 45 43 46 47 47 43 46 42 47 49 44 44 50 41 45 47 44 46 45 42 47 44 48 49 43 45 49 49 46 Using�an�interval�size�of�1,�construct�or�compute�each�of�the�following: � a�� Frequency�distribution � b�� Cumulative�frequency�distribution � c�� Relative�frequency�distribution � d�� Cumulative�relative�frequency�distribution � e� Histogram�and�frequency�polygon � f� Cumulative�frequency�polygon � g�� Quartiles � h�� P10�and�P90 � i�� PR(41)�and�PR(49�5) � j�� Box-and-whisker�plot � k�� Stem-and-leaf�display 2.2� The�following�data�were�obtained�from�classroom�observations�and�reflect�the�num- ber�of�incidences�that�preschool�children�shared�during�an�8�hour�period� 4 8 10 5 12 10 14 5 10 14 12 14 8 5 0 8 12 8 12 5 4 10 8 5 Using�an�interval�size�of�1,�construct�or�compute�each�of�the�following: � a�� Frequency�distribution � b�� Cumulative�frequency�distribution � c�� Relative�frequency�distribution � d�� Cumulative�relative�frequency�distribution � e�� Histogram�and�frequency�polygon � f�� Cumulative�frequency�polygon � g�� Quartiles � h�� P10�and�P90 � i�� PR(10) � j�� Box-and-whisker�plot � k�� Stem-and-leaf�display 47Data Representation 2.3� A�sample�distribution�of�variable�X�is�as�follows: X f 2 1 3 2 4 5 5 8 6 4 7 3 8 4 9 1 10 2 Calculate�or�draw�each�of�the�following�for�the�sample�distribution�of�X: � a�� Q1 � b�� Q2 � c�� Q3 � d�� P44�5 � e�� PR(7�0) � f�� Box-and-whisker�plot � g�� Histogram�(ungrouped) 2.4� A�sample�distribution�of�classroom�test�scores�is�as�follows: X f 70 1 75 2 77 3 79 2 80 6 82 5 85 4 90 4 96 3 Calculate�or�draw�each�of�the�following�for�the�sample�distribution�of�X: � a�� Q1 � b�� Q2 � c�� Q3 � d�� P44�5 � e�� PR(82) � f�� Box-and-whisker�plot � g�� Histogram�(ungrouped) 48 An Introduction to Statistical Concepts Interpretive problems Select�two�variables�from�the�survey1�dataset�on�the�website,�one�that�is�nominal�and�one� that�is�not� 2.1� �Write� research� questions� that� will� be� answered� from� these� data� using� descriptive� statistics�(you�may�want�to�review�the�research�question�template�in�this�chapter)� 2.2� �Construct�the�relevant�tables�and�figures�to�answer�the�questions�you�posed� 2.3� �Write�a�paragraph�which�summarizes�the�findings�for�each�variable�(you�may�want� to�review�the�writing�template�in�this�chapter)� 49 3 Univariate Population Parameters and Sample Statistics Chapter Outline 3�1� Summation�Notation 3�2� Measures�of�Central�Tendency 3�2�1� Mode 3�2�2� Median 3�2�3� Mean 3�2�4� Summary�of�Measures�of�Central�Tendency 3�3� Measures�of�Dispersion 3�3�1� Range 3�3�2� H�Spread 3�3�3� Deviational�Measures 3�3�4� Summary�of�Measures�of�Dispersion 3�4� SPSS 3�5� Templates�for�Research�Questions�and�APA-Style�Paragraph Key Concepts � 1�� Summation � 2�� Central�tendency � 3�� Outliers � 4�� Dispersion � 5�� Exclusive�versus�inclusive�range � 6�� Deviation�scores � 7�� Bias In�the�second�chapter,�we�began�our�discussion�of�descriptive�statistics�previously�defined�as� techniques�which�allow�us�to�tabulate,�summarize,�and�depict�a�collection�of�data�in�an�abbre- viated�fashion��There�we�considered�various�methods�for�representing�data�for�purposes�of� communicating�something�to�the�reader�or�audience��In�particular,�we�were�concerned�with� ways�of�representing�data�in�an�abbreviated�fashion�through�both�tables�and�figures� 50 An Introduction to Statistical Concepts In� this� chapter,� we� delve� more� into� the� field� of� descriptive� statistics� in� terms� of� three� general� topics�� First,� we� examine� summation� notation,� which� is� important� for� much� of� the� chapter� and,� to� some� extent,� the� remainder� of� the� text�� Second,� measures� of� central� tendency�allow�us�to�boil�down�a�set�of�scores�into�a�single�value,�a�point�estimate,�which� somehow� represents� the� entire� set�� The� most� commonly� used� measures� of� central� ten- dency�are�the�mode,�median,�and�mean��Finally,�measures�of�dispersion�provide�us�with� information�about� the�extent�to� which� the� set�of�scores� varies—in� other�words,� whether� the�scores�are�spread�out�quite�a�bit�or�are�pretty�much�the�same��The�most�commonly�used� measures�of�dispersion�are�the�range�(exclusive�and�inclusive�ranges),�H�spread,�and�vari- ance�and�standard�deviation��In�summary,�concepts�to�be�discussed�in�this�chapter�include� summation,�central�tendency,�and�dispersion��Within�this�discussion,�we�also�address�out- liers�and�bias��Our�objectives�are�that�by�the�end�of�this�chapter,�you�will�be�able�to�do� the�following:�(a)�understand�and�utilize�summation�notation,�(b)�determine�and�interpret� the�three�commonly�used�measures�of�central�tendency,�and�(c)�determine�and�interpret�dif- ferent�measures�of�dispersion� 3.1 Summation Notation We�were�introduced�to�the�following�research�scenario�in�Chapter�2�and�revisit�Marie�in� this�chapter� Marie,� a� graduate� student� pursuing� a� master’s� degree� in� educational� research,� has� been� assigned� to� her� first� task� as� a� research� assistant�� Her� faculty� mentor� has� given� Marie�quiz�data�collected�from�25�students�enrolled�in�an�introductory�statistics�course� and� has� asked� Marie� to� summarize� the� data�� The� faculty� member� was� pleased� with� the�descriptive�analysis�and�presentation�of�results�previously�shared,�and�has�asked� Marie�to�conduct�additional�analysis�related�to�the�following�research�questions:�How can quiz scores of students enrolled in an introductory statistics class be summarized using measures of central tendency? Measures of dispersion? Many� areas� of� statistics,� including� many� methods� of� descriptive� and� inferential� statis- tics,�require�the�use�of�summation�notation��Say�we�have�collected�heart�rate�scores�from� 100�students��Many�statistics�require�us�to�develop�“sums”�or�“totals”�in�different�ways�� For� example,� what� is� the� simple� sum� or� total� of� all� 100� heart� rate� scores?� Summation� (i�e�,�addition)�is�not�only�quite�tedious�to�do�computationally�by�hand,�but�we�also�need� a�system�of�notation�to�communicate�how�we�have�conducted�this�summation�process�� This�section�describes�such�a�notational�system� For�simplicity,�let�us�utilize�a�small�set�of�scores,�keeping�in�mind�that�this�system�can� be�used�for�a�set�of�numerical�values�of�any�size��In�other�words,�while�we�speak�in�terms� of�“scores,”�this�could�just�as�easily�be�a�set�of�heights,�distances,�ages,�or�other�measures�� Specifically� in� this� example,� we� have� a� set� of� five� ages:� 7,� 11,� 18,� 20,� and� 24�� Recall� from� Chapter�2�the�use�of�X�to�denote�a�variable��Here�we�define�Xi�as�the�score�for�variable�X�(in� this�example,�age)�for�a�particular�individual�or�object�i��The�subscript�i�serves�to�identify� one�individual�or�object�from�another��These�scores�would�then�be�denoted�as�follows:� X1�=�7,�X2�=�11,�X3�=�18,�X4�=�20,�and�X5�=�24��To�interpret�X1�=�7�means�that�for�variable�X� and�individual�1,�the�value�of�the�variable�age�is�7��In�other�words,�individual�1�is�7�years�of�age�� 51Univariate Population Parameters and Sample Statistics With�five�individuals�measured�on�age,�then�i�=�1,�2,�3,�4,�5��However,�with�a�large�set�of� values,�this�notation�can�become�quite�unwieldy,�so�as�shorthand�we�abbreviate�this�as� i�=�1,…,�5,�meaning�that�X�ranges�or�goes�from�i�=�1�to�i�=�5� Next�we�need�a�system�of�notation�to�denote�the�summation�or�total�of�a�set�of�scores�� The�standard�notation�used�is� Xi i a b = ∑ ,�where�Σ�is�the�Greek�capital�letter�sigma�and�merely� means�“the�sum�of,”�Xi�is�the�variable�we�are�summing�across�for�each�of�the�i�individuals,� i = a�indicates�that�a�is�the�lower�limit�(or�beginning)�of�the�summation�(i�e�,�the�first�value� with� which� we� begin� our� addition),� and� b� indicates� the� upper� limit� (or� end)� of� the� sum- mation�(i�e�,�the�last�value�added)��For�our�example�set�of�ages,�the�sum�of�all�of�the�ages� would�be�denoted�as� Xi i= ∑ 1 5 �in�shorthand�version�and�as� X X X X X Xi i= ∑ = + + + + 1 5 1 2 3 4 5�in� longhand�version��For�the�example�data,�the�sum�of�all�of�the�ages�is�computed�as�follows: X X X X X Xi i= ∑ = + + + + = + + + + = 1 5 1 2 3 4 5 7 11 18 20 24 80 Thus,�the�sum�of�the�age�variable�across�all�five�individuals�is�80� For�large�sets�of�values,�the�longhand�version�is�rather�tedious,�and,�thus,�the�shorthand� version�is�almost�exclusively�used��A�general�form�of�the�longhand�version�is�as�follows: X X X X Xi i a b a a b b = + −∑ = + + + +1 1… The�ellipse�notation�(i�e�,�…)�indicates�that�there�are�as�many�values�in�between�the�two� values� on� either� side� of� the� ellipse� as� are� necessary�� The� ellipse� notation� is� then� just� shorthand�for�“there�are�some�values�in�between�here�”�The�most�frequently�used�values� for�a�and�b�with�sample�data�are�a�=�1�and�b = n�(as�you�may�recall,�n�is�the�notation�used� to� represent� our� sample� size)�� Thus,� the� most� frequently� used� summation� notation� for� sample�data�is� Xi i n = ∑ 1 . 3.2 Measures of Central Tendency One�method�for�summarizing�a�set�of�scores�is�to�construct�a�single�index�or�value�that�can� somehow�be�used�to�represent�the�entire�collection�of�scores��In�this�section,�we�consider� the�three�most�popular�indices,�known�as�measures of central tendency��Although�other� indices�exist,�the�most�popular�ones�are�the�mode,�the�median,�and�the�mean� 3.2.1   Mode The� simplest� method� to� use� for� measuring� central� tendency� is� the� mode�� The� mode� is� defined� as� that� value� in� a� distribution� of� scores� that� occurs� most� frequently�� Consider� the�example�frequency�distributions�of�the�number�of�hours�of�TV�watched�per�week,�as� 52 An Introduction to Statistical Concepts shown� in� Table� 3�1�� In� distribution� (a),� the� mode� is� easy� to� determine,� as� the� interval� for� value�8�contains�the�most�scores,�3�(i�e�,�the�mode�number�of�hours�of�TV�watched�is�8)��In� distribution�(b),�the�mode�is�a�bit�more�complicated�as�two�adjacent�intervals�each�contain� the�most�scores;�that�is,�the�8�and�9�hour�intervals�each�contain�three�scores��Strictly�speak- ing,�this�distribution�is�bimodal,�that�is,�containing�two�modes,�one�at�8�and�one�at�9��This� is� our� personal� preference� for� reporting� this� particular� situation�� However,� because� the� two�modes�are�in�adjacent�intervals,�some�individuals�make�an�arbitrary�decision�to�aver- age�these�intervals�and�report�the�mode�as�8�5� Distribution�(c)�is�also�bimodal;�however,�here�the�two�modes�at�7�and�11�hours�are�not� in�adjacent�intervals��Thus,�one�cannot�justify�taking�the�average�of�these�intervals,�as�the� average� of�9�hours�[i�e�,�(7�+�11)/2]�is�not�representative�of� the�most�frequently�occurring� score��The�score�of�9�occurs�less�than�any�other�score�observed��We�recommend�reporting� both� modes� here� as� well�� Obviously,� there� are� other� possible� situations� for� the� mode� (e�g�,�trimodal�distribution),�but�these�examples�cover�the�basics��As�one�further�example,� the�example�data�on�the�statistics�quiz�from�Chapter�2�are�shown�in�Table�3�2�and�are�used� to� illustrate� the� methods� in� this� chapter�� The� mode� is� equal� to� 17� because� that� interval� contains�more�scores�(5)�than�any�other�interval��Note�also�that�the�mode�is�determined�in� Table 3.2 Frequency�Distribution� of Statistics�Quiz�Data X f cf rf crf 9 1 1 �04 �04 10 1 2 �04 �08 11 2 4 �08 �16 12 1 5 �04 �20 13 2 7 �08 �28 14 1 8 �04 �32 15 3 11 �12 �44 16 1 12 �04 �48 17 5 17 �20 �68 18 3 20 �12 �80 19 4 24 �16 �96 20 1 25 �04 1�00 n�=�25 1�00 Table 3.1 Example�Frequency�Distributions X f(a) f(b) f(c) 6 1 1 2 7 2 2 3 8 3 3 2 9 2 3 1 10 1 2 2 11 0 1 3 12 0 0 2 53Univariate Population Parameters and Sample Statistics precisely�the�same�way�whether�we�are�talking�about�the�population�mode�(i�e�,�the�popu- lation�parameter)�or�the�sample�mode�(i�e�,�the�sample�statistic)� Let�us�turn�to�a�discussion�of�the�general�characteristics�of�the�mode,�as�well�as�whether� a�particular�characteristic�is�an�advantage�or�a�disadvantage�in�a�statistical�sense��The�first� characteristic�of�the�mode�is�it�is�simple�to�obtain��The�mode�is�often�used�as�a�quick-and- dirty� method� for� reporting� central� tendency�� This� is� an� obvious� advantage�� The� second� characteristic�is�the�mode�does�not�always�have�a�unique�value��We�saw�this�in�distribu- tions� (b)� and� (c)� of� Table� 3�1�� This� is� generally� a� disadvantage,� as� we� initially� stated� we� wanted�a�single�index�that�could�be�used�to�represent�the�collection�of�scores��The�mode� cannot�guarantee�a�single�index� Third,�the�mode�is�not�a�function�of�all�of�the�scores�in�the�distribution,�and�this�is�generally� a�disadvantage��The�mode�is�strictly�determined�by�which�score�or�interval�contains�the�most� frequencies��In�distribution�(a),�as�long�as�the�other�intervals�have�fewer�frequencies�than�the� interval�for�value�8,�then�the�mode�will�always�be�8��That�is,�if�the�interval�for�value�8�contains� three�scores�and�all�of�the�other�intervals�contain�less�than�three�scores,�then�the�mode�will� be�8��The�number�of�frequencies�for�the�remaining�intervals�is�not�relevant�as�long�as�it�is�less� than�3��Also,�the�location�or�value�of�the�other�scores�is�not�taken�into�account� The�fourth�characteristic�of�the�mode�is�that�it�is�difficult�to�deal�with�mathematically��For� example,�the�mode�is�not�very�stable�from�one�sample�to�another,�especially�with�small�sam- ples��We�could�have�two�nearly�identical�samples�except�for�one�score,�which�can�alter�the� mode��For�example,�in�distribution�(a),�if�a�second�similar�sample�contains�the�same�scores� except�that�an�8�is�replaced�with�a�7,�then�the�mode�is�changed�from�8�to�7��Thus,�changing� a� single� score� can� change� the� mode,� and� this� is� considered� to� be� a� disadvantage�� A� fifth� and�final�characteristic�is�the�mode�can�be�used�with�any�type�of�measurement�scale,�from� nominal�to�ratio,�and�is�the�only�measure�of�central�tendency�appropriate�for�nominal�data� 3.2.2   Median A�second�measure�of�central�tendency�represents�a�concept�that�you�are�already�familiar� with��The�median�is�that�score�which�divides�a�distribution�of�scores�into�two�equal�parts�� In� other� words,� one-half� of� the� scores� fall� below� the� median,� and� one-half� of� the� scores� fall�above�the�median��We�already�know�this�from�Chapter�2�as�the�50th�percentile�or�Q2�� In� other� words,� the� 50th� percentile,� or� Q2,� represents� the� median� value�� The� formula�for� computing�the�median�is � Median LRL n cf f w= + −50% ( ) � (3�1) where�the�notation�is�the�same�as�previously�described�in�Chapter�2��Just�as�a�reminder,� LRL� is� the� lower� real� limit� of� the� interval� containing� the� median,� 50%� is� the� percentile� desired,�n�is�the�sample�size,�cf�is�the�cumulative�frequency�of�all�intervals�less�than�but� not�including�the�interval�containing�the�median�(cf�below),�f�is�the�frequency�of�the�interval� containing�the�median,�and�w�is�the�interval�width��For�the�example�quiz�data,�the�median� is�computed�as�follows: Median = + −      = + =16 5 50 25 12 5 1 16 5 0 1000 16 6000. % ( ) ( ) . . . 54 An Introduction to Statistical Concepts Occasionally,� you� will� run� into� simple� distributions� of� scores� where� the� median� is� easy� to�identify��If�you�have�an�odd�number�of�untied�scores,�then�the�median�is�the�middle- ranked�score��For�an�example,�say�we�have�measured�individuals�on�the�number�of�CDs� owned�and�find�values�of�1,�3,�7,�11,�and�21��For�these�data,�the�median�is�7�(e�g�,�7�CDs�is� the�middle-ranked�value�or�score)��If�you�have�an�even�number�of�untied�scores,�then�the� median�is�the�average�of�the�two�middle-ranked�scores��For�example,�a�different�sample� reveals�the�following�number�of�CDs�owned:�1,�3,�5,�11,�21,�and�32��The�two�middle�scores� are�5�and�11,�and,�thus,�the�median�is�the�average�of�8�CDs�owned�(i�e�,�(5�+�11)/2)��In�most� other� situations� where� there� are� tied� scores,� the� median� is� not� as� simple� to� locate� and� Equation�3�1�is�necessary��Note�also�that�the�median�is�computed�in�precisely�the�same�way� whether�we�are�talking�about�the�population�median�(i�e�,�the�population�parameter)�or�the� sample�median�(i�e�,�the�sample�statistic)� The�general�characteristics�of�the�median�are�as�follows��First,�the�median�is�not�influenced� by�extreme�scores�(scores�far�away�from�the�middle�of�the�distribution�are�known�as�outliers)�� Because�the�median�is�defined�conceptually�as�the�middle�score,�the�actual�size�of�an�extreme� score�is�not�relevant��For�the�example�statistics�quiz�data,�imagine�that�the�extreme�score�of�9� was�somehow�actually�0�(e�g�,�incorrectly�scored)��The�median�would�still�be�16�6,�as�half�of�the� scores�are�still�above�this�value�and�half�below��Because�the�extreme�score�under�consideration� here�still�remained�below�the�50th�percentile,�the�median�was�not�altered��This�characteristic� is�an�advantage,�particularly�when�extreme�scores�are�observed��As�another�example�using� salary�data,�say�that�all�but�one�of�the�individual�salaries�are�below�$100,000�and�the�median� is�$50,000��The�remaining�extreme�observation�has�a�salary�of�$5,000,000��The�median�is�not� affected�by�this�millionaire—the�extreme�individual�is�simply�treated�as�every�other�observa- tion�above�the�median,�no�more�or�no�less�than,�say,�the�salary�of�$65,000� A� second� characteristic� is� the� median� is� not� a� function� of� all� of� the� scores�� Because� we� already�know�that�the�median�is�not�influenced�by�extreme�scores,�we�know�that�the�median� does� not� take� such� scores� into� account�� Another� way� to� think� about� this� is� to� examine� Equation�3�1�for�the�median��The�equation�only�deals�with�information�for�the�interval�con- taining�the�median��The�specific�information�for�the�remaining�intervals�is�not�relevant�so� long�as�we�are�looking�in�the�median-contained�interval��We�could,�for�instance,�take�the�top� 25%�of�the�scores�and�make�them�even�more�extreme�(say�we�add�10�bonus�points�to�the�top� quiz�scores)��The�median�would�remain�unchanged��As�you�probably�surmised,�this�charac- teristic�is�generally�thought�to�be�a�disadvantage��If�you�really�think�about�the�first�two�char- acteristics,�no�measure�could�possibly�possess�both��That�is,�if�a�measure�is�a�function�of�all� of�the�scores,�then�extreme�scores�must�also�be�taken�into�account��If�a�measure�does�not�take� extreme�scores�into�account,�like�the�median,�then�it�cannot�be�a�function�of�all�of�the�scores� A� third� characteristic� is� the� median� is� difficult� to� deal� with� mathematically,� a� disad- vantage� as� with� the� mode�� The� median� is� somewhat� unstable� from� sample� to� sample,� especially�with�small�samples��As�a�fourth�characteristic,�the�median�always�has�a�unique� value,�another�advantage��This�is�unlike�the�mode,�which�does�not�always�have�a�unique� value��Finally,�the�fifth�characteristic�of�the�median�is�that�it�can�be�used�with�all�types�of� measurement�scales�except�the�nominal��Nominal�data�cannot�be�ranked,�and,�thus,�per- centiles�and�the�median�are�inappropriate� 3.2.3   Mean The� final� measure� of� central� tendency� to� be� considered� is� the� mean,� sometimes� known� as�the�arithmetic�mean�or�“average”�(although�the�term�average�is�used�rather�loosely�by� laypeople)��Statistically,�we�define�the�mean�as�the�sum�of�all�of�the�scores�divided�by�the� 55Univariate Population Parameters and Sample Statistics number�of�scores��Thought�of�in�those�terms,�you�may�have�been�computing�the�mean�for� many�years,�and�may�not�have�even�known�it� The�population�mean�is�denoted�by�μ�(Greek�letter�mu)�and�computed�as�follows: µ = = ∑X N i i N 1 For�sample�data,�the�sample�mean�is�denoted�by�X – �(read�“X�bar”)�and�computed�as�follows: X X n i i n = = ∑ 1 For�the�example�quiz�data,�the�sample�mean�is�computed�as�follows: X X n i i n = = == ∑ 1 389 25 15 5600. Here�are�the�general�characteristics�of�the�mean��First,�the�mean�is�a�function�of�every�score,� a�definite�advantage�in�terms�of�a�measure�of�central�tendency�representing�all�of�the�data�� If�you�look�at�the�numerator�of�the�mean,�you�see�that�all�of�the�scores�are�clearly�taken�into� account�in�the�sum��The�second�characteristic�of�the�mean�is�that�it�is�influenced�by�extreme� scores��Because�the�numerator�sum�takes�all�of�the�scores�into�account,�it�also�includes�the� extreme�scores,�which�is�a�disadvantage��Let�us�return�for�a�moment�to�a�previous�example� of�salary�data�where�all�but�one�of�the�individuals�have�an�annual�salary�under�$100,000,�and� the�one�outlier�is�making�$5,000,000��Because�this�one�outlying�value�is�so�extreme,�the�mean� will�be�greatly�influenced��In�fact,�the�mean�could�easily�fall�somewhere�between�the�second� highest�salary�and�the�millionaire,�which�does�not�represent�well�the�collection�of�scores� Third,�the�mean�always�has�a�unique�value,�another�advantage��Fourth,�the�mean�is�easy� to�deal�with�mathematically��The�mean�is�the�most�stable�measure�of�central�tendency�from� sample�to�sample,�and�because�of�that�is�the�measure�most�often�used�in�inferential�statistics� (as�we�show�in�later�chapters)��Finally,�the�fifth�characteristic�of�the�mean�is�that�it�is�only� appropriate�for�interval�and�ratio�measurement�scales��This�is�because�the�mean�implicitly� assumes�equal�intervals,�which�of�course�the�nominal�and�ordinal�scales�do�not�possess� 3.2.4   Summary of Measures of Central Tendency To�summarize�the�measures�of�central�tendency�then, � 1�� The�mode�is�the�only�appropriate�measure�for�nominal�data� � 2�� The�median�and�mode�are�both�appropriate�for�ordinal�data�(and�conceptually�the� median�fits�the�ordinal�scale�as�both�deal�with�ranked�scores)� � 3�� All�three�measures�are�appropriate�for�interval�and�ratio�data� A�summary�of�the�advantages�and�disadvantages�of�each�measure�is�presented�in�Box�3�1� 56 An Introduction to Statistical Concepts STOp aNd ThINk bOx 3.1 Advantages�and�Disadvantages�of�Measures�of�Central�Tendency Measure of Central Tendency Advantages Disadvantages Mode •��Quick�and�easy�method�for�reporting� central�tendency •��Can�be�used�with�any�measurement�scale� of variable •�Does�not�always�have�a�unique�value •��Not�a�function�of�all�scores�in�the� distribution •��Difficult�to�deal�with�mathematically� due�to�its�instability Median •�Not�influenced�by�extreme�scores •�Has�a�unique�value •��Can�be�used�with�ordinal,�interval,�and� ratio�measurement�scales�of�variables •��Not�a�function�of�all�scores�in�the� distribution •��Difficult�to�deal�with�mathematically� due�to�its�instability •�Cannot�be�used�with�nominal�data Mean •�Function�of�all�scores�in�the�distribution •�Has�a�unique�value •�Easy�to�deal�with�mathematically •��Can�be�used�with�interval�and�ratio� measurement�scales�of�variables •�Influenced�by�extreme�scores •��Cannot�be�used�with�nominal�or� ordinal�variables 3.3 Measures of Dispersion In�the�previous�section,�we�discussed�one�method�for�summarizing�a�collection�of�scores,� the�measures�of�central�tendency��Central�tendency�measures�are�useful�for�describing�a� collection�of�scores�in�terms�of�a�single�index�or�value�(with�one�exception:�the�mode�for� distributions�that�are�not�unimodal)��However,�what�do�they�tell�us�about�the�distribution� of�scores?�Consider�the�following�example��If�we�know�that�a�sample�has�a�mean�of�50,�what� do�we�know�about�the�distribution�of�scores?�Can�we�infer�from�the�mean�what�the�distri- bution�looks�like?�Are�most�of�the�scores�fairly�close�to�the�mean�of�50,�or�are�they�spread� out�quite�a�bit?�Perhaps�most�of�the�scores�are�within�two�points�of�the�mean��Perhaps�most� are�within�10�points�of�the�mean��Perhaps�most�are�within�50�points�of�the�mean��Do�we� know?�The�answer,�of�course,�is�that�the�mean�provides�us�with�no�information�about�what� the�distribution�of�scores�looks�like,�and�any�of�the�possibilities�mentioned,�and�many�oth- ers,�can�occur��The�same�goes�if�we�only�know�the�mode�or�the�median� Another�method�for�summarizing�a�set�of�scores�is�to�construct�an�index�or�value�that� can� be� used� to� describe� the� amount� of� spread� among� the� collection� of� scores�� In� other� words,� we� need� measures� that� can� be� used� to� determine� whether� the� scores� fall� fairly� close� to� the� central� tendency� measure,� are� fairly� well� spread� out,� or� are� somewhere� in� between��In�this�section,�we�consider�the�four�most�popular�such�indices,�which�are�known� as�measures of dispersion�(i�e�,�the�extent�to�which�the�scores�are�dispersed�or�spread�out)�� Although�other�indices�exist,�the�most�popular�ones�are�the�range�(exclusive�and�inclusive),� H�spread,�the�variance,�and�the�standard�deviation� 3.3.1   Range The�simplest�measure�of�dispersion�is�the�range��The�term�range�is�one�that�is�in�common� use�outside�of�statistical�circles,�so�you�have�some�familiarity�with�it�already��For�instance,� 57Univariate Population Parameters and Sample Statistics you�are�at�the�mall�shopping�for�a�new�pair�of�shoes��You�find�six�stores�have�the�same�pair� of�shoes�that�you�really�like,�but�the�prices�vary�somewhat��At�this�point,�you�might�actu- ally�make�the�statement�“the�price�for�these�shoes�ranges�from�$59�to�$75�”�In�a�way,�you� are�talking�about�the�range� Let�us�be�more�specific�as�to�how�the�range�is�measured��In�fact,�there�are�actually�two� different� definitions� of� the� range,� exclusive� and� inclusive,� which� we� consider� now�� The� exclusive range�is�defined�as�the�difference�between�the�largest�and�smallest�scores�in�a� collection� of� scores�� For� notational� purposes,� the� exclusive� range� (ER)� is� shown� as� ER = Xmax�−�Xmin,�where�Xmax�is�the�largest�or�maximum�score�obtained,�and�Xmin�is�the�smallest� or�minimum�score�obtained��For�the�shoe�example�then,�ER = Xmax�−�Xmin�=�75�−�59�=�16��In� other�words,�the�actual�exclusive�range�of�the�scores�is�16�because�the�price�varies�from�59� to�75�(in�dollar�units)� A�limitation�of�the�exclusive�range�is�that�it�fails�to�account�for�the�width�of�the�intervals� being�used��For�example,�if�we�use�an�interval�width�of�1�dollar,�then�the�59�interval�really� has�59�5�as�the�upper�real�limit�and�58�5�as�the�lower�real�limit��If�the�least�expensive�shoe� is� $58�95,� then� the� exclusive� range� covering� from� $59� to� $75� actually� excludes� the� least� expensive�shoe��Hence�the�term�exclusive range�means�that�scores�can�be�excluded�from� this�range��The�same�would�go�for�a�shoe�priced�at�$75�25,�as�it�would�fall�outside�of�the� exclusive�range�at�the�high�end�of�the�distribution� Because�of�this�limitation,�a�second�definition�of�the�range�was�developed,�known�as�the� inclusive range��As�you�might�surmise,�the�inclusive�range�takes�into�account�the�interval� width�so�that�all�scores�are�included�in�the�range��The�inclusive�range�is�defined�as�the�differ- ence�between�the�upper�real�limit�of�the�interval�containing�the�largest�score�and�the�lower� real�limit�of�the�interval�containing�the�smallest�score�in�a�collection�of�scores��For�notational� purposes,�the�inclusive�range�(IR)�is�shown�as�IR = URL�of�Xmax�−�LRL�of�Xmin��If�you�think� about�it,�what�we�are�actually�doing�is�extending�the�range�by�one-half�of�an�interval�at�each� extreme,�one-half�an�interval�width�at�the�maximum�value,�and�one-half�an�interval�width�at� the�minimum�value��In�notational�form,�IR = ER + w��For�the�shoe�example,�using�an�interval� width�of�1,�then�IR = URL�of�Xmax�−�LRL�of�Xmin�=�75�5�−�58�5�=�17��In�other�words,�the�actual� inclusive�range�of�the�scores�is�17�(in�dollar�units)��If�the�interval�width�was�instead�2,�then� we�would�add�1�unit�to�each�extreme�rather�than�the��5�unit�that�we�previously�added�to�each� extreme��The�inclusive�range�would�instead�be�18��For�the�example�quiz�data�(presented� in�Table�3�2),�note�that�the�exclusive�range�is�11�and�the�inclusive�range�is�12�(as�interval� width�is�1)� Finally,�we�need�to�examine�the�general�characteristics�of�the�range�(they�are�the�same� for�both�definitions�of�the�range)��First,�the�range�is�simple�to�compute,�which�is�a�definite� advantage�� One� can� look� at� a� collection� of� data� and� almost� immediately,� even� without� a� computer�or�calculator,�determine�the�range� The�second�characteristic�is�the�range�is�influenced�by�extreme�scores,�a�disadvantage�� Because� the� range� is� computed� from� the� two� most� extreme� scores,� this� characteristic� is� quite�obvious��This�might�be�a�problem,�for�instance,�if�all�of�the�salary�data�range�from� $10,000�to�$95,000�except�for�one�individual�with�a�salary�of�$5,000,000��Without�this�out- lier,�the�exclusive�range�is�$85,000��With�the�outlier,�the�exclusive�range�is�$4,990,000��Thus,� the�millionaire’s�salary�has�a�drastic�impact�on�the�range� Third,� the� range� is� only� a� function� of� two� scores,� another� disadvantage�� Obviously,� the� range�is�computed�from�the�largest�and�smallest�scores�and�thus�is�only�a�function�of�those� two�scores��The�spread�of�the�distribution�of�scores�between�those�two�extreme�scores�is�not� at�all�taken�into�account��In�other�words,�for�the�same�maximum�($5,000,000)�and�minimum� ($10,000)�salaries,�the�range�is�the�same�whether�the�salaries�are�mostly�near�the�maximum� 58 An Introduction to Statistical Concepts salary,�mostly�near�the�minimum�salary,�or�spread�out�evenly��The�fourth�characteristic�is� the� range� is� unstable� from� sample� to� sample,� another� disadvantage�� Say� a� second� sample� of�salary�data�yielded�the�exact�same�data�except�for�the�maximum�salary�now�being�a�less� extreme� $100,000�� The� range� is� now� dramatically� different�� Also,� in� statistics� we� tend� to� worry�about�measures�that�are�not�stable�from�sample�to�sample,�as�that�implies�the�results� are�not�very�reliable��Finally,�the�range�is�appropriate�for�data�that�are�ordinal,�interval,�or� ratio�in�measurement�scale� 3.3.2   H Spread The�next�measure�of�dispersion�is�H�spread,�a�variation�on�the�range�measure�with�one� major� exception�� Although� the� range� relies� upon� the� two� extreme� scores,� resulting� in� certain� disadvantages,� H� spread� relies� upon� the� difference� between� the� third� and� first� quartiles�� To� be� more� specific,� H� spread� is� defined� as� Q3� −� Q1,� the� simple� difference� between�the�third�and�first�quartiles��The�term�H�spread�was�developed�by�Tukey�(1977),� H�being�short�for�hinge�from�the�box-and-whisker�plot,�and�is�also�known�as�the�inter- quartile�range� For�the�example�statistics�quiz�data�(presented�in�Table�3�2),�we�already�determined�in� Chapter�2�that�Q3�=�18�0833�and�Q1�=�13�1250��Therefore,�H = Q3�−�Q1�=�18�0833�−�13�1250�=� 4�9583��H�measures�the�range�of�the�middle�50%�of�the�distribution��The�larger�the�value,� the�greater�is�the�spread�in�the�middle�of�the�distribution��The�size�or�magnitude�of�any�of� the� range� measures� takes� on� more� meaning� when� making� comparisons� across� samples�� For�example,�you�might�find�with�salary�data�that�the�range�of�salaries�for�middle�manage- ment�is�smaller�than�the�range�of�salaries�for�upper�management��As�another�example,�we� might�expect�the�salary�range�to�increase�over�time� What� are� the� characteristics� of� H� spread?� The� first� characteristic� is� H� is� unaffected� by� extreme�scores,�an�advantage��Because�we�are�looking�at�the�difference�between�the�third� and�first�quartiles,�extreme�observations�will�be�outside�of�this�range��Second,�H is�not�a� function�of�every�score,�a�disadvantage��The�precise�placement�of�where�scores�fall�above� Q3,�below�Q1,�and�between�Q3�and�Q1�is�not�relevant��All�that�matters�is�that�25%�of�the� scores�fall�above�Q3,�25%�fall�below�Q1,�and�50%�fall�between�Q3�and Q1��Thus,�H�is�not�a� function�of�very�many�of�the�scores�at�all,�just�those�around�Q3 and Q1��Third,�H�is�not�very� stable�from�sample�to�sample,�another�disadvantage�especially�in�terms�of�inferential�sta- tistics�and�one’s�ability�to�be�confident�about�a�sample�estimate�of�a�population�parameter�� Finally,�H�is�appropriate�for�all�scales�of�measurement�except�for�nominal� 3.3.3   deviational Measures In�this�section,�we�examine�deviation�scores,�population�variance�and�standard�deviation,� and�sample�variance�and�standard�deviation,�all�methods�that�deal�with�deviations�from� the�mean� 3.3.3.1   Deviation Scores In� the� last� category� of� measures� of� dispersion� are� those� that� utilize� deviations� from� the� mean��Let�us�define�a�deviation score�as�the�difference�between�a�particular�raw�score�and� the�mean�of�the�collection�of�scores�(population�or�sample,�either�will�work)��For�popula- tion�data,�we�define�a�deviation�as�di�=�Xi�−�μ��In�other�words,�we�can�compute�the�deviation� 59Univariate Population Parameters and Sample Statistics from�the�mean�for�each�individual�or�object��Consider�the�credit�card�dataset�as�shown�in� Table�3�3��To�make�matters�simple,�we�only�have�a�small�population�of�data,�five�values�to� be�exact��The�first�column�lists�the�raw�scores,�which�are�in�this�example�the�number�of� credit�cards�owned�for�five�individuals�and,�at�the�bottom�of�the�first�column,�indicates�the� sum�(Σ�=�30),�population�size�(N�=�5),�and�population�mean�(μ�=�6�0)��The�second�column� provides�the�deviation�scores�for�each�observation�from�the�population�mean�and,�at�the� bottom�of�the�second�column,�indicates�the�sum�of�the�deviation�scores,�denoted�by ( )Xi i N − = ∑ µ 1 From�the�second�column,�we�see�that�two�of�the�observations�have�positive�deviation�scores� as�their�raw�score�is�above�the�mean,�one�observation�has�a�zero�deviation�score�as�that�raw� score�is�at�the�mean,�and�two�other�observations� have�negative�deviation�scores�as�their� raw� score� is� below� the� mean�� However,� when� we� sum� the� deviation� scores,� we� obtain� a� value�of�zero��This�will�always�be�the�case�as�follows: ( )Xi i N − = = ∑ µ 0 1 The� positive� deviation� scores� will� exactly� offset� the� negative� deviation� scores�� Thus� any� measure�involving�simple�deviation�scores�will�be�useless�in�that�the�sum�of�the�deviation� scores�will�always�be�zero,�regardless�of�the�spread�of�the�scores� What�other�alternatives�are�there�for�developing�a�deviational�measure�that�will�yield�a� sum�other�than�zero?�One�alternative�is�to�take�the�absolute�value�of�the�deviation�scores� (i�e�,�where�the�sign�is�ignored)��Unfortunately,�however,�this�is�not�very�useful�mathematically� in� terms� of�deriving� other�statistics,�such� as�inferential� statistics��As�a�result,� this� devia- tional�measure�is�rarely�used�in�statistics� 3.3.3.2   Population Variance and Standard Deviation So�far,�we�found�the�sum�of�the�deviations�and�the�sum�of�the�absolute�deviations�not�to�be� very�useful�in�describing�the�spread�of�the�scores�from�the�mean��What�other�alternative� Table 3.3 Credit�Card�Data X X − μ (X − μ)2 1 −5 25 5 −1 1 6 0 0 8 2 4 10 4 16 =∑ 30 =∑ 0 =∑ 46 N�=�5 μ�=�6 60 An Introduction to Statistical Concepts might�be�useful?�As�shown�in�the�third�column�of�Table�3�3,�one�could�square�the�devia- tion�scores�to�remove�the�sign�problem��The�sum�of�the�squared�deviations�is�shown�at�the� bottom�of�the�column�as�Σ�=�46�and�denoted�as ( )Xi i N − = ∑ µ 2 1 As�you�might�suspect,�with�more�scores,�the�sum�of�the�squared�deviations�will�increase�� So�we�have�to�weigh�the�sum�by�the�number�of�observations�in�the�population��This�yields� a�deviational�measure�known�as�the�population variance,�which�is�denoted�as�σ2�(lower- case�Greek�letter�sigma)�and�computed�by�the�following�formula: σ µ 2 2 1= − = ∑( )X N i i N For�the�credit�card�example,�the�population�variance�σ2�=�(46/5)�=�9�2��We�refer�to�this�par- ticular�formula�for�the�population�variance�as�the�definitional formula,�as�conceptually� that�is�how�we�define�the�variance��Conceptually,�the�variance�is�a�measure�of�the�area�of�a� distribution��That�is,�the�more�spread�out�the�scores,�the�more�area�or�space�the�distribution� takes�up�and�the�larger�is�the�variance��The�variance�may�also�be�thought�of�as�an�average� distance�from�the�mean��The�variance�has�nice�mathematical�properties�and�is�useful�for� deriving�other�statistics,�such�as�inferential�statistics� The�computational formula�for�the�population�variance�is σ2 2 1 1 2 2= −       = = ∑ ∑N X X N i i N i i N This�method�is�computationally�easier�to�deal�with�than�the�definitional�formula��Imagine� if�you�had�a�population�of�100�scores��Using�hand�computations,�the�definitional�formula� would�take�considerably�more�time�than�the�computational�formula��With�the�computer,� this�is�a�moot�point,�obviously��But�if�you�do�have�to�compute�the�population�variance�by� hand,�then�the�easiest�formula�to�use�is�the�computational�one� Exactly� how� does� this� formula� work?� For� the� first� summation� in� the� numerator,� we� square�each�score�first,�then�sum�all�the�squared�scores��This�value�is�then�multiplied�by� the� population� size�� For� the� second� summation� in� the� numerator,� we� sum� all� the� scores� first,�then�square�the�summed�scores��After�subtracting�the�values�computed�in�the�numer- ator,�we�divide�by�the�squared�population�size� For the first summation in the numerator, we square each score first, then sum across the squared scores. For the second summation in the numerator, we sum across the scores �rst, then square the summed scores.N 2 σ 2 = Σ N X2i i=1 Σ N i=1 2 XiN 61Univariate Population Parameters and Sample Statistics The�two�quantities�derived�by�the�summation�operations�in�the�numerator�are�computed� in�much�different�ways�and�generally�yield�different�values� Let� us� return� to� the� credit� card� dataset� and� see� if� the� computational� formula� actually� yields�the�same�value�for�σ2�as�the�definitional�formula�did�earlier�(σ2�=�9�2)��The�computa- tional�formula�shows�σ2�to�be�as�follows: σ2 ( ) ( ) ( ) = −       = − = − == = ∑ ∑N X X N i i N i i N 2 1 1 2 2 2 2 5 226 30 5 1130 900 25 99 2000. which�is�precisely�the�value�we�computed�previously� A�few�individuals�(none�of�us,�of�course)�are�a�bit�bothered�about�the�variance�for�the� following�reason��Say�you�are�measuring�the�height�of�children�in�inches��The�raw�scores� are�measured�in�terms�of�inches,�the�mean�is�measured�in�terms�of�inches,�but�the�vari- ance�is�measured�in�terms�of�inches�squared��Squaring�the�scale�is�bothersome�to�some� as� the� scale� is� no� longer� in� the� original� units� of� measure,� but� rather� a� squared� unit� of� measure—making�interpretation�a�bit�difficult��To�generate�a�deviational�measure�in�the� original�scale�of�inches,�we�can�take�the�square�root�of�the�variance��This�is�known�as�the� standard deviation� and� is� the� final� measure� of� dispersion� we� discuss�� The� population� standard�deviation�is�defined�as�the�positive�square�root�of�the�population�variance�and� is�denoted�by�σ�(i�e�,�σ σ= + 2 )��The�standard�deviation,�then,�is�measured�in�the�original� scale�of�inches��For�the�credit�card�data,�the�standard�deviation�is�computed�as�follows: σ σ= + = + =2 9 2 3 0332. . What�are�the�major�characteristics�of�the�population�variance�and�standard�deviation?� First,�the�variance�and�standard�deviation�are�a�function�of�every�score,�an�advantage�� An� examination� of� either� the� definitional� or� computational� formula� for� the� variance� (and�standard�deviation�as�well)�indicates�that�all�of�the�scores�are�taken�into�account,� unlike�the�range�or�H�spread��Second,�therefore,�the�variance�and�standard�deviation�are� affected�by�extreme�scores,�a�disadvantage��As�we�said�earlier,�if�a�measure�takes�all�of� the�scores�into�account,�then�it�must�take�into�account�the�extreme�scores�as�well��Thus,�a� child�much�taller�than�all�of�the�rest�of�the�children�will�dramatically�increase�the�vari- ance,�as�the�area�or�size�of�the�distribution�will�be�much�more�spread�out��Another�way� to�think�about�this�is�the�size�of�the�deviation�score�for�such�an�outlier�will�be�large,�and� then�it�will�be�squared,�and�then�summed�with�the�rest�of�the�deviation�scores��Thus,�an� outlier�can�really�increase�the�variance��Also,�it�goes�without�saying�that�it�is�always�a� good�idea�when�using�the�computer�to�verify�your�data��A�data�entry�error�can�cause�an� outlier�and�therefore�a�larger�variance�(e�g�,�that�child�coded�as�700�inches�tall�instead�of� 70�will�surely�inflate�your�variance)� Third,� the� variance� and� standard� deviation� are� only� appropriate� for� interval� and� ratio� measurement�scales��Like�the�mean,�this�is�due�to�the�implicit�requirement�of�equal�intervals�� A� fourth� and� final� characteristic� of� the� variance� and� standard� deviation� is� they� are� quite� useful�for�deriving�other�statistics,�particularly�in�inferential�statistics,�another�advantage�� In�fact,�Chapter�9�is�all�about�making�inferences�about�variances,�and�many�other�inferential� statistics�make�assumptions�about�the�variance��Thus,�the�variance�is�quite�important�as�a� measure� of� dispersion�� It� is� also� interesting� to� compare� the� measures� of� central� tendency� with�the�measures�of�dispersion,�as�they�do�share�some�important�characteristics��The�mode� 62 An Introduction to Statistical Concepts and� the� range� share� certain� characteristics�� Both� only� take� some� of� the� data� into� account,� are�simple�to�compute,�and�are�unstable�from�sample�to�sample��The�median�shares�certain� characteristics�with�H�spread��These�are�not�influenced�by�extreme�scores,�are�not�a�function� of�every�score,�are�difficult�to�deal�with�mathematically�due�to�their�instability�from�sample� to�sample,�and�can�be�used�with�all�measurement�scales�except�the�nominal�scale��The�mean� shares�many�characteristics�with�the�variance�and�standard�deviation��These�all�are�a�func- tion�of�every�score,�are�influenced�by�extreme�scores,�are�useful�for�deriving�other�statistics,� and�are�only�appropriate�for�interval�and�ratio�measurement�scales� To�complete�this�section�of�the�chapter,�we�take�a�look�at�the�sample�variance�and�stan- dard�deviation�and�how�they�are�computed�for�large�samples�of�data�(i�e�,�larger�than�our� credit�card�dataset)� 3.3.3.3   Sample Variance and Standard Deviation Most�of�the�time,�we�are�interested�in�computing�the�sample�variance�and�standard�devia- tion;�we�also�often�have�large�samples�of�data�with�multiple�frequencies�for�many�of�the� scores��Here�we�consider�these�last�aspects�of�the�measures�of�dispersion��Recall�when�we� computed� the� sample� statistics� of� central� tendency�� The� computations� were� exactly� the� same� as� with� the� population� parameters� (although� the� notation� for� the� population� and� sample�means�was�different)��There�are�also�no�differences�between�the�sample�and�popu- lation�values�for�the�range,�or�H�spread��However,�there�is�a�difference�between�the�sample� and�population�values�for�the�variance�and�standard�deviation,�as�we�see�next� Recall�the�definitional�formula�for�the�population�variance�as�follows: σ µ 2 2 1= − = ∑( )X N i i N Why� not� just� take� this� equation� and� convert� everything� to� sample� statistics?� In� other� words,�we�could�simply�change�N�to�n�and�μ�to�X – ��What�could�be�wrong�with�that?�The� answer�is�that�there�is�a�problem�which�prevents�us�from�simply�changing�the�notation�in� the�formula�from�population�notation�to�sample�notation� Here�is�the�problem��First,�the�sample�mean,�X – ,�may�not�be�exactly�equal�to�the�popu- lation� mean,� μ�� In� fact,� for� most� samples,� the� sample� mean� will� be� somewhat� different� from� the� population� mean�� Second,� we� cannot� use� the� population� mean� anyway� as� it� is� unknown� (in� most� instances� anyway)�� Instead,� we� have� to� substitute� the� sample� mean� into�the�equation�(i�e�,�the�sample�mean,�X – ,�is�the�sample�estimate�for�the�population�mean,�μ)�� Because� the� sample� mean� is� different� from� the� population� mean,� the� deviations� will� all� be� affected�� Also,� the� sample� variance� that� would� be� obtained� in� this� fashion� would� be� a� biased� estimate� of� the� population� variance�� In� statistics,� bias� means� that� something� is� systematically� off�� In� this� case,� the� sample� variance� obtained� in� this� manner� would� be� systematically�too�small� In�order�to�obtain�an�unbiased�sample�estimate�of�the�population�variance,�the�following� adjustments�have�to�be�made�in�the�definitional�and�computational�formulas,�respectively: s X X n i i n 2 2 1 1 = − − = ∑( ) 63Univariate Population Parameters and Sample Statistics s n X X n n i i n i i n 2 2 1 1 2 1 = −       − = = ∑ ∑ ( ) In�terms�of�the�notation, s2�is�the�sample�variance n�has�been�substituted�for�N X – �has�been�substituted�for�μ These�changes�are�relatively�minor�and�expected��The�major�change�is�in�the�denominator,� where�instead�of�N�for�the�definitional�formula�we�have�n −�1,�and�instead�of�N 2�for�the�com- putational�formula�we�have�n(n�−�1)��This�turns�out�to�be�the�correction�that�early�statisticians� discovered�was�necessary�to�obtain�an�unbiased�estimate�of�the�population�variance� It�should�be�noted�that�(a)�when�sample�size�is�relatively�large�(e�g�,�n�=�1000),�the�correc- tion�will�be�quite�small,�and�(b)�when�sample�size�is�relatively�small�(e�g�,�n�=�5),�the�cor- rection�will�be�quite�a�bit�larger��One�suggestion�is�that�when�computing�the�variance�on�a� calculator�or�computer,�you�might�want�to�be�aware�of�whether�the�sample�or�population� variance�is�being�computed�as�it�can�make�a�difference�(typically�the�sample�variance�is� computed)��The�sample�standard�deviation�is�denoted�by�s�and�computed�as�the�positive� square�root�of�the�sample�variance�s2�(i�e�,�s s= + 2 )� For�our�example�statistics�quiz�data�(presented�in�Table�3�2),�we�have�multiple�frequen- cies�for�many�of�the�raw�scores�which�need�to�be�taken�into�account��A�simple�procedure� for�dealing�with�this�situation�when�using�hand�computations�is�shown�in�Table�3�4��Here� we�see�that�in�the�third�and�fifth�columns,�the�scores�and�squared�scores�are�multiplied�by� their�respective�frequencies��This�allows�us�to�take�into�account,�for�example,�that�the�score� of�19�occurred�four�times��Note�for�the�fifth�column�that�the�frequencies�are�not�squared;� only�the�scores�are�squared��At�the�bottom�of�the�third�and�fifth�columns�are�the�sums�we� need�to�compute�the�parameters�of�interest� Table 3.4 Sums�for�Statistics�Quiz�Data X f fX X2 fX2 9 1 9 81 81 10 1 10 100 100 11 2 22 121 242 12 1 12 144 144 13 2 26 169 338 14 1 14 196 196 15 3 45 225 675 16 1 16 256 256 17 5 85 289 1445 18 3 54 324 972 19 4 76 361 1444 20 1 20 400 400 n�=�25 =∑ 389 =∑ 6293 64 An Introduction to Statistical Concepts The�computations�are�as�follows��We�compute�the�sample�mean�to�be X fX n i i n = = == ∑ 1 389 25 15 5600. The�sample�variance�is�computed�to�be�as�follows: s n fX fX n n i i n i i n 2 2 1 1 2 2 1 25 6 293 389 25 24 = −       − = −= = ∑ ∑ ( ) ( , ) ( ) ( ) == − = = 157 325 151 321 600 6 004 600 10 0067 , , , . Therefore,�the�sample�standard�deviation�is s s= + = + =2 10 0067 3 1633. . 3.3.4   Summary of Measures of dispersion To�summarize�the�measures�of�dispersion�then, � 1�� The� range� is� the� only� appropriate� measure� for� ordinal� data�� The� H� spread,� vari- ance,� and� standard� deviation� can� be� used� with� interval� or� ratio� measurement� scales� � 2�� There�are�no�measures�of�dispersion�appropriate�for�nominal�data� A� summary� of� the� advantages� and� disadvantages� of� each� measure� is� presented� in� Box�3�2� STOp aNd ThINk bOx 3.2 Advantages�and�Disadvantages�of�Measures�of�Dispersion Measure of Dispersion Advantages Disadvantages Range •�Simple�to�compute •��Can�be�used�with�ordinal,�interval,�and� ratio�measurement�scales�of�variables •�Influenced�by�extreme�scores •�Function�of�only�two�scores •�Unstable�from�sample�to�sample •�Cannot�be�used�with�nominal�data H�spread •�Unaffected�by�extreme�scores •��Can�be�used�with�ordinal,�interval,�and� ratio�measurement�scales�of�variables •��Not�a�function�of�all�scores�in�the�distribution •��Difficult�to�deal�with�mathematically�due�to� its�instability •�Cannot�be�used�with�nominal�data Variance�and� standard� deviation •�Function�of�all�scores�in�the�distribution •�Useful�for�deriving�other�statistics •��Can�be�used�with�interval�and�ratio� measurement�scales�of�variables •�Influenced�by�extreme�scores •��Cannot�be�used�with�nominal�or�ordinal� variables 65Univariate Population Parameters and Sample Statistics 3.4 SPSS The�purpose�of�this�section�is�to�see�what�SPSS�has�to�offer�in�terms�of�computing�mea- sures� of� central� tendency� and� dispersion�� In� fact,� SPSS� provides� us� with� many� differ- ent�ways�to�obtain�such�measures��The�three�programs� that�we�have�found�to�be�most� useful� for� generating� descriptive� statistics� covered� in� this� chapter� are� “Explore,” “Descriptives,” and “Frequencies.”�Instructions�for�using�each�are�provided�as� follows� Explore Explore: Step 1.� The� first� program,�“Explore,”� can� be� invoked� by� clicking� on� “Analyze”�in�the�top�pulldown�menu,�then�“Descriptive Statistics,”�and�then� “Explore.”� Following� the� screenshot,� as� follows,� will� produce� the�Explore� dialog� box�� For� brevity,� we� have� not� reproduced� this� initial� screenshot� when� we� discuss� the� “Descriptives”� and�“Frequencies”� programs;� however,� you� can� see� here� where� they�can�be�found�from�the�pulldown�menus� A B C Descriptives and frequencies can also be invoked from this menu. Explore: Step 1 Explore: Step 2.�Next,�from�the�main�“Explore”�dialog�box,�click�the�variable�of� interest�from�the�list�on�the�left�(e�g�,�quiz),�and�move�it�into�the�“Dependent List”� box�by�clicking�on�the�arrow�button�(see�screenshot�for�“Explore: Step 2”)��Then� click�on�the�“OK”�button� 66 An Introduction to Statistical Concepts Explore: Step 2 Select the variable of interest from the list on the left and use the arrow to move to the “Dependent List” box on the right. This� will� automatically� generate� the� mean,� median� (approximate),� variance,� standard� deviation,�minimum,�maximum,�exclusive�range,�and�interquartile�range�(H)�(plus�skew- ness�and�kurtosis�to�be�covered�in�Chapter�4)��The�SPSS�output�from�“Explore”�is�shown� in�the�top�panel�of�Table�3�5� Table 3.5 Select�SPSS�Output�for�Statistics�Quiz�Data�Using�“Explore,”�“Descriptives,”� and “Frequencies” Descriptives Statistic Std. Error Mean .63267 Lower bound95% Confidence interval for mean Upper bound 5% Trimmed mean Median Variance Std. deviation Minimum Maximum Range Interquartile range Skewness .464 Quiz Kurtosis 15.5600 14.2542 16.8658 15.6778 17.0000 10.007 3.16333 9.00 20.00 11.00 5.00 –.598 –.741 .902 Descriptive Statistics N Range Minimum Maximum Mean Std. Deviation Variance Quiz 25 11.00 9.00 20.00 15.5600 3.16333 10.007 Valid N (listwise) 25 �is is an example of the output generated using the“Descriptives” procedure in SPSS. This is an example of the output generated using the “Explore” procedure in SPSS. By default, a stem-and- leaf plot and boxplot are also generated from “Explore” (but are not presented here). 67Univariate Population Parameters and Sample Statistics Table 3.5 (continued) Select�SPSS�Output�for�Statistics�Quiz�Data�Using�“Explore,”�“Descriptives,”� and�“Frequencies” Statistics Quiz Valid 25N Missing 0 Mean 15.5600 Median 16.3333a Mode 17.00 Std. deviation 3.16333 Variance 10.007 Range 11.00 Minimum 9.00 Maximum 20.00 a Calculated from grouped data. �is is an example of the output generated using the “Frequencies” procedure in SPSS. By default, a frequency table is generated from “Frequencies” (but is not presented here). Descriptives Descriptives: Step 1.� The� second� program� to� consider� is� “Descriptives.”� It� can� also� be� accessed� by� going� to�“Analyze”� in� the� top� pulldown� menu,� then� selecting� “Descriptive Statistics,”� and� then�“Descriptives”� (see�“Explore: Step 1”� for�a�screenshot�of�this�step)� Descriptives: Step 2.� This� will� bring� up� the� “Descriptives”� dialog� box� (see� “Descriptives: Step 2”� screenshot)�� From� the� main�“Descriptives”� dialog� box,� click�the�variable�of�interest�(e�g�,�quiz)�and�move�into�the�“Variable(s)”�box�by�clicking� on�the�arrow��Next,�click�on�the�“Options”�button� Select the variable of interest from the list on the left and use the arrow to move to the “Variable” box on the right. Clicking on “Options” will allow you to select various statistics to be generated. Descriptives: Step 2 Descriptives: Step 3.�A�new�box�called�“Descriptives: Options”�will�appear� (see�“Descriptives: Step 3”�screenshot),�and�you�can�simply�place�a�checkmark�in� the�boxes�for�the�statistics�that�you�want�to�generate��From�here,�you�can�obtain�the�mean,� variance,� standard� deviation,� minimum,� maximum,� and� exclusive� range� (among� oth- ers�available)��The�SPSS�output�from�“Descriptives”�is�shown�in�the�middle�panel�of� 68 An Introduction to Statistical Concepts Table�3�5��After�making�your�selections,�click�on�“Continue.”�You�will�then�be�returned� to�the�main�“Descriptives”�dialog�box��From�there,�click�“OK.” Descriptives: Step 3 Statistics available when clicking on “Options” from the main dialog box for Descriptives. Placing a checkmark will generate the respective statistic in the output. Frequencies Frequencies: Step 1.� The� final� program� to� consider� is� “Frequencies.”� Go� to� “Analyze”� in� the� top� pulldown� menu,� then�“Descriptive Statistics,”� and� then� select�“Frequencies.”�See�“Explore: Step 1”�for�a�screenshot�of�this�step� Frequencies: Step 2.�The�“Frequencies”�dialog�box�will�open�(see�screenshot�for� “Frequencies: Step 2”)�� From� this� main� “Frequencies”� dialog� box,� click� the� variable� of� interest� from� the� list� on� the� left� (e�g�,� quiz)� and� move� it� into� the�“Variables”� box� by� clicking� on� the� arrow� button�� By� default,� there� is� a� checkmark� in� the� box� for� “Display frequency tables,”�and�we�will�keep�this�checked��This�(i�e�,�selecting�“Display fre- quency tables”)�will�generate�a�table�of�frequencies,�relative�frequencies,�and�cumulative� relative�frequencies��Then�click�on�“Statistics”�located�in�the�top�right�corner� Select the variable of interest from the list on the left and use the arrow to move to the “Variable” box on the right. Clicking on “Statistics” will allow you to select various statistics to be generated. Frequencies: Step 2 69Univariate Population Parameters and Sample Statistics Frequencies: Step 3.�A�new�dialog�box�labeled�“Frequencies: Statistics”�will� appear� (see� screenshot� for�“Frequencies: Step 3”)�� Here� you� can� obtain� the� mean,� median� (approximate),� mode,� variance,� standard� deviation,� minimum,� maximum,� and� exclusive�range�(among�others)��In�order�to�obtain�the�closest�approximation�to�the�median,� check�the�“Values are group midpoints”�box,�as�shown��However,�it�should�be�noted� that�these�values�are�not�always�as�precise�as�those�from�the�formula�given�earlier�in�this� chapter��The�SPSS�output�from�“Frequencies”�is�shown�in�the�bottom�panel�of�Table�3�5�� After� making� your� selections,� click� on� “Continue.”� You� will� then� be� returned� to� the� main�“Frequencies”�dialog�box��From�there,�click�“OK.” Options available when clicking on “Statistics” from the main dialog box for Frequencies. Placing a checkmark will generate the respective statistic in the output. Check this for better accuracy with quartiles and percentiles (e.g., the median). Frequencies: Step 3 3.5 Templates for Research Questions and APA-Style Paragraph As�we�stated�in�Chapter�2,�depending�on�the�purpose�of�your�research�study,�you�may� or�may�not�write�a�research�question�that�corresponds�to�your�descriptive�statistics��If� the� end� result� of� your� research� paper� is� to� present� results� from� inferential� statistics,� it�may�be�that�your�research�questions�correspond�only�to�those�inferential�questions� and�thus�no�question�is�presented�to�represent�the�descriptive�statistics��That�is�quite� common��On�the�other�hand,�if�the�ultimate�purpose�of�your�research�study�is�purely� descriptive� in� nature,� then� writing� one� or� more� research� questions� that� correspond� to� the� descriptive� statistics� is� not� only� entirely� appropriate� but� (in� most� cases)� abso- lutely� necessary�� At� this� time,� let� us� revisit� our� graduate� research� assistant,� Marie,� who�was�introduced�at�the�beginning�of�the�chapter��As�you�may�recall,�her�task�was� 70 An Introduction to Statistical Concepts to�summarize�data�from�25�students�enrolled�in�a�statistics�course��The�questions�that� Marie’s�faculty�mentor�shared�with�her�were�as�follows:�How can quiz scores of students enrolled in an introductory statistics class be summarized using measures of central tendency? Measures of dispersion?�A�tem- plate�for�writing�descriptive�research�questions�for�summarizing�data�with�measures� of�central�tendency�and�dispersion�are�presented�as�follows: How can [variable] be summarized using measures of central tendency? Measures of dispersion? Next,�we�present�an�APA-like�paragraph�summarizing�the�results�of�the�statistics�quiz�data� example�answering�the�questions�posed�to�Marie: As shown in Table 3.5, scores ranged from 9 to 20. The mean was 15.56, the approximate median was 17.00 (or 16.33 when calculated from grouped data), and the mode was 17.00. Thus, the scores tended to lump together at the high end of the scale. A negatively skewed dis- tribution is suggested given that the mean was less than the median and mode. The exclusive range was 11, H spread (interquartile range) was 5.0, variance was 10.007, and standard deviation was 3.1633. From this, we can tell that the scores tended to be quite variable. For example, the middle 50% of the scores had a range of 5 (H spread) indicating that there was a reasonable spread of scores around the median. Thus, despite a high “average” score, there were some low performing students as well. These results are consistent with those described in Section 2.4. 3.6 Summary In�this�chapter,�we�continued�our�exploration�of�descriptive�statistics�by�considering�some� basic� univariate� population� parameters� and� sample� statistics�� First,� we� examined� sum- mation� notation� which� is� necessary� in� many� areas� of� statistics�� Then� we� looked� at� the� most�commonly�used�measures�of�central�tendency,�the�mode,�the�median,�and�the�mean�� The�next�section�of�the�chapter�dealt�with�the�most�commonly�used�measures�of�disper- sion��Here�we�discussed�the�range�(both�exclusive�and�inclusive�ranges),�H�spread,�and�the� population�variance�and�standard�deviation,�as�well�as�the�sample�variance�and�standard� deviation��We�concluded�the�chapter�with�a�look�at�SPSS,�a�template�for�writing�research� questions�for�summarizing�data�using�measures�of�central�tendency�and�dispersion,�and� then�developed�an�APA-style�paragraph�of�results��At�this�point,�you�should�have�met�the� following�objectives:�(a)�be�able�to�understand�and�utilize�summation�notation,�(b)�be�able� to�determine�and�interpret�the�three�commonly�used�measures�of�central�tendency,�and�(c)�be� able� to� determine� and� interpret� different� measures� of� dispersion�� A� summary� of� when� these�descriptive�statistics�are�most�appropriate�for�each�of�the�scales�of�measurement�is� shown�in�Box�3�3��In�the�next�chapter,�we�will�have�a�more�extended�discussion�of�the�nor- mal�distribution�(previously�introduced�in�Chapter�2),�as�well�as�the�use�of�standard�scores� as�an�alternative�to�raw�scores� 71Univariate Population Parameters and Sample Statistics STOp aNd ThINk bOx 3.3 Appropriate�Descriptive�Statistics Measurement Scale Measure of Central Tendency Measure of Dispersion Nominal •�Mode Ordinal •�Mode •�Range •�Median •�H�spread Interval/ratio •�Mode •�Range •�Median •�H�spread •�Mean •�Variance�and�standard�deviation Problems Conceptual problems 3.1� �Adding�just�one�or�two�extreme�scores�to�the�low�end�of�a�large�distribution�of�scores� will�have�a�greater�effect�on�which�one�of�the�following? � a�� Q�than�the�variance� � b�� The�variance�than�Q� � c�� The�mode�than�the�median� � d�� None�of�the�above�will�be�affected� 3.2� The�variance�of�a�distribution�of�scores�is�which�one�of�the�following? � a�� Always�1� � b�� May�be�any�number,�negative,�0,�or�positive� � c�� May�be�any�number�greater�than�0� � d�� May�be�any�number�equal�to�or�greater�than�0� 3.3� �A�20-item�statistics�test�was�graded�using�the�following�procedure:�a�correct�response� is�scored�+1,�a�blank�response�is�scored�0,�and�an�incorrect�response�is�scored�−1��The� highest�possible�score�is�+20;�the�lowest�score�possible�is�−20��Because�the�variance�of� the�test�scores�for�the�class�was�−3,�we�conclude�which�one�of�the�following? � a�� The�class�did�very�poorly�on�the�test� � b�� The�test�was�too�difficult�for�the�class� � c�� Some�students�received�negative�scores� � d�� A�computational�error�certainly�was�made� 3.4� �Adding� just� one� or� two� extreme� scores� to� the� high� end� of� a� large� distribution� of� scores�will�have�a�greater�effect�on�which�one�of�the�following? � a�� The�mode�than�the�median� � b�� The�median�than�the�mode� � c�� The�mean�than�the�median� � d�� None�of�the�above�will�be�affected� 3.5� �In� a� negatively� skewed� distribution,� the� proportion� of� scores� between� Q1� and� the� median�is�less�than��25��True�or�false? 72 An Introduction to Statistical Concepts 3.6� Median�is�to�ordinal�as�mode�is�to�nominal��True�or�false? 3.7� �I�assert�that�it�is�appropriate�to�utilize�the�mean�in�dealing�with�class�rank�data��Am� I�correct? 3.8� �For� a� perfectly� symmetrical� distribution� of� data,� the� mean,� median,� and� mode� are� calculated�� I� assert� that� the� values� of� all� three� measures� are� necessarily� equal�� Am� I correct? 3.9� �In� a� distribution� of� 100� scores,� the� top� 10� examinees� received� an� additional� bonus� of� 5� points�� Compared� to� the� original� median,� I� assert� that� the� median� of� the� new� (revised)�distribution�will�be�the�same�value��Am�I�correct? 3.10� �A�set�of�eight�scores�was�collected,�and�the�variance�was�found�to�be�0��I�assert�that�a� computational�error�must�have�been�made��Am�I�correct? 3.11� �Researcher�A�and�Researcher�B�are�using�the�same�dataset�(n�=�10),�where�Researcher� A� computes� the� sample� variance,� and� Researcher� B� computes� the� population� vari- ance�� The� values� are� found� to� differ� by� more� than� rounding� error�� I� assert� that� a� computational�error�must�have�been�made��Am�I�correct? 3.12� �For� a� set� of� 10� test� scores,� which� of� the� following� values� will� be� different� for� the� sample�statistic�and�population�parameter? � a�� Mean � b�� H � c�� Range � d�� Variance 3.13� Median�is�to�H�as�mean�is�to�standard�deviation��True�or�false? 3.14� �The� inclusive� range� will� be� greater� than� the� exclusive� range� for� any� data�� True� or� false? 3.15� �For�a�set�of�IQ�test�scores,�the�median�was�computed�to�be�95�and�Q1�to�be�100��I�assert� that�the�statistician�is�to�be�commended�for�their�work��Am�I�correct? 3.16� �A� physical� education� teacher� is� conducting� research� related� to� elementary� chil- dren’s� time� spent� in� physical� activity�� As� part� of� his� research,� he� collects� data� from�schools�related�to�the�number�of�minutes�that�they�require�children�to�par- ticipate� in� physical� education� classes�� She� finds� that� the� most� frequently� occur- ring�number�of�minutes�required�for�children�to�participate�in�physical�education� classes�is�22�00�minutes��Which�measure�of�central�tendency�does�this�statement� represent? � a�� Mean � b�� Median � c�� Mode � d�� Range � e�� Standard�deviation 3.17� �A�physical�education�teacher�is�conducting�research�related�to�elementary�children’s� time�spent�in�physical�activity��As�part�of�his�research,�he�collects�data�from�schools� related�to�the�number�of�minutes�that�they�require�children�to�participate�in�physical� education�classes��She�finds�that�the�fewest�number�of�minutes�required�per�week�is� 73Univariate Population Parameters and Sample Statistics 15�minutes�and�the�maximum�number�of�minutes�is�45��Which�measure�of�dispersion� do�these�values�reflect? � a�� Mean � b�� Median � c�� Mode � d�� Range � e�� Standard�deviation 3.18� �A�physical�education�teacher�is�conducting�research�related�to�elementary�children’s� time�spent�in�physical�activity��As�part�of�his�research,�he�collects�data�from�schools� related�to�the�number�of�minutes�that�they�require�children�to�participate�in�physical� education�classes��She�finds�that�50%�of�schools�required�20�or�more�minutes�of�par- ticipation�in�physical�education�classes��Which�measure�of�central�tendency�does�this� statement�represent? � a�� Mean � b�� Median � c�� Mode � d�� Range � e�� Standard�deviation 3.19� �One�item�on�a�survey�of�recent�college�graduates�asks�students�to�indicate�if�they�plan� to�live�within�a�50�mile�radius�of�the�university��Responses�to�the�question�include� “yes”�or�“no�”�The�researcher�who�gathers�these�data�computes�the�variance�of�this� variable��Is�this�appropriate�given�the�measurement�scale�of�this�variable? 3.20� �A�marriage�and�family�counselor�randomly�samples�250�clients�and�collects�data�on� the�number�of�hours�they�spent�in�counseling�during�the�past�year��What�is�the�most� stable� measure� of� central� tendency� to� compute� given� the� measurement� scale� of� this� variable? � a�� Mean � b�� Median � c�� Mode � d�� Range � e�� Standard�deviation Computational problems 3.1� �For�the�population�data�in�Computational�Problem�2�1,�and�again�assuming�an�inter- val�width�of�1,�compute�the�following: � a�� Mode � b�� Median � c�� Mean � d�� Exclusive�and�inclusive�range � e�� H�spread � f�� Variance�and�standard�deviation 74 An Introduction to Statistical Concepts 3.2� �Given�a�negatively�skewed�distribution�with�a�mean�of�10,�a�variance�of�81,�and�N�=�500,� what�is�the�numerical�value�of�the�following? ( )Xi i N − = ∑ µ 1 3.3� �For�the�sample�data�in�Computational�Problem�2�2,�and�again�assuming�an�interval� width�of�1,�compute�the�following: � a�� Mode � b�� Median � c�� Mean � d�� Exclusive�and�inclusive�range � e�� H�spread � f�� Variance�and�standard�deviation 3.4� �For�the�sample�data�in�Computational�Problem�4�(classroom�test�scores)�of�Chapter�2,� and�again�assuming�an�interval�width�of�1,�compute�the�following: � a�� Mode � b�� Median � c�� Mean � d�� Exclusive�and�inclusive�range � e�� H�spread � f�� Variance�and�standard�deviation 3.5� A�sample�of�30�test�scores�is�as�follows: X f 8 1 9 4 10 3 11 7 12 9 13 0 14 0 15 3 16 0 17 0 18 2 19 0 20 1 75Univariate Population Parameters and Sample Statistics Compute�each�of�the�following�statistics: � a�� Mode � b�� Median � c�� Mean � d�� Exclusive�and�inclusive�range � e�� H�spread � f�� Variance�and�standard�deviation 3.6� �Without�doing�any�computations,�which�of�the�following�distributions�has�the�largest� variance? X f Y f Z f 15 6 15 4 15 2 16 7 16 7 16 7 17 9 17 11 17 13 18 9 18 11 18 13 19 7 19 7 19 7 20 6 20 4 20 2 3.7� �Without� doing� any� computations,� which� of� the� following� distributions� has� the� largest�variance? X f Y f Z f 5 3 5 1 5 6 6 2 6 0 6 2 7 4 7 4 7 3 8 3 8 3 8 1 9 5 9 2 9 0 10 2 10 1 10 7 Interpretive problems 3.1� Select�one�interval�or�ratio�variable�from�the�survey1�sample�dataset�on�the�website� � a�� Calculate�all�of�the�measures�of�central�tendency�and�dispersion�discussed�in�this� chapter�that�are�appropriate�for�this�measurement�scale� � b�� Write�an�APA-style�paragraph�which�summarizes�the�findings� 3.2� Select�one�ordinal�variable�from�the�survey1�sample�dataset�on�the�website� � a�� Calculate� the� measures� of� central� tendency� and� dispersion� discussed� in� this� chapter�that�are�appropriate�for�this�measurement�scale� � b�� Write�an�APA-style�paragraph�which�summarizes�the�findings� 77 4 Normal Distribution and Standard Scores Chapter Outline 4�1� Normal�Distribution 4�1�1� History 4�1�2� Characteristics 4�2� Standard�Scores 4�2�1� z�Scores 4�2�2� Other�Types�of�Standard�Scores 4�3� Skewness�and�Kurtosis�Statistics 4�3�1� Symmetry 4�3�2� Skewness 4�3�3� Kurtosis 4�4� SPSS 4�5� Templates�for�Research�Questions�and�APA-Style�Paragraph Key Concepts � 1�� Normal�distribution�(family�of�distributions,�unit�normal�distribution,�area�under� the�curve,�points�of�inflection,�asymptotic�curve) � 2�� Standard�scores�[z,�College�Entrance�Examination�Board�(CEEB),�T,�IQ] � 3�� Symmetry � 4�� Skewness�(positively�skewed,�negatively�skewed) � 5�� Kurtosis�(leptokurtic,�platykurtic,�mesokurtic) � 6�� Moments�around�the�mean In�Chapter�3,�we�continued�our�discussion�of�descriptive�statistics,�previously�defined� as�techniques�that�allow�us�to�tabulate,�summarize,�and�depict�a�collection�of�data�in� an� abbreviated� fashion�� There� we� considered� the� following� three� topics:� summation� notation� (method� for� summing� a� set� of� scores),� measures� of� central� tendency� (mea- sures� for� boiling� down� a� set� of� scores� into� a� single� value� used� to� represent� the� data),� and�measures�of�dispersion�(measures�dealing�with�the�extent�to�which�a�collection�of� scores�vary)� 78 An Introduction to Statistical Concepts In�this�chapter,�we�delve�more�into�the�field�of�descriptive�statistics�in�terms�of�three�addi- tional�topics��First,�we�consider�the�most�commonly�used�distributional�shape,�the�normal� distribution��Although�in�this�chapter�we�discuss�the�major�characteristics�of�the�normal�dis- tribution�and�how�it�is�used�descriptively,�in�later�chapters�we�see�how�the�normal�distribu- tion�is�used�inferentially�as�an�assumption�for�certain�statistical�tests��Second,�several�types� of�standard�scores�are�considered��To�this�point,�we�have�looked�at�raw�scores�and�deviation� scores��Here�we�consider�scores�that�are�often�easier�to�interpret,�known�as�standard�scores�� Then� we� examine� two� other� measures� useful� for� describing� a� collection� of� data,� namely,� skewness�and�kurtosis��As�we�show�shortly,�skewness�refers�to�the�lack�of�symmetry�of�a�dis- tribution�of�scores,�and�kurtosis�refers�to�the�peakedness�of�a�distribution�of�scores��Finally,� we� provide� a� template� for� writing� research� questions,� develop� an� APA-style� paragraph� of� results�for�an�example�dataset,�and�also�illustrate�the�use�of�SPSS��Concepts�to�be�discussed� include�the�normal�distribution�(i�e�,�family�of�distributions,�unit�normal�distribution,�area� under�the�curve,�points�of�inflection,�asymptotic�curve),�standard�scores�(e�g�,�z,�CEEB,�T,�IQ),� symmetry,�skewness�(positively�skewed,�negatively�skewed),�kurtosis�(leptokurtic,�platykur- tic,�mesokurtic),�and�moments�around�the�mean��Our�objectives�are�that�by�the�end�of�this� chapter,�you�will�be�able�to�(a)�understand�the�normal�distribution�and�utilize�the�normal� table,� (b)� determine� and� interpret� different� types� of� standard� scores,� particularly� z� scores,� and�(c)�understand�and�interpret�skewness�and�kurtosis�statistics� 4.1 Normal Distribution You�may�remember�the�following�research�scenario�that�was�first�introduced�in�Chapter�2�� We�will�revisit�Marie�in�this�chapter� Marie,�a�graduate�student�pursuing�a�master’s�degree�in�educational�research,�has�been� assigned�to�her�first�task�as�a�research�assistant��Her�faculty�mentor�has�given�Marie� quiz�data�collected�from�25�students�enrolled�in�an�introductory�statistics�course�and� has� asked� Marie� to� summarize� the� data�� The� faculty� member,� who� continues� to� be� pleased� with� the� descriptive� analysis� and� presentation� of� results� previously� shared,� has� asked� Marie� to� revisit� the� following� research� question� related� to� distributional� shape:� What is the distributional shape of the statistics quiz score?� Additionally,� Marie’s� faculty� mentor� has� asked� Marie� to� standardize� the� quiz� score� and� compare� student� 1� to�student�3�relative�to�the�mean��The�corresponding�research�question�that�Marie�is� provided� for� this� analysis� is� as� follows:� In standard deviation units, what is the relative standing to the mean of student 1 compared to student 3? Recall�from�Chapter�2�that�there�are�several�commonly�seen�distributions��The�most�com- monly�observed�and�used�distribution�is�the�normal�distribution��It�has�many�uses�both�in� descriptive�and�inferential�statistics,�as�we�show��In�this�section,�we�discuss�the�history�of� the�normal�distribution�and�the�major�characteristics�of�the�normal�distribution� 4.1.1   history Let�us�first�consider�a�brief�history�of�the�normal�distribution��From�the�time�that�data�were� collected�and�distributions�examined,�a�particular�bell-shaped�distribution�occurred�quite� often�for�many�variables�in�many�disciplines�(e�g�,�many�physical,�cognitive,�physiological,� 79Normal Distribution and Standard Scores and� motor� attributes)�� This� has� come� to� be� known� as� the� normal distribution�� Back� in� the� 1700s,� mathematicians� were� called� on� to� develop� an� equation� that� could� be� used� to� approximate�the�normal�distribution��If�such�an�equation�could�be�found,�then�the�prob- ability� associated� with� any� point� on� the� curve� could� be� determined,� and� the� amount� of� space�or�area�under�any�portion�of�the�curve�could�also�be�determined��For�example,�one� might� want� to� know� what� the� probability� of� being� taller� than� 6′2″� would� be� for� a� male,� given� that� height� is� normally� shaped� for� each� gender�� Until� the� 1920s,� the� development� of� this� equation� was� commonly� attributed� to� Karl� Friedrich� Gauss�� Until� that� time,� this� distribution�was�known�as�the�Gaussian�curve��However,�in�the�1920s,�Karl�Pearson�found� this�equation�in�an�earlier�article�written�by�Abraham�DeMoivre�in�1733�and�renamed�the� curve�as�the�normal�distribution��Today�the�normal�distribution�is�obviously�attributed�to� DeMoivre� 4.1.2   Characteristics There�are�seven�important�characteristics�of�the�normal�distribution��Because�the�nor- mal�distribution�occurs�frequently,�features�of�the�distribution�are�standard�across�all� normal� distributions�� This� “standard� curve”� allows� us� to� make� comparisons� across� two�or�more�normal�distributions�as�well�as�look�at�areas�under�the�curve,�as�becomes� evident� 4.1.2.1   Standard Curve First,�the�normal�distribution�is�a�standard�curve�because�it�is�always�(a)�symmetric�around� the�mean,�(b)�unimodal,�and�(c)�bell-shaped��As�shown�in�Figure�4�1,�if�we�split�the�distri- bution�in�one-half�at�the�mean�(μ),�the�left-hand�half�(below�the�mean)�is�the�mirror�image� of�the�right-hand�half�(above�the�mean)��Also,�the�normal�distribution�has�only�one�mode,� and�the�general�shape�of�the�distribution�is�bell-shaped�(some�even�call�it�the�bell-shaped� curve)��Given�these�conditions,�the�mean,�median,�and�mode�will�always�be�equal�to�one� another�for�any�normal�distribution� –3 –2 –1 1 Mean 2 3 13.59%13.59% 34.13% 34.13% 2.14% 2.14% FIGuRe 4.1 The�normal�distribution� 80 An Introduction to Statistical Concepts 4.1.2.2   Family of Curves Second,�there�is�no�single�normal�distribution,�but�rather�the�normal�distribution�is�a�fam- ily�of�curves��For�instance,�one�particular�normal�curve�has�a�mean�of�100�and�a�vari- ance�of�225�(recall�that�the�standard�deviation�is�the�square�root�of�the�variance;�thus,� the�standard�deviation�in�this�instance�is�15)��This�normal�curve�is�exemplified�by�the� Wechsler� intelligence� scales�� Another� specific� normal� curve� has� a� mean� of� 50� and� a� variance�of�100�(standard�deviation�of�10)��This�normal�curve�is�used�with�most�behav- ior�rating�scales��In�fact,�there�are�an�infinite�number�of�normal�curves,�one�for�every� distinct�pair�of�values�for�the�mean�and�variance��Every�member�of�the�family�of�nor- mal� curves� has� the� same� characteristics;� however,� the� scale� of� X,� the� mean� of� X,� and� the�variance�(and�standard�deviation)�of�X�can�differ�across�different�variables�and/or� populations� To� keep� the� members� of� the� family� distinct,� we� use� the� following� notation�� If� the� variable�X�is�normally�distributed,�we�write�X ∼ N(μ,�σ2)��This�is�read�as�“X�is�distrib- uted�normally�with�population�mean�μ�and�population�variance�σ2�”�This�is�the�general� notation;�for�notation�specific�to�a�particular�normal�distribution,�the�mean�and�vari- ance�values�are�given��For�our�examples,�the�Wechsler�intelligence�scales�are�denoted� by�X ∼ N(100,�225),�whereas�the�behavior�rating�scales�are�denoted�by�X ∼ N(50,�100)�� Narratively�speaking�therefore,�the�Wechsler�intelligence�scale�is�distributed�normally� with�a�population�mean�of�100�and�population�variance�of�225��A�similar�interpretation� can�be�made�on�the�behavior�rating�scale� 4.1.2.3   Unit Normal Distribution Third,�there�is�one�particular�member�of�the�family�of�normal�curves�that�deserves�addi- tional�attention��This�member�has�a�mean�of�0�and�a�variance�(and�standard�deviation)�of�1� and�thus�is�denoted�by�X ∼ N(0,�1)��This�is�known�as�the�unit normal distribution�(unit� referring�to�the�variance�of�1)�or�as�the�standard unit normal distribution��On�a�related� matter,�let�us�define�a�z�score�as�follows: z X i i= −( )µ σ The� numerator� of� this� equation� is� actually� a� deviation� score,� previously� described� in� Chapter� 3,� and� indicates� how� far� above� or� below� the� mean� an� individual’s� score� falls�� When�we�divide�the�deviation�from�the�mean�(i�e�,�the�numerator)�by�the�standard�devia- tion�(i�e�,�denominator),�the�value�derived�indicates�how�many�deviations�above�or�below�the� mean�an�individual’s�score�falls��If�one�individual�has�a�z�score�of�+1�00,�then�the�person� falls�one�standard�deviation�above�the�mean��If�another�individual�has�a�z�score�of�−2�00,� then�that�person�falls�two�standard�deviations�below�the�mean��There�is�more�to�say�about�this� as�we�move�along�in�this�section� 4.1.2.4   Area The� fourth� characteristic� of� the� normal� distribution� is� the� ability� to� determine� any� area� under�the�curve��Specifically,�we�can�determine�the�area�above�any�value,�the�area�below� any�value,�or�the�area�between�any�two�values�under�the�curve��Let�us�chat�about�what�we� mean�by�area��If�you�return�to�Figure�4�1,�areas�for�different�portions�of�the�curve�are�listed�� 81Normal Distribution and Standard Scores Here�area�is�defined�as�the�percentage�or�amount�of�space�of�a�distribution,�either�above� a� certain� score,� below� a� certain� score,� or� between� two� different� scores�� For� example,� we� see�that�the�area�between�the�mean�and�one�standard�deviation�above�the�mean�is�34�13%�� In�other�words,�roughly�a�third�of�the�entire�distribution�falls�into�that�region��The�entire� area� under� the� curve� then� represents� 100%,� and� smaller� portions� of� the� curve� represent� somewhat�less�than�that� For�example,�say�you�wanted�to�know�what�percentage�of�adults�had�an�IQ�score�over�120,� what�percentage�of�adults�had�an�IQ�score�under�107,�or�what�percentage�of�adults�had�an�IQ� score�between�107�and�120��How�can�we�compute�these�areas�under�the�curve?�A�table�of�the� unit�normal�distribution�has�been�developed�for�this�purpose��Although�similar�tables�could� also�be�developed�for�every�member�of�the�normal�family�of�curves,�these�are�unnecessary,� as�any�normal�distribution�can�be�converted�to�a�unit�normal�distribution��The�unit�normal� table�is�given�in�Table�A�1� Turn�to�Table�A�1�now�and�familiarize�yourself�with�its�contents��To�help�illustrate,�a� portion�of�the�table�is�presented�in�Figure�4�2��The�first�column�simply�lists�the�values� of�z��These�are�standardized�scores�on�the�X�axis��Note�that�the�values�of�z�only�range� from�0�to�4�0��There�are�two�reasons�for�this��First,�values�above�4�0�are�rather�unlikely,� as�the�area�under�that�portion�of�the�curve�is�negligible�(less�than��003%)��Second,�val- ues�below�0�(i�e�,�negative�z�scores)�are�not�really�necessary�to�present�in�the�table,�as�the� normal�distribution�is�symmetric�around�the�mean�of�0��Thus,�that�portion�of�the�table� would�be�redundant�and�is�not�shown�here�(we�show�how�to�deal�with�this�situation�for� some�example�problems�in�a�bit)� The� second� column,� labeled� P(z),� gives� the� area� below� the� respective� value� of� z—in� other�words,�the�area�between�that�value�of�z�and�the�most�extreme�left-hand�portion�of� the�curve�[i�e�,�−∞�(negative�infinity)�on�the�far�negative�or�left-hand�side�of�0]��So�if�we� wanted�to�know�what�the�area�was�below�z�=�+1�00,�we�would�look�in�the�first�column� under�z�=�1�00�and�then�look�in�the�second�column�(P(z))�to�find�the�area�of��8413��This� value,��8413,�represents�the�percentage�of�the�distribution�that�is�smaller�than�z�of�+1�00��It� also�represents�the�probability�that�a�score�will�be�smaller�than�z�of�+1�00��In�other�words,� about�84%�of�the�distribution�is�less�than�z�of�+1�00,�and�the�probability�that�a�value�will� be�less�than�z�of�+1�00�is�about�84%��More�examples�are�considered�later�in�this�section� z scores are standardized scores on the X axis. .5000000 .5039694 .5079783 .5119665 .5159534 .5199388 .6914625 .6949743 .6984682 .7019440 .7054015 .7088403 .8414625 .8437524 .8461358 .8484950 .8508300 .8531409 .9331928 .9344783 .9357445 .9369916 .9382198 .9394292 .00 .01 .02 .03 .04 .04 .50 .51 .52 .53 .54 .55 1.00 1.01 1.02 1.03 1.04 1.05 1.50 1.51 1.52 1.53 1.54 1.55 P(z) P(z)P(z)P(z) zzzz P(z) values indicate the percentage of the z distribution that is smaller than the respective z value and it also represents the probability that a value will be less than that respective z value. FIGuRe 4.2 Portion�of�z�table� 82 An Introduction to Statistical Concepts 4.1.2.5   Transformation to Unit Normal Distribution A� fifth� characteristic� is� any� normally� distributed� variable,� regardless� of� the� mean� and� variance,�can�be�converted�into�a�unit�normally�distributed�variable��Thus,�our�Wechsler� intelligence� scales� as� denoted� by� X ∼ N(100,� 225)� can� be� converted� into� z ∼ N(0,� 1)�� Conceptually,�this�transformation�is�done�by�moving�the�curve�along�the�X�axis�until�it� is�centered�at�a�mean�of�0�(by�subtracting�out�the�original�mean)�and�then�by�stretching� or� compressing� the� distribution� until� it� has� a� variance� of� 1� (remember,� however,� that� the�shape�of�the�distribution�does�not�change�during�the�standardization�process—only� those� values� on� the� X� axis)�� This� allows� us� to� make� the� same� interpretation� about� any� individual’s� score� on� any� normally� distributed� variable�� If� z� =� +1�00,� then� for� any� vari- able,�this�implies�that�the�individual�falls�one�standard�deviation�above�the�mean� This� also� allows� us� to� make� comparisons� between� two� different� individuals� or� across� two� different� variables�� If� we� wanted� to� make� comparisons� between� two� different� indi- viduals�on�the�same�variable�X,�then�rather�than�comparing�their�individual�raw�scores,� X1�and�X2,�we�could�compare�their�individual�z�scores,�z1�and�z2,�where z X 1 1= −( )µ σ and z X 2 2= −( )µ σ This� is� the� reason� we� only� need� the� unit� normal� distribution� table� to� determine� areas� under� the� curve� rather� than� a� table� for� every� member� of� the� normal� distribution� fam- ily�� In� another� situation,� we� may� want� to� compare� scores� on� the� Wechsler� intelligence� scales�[X ∼ N(100,�225)]�to�scores�on�behavior�rating�scales�[X ∼ N(50,�100)]�for�the�same� individual��We�would�convert�to�z�scores�again�for�two�variables,�and�then�direct�com- parisons�could�be�made� It�is�important�to�note�that�in�standardizing�a�variable,�it�is�only�the�values�on�the�X�axis� that�change��The�shape�of�the�distribution�(e�g�,�skewness�and�kurtosis)�remains�the�same� 4.1.2.6   Constant Relationship with Standard Deviation The� sixth� characteristic� is� that� the� normal� distribution� has� a� constant� relationship� with� the�standard�deviation��Consider�Figure�4�1�again��Along�the�X�axis,�we�see�values�repre- sented�in�standard�deviation�increments��In�particular,�from�left�to�right,�the�values�shown� are�three,�two,�and�one�standard�deviation�units�below�the�mean�and�one,�two,�and�three� standard�deviation�units�above�the�mean��Under�the�curve,�we�see�the�percentage�of�scores� that�are�under�different�portions�of�the�curve��For�example,�the�area�between�the�mean�and� one�standard�deviation�above�or�below�the�mean�is�34�13%��The�area�between�one�standard� deviation�and�two�standard�deviations�on�the�same�side�of�the�mean�is�13�59%,�the�area� between�two�and�three�standard�deviations�on�the�same�side�is�2�14%,�and�the�area�beyond� three�standard�deviations�is��13%� In�addition,�three�other�areas�are�often�of�interest��The�area�within�one�standard�devi- ation�of�the�mean,�from�one�standard�deviation�below�the�mean�to�one�standard�devia- tion�above�the�mean,�is�approximately�68%�(or�roughly�two-thirds�of�the�distribution)�� The� area� within� two� standard� deviations� of� the� mean,� from� two� standard� deviations� 83Normal Distribution and Standard Scores below� the� mean� to� two� standard� deviations� above� the� mean,� is� approximately� 95%�� The�area�within�three�standard�deviations�of�the�mean,�from�three�standard�deviations� below�the�mean�to�three�standard�deviations�above�the�mean,�is�approximately�99%��In� other�words,�nearly�all�of�the�scores�will�be�within�two�or�three�standard�deviations�of� the�mean�for�any�normal�curve� 4.1.2.7   Points of Inflection and Asymptotic Curve The� seventh� and� final� characteristic� of� the� normal� distribution� is� as� follows�� The� points of inflection� are� where� the� curve� changes� from� sloping� down� (concave)� to� sloping� up� (convex)��These�points�occur�precisely�at�one�standard�deviation�unit�above�and�below�the� mean��This�is�more�a�matter�of�mathematical�elegance� than�a�statistical�application��The� curve�also�never�touches�the�X�axis��This�is�because�with�the�theoretical�normal�curve,�all� values�from�negative�infinity�to�positive�infinity�have�a�nonzero�probability�of�occurring�� Thus,� while� the� curve� continues� to� slope� ever-downward� toward� more� extreme� scores,� it�approaches,�but�never�quite�touches,�the�X�axis��The�curve�is�referred�to�here�as�being� asymptotic��This�allows�for�the�possibility�of�extreme�scores� Examples:�Now�for�the�long-awaited�examples�for�finding�area�using�the�unit�normal�dis- tribution��These�examples�require�the�use�of�Table�A�1��Our�personal�preference�is�to�draw� a�picture�of�the�normal�curve�so�that�the�proper�area�is�determined��Let�us�consider�four� examples�of�finding�the�area�below�a�certain�value�of�z:�(1)�below�z�=�−2�50,�(2)�below�z�=�0,� (3)�below�z�=�1�00,�and�(4)�between�z�=�−2�50�and�z�=�1�00� To�determine�the�value�below�z�=�−2�50,�we�draw�a�picture�as�shown�in�Figure�4�3a��We� draw�a�vertical�line�at�the�value�of�z,�then�shade�in�the�area�we�want�to�find��Because�the� shaded�region�is�relatively�small,�we�know�the�area�must�be�considerably�smaller�than��50�� In�the�unit�normal�table,�we�already�know�negative�values�of�z�are�not�included��However,� because�the�normal�distribution�is�symmetric,�we�know�the�area�below�−2�50�is�the�same�as� the�area�above�+2�50��Thus,�we�look�up�the�area�below�+2�50�and�find�the�value�of��9938��We� subtract�this�from�1�0000�and�find�the�value�of��0062,�or��62%,�a�very�small�area�indeed� How�do�we�determine�the�area�below�z�=�0�(i�e�,�the�mean)?�As�shown�in�Figure�4�3b,�we� already�know�from�reading�this�section�that�the�area�has�to�be��5000�or�one-half�of�the�total� area�under�the�curve��However,�let�us�look�in�the�table�again�for�area�below�z�=�0,�and�we� find�the�area�is��5000��How�do�we�determine�the�area�below�z�=�1�00?�As�shown�in�Figure� 4�3c,�this�region�exists�on�both�sides�of�0�and�actually�constitutes�two�smaller�areas,�the�first� area�below�0�and�the�second�area�between�0�and�1��For�this�example,�we�use�the�table�directly� and�find�the�value�of��8413��We�leave�you�with�two�other�problems�to�solve�on�your�own�� First,�what�is�the�area�below�z�=��50�(answer:��6915)?�Second,�what�is�the�area�below�z�=�1�96� (answer:��9750)? Because�the�unit�normal�distribution�is�symmetric,�finding�the�area�above�a�certain�value� of�z�is�solved�in�a�similar�fashion�as�the�area�below�a�certain�value�of�z��We�need�not�devote� any�further�attention�to�that�particular�situation��However,�how�do�we�determine�the�area� between� two� values� of� z?� This� is� a� little� different� and� needs� some� additional� discussion�� Consider� as� an� example� finding� the� area� between� z� =� −2�50� and� z� =� 1�00,� as� depicted� in� Figure� 4�3d�� Here� we� see� that� the� shaded� region� consists� of� two� smaller� areas,� the� area� between�the�mean�and�−2�50�and�the�area�between�the�mean�(z�=�0)�and�1�00��Using�the�table� again,� we� find� the� area� below� 1�00� is� �8413� and� the� area� below� −2�50� is� �0062�� Thus,� the� shaded�region�is�the�difference�as�computed�by��8413�−��0062�=��8351��On�your�own,�determine� the�area�between�z�=�−1�27�and�z�=��50�(answer:��5895)� 84 An Introduction to Statistical Concepts Finally,�what�if�we�wanted�to�determine�areas�under�the�curve�for�values�of�X�rather�than� z?�The�answer�here�is�simple,�as�you�might�have�guessed��First�we�convert�the�value�of�X� to�a�z�score;�then�we�use�the�unit�normal�table�to�determine�the�area��Because�the�normal� curve�is�standard�for�all�members�of�the�family�of�normal�curves,�the�scale�of�the�variable,� X�or�z,�is�irrelevant�in�terms�of�determining�such�areas��In�the�next�section,�we�deal�more� with�such�transformations� 4.2 Standard Scores We�have�already�devoted�considerable�attention�to�z�scores,�which�are�one�type�of�standard� score��In�this�section,�we�describe�an�application�of�z�scores�leading�up�to�a�discussion�of� other� types� of� standard� scores�� As� we� show,� the� major� purpose� of� standard� scores� is� to� place�scores�on�the�same�standard�scale�so�that�comparisons�can�be�made�across�individu- als�and/or�variables��Without�some�standard�scale,�comparisons�across�individuals�and/or� variables�would�be�difficult�to�make��Examples�are�coming�right�up� 4.2.1   z Scores A�child�comes�home�from�school�with�the�results�of�two�tests�taken�that�day��On�the�math� test,�she�receives�a�score�of�75,�and�on�the�social�studies�test,�she�receives�a�score�of�60�� As�a�parent,�the�natural�question�to�ask�is,�“Which�performance�was�the�stronger�one?”� .0062 –2.5(a) .5000 0(b) (c) (d) .8413 1.0 .8351 0–2.5 1.0 FIGuRe 4.3 Examples�of�area�under�the�unit�normal�distribution:�(a)�Area�below�z�=�−2�5��(b)�Area�below�z�=�0��(c)�Area�below� z�=�1�0��(d)�Area�between�z�=�−2�5�and�z�=�1�0� 85Normal Distribution and Standard Scores No�information�about�any�of�the�following�is�available:�maximum�score�possible,�mean� of�the�class�(or�any�other�central�tendency�measure),�or�standard�deviation�of�the�class� (or�any�other�dispersion�measure)��It�is�possible�that�the�two�tests�had�a�different�number� of� possible� points,� different� means,� and/or� different� standard� deviations�� How� can� we� possibly�answer�our�question? The�answer,�of�course,�is�to�use�z�scores�if�the�data�are�assumed�to�be�normally�distrib- uted,�once�the�relevant�information�is�obtained��Let�us�take�a�minor�digression�before�we� return�to�answer�our�question�in�more�detail��Recall�the�formula�for�standardizing�vari- able�X�into�a�z�score: z X i i X X = −( )µ σ where�the�X�subscript�has�been�added�to�the�mean�and�standard�deviation�for�purposes� of�clarifying�which�variable�is�being�considered��If�variable�X�is�the�number�of�items�cor- rect�on�a�test,�then�the�numerator�is�the�deviation�of�a�student’s�raw�score�from�the�class� mean� (i�e�,� the� numerator� is� a� deviation� score� as� previously� defined� in� Chapter� 3),� mea- sured�in�terms�of�items�correct,�and�the�denominator�is�the�standard�deviation�of�the�class,� measured� in� terms� of� items� correct�� Because� both� the� numerator� and� denominator� are� measured�in�terms�of�items�correct,�the�resultant�z�score�is�measured�in�terms�of�no�units� (as�the�units�of�the�numerator�and�denominator�essentially�cancel�out)��As�z�scores�have� no�units�(i�e�,�the�z�score�is�interpreted�as�the�number�of�standard�deviation�units�above�or� below�the�mean),�this�allows�us�to�compare�two�different�raw�score�variables�with�different� scales,�means,�and/or�standard�deviations��By�converting�our�two�variables�to�z�scores,�the� transformed�variables�are�now�on�the�same�z�score�scale�with�a�mean�of�0,�and�a�variance� and�standard�deviation�of�1� Let� us� return� to� our� previous� situation� where� the� math� test� score� is� 75� and� the� social� studies�test�score�is�60��In�addition,�we�are�provided�with�information�that�the�standard� deviation�for�the�math�test�is�15�and�the�standard�deviation�for�the�social�studies�test�is�10�� Consider�the�following�three�examples��In�the�first�example,�the�means�are�60�for�the�math� test�and�50�for�the�social�studies�test��The�z�scores�are�then�computed�as�follows: z zmath ss= − = = − = ( ) . ( ) . 75 60 15 1 0 60 50 10 1 0 The�conclusion�for�the�first�example�is�that�the�performance�on�both�tests�is�the�same;�that� is,�the�child�scored�one�standard�deviation�above�the�mean�for�both�tests� In�the�second�example,�the�means�are�60�for�the�math�test�and�40�for�the�social�studies� test��The�z�scores�are�then�computed�as�follows: z zmath ss= − = = − = ( ) . ( ) . 75 60 15 1 0 60 40 10 2 0 The�conclusion�for�the�second�example�is�that�performance�is�better�on�the�social�studies� test;� that� is,� the� child� scored� two� standard� deviations� above� the� mean� for� the� social� studies�test�and�only�one�standard�deviation�above�the�mean�for�the�math�test� 86 An Introduction to Statistical Concepts In�the�third�example,�the�means�are�60�for�the�math�test�and�70�for�the�social�studies�test�� The�z�scores�are�then�computed�as�follows: z zmath ss= − = = − = − ( ) . ( ) . 75 60 15 1 0 60 70 10 1 0 The�conclusion�for�the�third�example�is�that�performance�is�better�on�the�math�test;�that�is,� the�child�scored�one�standard�deviation�above�the�mean�for�the�math�test�and�one�standard� deviation� below� the� mean� for� the� social� studies� test�� These� examples� serve� to� illustrate� a� few� of� the� many� possibilities,� depending� on� the� particular� combinations� of� raw� score,� mean,�and�standard�deviation�for�each�variable� Let�us�conclude�this�section�by�mentioning�the�major�characteristics�of�z�scores��The�first� characteristic�is�that�z�scores�provide�us�with�comparable�distributions,�as�we�just�saw�in� the� previous� examples�� Second,� z� scores� take� into� account� the� entire� distribution� of� raw� scores��All�raw�scores�can�be�converted�to�z�scores�such�that�every�raw�score�will�have�a� corresponding�z�score��Third,�we�can�evaluate�an�individual’s�performance�relative�to�the� scores�in�the�distribution��For�example,�saying�that�an�individual’s�score�is�one�standard� deviation�above�the�mean�is�a�measure�of�relative�performance��This�implies�that�approxi- mately�84%�of�the�scores�will�fall�below�the�performance�of�that�individual��Finally,�nega- tive�values�(i�e�,�below�0)�and�decimal�values�(e�g�,�z�=�1�55)�are�obviously�possible�(and�will� most�certainly�occur)�with�z�scores��On�the�average,�about�one-half�of�the�z�scores�for�any� distribution�will�be�negative,�and�some�decimal�values�are�quite�likely��This�last�character- istic�is�bothersome�to�some�individuals�and�has�led�to�the�development�of�other�types�of� standard�scores,�as�described�in�the�next�section� 4.2.2   Other Types of Standard Scores Over�the�years,�other�standard�scores�besides�z�scores�have�been�developed,�either�to�allevi- ate�the�concern�over�negative�and/or�decimal�values�associated�with�z�scores,�or�to�obtain�a� particular�mean�and�standard�deviation��Let�us�examine�three�common�examples��The�first� additional�standard�score�is�known�as�the�College�Entrance�Examination�Board�(CEEB)�score�� This�standard�score�is�used�in�exams�such�as�the�SAT�and�the�GRE��The�subtests�for�these� exams�all�have�a�mean�of�500�and�a�standard�deviation�of�100��A�second�additional�standard� score�is�known�as�the�T�score�and�is�used�in�tests�such�as�most�behavior�rating�scales,�as�pre- viously�mentioned��The�T�scores�have�a�mean�of�50�and�a�standard�deviation�of�10��A�third� additional�standard�score�is�known�as�the�IQ�score�and�is�used�in�the�Wechsler�intelligence� scales��The�IQ�score�has�a�mean�of�100�and�a�standard�deviation�of�15�(the�Stanford–Binet� intelligence�scales�have�a�mean�of�100�and�a�standard�deviation�of�16)� Say�we�want�to�develop�our�own�type�of�standard�score,�where�we�determine�in�advance� the�mean�and�standard�deviation�that�we�would�like�to�have��How�would�that�be�done?�As� the�equation�for�z�scores�is�as�follows: z X i i X X = −( )µ σ then�algebraically�the�following�can�be�shown: X zi X X i= +µ σ 87Normal Distribution and Standard Scores If,�for�example,�we�want�to�develop�our�own�“stat”�standardized�score,�then�the�following� equation�would�be�used: stat zi stat stat i= +µ σ where stati�is�the�“stat”�standardized�score�for�a�particular�individual μstat�is�the�desired�mean�of�the�“stat”�distribution σstat�is�the�desired�standard�deviation�of�the�“stat”�distribution If� we� want� to� have� a� mean� of� 10� and� a� standard� deviation� of� 2,� then� our� equation� becomes stat zi i= +10 2 We�would�then�have�the�computer�simply�plug�in�a�z�score�and�compute�an�individual’s� “stat”�score��Thus,�a�z�score�of�1�0�would�yield�a�“stat”�standardized�score�of�12�0� Consider�a�realistic�example�where�we�have�a�raw�score�variable�we�want�to�transform� into�a�standard�score,�and�we�want�to�control�the�mean�and�standard�deviation��For�exam- ple,�we�have�statistics�midterm�raw�scores�with�225�points�possible��We�want�to�develop� a�standard�score�with�a�mean�of�50�and�a�standard�deviation�of�5��We�also�have�scores�on� other� variables� that� are� on� different� scales� with� different� means� and� different� standard� deviations�(e�g�,�statistics�final�exam�scores�worth�175�points,�a�set�of�20�lab�assignments� worth�a�total�of�200�points,�a�statistics�performance�assessment�worth�100�points)��We�can� standardize�each�of�those�variables�by�placing�them�on�the�same�scale�with�the�same�mean� and�same�standard�deviation,�thereby�allowing�comparisons�across�variables��This�is�pre- cisely� the�rationale�used�by�testing�companies�and�researchers� when�they�develop� stan- dard�scores��In�short,�from�z�scores,�we�can�develop�a�CEEB,�T,�IQ,�“stat,”�or�any�other�type� of�standard�score� 4.3 Skewness and Kurtosis Statistics In� previous� chapters,� we� discussed� the� distributional� concepts� of� symmetry,� skewness,� central�tendency,�and�dispersion��In�this�section,�we�more�closely�define�symmetry�as�well� as�the�statistics�commonly�used�to�measure�skewness�and�kurtosis� 4.3.1   Symmetry Conceptually,�we�define�a�distribution�as�being�symmetric�if�when�we�divide�the�dis- tribution� precisely� in� one-half,� the� left-hand� half� is� a� mirror� image� of� the� right-hand� half�� That� is,� the� distribution� above� the� mean� is� a� mirror� image� of� the� distribution� below�the�mean��To�put�it�another�way,�a�distribution�is�symmetric around the mean� if�for�every�score�q�units�below�the�mean,�there�is�a�corresponding�score�q�units�above� the�mean� 88 An Introduction to Statistical Concepts Two� examples� of� symmetric� distributions� are� shown� in� Figure� 4�4�� In� Figure� 4�4a,� we� have�a�normal�distribution,�which�is�clearly�symmetric�around�the�mean��In�Figure�4�4b,� we� have� a� symmetric� distribution� that� is� bimodal,� unlike� the� previous� example�� From� these�and�other�numerous�examples,�we�can�make�the�following�two�conclusions��First,�if�a� distribution�is�symmetric,�then�the�mean�is�equal�to�the�median��Second,�if�a�distribution�is� symmetric�and�unimodal,�then�the�mean,�median,�and�mode�are�all�equal��This�indicates� we�can�determine�whether�a�distribution�is�symmetric�by�simply�comparing�the�measures� of�central�tendency� 4.3.2   Skewness We� define� skewness� as� the� extent� to� which� a� distribution� of� scores� deviates� from� per- fect�symmetry��This�is�important�as�perfectly�symmetrical�distributions�rarely�occur�with� actual�sample�data�(i�e�,�“real”�data)��A�skewed�distribution�is�known�as�being�asymmetri- cal�� As� shown� in� Figure� 4�5,� there� are� two� general� types� of� skewness,� distributions� that� are�negatively�skewed,�as�in�Figure�4�5a,�and�those�that�are�positively�skewed,�as�in�Figure� 4�5b��Negatively�skewed�distributions,�which�are�skewed�to�the�left,�occur�when�most�of� the�scores�are�toward�the�high�end�of�the�distribution�and�only�a�few�scores�are�toward� the�low�end��If�you�make�a�fist�with�your�thumb�pointing�to�the�left�(skewed�to�the�left),� you� have� graphically� defined� a� negatively� skewed� distribution�� For� a� negatively� skewed� (a) (b) FIGuRe 4.4 Symmetric�distributions:�(a)�Normal�distribution��(b)�Bimodal�distribution� (a) (b) FIGuRe 4.5 Skewed�distributions:�(a)�Negatively�skewed�distribution��(b)�Positively�skewed�distribution� 89Normal Distribution and Standard Scores distribution,�we�also�find�the�following:�mode > median > mean��This�indicates�that�we�can�
determine�whether�a�distribution�is�negatively�skewed�by�simply�comparing�the�measures�
of�central�tendency�
Positively�skewed�distributions,�which�are�skewed�to�the�right,�occur�when�most�of�the�
scores� are� toward� the� low� end� of� the� distribution� and� only� a� few� scores� are� toward� the�
high�end��If�you�make�a�fist�with�your�thumb�pointing�to�the�right�(skewed�to�the�right),�
you� have� graphically� defined� a� positively� skewed� distribution�� For� a� positively� skewed�
distribution,�we�also�find�the�following:�mode < median < mean��This�indicates�that�we�can� determine�whether�a�distribution�is�positively�skewed�by�simply�comparing�the�measures� of�central�tendency� The� most� commonly� used� measure� of� skewness� is� known� as� γ1� (Greek� letter� gamma),� which�is�mathematically�defined�as�follows: γ 1 3 1= = ∑ z N i i N where�we�take�the�z�score�for�each�individual,�cube�it,�sum�across�all�N�individuals,�and�then� divide� by� the� number� of� individuals� N�� This� measure� is� available� in� nearly� all� computer� packages,� so� hand� computations� are� not� necessary�� The� characteristics� of� this� measure� of� skewness�are�as�follows:�(a)�a�perfectly�symmetrical�distribution�has�a�skewness�value�of 0,� (b)�the�range�of�values�for�the�skewness�statistic�is�approximately�from�−3�to�+3,�(c) nega- tively�skewed�distributions�have�negative�skewness�values,�and�(d)�positively�skewed�dis- tributions�have�positive�skewness�values� There�are�different�rules�of�thumb�for�determining�how�extreme�skewness�can�be�and� still�retain�a�relatively�normal�distribution��One�simple�rule�of�thumb�is�that�skewness� values� within� ±2�0� are� considered� relatively� normal,� with� more� conservative� research- ers� applying� a� ±3�0� guideline,� and� more� stringent� researchers� using� ±1�0�� Another� rule� of� thumb� for� determining� how� extreme� a� skewness� value� must� be� for� the� distribution� to� be� considered� nonnormal� is� as� follows:� Skewness� values� outside� the� range� of� ±� two� standard�errors�of�skewness�suggest�a�distribution�that�is�nonnormal��Applying�this�rule� of� thumb,� if� the� standard� error� of� skewness� is� �85,� then� anything� outside� of� −2(�85)� to� +2(�85),�or�−1�7�to +1�7,�would�be�considered�nonnormal��It�is�important�to�note�that�this� second�rule�of�thumb�is�sensitive�to�small�sample�sizes�and�should�only�be�considered�as� a�general�guide� 4.3.3   kurtosis Kurtosis� is� the� fourth� and� final� property� of� a� distribution� (often� referred� to� as� the� moments around the mean)��These�four�properties�are�central�tendency�(first�moment),� dispersion� (second� moment),� skewness� (third� moment),� and� kurtosis� (fourth� moment)�� Kurtosis�is�conceptually�defined�as�the�“peakedness”�of�a�distribution�(kurtosis�is�Greek� for�peakedness)��Some�distributions�are�rather�flat,�and�others�have�a�rather�sharp�peak�� Specifically,�there�are�three�general�types�of�peakedness,�as�shown�in�Figure�4�6��A�distri- bution�that�is�very�peaked�is�known�as�leptokurtic�(“lepto”�meaning�slender�or�narrow)� (Figure�4�6a)��A�distribution�that�is�relatively�flat�is�known�as�platykurtic�(“platy”�mean- ing�flat�or�broad)�(Figure�4�6b)��A�distribution�that�is�somewhere�in�between�is�known�as� mesokurtic�(“meso”�meaning�intermediate)�(Figure�4�6c)� 90 An Introduction to Statistical Concepts The�most�commonly�used�measure�of�kurtosis�is�known�as�γ2,�which�is�mathematically� defined�as γ 2 4 1 3= −= ∑ z N i i N where�we�take�the�z�score�for�each�individual,�take�it�to�the�fourth�power�(being�the�fourth� moment),� sum� across� all� N� individuals,� divide� by� the� number� of� individuals� N,� and� then� subtract�3��This�measure�is�available�in�nearly�all�computer�packages,�so�hand�computations� are�not�necessary��The�characteristics�of�this�measure�of�kurtosis�are�as�follows:�(a)�a�perfectly� mesokurtic�distribution,�which�would�be�a�normal�distribution,�has�a�kurtosis�value�of�0,� (b)� platykurtic�distributions�have�negative�kurtosis�values�(being�flat�rather�than�peaked),� and�(c)�leptokurtic�distributions�have�positive�kurtosis�values�(being�peaked)��Kurtosis�values� can�range�from�negative�to�positive�infinity� There�are�different�rules�of�thumb�for�determining�how�extreme�kurtosis�can�be�and�still� retain� a� relatively� normal� distribution�� One� simple� rule� of� thumb� is� that� kurtosis� values� within�±2�0�are�considered�relatively�normal,�with�more�conservative�researchers�applying� a�±3�0�guideline,�and�more�stringent�researchers�using�±1�0��A�rule�of�thumb�for�determin- ing�how�extreme�a�kurtosis�value�may�be�for�the�distribution�to�be�considered�nonnormal� is�as�follows:�Kurtosis�values�outside�the�range�of�±�two�standard�errors�of�kurtosis�suggest� (c) (a) (b) FIGuRe 4.6 Distributions� of� different� kurtosis:� (a)� Leptokurtic� distribution�� (b)� Platykurtic� distribution�� (c)� Mesokurtic� distribution� 91Normal Distribution and Standard Scores a� distribution� that� is� nonnormal�� Applying� this� rule� of� thumb,� if� the� standard� error� of� kurtosis� is� 1�20,� then� anything� outside� of� −2(1�20)� to� +2(1�20),� or� −2�40� to� +2�40,� would� be� considered�nonnormal��It�is�important�to�note�that�this�second�rule�of�thumb�is�sensitive�to� small�sample�sizes�and�should�only�be�considered�as�a�general�guide� Skewness�and�kurtosis�statistics�are�useful�for�the�following�two�reasons:�(a)�as�descrip- tive�statistics�used�to�describe�the�shape�of�a�distribution�of�scores�and�(b)�in�inferential� statistics,�which�often�assume�a�normal�distribution,�so�the�researcher�has�some�indication� of�whether�the�assumption�has�been�met�(more�about�this�beginning�in�Chapter�6)� 4.4 SPSS Here�we�review�what�SPSS�has�to�offer�for�examining�distributional�shape�and�computing� standard�scores��The�following�programs�have�proven�to�be�quite�useful�for�these�purposes:� “Explore,” “Descriptives,” “Frequencies,” “Graphs,”� and� “Transform.”� Instructions�for�using�each�are�provided�as�follows� Explore Explore: Step 1.� The� first� program,� “Explore,”� can� be� invoked� by� clicking� on� “Analyze”�in�the�top�pulldown�menu,�then�“Descriptive Statistics,”�and�then� “Explore.”�Following�the�screenshot�(step�1),�as�follows,�produces�the�“Explore”�dia- log�box��For�brevity,�we�have�not�reproduced�this�initial�screenshot�when�we�discuss�the� “Descriptives”� and�“Frequencies”� programs;� however,� you� see� here� where� they� can�be�found�from�the�pulldown�menus� Explore: Step 1 B A C Frequencies and Descriptives can also be invoked from this menu. 92 An Introduction to Statistical Concepts Explore: Step 2.� Next,� from� the� main�“Explore”� dialog� box,� click� the� variable� of� interest�from�the�list�on�the�left�(e�g�,�quiz),�and�move�it�into�the�“Dependent List”�box� by�clicking�on�the�arrow�button��Next,�click�on�the�“Statistics”�button�located�in�the� top�right�corner�of�the�main�dialog�box� Select the variable of interest from the list on the left and use the arrow to move to the “Dependent List” box on the right. Clicking on “Statistics” will allow you to select descriptive statistics. Explore: Step 2 Explore: Step 3.�A�new�box�labeled�“Explore: Statistics”�will�appear��Simply� place�a�checkmark�in�the�“Descriptives”�box��Next�click�“Continue.”�You�will�then�be� returned�to�the�main�“Explore”�dialog�box��From�there,�click�“OK.”�This�will�automati- cally�generate�the�skewness�and�kurtosis�values,�as�well�as�measures�of�central�tendency� and� dispersion� which� were� covered� in� Chapter� 3�� The� output� from� this� was� previously� shown�in�the�top�panel�of�Table�3�5� Explore: Step 3 Descriptives Descriptives: Step 1. The� second� program� to� consider� is� “Descriptives.”� It� can� also� be� accessed� by� going� to�“Analyze”� in� the� top� pulldown� menu,� then� selecting� “Descriptive Statistics,”�and�then�“Descriptives”�(see�“Explore: Step 1”�for� screenshots�of�these�steps)� Descriptives: Step 2.� This� will� bring� up� the� “Descriptives”� dialog� box� (see� screenshot,�step�2)��From�the�main�“Descriptives”�dialog�box,�click�the�variable�of�inter- est�(e�g�,�quiz)�and�move�into�the�“Variable(s)”�box�by�clicking�on�the�arrow��If�you�want� 93Normal Distribution and Standard Scores to�obtain�z�scores�for�this�variable�for�each�case�(e�g�,�person�or�object�that�was�measured— your�unit�of�analysis),�check�the�“Save standardized values as variables”�box� located�in�the�bottom�left�corner�of�the�main�“Descriptives”�dialog�box��This�will�insert� a�new�variable�into�your�dataset�for�subsequent�analysis�(see�screenshot�for�how�this�will� appear�in�“Data View”)��Next,�click�on�the�“Options”�button� Select the variable of interest from the list on the left and use the arrow to move to the “Variable” box on the right. Placing a checkmark here will generate a new, standardized variable in your datafile for each variable selected. Clicking on “Options” will allow you to select various statistics to be generated. Descriptives: Step 2 Descriptives: Step 3.�A�new�box�called�“Descriptives: Options”�will�appear� (see�screenshot,�step�3)�and�you�can�simply�place�a�checkmark�in�the�boxes�for�the�statistics� that� you� want� to� generate�� This� will� allow� you� to� obtain� the� skewness� and� kurtosis� val- ues,�as�well�as�measures�of�central�tendency�and�dispersion�discussed�in�Chapter�3��After� making� your� selections,� click� on� “Continue.”� You� will� then� be� returned� to� the� main� “Descriptives”�dialog�box��From�there,�click�“OK.” Statistics available when clicking on “Options” from the main dialog box for Descriptives. Placing a checkmark will generate the respective statistic in the output. Descriptives: Step 3 94 An Introduction to Statistical Concepts X – μ σ If “Save standardized values as variables” was checked on the main “Descriptives” dialog box, a new standardized variable will be created. By default, this variable name is the name of the original variable prefixed with a “Z” (denoting its standardization). It is computed using the unit normal formula: Descriptives: Saving standardized variable – – – – – – – – – Frequencies Frequencies: Step 1.�The�third�program�to�consider�is�“Frequencies,”�which�is� also� accessible� by� clicking� on� “Analyze”� in� the� top� pulldown� menu,� then� clicking� on� “Descriptive Statistics,”� and� then� selecting� “Frequencies”� (see� “Explore: Step 1”�for�screenshots�of�these�steps)� Frequencies: Step 2.�This�will�bring�up�the�“Frequencies”�dialog�box��Click�the� variable�of�interest�(e�g�,�quiz)�into�the�“Variable(s)”�box,�then�click�on�the�“Statistics”� button� 95Normal Distribution and Standard Scores Select the variable of interest from the list on the left and use the arrow to move to the “Variable” box on the right. Clicking on “Charts” will allow you to generate a histogram with normal curve (and other types of graphs). Clicking on “Statistics” will allow you to select various statistics to be generated. Frequencies: Step 2 Frequencies: Step 3.�A�new�box�labeled�“Frequencies:�Statistics”�will�appear�� Again,�you�can�simply�place�a�checkmark�in�the�boxes�for�the�statistics�that�you�want�to� generate��Here�you�can�obtain�the�skewness�and�kurtosis�values,�as�well�as�measures�of� central�tendency�and�dispersion�from�Chapter�3��If�you�click�on�the�“Charts”�button,�you� can� also� obtain� a� histogram� with� a� normal� curve� overlay� by� clicking� the�“Histogram”� radio� button� and� checking� the�“With normal curve”� box�� This� histogram� output� is� shown�in�Figure�4�7��After�making�your�selections,�click�on�“Continue.”�You�will�then�be� returned�to�the�main�“Frequencies”�dialog�box��From�there,�click�“OK.” 9 10 11 12 13 14 15 Quiz 16 17 18 19 20 1 2 3 Fr eq ue nc y 4 5 FIGuRe 4.7 SPSS� histogram� of� statistics� quiz� data� with� nor- mal�distribution�overlay� 96 An Introduction to Statistical Concepts Options available when clicking on “Statistics” from the main dialog box for Frequencies. Placing a checkmark will generate the respective statistic in the output. Check this for better accuracy with quartiles and percentiles (i.e., the median). Frequencies: Step 3 Graphs Graphs:� Two� other� programs� also� yield� a� histogram� with� a� normal� curve� overlay�� Both� can� be� accessed� by� first� going� to�“Graphs”� in� the� top� pulldown� menu�� From� there,� select� “Legacy Dialogs,”�then�“Histogram.”�Another�option�for�creating�a�histogram,�starting� again�from�the�“Graphs”�option�in�the�top�pulldown�menu,�is�to�select�“Legacy Dialogs,”� then� “Interactive,”� and� finally� “Histogram.”� From� there,� both� work� similarly� to� the� “Frequencies”�program�described�earlier� A B C Graphs: Step 1 97Normal Distribution and Standard Scores Transform Transform: Step 1.�A�final�program�that�comes�in�handy�is�for�transforming�variables,� such� as� creating� a� standardized� version� of� a� variable� (most� notably� standardization� other� than�the�application�of�the�unit�normal�formula,�where�the�unit�normal�standardization�can� be� easily� performed� as� seen� previously� by� using�“Descriptives”)�� Go� to�“Transform”� from�the�top�pulldown�menu,�and�then�select�“Compute Variables.”�A�dialog�box�labeled� “Compute Variables”�will�appear� A B Transform: Step 1 Transform: Step 2.�The�“Target Variable”�is�the�name�of�the�new�variable�you�are� creating,�and�the�“Numeric Expression”�box�is�where�you�insert�the�commands�of�which� original�variable�to�transform�and�how�to�transform�it�(e�g�,�stat�variable)��When�you�are�done� defining�the�formula,�simply�click�“OK”�to�generate�the�new�variable�in�the�data�file� The name specified here becomes the column header in “Data View.” This name must begin with a letter, and no spaces can be included. “Numeric Expression” will be where you enter the formula for your new variable. For user’s convenience, a number of formulas are already defined within SPSS and accessible through the “Function group” formulas listed below. Transform: Step 2 98 An Introduction to Statistical Concepts 4.5 Templates for Research Questions and APA-Style Paragraph As�stated�in�the�previous�chapter,�depending�on�the�purpose�of�your�research�study,�you� may�or�may�not�write�a�research�question�that�corresponds�to�your�descriptive�statistics�� If� the� end� result� of� your� research� paper� is� to� present� results� from� inferential� statistics,� it� may� be� that� your� research� questions� correspond� only� to� those� inferential� questions,� and,� thus,� no� question� is� presented� to� represent� the� descriptive� statistics�� That� is� quite� common�� On� the� other� hand,� if� the� ultimate� purpose� of� your� research� study� is� purely� descriptive� in� nature,� then� writing� one� or� more� research� questions� that� correspond� to� the� descriptive� statistics� is� not� only� entirely� appropriate� but� (in� most� cases)� absolutely� necessary� It�is�time�again�to�revisit�our�graduate�research�assistant,�Marie,�who�was�reintroduced� at�the�beginning�of�the�chapter��As�a�reminder,�her�task�was�to�continue�to�summarize�data� from� 25� students� enrolled� in� a� statistics� course,� this� time� paying� particular� attention� to� distributional�shape�and�standardization��The�questions�posed�this�time�by�Marie’s�faculty� mentor�were�as�follows:�What is the distributional shape of the statistics quiz score? In standard deviation units, what is the relative stand- ing to the mean of student 1 compared to student 3? A�template�for�writ- ing�a�descriptive�research�question�for�summarizing�distributional�shape�is�presented�as� follows�(this�may�sound�familiar�as�this�was�first�presented�in�Chapter�2�when�we�initially� discussed�distributional�shape)��This�is�followed�by�a�template�for�writing�a�research�ques- tion�related�to�standardization: What is the distributional shape of the [variable]? In standard devi- ation units, what is the relative standing to the mean of [unit 1] compared to [unit 3]? Next,� we� present� an� APA-style� paragraph� summarizing� the� results� of� the� statistics� quiz� data�example�answering�the�questions�posed�to�Marie: As shown in the top panel of Table 3.5, the skewness value is −.598 (SE = .464) and the kurtosis value is −.741 (SE = .902). Skewness and kurtosis values within the range of +/−2(SE) are generally considered normal. Given our values, skewness is within the range of −.928 to +.928 and kurtosis is within the range of −1.804 and +1.804, and these would be considered normal. Another rule of thumb is that the skew- ness and kurtosis values should fall within an absolute value of 2.0 to be considered normal. Applying this rule, normality is still evi- dent. The histogram with a normal curve overlay is depicted in Figure 4.7. Taken with the skewness and kurtosis statistics, these results indicate that the quiz scores are reasonably normally distributed. There is a slight negative skew such that there are more scores at the high end of the distribution than a typical normal distribu- tion. There is also a slight negative kurtosis indicating that the distribution is slightly flatter than a normal distribution, with a few more extreme scores at the low end of the distribution. Again, however, the values are within the range of what is considered a reasonable approximation to the normal curve. 99Normal Distribution and Standard Scores The quiz score data were standardized using the unit normal formula. After standardization, student 1’s score was −2.07 and student 3’s score was 1.40. This suggests that student 1 was slightly more than two stan- dard deviation units below the mean on the statistics quiz score, while student 3 was nearly 1.5 standard deviation units above the mean. 4.6 Summary In� this� chapter,� we� continued� our� exploration� of� descriptive� statistics� by� considering� an� important� distribution,� the� normal� distribution,� standard� scores,� and� other� characteristics� of�a�distribution�of�scores��First�we�discussed�the�normal�distribution,�with�its�history�and� important� characteristics�� In� addition,� the� unit� normal� table� was� introduced� and� used� to� determine� various� areas� under� the� curve�� Next� we� examined� different� types� of� standard� scores,�in�particular�z�scores,�as�well�as�CEEB�scores,�T�scores,�and�IQ�scores��Examples�of� types�of�standard�scores�are�summarized�in�Box�4�1��The�next�section�of�the�chapter�included� a�detailed�description�of�symmetry,�skewness,�and�kurtosis��The�different�types�of�skewness� and� kurtosis� were� defined� and� depicted�� We� finished� the� chapter� by� examining� SPSS� for� these�statistics�as�well�as�how�to�write�up�an�example�set�of�results��At�this�point,�you�should� have� met� the� following� objectives:� (a)� understand� the� normal� distribution� and� utilize� the� normal�table;�(b)�determine�and�interpret�different�types�of�standard�scores,�particularly�z� scores;�and�(c)�understand�and�interpret�skewness�and�kurtosis�statistics��In�the�next�chapter,� we�move�toward�inferential�statistics�through�an�introductory�discussion�of�probability�as� well�as�a�more�detailed�discussion�of�sampling�and�estimation� STOp aNd ThINk bOx 4.1 Examples�of�Types�of�Standard�Scores Standard Score Distributiona Z�(unit�normal) N(0,�1) CEEB�score N(500,�10,000) T�score N(50,�100) Wechsler�intelligence�scale N(100,�225) Stanford–Binet�intelligence�scale N(100,�256) a� N(μ,�σ2)� Problems Conceptual problems 4.1� For�which�of�the�following�distributions�will�the�skewness�value�be�0? � a�� N(0,�1) � b�� N(0,�2) � c�� N(10,�50) � d�� All�of�the�above 100 An Introduction to Statistical Concepts 4.2� For�which�of�the�following�distributions�will�the�kurtosis�value�be�0? � a�� N(0,�1) � b�� N(0,�2) � c�� N(10,�50) � d�� All�of�the�above 4.3� A�set�of�400�scores�is�approximately�normally�distributed�with�a�mean�of�65�and�a� standard�deviation�of�4�5��Approximately�95%�of�the�scores�would�fall�within�which� range�of�scores? � a�� 60�5�and�69�5 � b�� 56�and�74 � c�� 51�5�and�78�5 � d�� 64�775�and�65�225 4.4� What�is�the�percentile�rank�of�60�in�the�distribution�of�N(60,100)? � a�� 10 � b�� 50 � c�� 60 � d�� 100 4.5� Which� of� the� following� parameters� can� be� found� on� the� X� axis� for� a� frequency� polygon�of�a�population�distribution? � a�� Skewness � b�� Median � c�� Kurtosis � d�� Q 4.6� The�skewness�value�is�calculated�for�a�set�of�data�and�is�found�to�be�equal�to�+2�75�� This�indicates�that�the�distribution�of�scores�is�which�one�of�the�following? � a�� Highly�negatively�skewed � b�� Slightly�negatively�skewed � c�� Symmetrical � d�� Slightly�positively�skewed � e�� Highly�positively�skewed 4.7� The�kurtosis�value�is�calculated�for�a�set�of�data�and�is�found�to�be�equal�to�+2�75��This� indicates�that�the�distribution�of�scores�is�which�one�of�the�following? � a�� Mesokurtic � b�� Platykurtic � c�� Leptokurtic � d�� Cannot�be�determined 4.8� For�a�normal�distribution,�all�percentiles�above�the�50th�must�yield�positive�z�scores�� True�or�false? 4.9� If� one� knows� the� raw� score,� the� mean,� and� the� z� score,� then� one� can� calculate� the� value�of�the�standard�deviation��True�or�false? 101Normal Distribution and Standard Scores 4.10� In�a�normal�distribution,�a�z�score�of�1�0�has�a�percentile�rank�of�34��True�or�false? 4.11� The�mean�of�a�normal�distribution�of�scores�is�always�1��True�or�false? 4.12� If�in�a�distribution�of�200�IQ�scores,�the�mean�is�considerably�above�the�median,�then� the�distribution�is�which�one�of�the�following? � a�� Negatively�skewed � b�� Symmetrical � c�� Positively�skewed � d�� Bimodal 4.13� Which� of� the� following� is� indicative� of� a� distribution� that� has� a� skewness� value� of� −3�98�and�a�kurtosis�value�of�−6�72? � a�� A�left�tail�that�is�pulled�to�the�left�and�a�very�flat�distribution � b�� A�left�tail�that�is�pulled�to�the�left�and�a�distribution�that�is�neither�very�peaked� nor�very�flat � c�� A�right�tail�that�is�pulled�to�the�right�and�a�very�peaked�distribution � d�� A�right�tail�that�is�pulled�to�the�right�and�a�very�flat�distribution 4.14� Which�of�the�following�is�indicative�of�a�distribution�that�has�a�kurtosis�value�of�+4�09? � a�� Leptokurtic�distribution � b�� Mesokurtic�distribution � c�� Platykurtic�distribution � d�� Positive�skewness � e�� Negative�skewness 4.15� For�which�of�the�following�distributions�will�the�kurtosis�value�be�greatest? A f B f C f D f 11 3 11 4 11 1 11 1 12 4 12 4 12 3 12 5 13 6 13 4 13 12 13 8 14 4 14 4 14 3 14 5 15 3 15 4 15 1 15 1 � a�� Distribution�A � b�� Distribution�B � c�� Distribution�C � d�� Distribution�D 4.16� The�distribution�of�variable�X�has�a�mean�of�10�and�is�positively�skewed��The�distri- bution�of�variable�Y�has�the�same�mean�of�10�and�is�negatively�skewed��I�assert�that� the�medians�for�the�two�variables�must�also�be�the�same��Am�I�correct? 4.17� The�variance�of�z�scores�is�always�equal�to�the�variance�of�the�raw�scores�for�the�same� variable��True�or�false? 102 An Introduction to Statistical Concepts 4.18� The� mode� has� the� largest� value� of� the� central� tendency� measures� in� a� positively� skewed�distribution��True�or�false? 4.19� Which� of� the� following� represents� the� highest� performance� in� a� normal� distribution? � a�� P90 � b�� z�=�+1�00 � c�� Q3 � d�� IQ�=�115 4.20� Suzie�Smith�came�home�with�two�test�scores,�z�=�+1�in�math�and�z�=�−1�in�biology�� For which�test�did�Suzie�perform�better? 4.21� A�psychologist�analyzing�data�from�creative�intelligence�scores�finds�a�relatively�nor- mal�distribution�with�a�population�mean�of�100�and�population�standard�deviation� of� 10�� When� standardized� into� a� unit� normal� distribution,� what� is� the� mean� of� the� (standardized)�creative�intelligence�scores? � a�� 0 � b�� 70 � c�� 100 � d�� Cannot�be�determined�from�the�information�provided Computational problems 4.1� Give�the�numerical�value�for�each�of�the�following�descriptions�concerning�normal� distributions�by�referring�to�the�table�for�N(0,�1)� � a�� The�proportion�of�the�area�below�z�=�−1�66 � b�� The�proportion�of�the�area�between�z�=�−1�03�and�z�=�+1�03 � c�� The�fifth�percentile�of�N(20,�36) � d�� The�99th�percentile�of�N(30,�49) � e�� The�percentile�rank�of�the�score�25�in�N(20,�36) � f�� The�percentile�rank�of�the�score�24�5�in�N(30,�49) � g�� The�proportion�of�the�area�in�N(36,�64)�between�the�scores�of�18�and�42 4.2� Give�the�numerical�value�for�each�of�the�following�descriptions�concerning�normal� distributions�by�referring�to�the�table�for�N(0,�1)� � a�� The�proportion�of�the�area�below�z�=�−�80 � b�� The�proportion�of�the�area�between�z�=�−1�49�and�z�=�+1�49 � c�� The�2�5th�percentile�of�N(50,�81) � d�� The�50th�percentile�of�N(40,�64) � e�� The�percentile�rank�of�the�score�45�in�N(50,�81) � f�� The�percentile�rank�of�the�score�53�in�N(50,�81) � g�� The�proportion�of�the�area�in�N(36,�64)�between�the�scores�of�19�7�and�45�1 103Normal Distribution and Standard Scores 4.3� Give�the�numerical�value�for�each�of�the�following�descriptions�concerning�normal� distributions�by�referring�to�the�table�for�N(0,�1)� � a�� The�proportion�of�the�area�below�z�=�+1�50 � b�� The�proportion�of�the�area�between�z�=�−�75�and�z�=�+2�25 � c�� The�15th�percentile�of�N(12,�9) � d�� The�80th�percentile�of�N(100,000,�5,000) � e�� The�percentile�rank�of�the�score�300�in�N(200,�2500) � f�� The�percentile�rank�of�the�score�61�in�N(60,�9) � g�� The�proportion�of�the�area�in�N(500,�1600)�between�the�scores�of�350�and�550 Interpretive problems 4.1� Select� one� interval� or� ratio� variable� from� the� survey� 1� dataset� on� the� website� (e�g�,� one�idea�is�to�select�the�same�variable�you�selected�for�the�interpretive�problem�from� Chapter�3)� � a�� Determine�the�measures�of�central�tendency,�dispersion,�skewness,�and�kurtosis� � b�� Write�a�paragraph�which�summarizes�the�findings,�particularly�commenting�on� the�distributional�shape� 4.2� Using�the�same�variable�selected�in�the�previous�problem,�standardize�it�using�SPSS� � a�� Determine�the�measures�of�central�tendency,�dispersion,�skewness,�and�kurtosis� for�the�standardized�variable� � b�� Determine�the�measures�of�central�tendency,�dispersion,�skewness,�and�kurtosis� for�the�variable�in�its�original�scale�(i�e�,�the�unstandardized�variable)� � c�� Compare� and� contrast� the� differences� between� the� standardized� and� unstan- dardized�variables� 105 5 Introduction to Probability and Sample Statistics Chapter Outline 5�1� Brief�Introduction�to�Probability 5�1�1� Importance�of�Probability 5�1�2� Definition�of�Probability 5�1�3� Intuition�Versus�Probability 5�2� Sampling�and�Estimation 5�2�1� Simple�Random�Sampling 5�2�2� Estimation�of�Population�Parameters�and�Sampling�Distributions Key Concepts � 1�� Probability � 2�� Inferential�statistics � 3�� Simple�random�sampling�(with�and�without�replacement) � 4�� Sampling�distribution�of�the�mean � 5�� Variance�and�standard�error�of�the�mean�(sampling�error) � 6�� Confidence�intervals�(CIs)�(point�vs��interval�estimation) � 7�� Central�limit�theorem In�Chapter�4,�we�extended�our�discussion�of�descriptive�statistics��There�we�considered�the� following�three�general�topics:�the�normal�distribution,�standard�scores,�and�skewness�and� kurtosis��In�this�chapter,�we�begin�to�move�from�descriptive�statistics�into�inferential�statis- tics�(in�which�normally�distributed�data�play�a�major�role)��The�two�basic�topics�described� in�this�chapter�are�probability,�and�sampling�and�estimation��First,�as�a�brief�introduction� to�probability,�we�discuss�the�importance�of�probability�in�statistics,�define�probability�in�a� conceptual�and�computational�sense,�and�discuss�the�notion�of�intuition�versus�probabil- ity�� Second,� under� sampling� and� estimation,� we� formally� move� into� inferential� statistics� by�considering�the�following�topics:�simple�random�sampling�(and�briefly�other�types�of� sampling),�and�estimation�of�population�parameters�and�sampling�distributions��Concepts� to� be� discussed� include� probability,� inferential� statistics,� simple� random� sampling� (with� and�without�replacement),�sampling�distribution�of�the�mean,�variance�and�standard�error� 106 An Introduction to Statistical Concepts of�the�mean�(sampling�error),�CIs�(point�vs��interval�estimation),�and�central�limit�theorem�� Our�objectives�are�that�by�the�end�of�this�chapter,�you�will�be�able�to�(a)�understand�the� most�basic�concepts�of�probability;�(b)�understand�and�conduct�simple�random�sampling;� and�(c)�understand,�determine,�and�interpret�the�results�from�the�estimation�of�population� parameters�via�a�sample� 5.1 Brief Introduction to Probability The�area�of�probability�became�important�and�began�to�be�developed�during�the�Middle� Ages�(seventeenth�and�eighteenth�centuries),�when�royalty�and�other�well-to-do�gamblers� consulted� with� mathematicians� for� advice� on� games� of� chance�� For� example,� in� poker� if� you�hold�two�jacks,�what�are�your�chances�of�drawing�a�third�jack?�Or�in�craps,�what�is�the� chance�of�rolling�a�“7”�with�two�dice?�During�that�time,�probability�was�also�used�for�more� practical� purposes,� such� as� to� help� determine� life� expectancy� to� underwrite� life� insur- ance� policies�� Considerable� development� in� probability� has� obviously� taken� place� since� that� time�� In� this� section,� we� discuss� the� importance� of� probability,� provide� a� definition� of� probability,� and� consider� the� notion� of� intuition� versus� probability�� Although� there� is� much�more�to�the�topic�of�probability,�here�we�simply�discuss�those�aspects�of�probability� necessary�for�the�remainder�of�the�text��For�additional�information�on�probability,�take�a� look�at�texts�by�Rudas�(2004)�or�Tijms�(2004)� 5.1.1   Importance of probability Let�us�first�consider�why�probability�is�important�in�statistics��A�researcher�is�out�collect- ing�some�sample�data�from�a�group�of�individuals�(e�g�,�students,�parents,�teachers,�voters,� corporations,�animals)��Some�descriptive�statistics�are�generated�from�the�sample�data��Say� the�sample�mean,�X – ,�is�computed�for�several�variables�(e�g�,�number�of�hours�of�study�time� per� week,� grade� point� average,� confidence� in� a� political� candidate,� widget� sales,� animal� food�consumption)��To�what�extent�can�we�generalize�from�these�sample�statistics�to�their� corresponding�population�parameters?�For�example,�if�the�mean�amount�of�study�time�per� week�for�a�given�sample�of�graduate�students�is�X – �=�10�hours,�to�what�extent�are�we�able�to� generalize�to�the�population�of�graduate�students�on�the�value�of�the�population�mean�μ? As�we�see,�beginning�in�this�chapter,�inferential�statistics�involve�making�an�inference� about�population�parameters�from�sample�statistics��We�would�like�to�know�(a)�how�much� uncertainty�exists�in�our�sample�statistics�as�well�as�(b)�how�much�confidence�to�place�in� our�sample�statistics��These�questions�can�be�addressed�by�assigning�a�probability�value� to�an�inference��As�we�show�beginning�in�Chapter�6,�probability�can�also�be�used�to�make� statements�about�areas�under�a�distribution�of�scores�(e�g�,�the�normal�distribution)��First,� however,�we�need�to�provide�a�definition�of�probability� 5.1.2   definition of probability In�order�to�more�easily�define�probability,�consider�a�simple�example�of�rolling�a�six-sided�die� (as�there�are�dice�with�different�numbers�of�sides)��Each�of�the�six�sides,�of�course,�has�any- where�from�one�to�six�dots��Each�side�has�a�different�number�of�dots��What�is�the�probability� of�rolling�a�“4”?�Technically,�there�are�six�possible�outcomes�or�events�that�can�occur��One�can� 107Introduction to Probability and Sample Statistics also�determine�how�many�times�a�specific�outcome�or�event�actually�can�occur��These�two� concepts�are�used�to�define�and�compute�the�probability�of�a�particular�outcome�or�event�by � p A S T ( ) = where p(A)�is�the�probability�that�outcome�or�event�A�will�occur S�is�the�number�of�times�that�the�specific�outcome�or�event�A�can�occur T�is�the�total�number�of�outcomes�or�events�possible Let�us�revisit�our�example,�the�probability�of�rolling�a�“4�”�A�“4”�can�occur�only�once,�thus� S�=�1��There�are�six�possible�values�that�can�be�rolled,�thus�T�=�6��Therefore�the�probability� of�rolling�a�“4”�is�determined�by � p S T ( )4 1 6 = = This�assumes,�however,�that�the�die�is�unbiased,�which�means�that�the�die�is�fair�and�that� the�probability�of�obtaining�any�of�the�six�outcomes�is�the�same��For�a�fair,�unbiased�die,� the�probability�of�obtaining�any�outcome�is�1/6��Gamblers�have�been�known�to�possess�an� unfair,�biased�die�such�that�the�probability�of�obtaining�a�particular�outcome�is�different� from�1/6�(e�g�,�to�cheat�their�opponent�by�shaving�one�side�of�the�die)� Consider�one�other�classic�probability�example��Imagine�you�have�an�urn�(or�other�con- tainer)��Inside�of�the�urn�and�out�of�view�are�a�total�of�nine�balls�(thus�T�=�9),�six�of�the�balls� being�red�(event�A;�S�=�6)�and�the�other�three�balls�being�green�(event�B;�S�=�3)��Your�task� is�to�draw�one�ball�out�of�the�urn�(without�looking)�and�then�observe�its�color��The�prob- ability�of�each�of�these�two�events�occurring�on�the�first�draw�is�as�follows: � p A S T ( ) = = = 6 9 2 3 � p B S T ( ) = = = 3 9 1 3 Thus�the�probability�of�drawing�a�red�ball�is�2/3,�and�the�probability�of�drawing�a�green� ball�is�1/3� Two�notions�become�evident�in�thinking�about�these�examples��First,�the�sum�of�the� probabilities�for�all�distinct�or�independent�events�is�precisely�1��In�other�words,�if�we� take�each�distinct�event�and�compute�its�probability,�then�the�sum�of�those�probabilities� must�be�equal�to�one�so�as�to�account�for�all�possible�outcomes��Second,�the�probability� of�any�given�event�(a)�cannot�exceed�one�and�(b)�cannot�be�less�than�zero��Part�(a)�should� be� obvious� in� that� the� sum� of� the� probabilities� for� all� events� cannot� exceed� one,� and� therefore�the�probability�of�any�one�event�cannot�exceed�one�either�(it�makes�no�sense� to�talk�about�an�event�occurring�more�than�all�of�the�time)��An�event�would�have�a�prob- ability�of�one�if�no�other�event�can�possibly�occur,�such�as�the�probability�that�you�are� currently�breathing��For�part�(b)�no�event�can�have�a�negative�probability�(it�makes�no� 108 An Introduction to Statistical Concepts sense�to�talk�about�an�event�occurring�less�than�never);�however,�an�event�could�have� a� zero� probability� if� the� event� can� never� occur�� For� instance,� in� our� urn� example,� one� could�never�draw�a�purple�ball� 5.1.3   Intuition Versus probability At�this�point,�you�are�probably�thinking�that�probability�is�an�interesting�topic��However,� without�extensive�training�to�think�in�a�probabilistic�fashion,�people�tend�to�let�their�intu- ition�guide�them��This�is�all�well�and�good,�except�that�intuition�can�often�guide�you�to�a� different�conclusion�than�probability��Let�us�examine�two�classic�examples�to�illustrate�this� dilemma��The�first�classic�example�is�known�as�the�“birthday�problem�”�Imagine�you�are�in� a�room�of�23�people��You�ask�each�person�to�write�down�their�birthday�(month�and�day)�on� a�piece�of�paper��What�do�you�think�is�the�probability�that�in�a�room�of�23�people�at�least� two�will�have�the�same�birthday? Assume� first� that� we� are� dealing� with� 365� different� possible� birthdays,� where� leap� year� (February�29)�is�not�considered��Also�assume�the�sample�of�23�people�is�randomly�drawn�from� some�population�of�people��Taken�together,�this�implies�that�each�of�the�365�different�possible� birthdays�has�the�same�probability�(i�e�,�1/365)��An�intuitive�thinker�might�have�the�following� thought�processing��“There�are�365�different�birthdays�in�a�year�and�there�are�23�people�in�the� sample��Therefore�the�probability�of�two�people�having�the�same�birthday�must�be�close�to�zero�”� We�try�this�on�our�introductory�students�each�year,�and�their�guesses�are�usually�around�zero� Intuition� has� led� us� astray,� and� we� have� not� used� the� proper� thought� processing�� True,� there�are�365�days�and�23�people��However,�the�question�really�deals�with�pairs�of�people�� There� is� a� fairly� large� number� of� different� possible� pairs� of� people� [i�e�,� person� 1� with� 2,� 1� with�3,�etc�,�where�the�total�number�of�different�pairs�of�people�is�equal�to�n(n�−�1)/2�=�23(22)/� 2 = 253]��All�we�need�is�for�one�pair�to�have�the�same�birthday��While�the�probability�compu- tations�are�a�little�complex�(see�Appendix),�the�probability�that�at�least�two�individuals�will� have�the�same�birthday�in�a�group�of�23�is�equal�to��507��That�is�right,�about�one-half�of�the� time�a�group�of�23�people�will�have�two�or�more�with�the�same�birthday��Our�introductory� classes�typically�have�between�20�and�40�students��More�often�than�not,�we�are�able�to�find� two�students�with�the�same�birthday��One�year�one�of�us�wrote�each�birthday�on�the�board�so� that�students�could�see�the�data��The�first�two�students�selected�actually�had�the�same�birth- day,�so�our�point�was�very�quickly�shown��What�was�the�probability�of�that�event�occurring? The� second� classic� example� is� the� “gambler’s� fallacy,”� sometimes� referred� to� as� the� “law�of�averages�”�This�works�for�any�game�of�chance,�so�imagine�you�are�flipping�a�coin�� Obviously�there�are�two�possible�outcomes�from�a�coin�flip,�heads�and�tails��Assume�the� coin�is�fair�and�unbiased�such�that�the�probability�of�flipping�a�head�is�the�same�as�flipping� a�tail,�that�is,��5��After�flipping�the�coin�nine�times,�you�have�observed�a�tail�every�time�� What�is�the�probability�of�obtaining�a�head�on�the�next�flip? An�intuitive�thinker�might�have�the�following�thought�processing��“I�have�just�observed�a� tail�each�of�the�last�nine�flips��According�to�the�law�of�averages,�the�probability�of�observing�a� head�on�the�next�flip�must�be�near�certainty��The�probability�must�be�nearly�one�”�We�also�try� this�on�our�introductory�students�every�year,�and�their�guesses�are�almost�always�near�one� Intuition� has� led� us� astray� once� again� as� we� have� not� used� the� proper� thought� pro- cessing��True,�we�have�just�observed�nine�consecutive�tails��However,�the�question�really� deals� with� the� probability� of� the� 10th� flip� being� a� head,� not� the� probability� of� obtaining� 10�consecutive�tails��The�probability�of�a�head�is�always��5�with�a�fair,�unbiased�coin��The� coin�has�no�memory;�thus�the�probability�of�tossing�a�head�after�nine�consecutive�tails�is� the�same�as�the�probability�of�tossing�a�head�after�nine�consecutive�heads,��5��In�technical� 109Introduction to Probability and Sample Statistics terms,�the�probabilities�of�each�event�(each�toss)�are�independent�of�one�another��In�other� words,�the�probability�of�flipping�a�head�is�the�same�regardless�of�the�preceding�flips��This� is� not� the� same� as� the� probability� of� tossing� 10� consecutive� heads,� which� is� rather� small� (approximately��0010)��So�when�you�are�gambling�at�the�casino�and�have�lost�the�last�nine� games,�do�not�believe�that�you�are�guaranteed�to�win�the�next�game��You�can�just�as�easily� lose�game�10�as�you�did�game�1��The�same�goes�if�you�have�won�a�number�of�games��You� can�just�as�easily�win�the�next�game�as�you�did�game�1��To�some�extent,�the�casinos�count� on�their�customers�playing�the�gambler’s�fallacy�to�make�a�profit� 5.2 Sampling and Estimation In�Chapter�3,�we�spent�some�time�discussing�sample�statistics,�including�the�measures�of� central�tendency�and�dispersion��In�this�section,�we�expand�upon�that�discussion�by�defin- ing�inferential�statistics,�describing�different�types�of�sampling,�and�then�moving�into�the� implications�of�such�sampling�in�terms�of�estimation�and�sampling�distributions� Consider� the� situation� where� we� have� a� population� of� graduate� students�� Population� parameters�(characteristics�of�a�population)�could�be�determined,�such�as�the�population� size  N,� the� population� mean� μ,� the� population� variance� σ2,� and� the� population� standard� deviation�σ��Through�some�method�of�sampling,�we�then�take�a�sample�of�students�from� this�population��Sample�statistics�(characteristics�of�a�sample)�could�be�determined,�such� as�the�sample�size�n,�the�sample�mean�X – ,�the�sample�variance�s2,�and�the�sample�standard� deviation�s� How� often� do� we� actually� ever� deal� with� population� data?� Except� when� dealing� with� very� small,� well-defined� populations,� we� almost� never� deal� with� population� data�� The� main�reason�for�this�is�cost,�in�terms�of�time,�personnel,�and�economics��This�means�then� that� we� are� almost� always� dealing� with� sample� data�� With� descriptive� statistics,� dealing� with�sample�data�is�very�straightforward,�and�we�only�need�to�make�sure�we�are�using�the� appropriate�sample�statistic�equation��However,�what�if�we�want�to�take�a�sample�statistic� and�make�some�generalization�about�its�relevant�population�parameter?�For�example,�you� have�computed�a�sample�mean�on�grade�point�average�(GPA)�of�X – �=�3�25�for�a�sample�of�25� graduate�students�at�State�University��You�would�like�to�make�some�generalization�from� this� sample� mean� to� the� population� mean� μ� at� State� University�� How� do� we� do� this?� To� what�extent�can�we�make�such�a�generalization?�How�confident�are�we�that�this�sample� mean�actually�represents�the�population�mean? This� brings� us� to� the� field� of� inferential� statistics�� We� define� inferential statistics� as� statistics�that�allow�us�to�make�an�inference�or�generalization�from�a�sample�to�the�popu- lation�� In� terms� of� reasoning,� inductive� reasoning� is� used� to� infer� from� the� specific� (the� sample)� to� the� general� (the� population)�� Thus� inferential� statistics� is� the� answer� to� all� of� our�preceding�questions�about�generalizing�from�sample�statistics�to�population�param- eters��How�the�sample�is�derived,�however,�is�important�in�determining�to�what�extent�the� statistical�results�we�derive�can�be�inferred�from�the�sample�back�to�the�population��Thus,� it�is�important�to�spend�a�little�time�talking�about�simple�random�sampling,�the�only�sam- pling�procedure�that�allows�generalizations�to�be�made�from�the�sample�to�the�population�� (Although�there�are�statistical�means�to�correct�for�non-simple�random�samples,�they�are� beyond� the� scope� of� this� textbook�)� In� the� remainder� of� this� section,� and� in� much� of� the� remainder� of� this� text,� we� take� up� the� details� of� inferential� statistics� for� many� different� procedures� 110 An Introduction to Statistical Concepts 5.2.1   Simple Random Sampling There�are�several�different�ways�in�which�a�sample�can�be�drawn�from�a�population�� In�this�section�we�introduce�simple�random�sampling,�which�is�a�commonly�used�type� of� sampling� and� which� is� also� assumed� for� many� inferential� statistics� (beginning� in� Chapter� 6)�� Simple random sampling� is� defined� as� the� process� of� selecting� sample� observations�from�a�population�so�that�each�observation�has�an�equal�and�independent� probability� of� being� selected�� If� the� sampling� process� is� truly� random,� then� (a)� each� observation� in� the� population� has� an� equal� chance� of� being� included� in� the� sample,� and�(b)�each�observation�selected�into�the�sample�is�independent�of�(or�not�affected�by)� every�other�selection��Thus�a�volunteer�or�“street-corner”�sample�would�not�meet�the� first�condition�because�members�of�the�population�who�do�not�frequent�that�particular� street�corner�have�no�chance�of�being�included�in�the�sample� In� addition,� if� the� selection� of� spouses� required� the� corresponding� selection� of� their� respective�mates,�then�the�second�condition�would�not�be�met��For�example,�if�the�selection� of�Mr��Joe�Smith�III�also�required�the�selection�of�his�wife,�then�these�two�selections�are�not� independent�of�one�another��Because�we�selected�Mr��Joe�Smith�III,�we�must�also�therefore� select�his�wife��Note�that�through�independent�sampling�it�is�possible�for�Mr��Smith�and� his�wife�to�both�be�sampled,�but�it�is�not�required��Thus,�independence�implies�that�each� observation�is�selected�without�regard�to�any�other�observation�sampled� We� also� would� fail� to� have� equal� and� independent� probability� of� selection� if� the� sam- pling�procedure�employed�was�something�other�than�a�simple�random�sample—because�it� is�only�with�a�simple�random�sample�that�we�have�met�the�conditions�(a)�and�(b)�presented� earlier� in� the� paragraph�� (Although� there� are� statistical� means� to� correct� for� non-simple� random�samples,�they�are�beyond�the�scope�of�this�textbook�)�This�concept�of�independence� is� an� important� assumption� that� we� will� become� acquainted� with� more� in� the� remain- ing�chapters��If�we�have�independence,�then�generalizations�from�the�sample�back�to�the� population� can� be� made� (you� may� remember� this� as� external validity� which� was� likely� introduced� in� your� research� methods� course)� (see� Figure� 5�1)�� Because� of� the� connection� between�simple�random�sampling�and�independence,�let�us�expand�our�discussion�on�the� two�types�of�simple�random�sampling� 5.2.1.1   Simple Random Sampling With Replacement There�are�two�specific�types�of�simple�random�sampling��Simple random sampling with replacement�is�conducted�as�follows��The�first�observation�is�selected�from�the�population� into�the�sample,�and�that�observation�is�then�replaced�back�into�the�population��The�second� observation�is�selected�and�then�replaced�in�the�population��This�continues�until�a�sample� of� the� desired� size� is� obtained�� The� key� here� is� that� each� observation� sampled� is� placed� back�into�the�population�and�could�be�selected�again� This�scenario�makes�sense�in�certain�applications�and�not�in�others��For�example,�return� to�our�coin�flipping�example�where�we�now�want�to�flip�a�coin�100�times�(i�e�,�a�sample�size� of�100)��How�does�this�operate�in�the�context�of�sampling?�We�flip�the�coin�(e�g�,�heads)�and� record�the�result��This�“head”�becomes�the�first�observation�in�our�sample��This�observa- tion� is� then� placed� back� into� the� population�� Then� a� second� observation� is� made� and� is� placed�back�into�the�population��This�continues�until�our�sample�size�requirement�of�100�is� reached��In�this�particular�scenario�we�always�sample�with�replacement,�and�we�automati- cally�do�so�even�if�we�have�never�heard�of�sampling�with�replacement��If�no�replacement� took�place,�then�we�could�only�ever�have�a�sample�size�of�two,�one�“head”�and�one�“tail�” 111Introduction to Probability and Sample Statistics 5.2.1.2   Simple Random Sampling Without Replacement In� other� scenarios,� sampling� with� replacement� does� not� make� sense�� For� example,� say� we�are�conducting�a�poll�for�the�next�major�election�by�randomly�selecting�100�students� (the� sample)� at� a� local� university� (the� population)�� As� each� student� is� selected� into� the� sample,�they�are�removed�and�cannot�be�sampled�again��It�simply�would�make�no�sense� if� our� sample� of� 100� students� only� contained� 78� different� students� due� to� replacement� (as� some� students� were� polled� more� than� once)�� Our� polling� example� represents� the� other�type�of�simple�random�sampling,�this�time�without�replacement��Simple random sampling without replacement� is� conducted� in� a� similar� fashion� except� that� once� an� observation� is� selected� for� inclusion� in� the� sample,� it� is� not� replaced� and� cannot� be� selected�a�second�time� 5.2.1.3   Other Types of Sampling There� are� several� other� types� of� sampling�� These� other� types� of� sampling� include� con- venient� sampling� (i�e�,� volunteer� or� “street-corner”� sampling� previously� mentioned),� systematic� sampling� (e�g�,� select� every� 10th� observation� from� the� population� into� the� sample),�cluster�sampling�(i�e�,�sample�groups�or�clusters�of�observations�and�include�all� members�of�the�selected�clusters�in�the�sample),�stratified�sampling�(i�e�,�sampling�within� subgroups� or� strata� to� ensure� adequate� representation� of� each� strata),� and� multistage� sampling�(e�g�,�stratify�at�one�stage�and�randomly�sample�at�another�stage)��These�types� of� sampling� are� beyond� the� scope� of� this� text,� and� the� interested� reader� is� referred� to� sampling� texts� such� as� Sudman� (1976),� Kalton� (1983),� Jaeger� (1984),� Fink� (1995),� or� Levy� and�Lemeshow�(1999)� Step 1: Population Step 2: Draw simple random sample Step 3: Compute sample statistics Step 4: Make inference back to the population FIGuRe 5.1 Cycle�of�inference� 112 An Introduction to Statistical Concepts 5.2.2   estimation of population parameters and Sampling distributions Take� as� an� example� the� situation� where� we� select� one� random� sample� of� n� females� (e�g�,� n = 20),�measure�their�weight,�and�then�compute�the�mean�weight�of�the�sample��We�find� the�mean�of�this�first�sample�to�be�102�pounds�and�denote�it�by�X – 1�=�102,�where�the�subscript� identifies�the�first�sample��This�one�sample�mean�is�known�as�a�point�estimate�of�the�popu- lation�mean�μ,�as�it�is�simply�one�value�or�point��We�can�then�proceed�to�collect�weight�data� from�a�second�sample�of�n�females�and�find�that�X – 2�=�110��Next�we�collect�weight�data�from� a�third�sample�of�n�females�and�find�that�X – 3�=�119��Imagine�that�we�go�on�to�collect�such�data� from�many�other�samples�of�size�n�and�compute�a�sample�mean�for�each�of�those�samples� 5.2.2.1   Sampling Distribution of the Mean At� this� point,� we� have� a� collection� of� sample� means,� which� we� can� use� to� construct� a� frequency�distribution�of�sample�means��This�frequency�distribution�is�formally�known� as�the�sampling distribution of the mean��To�better�illustrate�this�new�distribution,�let� us�take�a�very�small�population�from�which�we�can�take�many�samples��Here�we�define� our�population�of�observations�as�follows:�1,�2,�3,�5,�9�(in�other�words,�we�have�five�values� in� our� population)�� As� the� entire� population� is� known� here,� we� can� better� illustrate� the� important�underlying�concepts��We�can�determine�that�the�population�mean�μX�=�4�and� the�population�variance�σX 2 �=�8,�where�X�indicates�the�variable�we�are�referring�to��Let�us� first�take�all�possible�samples�from�this�population�of�size�2�(i�e�,�n�=�2)�with�replacement�� As� there� are� only� five� observations,� there� will� be� 25� possible� samples� as� shown� in� the� upper�portion�of�Table�5�1,�called�“Samples�”�Each�entry�represents�the�two�observations� for�a�particular�sample��For�instance,�in�row�1�and�column�4,�we�see�1,5��This�indicates�that� the�first�observation�is�a�1�and�the�second�observation�is�a�5��If�sampling�was�done�without� replacement,�then�the�diagonal�of�the�table�from�upper�left�to�lower�right�would�not�exist�� For�instance,�a�1,1�sample�could�not�be�selected�if�sampling�without�replacement� Now� that� we� have� all� possible� samples� of� size� 2,� let� us� compute� the� sample� means� for� each� of� the� 25� samples�� The� sample� means� are� shown� in� the� middle� portion� of� Table� 5�1,� called�“Sample�means�”�Just�eyeballing�the�table,�we�see�the�means�range�from�1�to�9�with� numerous�different�values�in�between��We�then�compute�the�mean�of�the�25�sample�means� to�be�4,�as�shown�in�the�bottom�portion�of�Table�5�1,�called�“Mean�of�the�sample�means�” This� is� a� matter� for� some� discussion,� so� consider� the� following� three� points�� First,� the� distribution�of�X – �for�all�possible�samples�of�size�n�is�known�as�the�sampling�distribution� of�the�mean��In�other�words,�if�we�were�to�take�all�of�the�“sample�mean”�values�in�Table�5�1� and�construct�a�histogram�of�those�values,�then�that�is�what�is�referred�to�as�a�“sampling� distribution�of�the�mean�”�It�is�simply�the�distribution�(i�e�,�histogram)�of�all�the�“sample� mean”�values��Second,�the�mean�of�the�sampling�distribution�of�the�mean�for�all�possible� samples�of�size�n�is�equal�to�μX–��As�the�mean�of�the�sampling�distribution�of�the�mean�is� denoted�by�μX–�(the�mean�of�the�X – s),�then�we�see�for�the�example�that�μX–�=�μX�=�4��In�other� words,�the�mean�of�the�sampling�distribution�of�the�mean�is�simply�the�average�of�all�of� the�“sample�means”�in�Table�5�1��The�mean�of�the�sampling�distribution�of�the�mean�will� always�be�equal�to�the�population�mean� Third,�we�define�sampling error�in�this�context�as�the�difference�(or�deviation)�between� a�particular�sample�mean�and�the�population�mean,�denoted�as�X – �−�μX��A�positive�sam- pling�error�indicates�a�sample�mean�greater�than�the�population�mean,�where�the�sam- ple�mean�is�known�as�an�overestimate�of�the�population�mean��A�zero�sampling�error� indicates� a� sample� mean� exactly� equal� to� the� population� mean�� A� negative� sampling� 113Introduction to Probability and Sample Statistics error�indicates�a�sample�mean�less�than�the�population�mean,�where�the�sample�mean� is� known� as� an� underestimate� of� the� population� mean�� As� a� researcher,� we� want� the� sampling�error�to�be�as�close�to�zero�as�possible�to�suggest�that�the�sample�reflects�the� population�well� 5.2.2.2   Variance Error of the Mean Now�that�we�have�a�measure�of�the�mean�of�the�sampling�distribution�of�the�mean,�let�us� consider�the�variance�of�this�distribution��We�define�the�variance�of�the�sampling�distribu- tion�of�the�mean,�known�as�the�variance error of the mean,�as� σX 2 ��This�will�provide�us� with� a� dispersion� measure� of� the� extent� to� which� the� sample� means� vary� and� will� also� provide�some�indication�of�the�confidence�we�can�place�in�a�particular�sample�mean��The� variance�error�of�the�mean�is�computed�as � σ σ X X n 2 2 = where σX 2 �is�the�population�variance�of�X n�is�the�sample�size Table 5.1 All�Possible�Samples�and�Sample�Means�for�n�=�2�from�the�Population�of�1,�2,�3,�5,�9 First Observation Second Observation Samples 1 2 3 5 9 1 1,1 1,2 1,3 1,5 1,9 2 2,1 2,2 2,3 2,5 2,9 3 3,1 3,2 3,3 3,5 3,9 5 5,1 5,2 5,3 5,5 5,9 9 9,1 9,2 9,3 9,5 9,9 Sample Means 1 1�0 1�5 2�0 3�0 5�0 2 1�5 2�0 2�5 3�5 5�5 3 2�0 2�5 3�0 4�0 6�0 5 3�0 3�5 4�0 5�0 7�0 9 5�0 5�5 6�0 7�0 9�0 X =∑ 12 5. X =∑ 15 0. X =∑ 17 5. X =∑ 22 5. X =∑ 32 5. Mean�of�the�sample�means: µX X number of samples = = = ∑ 100 25 4 0. Variance�of�the�sample�means: σX number of samples X X number of samples 2 2 2 2 25 500 = − ( ) = −∑ ∑( ) ( ) ( ) (1100 25 25 500 10 000 25 4 0 2 2 2 ) ( ) ( ) , ( ) .= − = 114 An Introduction to Statistical Concepts For�the�example,�we�have�already�determined�that�σX 2 �=�8�and�that�n�=�2;�therefore, � σ σ X X n 2 2 8 2 4= = = This�is�verified�in�the�bottom�portion�of�Table�5�1,�called�“Variance�of�the�sample�means,”� where�the�variance�error�is�computed�from�the�collection�of�sample�means� What�will�happen�if�we�increase�the�size�of�the�sample?�If�we�increase�the�sample�size�to� n�=�4,�then�the�variance�error�is�reduced�to�2��Thus�we�see�that�as�the�size�of�the�sample�n� increases,�the�magnitude�of�the�sampling�error�decreases��Why?�Conceptually,�as�sample� size�increases,�we�are�sampling�a�larger�portion�of�the�population��In�doing�so,�we�are�also� obtaining� a� sample� that� is� likely� more� representative� of� the� population�� In� addition,� the� larger�the�sample�size,�the�less�likely�it�is�to�obtain�a�sample�mean�that�is�far�from�the�popu- lation�mean��Thus,�as�sample�size�increases,�we�hone�in�closer�and�closer�to�the�population� mean�and�have�less�and�less�sampling�error� For�example,�say�we�are�sampling�from�a�voting�district�with�a�population�of�5000�vot- ers��A�survey�is�developed�to�assess�how�satisfied�the�district�voters�are�with�their�local� state�representative��Assume�the�survey�generates�a�100-point�satisfaction�scale��First�we� determine�that�the�population�mean�of�satisfaction�is�75��Next�we�take�samples�of�different� sizes��For�a�sample�size�of�1,�we�find�sample�means�that�range�from�0�to�100�(i�e�,�each�mean� really�only�represents�a�single�observation)��For�a�sample�size�of�10,�we�find�sample�means� that�range�from�50�to�95��For�a�sample�size�of�100,�we�find�sample�means�that�range�from� 70�to�80��We�see�then�that�as�sample�size�increases,�our�sample�means�become�closer�and� closer�to�the�population�mean,�and�the�variability�of�those�sample�means�becomes�smaller� and�smaller� 5.2.2.3   Standard Error of the Mean We� can� also� compute� the� standard� deviation� of� the� sampling� distribution� of� the� mean,� known�as�the�standard error of the mean,�by � σ σ X X n = Thus�for�the�example�we�have � σ σ X X n = = = 2 8284 2 2 . Because�the�applied�researcher�typically�does�not�know�the�population�variance,�the�pop- ulation�variance�error�of�the�mean�and�the�population�standard�error�of�the�mean�can�be� estimated�by�the�following,�respectively: � s s nX X2 2 = 115Introduction to Probability and Sample Statistics and � s s n X X= 5.2.2.4   Confidence Intervals Thus� far� we� have� illustrated� how� a� sample� mean� is� a� point estimate� of� the� popula- tion� mean� and� how� a� variance� error� gives� us� some� sense� of� the� variability� among� the� sample� means�� Putting� these� concepts� together,� we� can� also� build� an� interval estimate� for� the� population� mean� to� give� us� a� sense� of� how� confident� we� are� in� our� particular�sample�mean��We�can�form�a�confidence interval (CI)�around�a�particular� sample�mean�as�follows��As�we�learned�in�Chapter�4,�for�a�normal�distribution,�68%�of� the�distribution�falls�within�one�standard�deviation�of�the�mean��A�68%�CI�of�a�sample� mean�can�be�formed�as follows: � 68% CI = ±X Xσ Conceptually,� this� means� that� if� we� form� 68%� CIs� for� 100� sample� means,� then� 68� of� those� 100� intervals� would� contain� or� include� the� population� mean� (it� does� not� mean� that� there� is� a� 68%� probability� of� the� interval� containing� the� population� mean—the� interval� either� contains� it� or� does� not)�� Because� the� applied� researcher� typically� only� has�one�sample�mean�and�does�not�know�the�population�mean,�he�or�she�has�no�way� of�knowing�if�this�one�CI�actually�contains�the�population�mean�or�not��If�one�wanted� to�be�more�confident�in�a�sample�mean,�then�a�90%�CI,�a�95%�CI,�or�a�99%�CI�could�be� formed�as�follows: � 90 1 645% CI .= ±X Xσ � 95 1 96% CI .= ±X Xσ � 99 2 5758% CI .= ±X Xσ Thus�for�the�90%�CI,�the�population�mean�will�be�contained�in�90�out�of�100�CIs;�for�the� 95%�CI,�the�population�mean�will�be�contained�in�95�out�of�100�CIs;�and�for�the�99%�CI,�the� population�mean�will�be�contained�in�99�out�of�100�CIs��The�critical�values�of�1�645,�1�96,� and�2�5758�come�from�the�standard�unit�normal�distribution�table�(Table�A�1)�and�indicate� the�width�of�the�CI��Wider�CIs,�such�as�the�99%�CI,�enable�greater�confidence��For�example,� with�a�sample�mean�of�70�and�a�standard�error�of�the�mean�of�3,�the�following�CIs�result:� 68%�CI�=�(67,�73)�[i�e�,�ranging�from�67�to�73];�90%�CI�=�(65�065,�74�935);�95%�CI�=�(64�12,�75�88);� and�99%�CI�=�(62�2726,�77�7274)��We�can�see�here�that�to�be�assured�that�99%�of�the�CIs�con- tain�the�population�mean,�then�our�interval�must�be�wider�(i�e�,�ranging�from�about�62�27�to� 77�73,�or�a�range�of�about�15)�than�the�CIs�that�are�lesser�(e�g�,�the�95%�CI�ranges�from�64�12� to�75�88,�or�a�range�of�about�11)� 116 An Introduction to Statistical Concepts In�general,�a�CI�for�any�level�of�confidence�(i�e�,�XX%�CI)�can�be�computed�by�the�follow- ing�general�formula: � XX X zcv X% CI = ± σ where�zcv�is�the�critical� value�taken� from�the�standard�unit�normal�distribution�table�for� that�particular�level�of�confidence,�and�the�other�values�are�as�before� 5.2.2.5   Central Limit Theorem In�our�discussion�of�CIs,�we�used�the�normal�distribution�to�help�determine�the�width� of�the�intervals��Many�inferential�statistics�assume�the�population�distribution�is�nor- mal� in� shape�� Because� we� are� looking� at� sampling� distributions� in� this� chapter,� does� the�shape�of�the�original�population�distribution�have�any�relationship�to�the�sampling� distribution�of�the�mean�we�obtain?�For�example,�if�the�population�distribution�is�non- normal,� what� form� does� the� sampling� distribution� of� the� mean� take� (i�e�,� is� the� sam- pling�distribution�of�the�mean�also�nonnormal)?�There�is�a�nice�concept,�known�as�the� central limit theorem,�to�assist�us�here��The�central�limit�theorem�states�that�as�sample� size�n�increases,�the�sampling�distribution�of�the�mean�from�a�random�sample�of�size� n� more� closely� approximates� a� normal� distribution�� If� the� population� distribution� is� normal�in�shape,�then�the�sampling�distribution�of�the�mean�is�also�normal�in�shape�� If� the� population� distribution� is� not� normal� in� shape,� then� the� sampling� distribution� of� the� mean� becomes� more� nearly� normal� as� sample� size� increases�� This� concept� is� graphically�depicted�in�Figure�5�2� The�top�row�of�the�figure�depicts�two�population�distributions,�the�left�one�being�normal� and�the�right�one�being�positively�skewed��The�remaining�rows�are�for�the�various�sam- pling� distributions,� depending� on� the� sample� size�� The� second� row� shows� the� sampling� distributions�of�the�mean�for�n�=�1��Note�that�these�sampling�distributions�look�precisely� like�the�population�distributions,�as�each�observation�is�literally�a�sample�mean��The�next� row�gives�the�sampling�distributions�for�n�=�2;�here�we�see�for�the�skewed�population�that� the�sampling�distribution�is�slightly�less�skewed��This�is�because�the�more�extreme�obser- vations�are�now�being�averaged�in�with�less�extreme�observations,�yielding�less�extreme� Normal Positively skewed Population ------------------------------------------------------------------ n = 1 n = 2 n = 4 n = 25 FIGuRe 5.2 Central�limit�theorem�for�normal�and�positively�skewed�population�distributions� 117Introduction to Probability and Sample Statistics means��For�n�=�4,�the�sampling�distribution�in�the�skewed�case�is�even�less�skewed�than�for� n = 2��Eventually�we�reach�the�n�=�25�sampling�distribution,�where�the�sampling�distribu- tion� for� the� skewed� case� is� nearly� normal� and� nearly� matches� the� sampling� distribution� for�the�normal�case��This�phenomenon�will�occur�for�other�nonnormal�population�distri- butions�as�well�(e�g�,�negatively�skewed)��The�morale�of�the�story�here�is�a�good�one��If�the� population�distribution�is�nonnormal,�then�this�will�have�minimal�effect�on�the�sampling� distribution� of� the� mean� except� for� rather� small� samples�� This� can� come� into� play� with� inferential�statistics�when�the�assumption�of�normality�is�not�satisfied,�as�we�see�in�later� chapters� 5.3 Summary In� this� chapter,� we� began� to� move� from� descriptive� statistics� to� the� realm� of� inferential� statistics� The� two� main� topics� we� considered� were� probability,� and� sampling� and� estimation�� First� we� briefly� introduced� probability� by� looking� at� the� importance� of� probability� in� statistics,� defining� probability,� and� comparing� conclusions� often� reached� by� intuition� versus�probability��The�second�topic�involved�sampling�and�estimation,�a�topic�we�return� to�in�most�of�the�remaining�chapters��In�the�sampling�section,�we�defined�and�described� simple�random�sampling,�both�with�and�without�replacement,�and�briefly�outlined�other� types�of�sampling��In�the�estimation�section,�we�examined�the�sampling�distribution�of� the� mean,� the� variance� and� standard� error� of� the� mean,� CIs� around� the� mean,� and� the� central�limit�theorem��At�this�point�you�should�have�met�the�following�objectives:�(a)�be� able�to�understand�the�most�basic�concepts�of�probability,�(b)�be�able�to�understand�and� conduct� simple� random� sampling,� and� (c)� be� able� to� understand,� determine,� and� inter- pret�the�results�from�the�estimation�of�population�parameters�via�a�sample��In�the�next� chapter� we� formally� discuss� our� first� inferential� statistics� situation,� testing� hypotheses� about�a�single�mean� Appendix: Probability That at Least Two Individuals Have the Same Birthday This� probability� can� be� shown� by� either� of� the� following� equations�� Note� that� there� are� n = 23�individuals�in�the�room��One�method�is�as�follows: � 1 365 364 363 365 1 365 1 365 364 363 343 365 5023− × × × × − + = − × × × × = � �( ) . n n 77 An�equivalent�method�is�as�follows: � 1 365 365 364 365 363 365 365 1 365 1 365 365 364 36 − × × × × − +      = − ×� ( )n 55 363 365 343 365 507× × ×      =� . 118 An Introduction to Statistical Concepts Problems Conceptual problems 5.1� The�standard�error�of�the�mean�is�which�one�of�the�following? � a�� Standard�deviation�of�a�sample�distribution � b�� Standard�deviation�of�the�population�distribution � c�� Standard�deviation�of�the�sampling�distribution�of�the�mean � d�� Mean�of�the�sampling�distribution�of�the�standard�deviation 5.2� An�unbiased�six-sided�die�is�tossed�on�two�consecutive�trials,�and�the�first�toss�results� in�a�“2�”�What�is�the�probability�that�a�“2”�will�result�on�the�second�toss? � a�� Less�than�1/6 � b�� 1/6 � c�� More�than�1/6 � d�� Cannot�be�determined 5.3� An�urn�contains�9�balls:�3�green,�4�red,�and�2�blue��The�probability�that�a�ball�selected� at�random�is�blue�is�equal�to�which�one�of�the�following? � a�� 2/9 � b�� 5/9 � c�� 6/9 � d�� 7/9 5.4� Sampling�error�is�which�one�of�the�following? � a�� The�amount�by�which�a�sample�mean�is�greater�than�the�population�mean � b�� The�amount�of�difference�between�a�sample�statistic�and�a�population�parameter � c�� The�standard�deviation�divided�by�the�square�root�of�n � d�� When�the�sample�is�not�drawn�randomly 5.5� What�does�the�central�limit�theorem�state? � a�� The� means� of� many� random� samples� from� a� population� will� be� normally� distributed� � b�� The�raw�scores�of�many�natural�events�will�be�normally�distributed� � c�� z�scores�will�be�normally�distributed� � d�� None�of�the�above� 5.6� For� a� normal� population,� the� variance� of� the� sampling� distribution� of� the� mean� increases�as�sample�size�increases��True�or�false? 5.7� All� other� things� being� equal,� as� the� sample� size� increases,� the� standard� error� of� a� statistic�decreases��True�or�false? 5.8� I�assert�that�the�95%�CI�has�a�larger�(or�wider)�range�than�the�99%�CI�for�the�same� parameter�using�the�same�data��Am�I�correct? 5.9� I�assert�that�the�90%�CI�has�a�smaller�(or�more�narrow)�range�than�the�68%�CI�for�the� same�parameter�using�the�same�data��Am�I�correct? 119Introduction to Probability and Sample Statistics 5.10� I�assert�that�the�mean�and�median�of�any�random�sample�drawn�from�a�symmetric� population�distribution�will�be�equal��Am�I�correct? 5.11� A�random�sample�is�to�be�drawn�from�a�symmetric�population�with�mean�100�and� variance�225��I�assert�that�the�sample�mean�is�more�likely�to�have�a�value�larger�than� 105�if�the�sample�size�is�16�than�if�the�sample�size�is�25��Am�I�correct? 5.12� A� gambler� is� playing� a� card� game� where� the� known� probability� of� winning� is� �40� (win�40%�of�the�time)��The�gambler�has�just�lost�10�consecutive�hands��What�is�the� probability�of�the�gambler�winning�the�next�hand? � a�� Less�than��40 � b�� Equal�to��40 � c�� Greater�than��40 � d�� Cannot�be�determined�without�observing�the�gambler 5.13� On� the� evening� news,� the� anchorwoman� announces� that� the� state’s� lottery� has� reached�$72�billion�and�reminds�the�viewing�audience�that�there�has�not�been�a�win- ner�in�over�5�years��In�researching�lottery�facts,�you�find�a�report�that�states�the�prob- ability� of� winning� the� lottery� is� 1� in� 2� million� (i�e�,� a� very,� very� small� probability)�� What�is�the�probability�that�you�will�win�the�lottery? � a�� Less�than�1�in�2�million � b�� Equal�to�1�in�2�million � c�� Greater�than�1�in�2�million � d�� Cannot�be�determined�without�additional�statistics 5.14� The�probability�of�being�selected�into�a�sample�is�the�same�for�every�individual�in�the� population�for�the�convenient�method�of�sampling��True�or�false? 5.15� Malani�is�conducting�research�on�elementary�teacher�attitudes�toward�changes� in�mathematics�standards��Malani’s�population�consists�of�all�elementary�teach- ers� within� one� district� in� the� state�� Malani� wants� her� sampling� method� to� be� such� that� every� teacher� in� the� population� has� an� equal� and� independent� prob- ability� of� selection�� Which� of� the� following� is� the� most� appropriate� sampling� method? � a�� Convenient�sampling � b�� Simple�random�sampling�with�replacement � c�� Simple�random�sampling�without�replacement � d�� Systematic�sampling 5.16� Sampling�error�increases�with�larger�samples��True�or�false? 5.17� If�a�population�distribution�is�highly�positively�skewed,�then�the�distribution�of�the� sample�means�for�samples�of�size�500�will�be � a�� Highly�negatively�skewed � b�� Highly�positively�skewed � c�� Approximately�normally�distributed � d�� Cannot�be�determined�without�further�information 120 An Introduction to Statistical Concepts Computational problems 5.1� The�population�distribution�of�variable�X,�the�number�of�pets�owned,�consists�of�the� five�values�of�1,�4,�5,�7,�and�8� � a�� Calculate�the�values�of�the�population�mean�and�variance� � b�� List�all�possible�samples�of�size�2�where�samples�are�drawn�with�replacement� � c�� Calculate�the�values�of�the�mean�and�variance�of�the�sampling�distribution�of�the� mean� 5.2� The�following�is�a�random�sampling�distribution�of�the�mean�number�of�children�for� samples�of�size�3,�where�samples�are�drawn�with�replacement� Sample Mean f 1 1 2 2 3 4 4 2 5 1 � a�� What�is�the�population�mean? � b�� What�is�the�population�variance? � c�� What�is�the�mean�of�the�sampling�distribution�of�the�mean? � d�� What�is�the�variance�error�of�the�mean? 5.3� In�a�study�of�the�entire�student�body�of�a�large�university,�if�the�standard�error�of�the� mean�is�20�for�n�=�16,�what�must�the�sample�size�be�to�reduce�the�standard�error�to�5? 5.4� A�random�sample�of�13�statistics�texts�had�a�mean�number�of�pages�of�685�and�a�stan- dard�deviation�of�42��First�calculate�the�standard�error�of�the�mean��Then�calculate� the�95%�CI�for�the�mean�length�of�statistics�texts� 5.5� A�random�sample�of�10�high�schools�employed�a�mean�number�of�guidance�counsel- ors�of�3�and�a�standard�deviation�of�2��First�calculate�the�standard�error�of�the�mean�� Then�calculate�the�90%�CI�for�the�mean�number�of�guidance�counselors� Interpretive problems 5.1� Take�a�six-sided�die,�where�the�population�values�are�obviously�1,�2,�3,�4,�5,�and�6��Take� 20�samples,�each�of�size�2�(e�g�,�every�two�rolls�is�one�sample)��For�each�sample,�calcu- late�the�mean��Then�determine�the�mean�of�the�sampling�distribution�of�the�mean�and� the�variance�error�of�the�mean��Compare�your�results�to�those�of�your�colleagues� 5.2� You�will�need�20�plain�M&M�candy�pieces�and�one�cup��Put�the�candy�pieces�in�the� cup�and�toss�them�onto�a�flat�surface��Count�the�number�of�candy�pieces�that�land� with�the�“M”�facing�up��Write�down�that�number��Repeat�these�steps�five�times��These� steps� will� constitute� one sample�� Next,� generate� four� additional� samples� (i�e�,� repeat� the�process�of�tossing�the�candy�pieces,�counting�the�“Ms,”�and�writing�down�that� number)��Then�determine�the�mean�of�the�sampling�distribution�of�the�mean�and�the� variance�error�of�the�mean��Compare�your�results�to�those�of�your�colleagues� 121 6 Introduction to Hypothesis Testing: Inferences About a Single Mean Chapter Outline 6�1� Types�of�Hypotheses 6�2� Types�of�Decision�Errors 6�2�1� Example�Decision-Making�Situation 6�2�2� Decision-Making�Table 6�3� Level�of�Significance�(α) 6�4� Overview�of�Steps�in�Decision-Making�Process 6�5� Inferences�About�μ�When�σ�Is�Known 6�5�1� z�Test 6�5�2� Example 6�5�3� Constructing�Confidence�Intervals�Around�the�Mean 6�6� Type�II�Error�(β)�and�Power�(1�−�β) 6�6�1� Full�Decision-Making�Context 6�6�2� Power�Determinants 6�7� Statistical�Versus�Practical�Significance 6�8� Inferences�About�μ�When�σ�Is�Unknown 6�8�1� New�Test�Statistic�t 6�8�2� t�Distribution 6�8�3� t�Test 6�8�4� Example 6�9� SPSS 6�10� G*Power 6�11� Template�and�APA-Style�Write-Up Key Concepts � 1�� Null�or�statistical�hypothesis�versus�scientific�or�research�hypothesis � 2�� Type�I�error�(α),�type�II�error�(β),�and�power�(1�−�β) � 3�� Two-tailed�versus�one-tailed�alternative�hypotheses � 4�� Critical�regions�and�critical�values 122 An Introduction to Statistical Concepts � 5�� z�test�statistic � 6�� Confidence�interval�(CI)�around�the�mean � 7�� t�test�statistic � 8�� t�distribution,�degrees�of�freedom,�and�table�of�t�distributions In�Chapter�5,�we�began�to�move�into�the�realm�of�inferential�statistics��There�we�considered� the� following� general� topics:� probability,� sampling,� and� estimation�� In� this� chapter,� we� move�totally�into�the�domain�of�inferential�statistics,�where�the�concepts�involved�in�prob- ability,�sampling,�and�estimation�can�be�implemented��The�overarching�theme�of�the�chap- ter�is�the�use�of�a�statistical�test�to�make�inferences�about�a�single�mean��In�order�to�properly� cover� this� inferential� test,� a� number� of� basic� foundational� concepts� are� described� in� this� chapter�� Many� of� these� concepts� are� utilized� throughout� the� remainder� of� this� text�� The� topics�described�include�the�following:�types�of�hypotheses,�types�of�decision�errors,�level� of� significance� (α),� overview� of� steps� in� the� decision-making� process,� inferences� about  μ� when�σ�is�known,�Type�II�error�(β)�and�power�(1�−�β),�statistical�versus�practical�significance,� and� inferences� about� μ� when� σ� is� unknown�� Concepts� to� be� discussed� include� the� fol- lowing:�null�or�statistical�hypothesis�versus�scientific�or�research�hypothesis;�Type�I�error� (α),� Type  II� error� (β),� and� power� (1� −� β);� two-tailed� versus� one-tailed� alternative� hypoth- eses;�critical�regions�and�critical�values;�z�test�statistic;�confidence�interval�(CI)�around�the� mean;� t� test� statistic;� and� t� distribution,� degrees� of� freedom,� and� table� of� t� distributions�� Our�objectives�are�that�by�the�end�of�this�chapter,�you�will�be�able�to�(a)�understand�the� basic�concepts�of�hypothesis�testing;�(b)�utilize�the�normal�and�t�tables;�and�(c)�understand,� determine,�and�interpret�the�results�from�the�z�test,�t�test,�and�CI�procedures� 6.1 Types of Hypotheses You�may�remember�Marie�from�previous�chapters��We�now�revisit�Marie�in�this�chapter� Marie,� a� graduate� student� pursuing� a� master’s� degree� in� educational� research,� has� completed� her� first� tasks� as� a� research� assistant—determining� a� number� of� descrip- tive�statistics�on�data�provided�to�her�by�her�faculty�mentor��The�faculty�member�was� so�pleased�with�the�descriptive�analyses�and�presentation�of�results�previously�shared� that�she�has�asked�Marie�to�consult�with�a�local�hockey�coach,�Oscar,�who�is�interested� in� examining� team� skating� performance�� Based� on� Oscar’s� research� question:� Is the mean skating speed of a hockey team different from the league mean speed of 12 seconds?�Marie� suggests�a�one-sample�test�of�means�as�the�test�of�inference��Her�task�is�to�assist�Oscar� in�generating�the�test�of�inference�to�answer�his�research�question� Hypothesis�testing�is�a�decision-making�process�where�two�possible�decisions�are�weighed� in�a�statistical�fashion��In�a�way,�this�is�much�like�any�other�decision�involving�two�possi- bilities,�such�as�whether�to�carry�an�umbrella�with�you�today�or�not��In�statistical�decision- making,�the�two�possible�decisions�are�known�as�hypotheses��Sample�data�are�then�used� to�help�us�select�one�of�these�decisions��The�two�types�of�hypotheses�competing�against� one�another�are�known�as�the�null�or�statistical hypothesis,�denoted�by�H0,�and�the�scien- tific, alternative,�or�research hypothesis,�denoted�by�H1� 123Introduction to Hypothesis Testing: Inferences About a Single Mean The�null�or�statistical�hypothesis�is�a�statement�about�the�value�of�an�unknown�popula- tion� parameter�� Considering� the� procedure� we� are� discussing� in� this� chapter,� the� one- sample� mean� test,� one� example� H0� might� be� that� the� population� mean� IQ� score� is� 100,� which�we�denote�as � H H0 000 00 0: 1 or : 1µ µ= − = Mathematically,� both� equations� say� the� same� thing�� The� version� on� the� left� is� the� more� traditional�form�of�the�null�hypothesis�involving�a�single�mean��However,�the�version�on� the� right� makes� clear� to� the� reader� why� the� term� “null”� is� appropriate�� That� is,� there� is� no�difference�or�a�“null”�difference�between�the�population�mean�and�the�hypothesized� mean�value�of�100��In�general,�the�hypothesized�mean�value�is�denoted�by�μ0�(here�μ0�=�100)�� Another� H0� might� be� the� statistics� exam� population� means� are� the� same� for� male� and� female�students,�which�we�denote�as � H0 00: 11 2µ µ− = where μ1�is�the�population�mean�for�males μ2�is�the�population�mean�for�females Here�there�is�no�difference�or�a�“null”�difference�between�the�two�population�means��The� test�of�the�difference�between�two�means�is�presented�in�Chapter�7��As�we�move�through� subsequent�chapters,�we�become�familiar�with�null�hypotheses�that�involve�other�popula- tion�parameters�such�as�proportions,�variances,�and�correlations� The�null�hypothesis�is�basically�set�up�by�the�researcher�in�an�attempt�to�reject�the�null� hypothesis�in�favor�of�our�own�personal�scientific,�alternative,�or�research�hypothesis��In� other�words,�the�scientific�hypothesis�is�what�we�believe�the�outcome�of�the�study�will�be,� based�on�previous�theory�and�research��Thus,�we�are�trying�to�reject�the�null�hypothesis� and�find�evidence�in�favor�of�our�scientific�hypothesis��The�scientific�hypotheses�H1�for�our� two�examples�are � H H1 1: 1 or : 1µ µ≠ − ≠00 00 0 and � H1 1 2 1: µ µ− ≠ 00 Based�on�the�sample�data,�hypothesis�testing�involves�making�a�decision�as�to�whether�the� null�or�the�research�hypothesis�is�supported��Because�we�are�dealing�with�sample�statistics� in�our�decision-making�process,�and�trying�to�make�an�inference�back�to�the�population� parameter(s),�there�is�always�some�risk�of�making�an�incorrect�decision��In�other�words,�the� sample�data�might�lead�us�to�make�a�decision�that�is�not�consistent�with�the�population�� We�might�decide�to�take�an�umbrella�and�it�does�not�rain,�or�we�might�decide�to�leave�the� umbrella�at�home�and�it�rains��Thus,�as�in�any�decision,�the�possibility�always�exists�that� an�incorrect�decision�may�be�made��This�uncertainty�is�due�to�sampling�error,�which,�we� will�see,�can�be�described�by�a�probability�statement��That�is,�because�the�decision�is�made� based�on�sample�data,�the�sample�may�not�be�very�representative�of�the�population�and� therefore�leads�us�to�an�incorrect�decision��If�we�had�population�data,�we�would�always� 124 An Introduction to Statistical Concepts make�the�correct�decision�about�a�population�parameter��Because�we�usually�do�not,�we� use�inferential�statistics�to�help�make�decisions�from�sample�data�and�infer�those�results� back� to� the� population�� The� nature� of� such� decision� errors� and� the� probabilities� we� can� attribute�to�them�are�described�in�the�next�section� 6.2 Types of Decision Errors In� this� section,� we� consider� more� specifically� the� types� of� decision� errors� that� might� be� made�in�the�decision-making�process��First�an�example�decision-making�situation�is�pre- sented��This�is�followed�by�a�decision-making�table�whereby�the�types�of�decision�errors� are�easily�depicted� 6.2.1  example decision-Making Situation Let�us�propose�an�example�decision-making�situation�using�an�adult�intelligence�instru- ment��It�is�known�somehow�that�the�population�standard�deviation�of�the�instrument�is� 15�(i�e�,�σ2�=�225,�σ�=�15)��(In�the�real�world,�it�is�rare�that�the�population�standard�deviation� is�known,�and�we�return�to�reality�later�in�the�chapter�when�the�basic�concepts�have�been� covered��But�for�now,�assume�that�we�know�the�population�standard�deviation�)�Our�null� and�alternative�hypotheses,�respectively,�are�as�follows: � H H0 000 00 0: 1 or : 1µ µ= − = � H H1 1: 1 or : 1µ µ≠ − ≠00 00 0 Thus,�we�are�interested�in�testing�whether�the�population�mean�for�the�intelligence�instru- ment�is�equal�to�100,�our�hypothesized�mean�value,�or�not�equal�to�100� Next�we�take�several�random�samples�of�individuals�from�the�adult�population��We�find� for�our�first�sample�Y – 1�=�105�(i�e�,�denoting�the�mean�for�sample�1)��Eyeballing�the�informa- tion�for�sample�1,�the�sample�mean�is�one-third�of�a�standard�deviation�above�the�hypoth- esized�value�[i�e�,�by�computing�a�z�score�of�(105�−�100)/15�=��3333],�so�our�conclusion�would� probably�be�to�fail�to�reject�H0��In�other�words,�if�the�population�mean�actually�is�100,�then� we�believe�that�one�is�quite�likely�to�observe�a�sample�mean�of�105��Thus,�our�decision�for� sample� 1� is� to� fail� to� reject� H0;� however,� there� is� some� likelihood� or� probability� that� our� decision�is�incorrect� We� take� a� second� sample� and� find� Y – 2� =� 115� (i�e�,� denoting� the� mean� for� sample� 2)�� Eyeballing�the�information�for�sample�2,�the�sample�mean�is�one�standard�deviation�above� the�hypothesized�value�[i�e�,�z�=�(115�−�100)/15�=�1�0000],�so�our�conclusion�would�probably� be�to�fail�to�reject�H0��In�other�words,�if�the�population�mean�actually�is�100,�then�we�believe� that�it�is�somewhat�likely�to�observe�a�sample�mean�of�115��Thus,�our�decision�for�sample�2�is� to�fail�to�reject�H0��However,�there�is�an�even�greater�likelihood�or�probability�that�our�deci- sion�is�incorrect�than�was�the�case�for�sample�1;�this�is�because�the�sample�mean�is�further� away�from�the�hypothesized�value� We�take�a�third�sample�and�find�Y – 3�=�190�(i�e�,�denoting�the�mean�for�sample�3)��Eyeballing� the� information� for� sample� 3,� the� sample� mean� is� six� standard� deviations� above� the� hypothesized�value�[i�e�,�z�=�(190�−�100)/15�=�6�0000],�so�our�conclusion�would�probably�be� 125Introduction to Hypothesis Testing: Inferences About a Single Mean reject�H0��In�other�words,�if�the�population�mean�actually�is�100,�then�we�believe�that�it�is� quite�unlikely�to�observe�a�sample�mean�of�190��Thus,�our�decision�for�sample�3�is�to�reject� H0;�however,�there�is�some�small�likelihood�or�probability�that�our�decision�is�incorrect� 6.2.2  decision-Making Table Let�us�consider�Table�6�1�as�a�mechanism�for�sorting�out�the�possible�outcomes�in�the�sta- tistical�decision-making�process��The�table�consists�of�the�general�case�and�a�specific�case�� First,� in� part� (a)� of� the� table,� we� have� the� possible� outcomes� for� the� general� case�� For� the� state�of�nature�or�reality�(i�e�,�how�things�really�are�in�the�population),�there�are�two�distinct� possibilities�as�depicted�by�the�rows�of�the�table��Either�H0�is�indeed�true�or�H0�is�indeed� false��In�other�words,�according�to�the�real-world�conditions�in�the�population,�either�H0�is� actually�true�or�H0�is�actually�false��Admittedly,�we�usually�do�not�know�what�the�state�of� nature�truly�is;�however,�it�does�exist�in�the�population�data��It�is�the�state�of�nature�that�we� are�trying�to�best�approximate�when�making�a�statistical�decision�based�on�sample�data� For� our� statistical� decision,� there� are� two� distinct� possibilities� as� depicted� by� the� col- umns� of� the� table�� Either� we� fail� to� reject� H0� or� we� reject� H0�� In� other� words,� based� on� our�sample�data,�we�either�fail�to�reject�H0�or�reject�H0��As�our�goal�is�usually�to�reject�H0� in� favor� of� our� research� hypothesis,� we� prefer� the� term� fail to reject� rather� than� accept�� Accept�implies� you� are�willing�to�throw� out�your� research� hypothesis� and�admit�defeat� based� on� one� sample�� Fail to reject� implies� you� still� have� some� hope� for� your� research� hypothesis,�despite�evidence�from�a�single�sample�to�the�contrary� If�we�look�inside�of�the�table,�we�see�four�different�outcomes�based�on�a�combination�of� our�statistical�decision�and�the�state�of�nature��Consider�the�first�row�of�the�table�where�H0� is�in�actuality�true��First,�if�H0�is�true�and�we�fail�to�reject�H0,�then�we�have�made�a�correct� decision;�that�is,�we�have�correctly�failed�to�reject�a�true�H0��The�probability�of�this�first�out- come�is�known�as�1�−�α�(where�α�represents�alpha)��Second,�if�H0�is�true�and�we�reject�H0,� then�we�have�made�a�decision�error�known�as�a�Type I error��That�is,�we�have�incorrectly� rejected�a�true�H0��Our�sample�data�have�led�us�to�a�different�conclusion�than�the�popula- tion�data�would�have��The�probability�of�this�second�outcome�is�known�as�α��Therefore,�if� H0�is�actually�true,�then�our�sample�data�lead�us�to�one�of�two�conclusions,�either�we�cor- rectly�fail�to�reject�H0,�or�we�incorrectly�reject�H0��The�sum�of�the�probabilities�for�these�two� outcomes�when�H0�is�true�is�equal�to�1�[i�e�,�(1�−�α)�+�α�=�1]� Consider� now� the� second� row� of� the� table� where� H0� is� in� actuality� false�� First,� if� H0� is� really�false�and�we�fail�to�reject�H0,�then�we�have�made�a�decision�error�known�as�a�Type II Table 6.1 Statistical�Decision�Table State of Nature (Reality) Decision Fail to Reject H0 Reject H0 (a) General case H0�is�true Correct�decision�(1�−�α) Type�I�error�(α) H0�is�false Type�II�error�(β) Correct�decision�(1�−�β)�=�power (b) Example rain case H0�is�true�(no rain) Correct�decision�(do not take umbrella and no umbrella needed)�(1�−�α) Type�I�error�(take umbrella and look silly)�(α) H0�is�false�(rains) Type�II�error�(do not take umbrella and get wet)�(β) Correct�decision�(take umbrella and stay dry)� (1�−�β)�=�power 126 An Introduction to Statistical Concepts error��That�is,�we�have�incorrectly�failed�to�reject�a�false�H0��Our�sample�data�have�led�us� to�a�different�conclusion�than�the�population�data�would�have��The�probability�of�this�out- come�is�known�as�β�(beta)��Second,�if�H0�is�really�false�and�we�reject�H0,�then�we�have�made� a�correct�decision;�that�is,�we�have�correctly�rejected�a�false�H0��The�probability�of�this�sec- ond�outcome�is�known�as�1�−�β�or�power�(to�be�more�fully�discussed�later�in�this�chapter)�� Therefore,�if�H0�is�actually�false,�then�our�sample�data�lead�us�to�one�of�two�conclusions,� either�we�incorrectly�fail�to�reject�H0,�or�we�correctly�reject�H0��The�sum�of�the�probabilities� for�these�two�outcomes�when�H0�is�false�is�equal�to�1�[i�e�,�β�+�(1�−�β)�=�1]� As�an�application�of�this�table,�consider�the�following�specific�case,�as�shown�in�part�(b)�of� Table�6�1��We�wish�to�test�the�following�hypotheses�about�whether�or�not�it�will�rain�tomorrow� H0:�no�rain�tomorrow H1:�rains�tomorrow We� collect� some� sample� data� from� prior� years� for� the� same� month� and� day,� and� go� to� make�our�statistical�decision��Our�two�possible�statistical�decisions�are�(a)�we�do�not�believe� it�will�rain�tomorrow�and�therefore�do�not�bring�an�umbrella�with�us,�or�(b)�we�do�believe�it� will�rain�tomorrow�and�therefore�do�bring�an�umbrella� Again� there� are� four� potential� outcomes�� First,� if� H0� is� really� true� (no� rain)� and� we� do� not�carry�an�umbrella,�then�we�have�made�a�correct�decision�as�no�umbrella�is�necessary� (probability�=�1�−�α)��Second,�if�H0�is�really�true�(no�rain)�and�we�carry�an�umbrella,�then� we�have�made�a�Type�I�error�as�we�look�silly�carrying�that�umbrella�around�all�day�(prob- ability�=�α)��Third,�if�H0�is�really�false�(rains)�and�we�do�not�carry�an�umbrella,�then�we�have� made�a�Type�II�error�and�we�get�wet�(probability�=�β)��Fourth,�if�H0�is�really�false�(rains)� and�we�carry�an�umbrella,�then�we�have�made�the�correct�decision�as�the�umbrella�keeps� us�dry�(probability�=�1�−�β)� Let� us� make� two� concluding� statements� about� the� decision� table�� First,� one� can� never� prove�the�truth�or�falsity�of�H0�in�a�single�study��One�only�gathers�evidence�in�favor�of�or� in�opposition�to�the�null�hypothesis��Something�is�proven�in�research�when�an�entire�col- lection�of�studies�or�evidence�reaches�the�same�conclusion�time�and�time�again��Scientific� proof�is�difficult�to�achieve�in�the�social�and�behavioral�sciences,�and�we�should�not�use� the�term�prove�or�proof�loosely��As�researchers,�we�gather�multiple�pieces�of�evidence�that� eventually�lead�to�the�development�of�one�or�more�theories��When�a�theory�is�shown�to�be� unequivocally�true�(i�e�,�in�all�cases),�then�proof�has�been�established� Second,�let�us�consider�the�decision�errors�in�a�different�light��One�can�totally�eliminate� the� possibility� of� a� Type� I� error� by� deciding� to� never� reject� H0�� That� is,� if� we� always� fail� to� reject� H0� (do� not� ever� carry� umbrella),� then� we� can� never� make� a� Type� I� error� (look� silly�with�unnecessary�umbrella)��Although�this�strategy�sounds�fine,�it�totally�takes�the� decision-making�power�out�of�our�hands��With�this�strategy,�we�do�not�even�need�to�collect� any�sample�data,�as�we�have�already�decided�to�never�reject�H0� One� can� totally� eliminate� the� possibility� of� a� Type� II� error� by� deciding� to� always� reject H0��That�is,�if�we�always�reject�H0�(always�carry�umbrella),�then�we�can�never�make� a� Type� II� error� (get� wet� without� umbrella)�� Although� this� strategy� also� sounds� fine,� it� totally�takes�the�decision-making�power�out�of�our�hands��With�this�strategy,�we�do�not� even� need� to� collect� any� sample� data� as� we� have� already� decided� to� always� reject  H0�� Taken� together,� one� can� never� totally� eliminate� the� possibility� of� both� a� Type� I� and� a� Type�II�error��No�matter�what�decision�we�make,�there�is�always�some�possibility�of�mak- ing�a�Type�I�and/or�Type�II�error��Therefore,�as�researchers,�our�job�is�to�make�conscious� decisions�in�designing�and�conducting�our�study�and�in�analyzing�the�data�so�that�the� possibility�of�decision�error�is�minimized� 127Introduction to Hypothesis Testing: Inferences About a Single Mean 6.3 Level of Significance (α) We�have�already�stated�that�a�Type�I�error�occurs�when�the�decision�is�to�reject�H0�when� in�fact�H0�is�actually�true��We�defined�the�probability�of�a�Type�I�error�as�α,�which�is�also� known�as�the�level�of�significance�or�significance�level��We�now�examine�α�as�a�basis�for� helping� us� make� statistical� decisions�� Recall� from� a� previous� example� that� the� null� and� alternative�hypotheses,�respectively,�are�as�follows: � H H0 000 00 0: 1 or : 1µ µ= − = � H H1 1: 1 or : 1µ µ≠ − ≠00 00 0 We� need� a� mechanism� for� deciding� how� far� away� a� sample� mean� needs� to� be� from� the� hypothesized� mean� value� of� μ0� =� 100� in� order� to� reject� H0�� In� other� words,� at� a� certain� point�or�distance�away�from�100,�we�will�decide�to�reject�H0��We�use�α�to�determine�that� point� for� us,� where� in� this� context,� α� is� known� as� the� level of significance�� Figure� 6�1a� shows�a�sampling�distribution�of�the�mean�where�the�hypothesized�value�μ0�is�depicted� at�the�center�of�the�distribution��Toward�both�tails�of�the�distribution,�we�see�two�shaded� regions�known�as�the�critical regions�or�regions�of�rejection��The�combined�areas�of�the� two�shaded�regions�is�equal�to�α,�and,�thus,�the�area�of�either�the�upper�or�the�lower�tail� critical�region�is�equal�to�α/2�(i�e�,�we�split�α�in�half�by�dividing�by�two)��If�the�sample�mean� (a) α/2 Critical region α/2 Critical region Critical value Critical value µ0 Hypothesized value (b) α Critical region Critical value µ0 Hypothesized value (c) α Critical region Critical value µ0 Hypothesized value FIGuRe 6.1 Alternative�hypotheses�and�critical�regions:�(a)�two-tailed�test;�(b)�one-tailed,�right�tailed�test;�(c)�one-tailed,�left� tailed�test� 128 An Introduction to Statistical Concepts is�far�enough�away�from�the�hypothesized�mean�value,�μ0,�that�it�falls�into�either�critical� region,�then�our�statistical�decision�is�to�reject�H0��In�this�case,�our�decision�is�to�reject�H0� at�the�α�level�of�significance��If,�however,�the�sample�mean�is�close�enough�to�μ0�that�it�falls� into�the�unshaded�region�(i�e�,�not�into�either�critical�region),�then�our�statistical�decision� is� to� fail to reject� H0�� The� precise� points� on� the� X� axis� at� which� the� critical� regions� are� divided�from�the�unshaded�region�are�known�as�the�critical values��Determining�critical� values�is�discussed�later�in�this�chapter� Note�that�under�the�alternative�hypothesis�H1,�we�are�willing�to�reject�H0�when�the�sample� mean�is�either�significantly�greater�than�or�significantly�less�than�the�hypothesized�mean� value�μ0��This�particular�alternative�hypothesis�is�known�as�a�nondirectional alternative hypothesis,�as�no�direction�is�implied�with�respect�to�the�hypothesized�value��That�is,�we� will� reject� the� null� hypothesis� in� favor� of� the� alternative� hypothesis� in� either� direction,� either� above� or� below� the� hypothesized� mean� value�� This� also� results� in� what� is� known� as�a�two-tailed test of significance�in�that�we�are�willing�to�reject�the�null�hypothesis�in� either�tail�or�critical�region� Two� other� alternative� hypotheses� are� also� possible,� depending� on� the� researcher’s� sci- entific�hypothesis,�which�are�known�as�a�directional alternative hypothesis��One�direc- tional�alternative�is�that�the�population�mean�is�greater�than�the�hypothesized�mean�value,� also�known�as�a�right-tailed�test,�as�denoted�by � H H1 1: 1 or : 1µ µ> − >00 00 0
Mathematically,�both�of�these�equations�say�the�same�thing��With�a�right-tailed�alternative�
hypothesis,�the�entire�region�of�rejection�is�contained�in�the�upper�tail,�with�an�area�of�α,�
known� as� a� one-tailed test of significance� (and� specifically� the� right� tail)�� If� the� sample�
mean�is�significantly�greater�than�the�hypothesized�mean�value�of�100,�then�our�statistical�
decision�is�to�reject�H0��If,�however,�the�sample�mean�falls�into�the�unshaded�region,�then�
our�statistical�decision�is�to�fail�to�reject�H0��This�situation�is�depicted�in�Figure�6�1b�
A� second� directional� alternative� is� that� the� population� mean� is� less� than� the� hypoth-
esized�mean�value,�also�known�as�a�left-tailed�test,�as�denoted�by
� H H1 1: 1 or : 1µ µ< − <00 00 0 Mathematically,�both�of�these�equations�say�the�same�thing��With�a�left-tailed�alternative� hypothesis,�the�entire�region�of�rejection�is�contained�in�the�lower�tail,�with�an�area�of�α,� also�known�as�a�one-tailed test of significance�(and�specifically�the�left�tail)��If�the�sam- ple�mean�is�significantly�less�than�the�hypothesized�mean�value�of�100,�then�our�statisti- cal�decision�is�to�reject�H0��If,�however,�the�sample�mean�falls�into�the�unshaded�region,� then�our�statistical�decision�is�to�fail�to�reject�H0��This�situation�is�depicted�in�Figure�6�1c� There� is� some� potential� for� misuse� of� the� different� alternatives,� which� we� consider� to� be�an�ethical� matter��For�example,� a�researcher� conducts�a�one-tailed� test�with�an�upper� tail�critical�region�and�fails�to�reject�H0��However,�the�researcher�notices�that�the�sample� mean�is�considerably�below�the�hypothesized�mean�value�and�then�decides�to�change�the� alternative�hypothesis�to�either�a�nondirectional�test�or�a�one-tailed�test�in�the�other�tail�� This� is� unethical,� as� the� researcher� has� examined� the� data� and� changed� the� alternative� hypothesis��The�morale�of�the�story�is�this:�If there is previous and consistent empirical evidence to use a specific directional alternative hypothesis, then you should do so. If, however, there is mini- mal or inconsistent empirical evidence to use a specific directional alternative, then you should not. Instead, you should use a nondirectional alternative��Once�you�have�decided�which�alternative� 129Introduction to Hypothesis Testing: Inferences About a Single Mean hypothesis�to�go�with,�then�you�need�to�stick�with�it�for�the�duration�of�the�statistical�deci- sion��If�you�find�contrary�evidence,�then�report�it�as�it�may�be�an�important�finding,�but�do� not�change�the�alternative�hypothesis�in�midstream� 6.4 Overview of Steps in Decision-Making Process Before�we�get�into�the�specific�details�of�conducting�the�test�of�a�single�mean,�we�want�to� discuss�the�basic�steps�for�hypothesis�testing�of�any�inferential�test: � 1�� State�the�null�and�alternative�hypotheses� � 2�� Select�the�level�of�significance�(i�e�,�alpha,�α)� � 3�� Calculate�the�test�statistic�value� � 4�� Make�a�statistical�decision�(reject�or�fail�to�reject�H0)� Step 1:�The�first�step�in�the�decision-making�process�is�to�state�the�null�and�alternative� hypotheses��Recall�from�our�previous�example�that�the�null�and�nondirectional�alterna- tive�hypotheses,�respectively,�for�a�two-tailed�test�are�as�follows: � H H0 000 00 0: 1 or : 1µ µ= − = � H H1 1: 1 or : 1µ µ≠ − ≠00 00 0 One� could� also� choose� one� of� the� other� directional� alternative� hypotheses� described� previously� If� we� choose� to� write� our� null� hypothesis� as� H0:� μ� =� 100,� we� would� want� to� write� our� research�hypothesis�in�a�consistent�manner,�H1:�μ�≠�100�(rather�than�H1:�μ�−�100�≠�0)��In�pub- lication,�many�researchers�opt�to�present�the�hypotheses�in�narrative�form�(e�g�,�“the�null� hypothesis�states�that�the�population�mean�will�equal�100,�and�the�alternative�hypothesis� states� that� the� population� mean� will� not� equal� 100”)�� How� you� present� your� hypotheses� (mathematically�or�using�statistical�notation)�is�up�to�you� Step 2:�The�second�step�in�the�decision-making�process�is�to�select�a�level�of�significance�α�� There�are�two�considerations�to�make�in�terms�of�selecting�a�level�of�significance��One�con- sideration�is�the�cost�associated�with�making�a�Type�I�error,�which�is�what�α�really�is��Recall� that�alpha�is�the�probability�of�rejecting�the�null�hypothesis�if�in�reality�the�null�hypothesis� is�true��When�a�Type�I�error�is�made,�that�means�evidence�is�building�in�favor�of�the�research� hypothesis�(which�is�actually�false)��Let�us�take�an�example�of�a�new�drug��To�test�the�efficacy� of�the�drug,�an�experiment�is�conducted�where�some�individuals�take�the�new�drug�while� others� receive� a� placebo�� The� null� hypothesis,� stated� nondirectionally,� would� essentially� indicate�that�the�effects�of�the�drug�and�placebo�are�the�same��Rejecting�that�null�hypothesis� would�mean�that�the�effects�are�not�equal—suggesting�that�perhaps�this�new�drug,�which�in� reality�is�not�any�better�than�a�placebo,�is�being�touted�as�effective�medication��That�is�obvi- ously�problematic�and�potentially�very�hazardous� Thus,�if�there�is�a�relatively�high�cost�associated�with�a�Type�I�error—for�example,�such� that�lives�are�lost,�as�in�the�medical�profession—then�one�would�want�to�select�a�relatively� small�level�of�significance�(e�g�,��01�or�smaller)��A�small�alpha�would�translate�to�a�very�small� probability�of�rejecting�the�null�if�it�were�really�true�(i�e�,�a�small�probability�of�making�an� 130 An Introduction to Statistical Concepts incorrect�decision)��If�there�is�a�relatively�low�cost�associated�with�a�Type�I�error—for�exam- ple,�such�that�children�have�to�eat�the�second-rated�candy�rather�than�the�first—then�select- ing�a�larger�level�of�significance�may�be�appropriate�(e�g�,��05�or�larger)��Costs�are�not�always� known,�however��A�second�consideration�is�the�level�of�significance�commonly�used�in�your� field� of� study�� In� many� disciplines,� the� �05� level� of� significance� has� become� the� standard� (although�no�one�seems�to�have�a�really�good�rationale)��This�is�true�in�many�of�the�social� and�behavioral�sciences��Thus,�you�would�do�well�to�consult�the�published�literature�in�your� field�to�see�if�some�standard�is�commonly�used�and�to�consider�it�for�your�own�research� Step 3:�The�third�step�in�the�decision-making�process�is�to�calculate�the�test�statistic��For� the� one-sample� mean� test,� we� will� compute� the� sample� mean� Y – � and� compare� it� to� the� hypothesized�value�μ0��This�allows�us�to�determine�the�size�of�the�difference�between� Y – � and� μ0,� and� subsequently,� the� probability� associated� with� the� difference�� The� larger� the�difference,�the�more�likely�it�is�that�the�sample�mean�really�differs�from�the�hypoth- esized�mean�value�and�the�larger�the�probability�associated�with�the�difference� Step 4:�The�fourth�and�final�step�in�the�decision-making�process�is�to�make�a�statistical�deci- sion�regarding�the�null�hypothesis�H0��That�is,�a�decision�is�made�whether�to�reject�H0�or�to� fail�to�reject�H0��If�the�difference�between�the�sample�mean�and�the�hypothesized�value�is� large�enough�relative�to�the�critical�value�(we�will�talk�about�critical�values�in�more�detail� later),�then�our�decision�is�to�reject�H0��If�the�difference�between�the�sample�mean�and�the� hypothesized�value�is�not�large�enough�relative�to�the�critical�value,�then�our�decision�is�to� fail�to�reject�H0��This�is�the�basic�four-step�process�for�hypothesis�testing�of�any�inferential� test��The�specific�details�for�the�test�of�a�single�mean�are�given�in�the�following�section� 6.5 Inferences About μ When σ Is Known In�this�section,�we�examine�how�hypotheses�about�a�single�mean�are�conducted�when�the� population�standard�deviation�is�known��Specifically,�we�consider�the�z�test,�an�example� illustrating�the�use�of�the�z�test,�and�how�to�construct�a�CI�around�the�mean� 6.5.1  z Test Recall�from�Chapter�4�the�definition�of�a�z�score�as � z Yi Y = − µ σ where Yi�is�the�score�on�variable�Y�for�individual�i μ�is�the�population�mean�for�variable�Y σY�is�the�population�standard�deviation�for�variable�Y The�z�score�is�used�to�tell�us�how�many�standard�deviation�units�an�individual’s�score�is� from�the�mean� In� the� context� of� this� chapter,� however,� we� are� concerned� with� the� extent� to� which� a� sample�mean�differs�from�some�hypothesized�mean�value��We�can�construct�a�variation�of� 131Introduction to Hypothesis Testing: Inferences About a Single Mean the�z�score�for�testing�hypotheses�about�a�single�mean��In�this�situation,�we�are�concerned� with�the�sampling�distribution�of�the�mean�(introduced�in�Chapter�5),�so�the�equation�must� reflect�means�rather�than�raw�scores��Our�z�score�equation�for�testing�hypotheses�about�a� single�mean�becomes � z Y Y = − µ σ 0 where Y – �is�the�sample�mean�for�variable�Y μ0�is�the�hypothesized�mean�value�for�variable�Y σY– is�the�population�standard�error�of�the�mean�for�variable�Y From�Chapter�5,�recall�that�the�population�standard�error�of�the�mean�σY–�is�computed�by � σ σ Y Y n = � where σY�is�the�population�standard�deviation�for�variable�Y n�is�sample�size Thus,� the� numerator� of� the� z� score� equation� is� the� difference� between� the� sample� mean� and�the�hypothesized�value�of�the�mean,�and�the�denominator�is�the�standard�error�of�the� mean��What�we�are�really�determining�here�is�how�many�standard�deviation�(or�standard� error)� units� the� sample� mean� is� from� the� hypothesized� mean�� Henceforth,� we� call� this� variation�of�the�z�score�the�test statistic for the test of a single mean,�also�known�as�the� z�test��This�is�the�first�of�several�test�statistics�we�describe�in�this�text;�every�inferential�test� requires�some�test�statistic�for�purposes�of�testing�hypotheses� We�need�to�make�a�statistical�assumption�regarding�this�hypothesis�testing�situation��We� assume�that�z�is�normally�distributed�with�a�mean�of�0�and�a�standard�deviation�of�1��This� is�written�statistically�as�z ∼ N(0,�1)�following�the�notation�we�developed�in�Chapter�4��Thus,� the�assumption�is�that�z�follows�the�unit�normal�distribution�(in�other�words,�the�shape�of� the�distribution�is�approximately�normal)��An�examination�of�our�test�statistic�z�reveals�that� only� the� sample� mean� can� vary� from� sample� to� sample�� The� hypothesized� value� and� the� standard�error�of�the�mean�are�constant�for�every�sample�of�size�n�from�the�same�population� In�order�to�make�a�statistical�decision,�the�critical�regions�need�to�be�defined��As�the�test� statistic�is�z�and�we�have�assumed�normality,�then�the�relevant�theoretical�distribution�we� compare�the�test�statistic�to�is�the�unit�normal�distribution��We�previously�discussed�this�dis- tribution�in�Chapter�4,�and�the�table�of�values�is�given�in�Table�A�1��If�the�alternative�hypoth- esis�is�nondirectional,�then�there�would�be�two�critical�regions,�one�in�the�upper�tail�and�one� in�the�lower�tail��Here�we�would�split�the�area�of�the�critical�region,�known�as�α,�in�two��If�the� alternative�hypothesis�is�directional,�then�there�would�only�be�one�critical�region,�either�in� the�upper�tail�or�in�the�lower�tail,�depending�on�which�direction�one�is�willing�to�reject�H0� 6.5.2  example Let�us�illustrate�the�use�of�this�inferential�test�through�an�example��We�are�interested�in� testing�whether�the�population�of�undergraduate�students�from�Awesome�State�University� (ASU)�has�a�mean�intelligence�test�score�different�from�the�hypothesized�mean�value�of� 132 An Introduction to Statistical Concepts μ0�=�100�(remember�that�the�hypothesized�mean�value�does�not�come�from�our�sample�but� from�another�source;�in�this�example,�let�us�say�that�this�value�of�100�is�the�national�norm� as�presented�in�the�technical�manual�of�this�particular�intelligence�test)� Recall that our first step in hypothesis testing is to state the hypothesis��A�nondirectional�alter- native�hypothesis�is�of�interest�as�we�simply�want�to�know�if�this�population�has�a�mean� intelligence�different�from�the�hypothesized�value,�either�greater�than�or�less�than��Thus,� the�null�and�alternative�hypotheses�can�be�written�respectively�as�follows: � H H0 000 00 0: 1 or : 1µ µ= − = � H H1 1: 1 or : 1µ µ≠ − ≠00 00 0 A�sample�mean�of�Y – �=�103�is�observed�for�a�sample�of�n�=�100�ASU�undergraduate�students�� From� the� development� of� this� intelligence� test,� we� know� that� the� theoretical� population� standard�deviation�is�σY�=�15�(again,�for�purposes�of�illustration,�let�us�say�that�the�popula- tion�standard�deviation�of�15�was�noted�in�the�technical�manual�for�this�test)� Our second step is to select a level of significance��The�standard�level�of�significance�in�this� field�is�the��05�level;�thus,�we�perform�our�significance�test�at�α�=��05� The third step is to compute the test statistic value��To�compute�our�test�statistic�value,�first� we�compute�the�standard�error�of�the�mean�(the�denominator�of�our�test�statistic�formula)� as�follows: � σ σ Y Y n = = = 15 100 1 5000. Then� we� compute� the� test� statistic� z,� where� the� numerator� is� the� difference� between� the� mean�of�our�sample�(Y – �=�103)�and�the�hypothesized�mean�value�(μ0�=�100),�and�the�denomi- nator�is�the�standard�error�of�the�mean: � z Y Y = − = − = µ σ 0 103 100 1 5000 2 0000 . . Finally, in the last step, we make our statistical decision by comparing the test statistic z to the critical values��To�determine�the�critical�values�for�the�z�test,�we�use�the�unit�normal�distribution�in� Table  A�1�� Since� α� =� �05� and� we� are� conducting� a� nondirectional� test,� we� need� to� find� criti- cal�values�for�the�upper�and�lower�tails,�where�the�area�of�each�of�the�two�critical�regions�is� equal�to��025�(i�e�,�splitting�alpha�in�half:�α/2�or��05/2�=��025)��From�the�unit�normal�table,�we� find�these�critical�values�to�be�+1�96�(the�point�on�the�X�axis�where�the�area�above�that�point� is�equal�to��025)�and�−1�96�(the�point�on�the�X�axis�where�the�area�below�that�point�is�equal�to� �025)��As�shown�in�Figure�6�2,�the�test�statistic�z�=�2�00�falls�into�the�upper�tail�critical�region,� just�slightly�larger�than�the�upper�tail�critical�value�of�+1�96��Our�decision�is�to�reject�H0�and� conclude�that�the�ASU�population�from�which�the�sample�was�selected�has�a�mean�intelligence� score�that�is�statistically�significantly�different�from�the�hypothesized�mean�of�100�at�the��05� level�of�significance� A�more�precise�way�of�thinking�about�this�process�is�to�determine�the�exact probability� of�observing�a�sample�mean�that�differs�from�the�hypothesized�mean�value��From�the�unit� normal�table,�the�area�above�z�=�2�00�is�equal�to��0228��Therefore,�the�area�below�z�=�−2�00�is� also�equal�to��0228��Thus,�the�probability�p�of�observing,�by�chance,�a�sample�mean�of�2�00� or�more�standard�errors�(i�e�,�z�=�2�00)�from�the�hypothesized�mean�value�of�100,�in�either� direction,�is�two�times�the�observed�probability�level�or�p�=�(2)(�0228)�= �0456��To�put�this�in� 133Introduction to Hypothesis Testing: Inferences About a Single Mean the�context�of�the�values�in�this�example,�there�is�a�relatively�small�probability�(less�than�5%)� of�observing�a�sample�mean�of�103�just�by�chance�if�the�true�population�mean�is�really 100�� As� this� exact� probability� (p� =� �0456)� is� smaller� than� our� level� of� significance� α� =  �05,� we� reject  H0�� Thus,� there� are� two� approaches� to� dealing� with� probability�� One� approach� is� a� decision�based�solely�on�the�critical�values��We�reject�or�fail�to�reject�H0�at�a�given�α�level,� but�no�other�information�is�provided��The�other�approach�is�a�decision�based�on�compar- ing�the�exact�probability�to�the�given�α�level��We�reject�or�fail�to�reject�H0�at�a�given�α�level,� but�we�also�have�information�available�about�the�closeness�or�confidence�in�that�decision� For�this�example,�the�findings�in�a�manuscript�would�be�reported�based�on�comparing� the�p�value�to�alpha�and�reported�either�as�z�=�2�(p�<��05)�or�as�z�=�2�(p�=��0456)��(You�may� want�to�refer�to�the�style�manual�relevant�to�your�discipline,�such�as�the�Publication Manual for the American Psychological Association� (2010),� for� information� on� which� is� the� recom- mended�reporting�style�)�Obviously�the�conclusion�is�the�same�with�either�approach;�it�is� just�a�matter�of�how�the�results�are�reported��Most�statistical�computer�programs,�includ- ing�SPSS,�report�the�exact�probability�so�that�the�reader�can�make�a�decision�based�on�their� own� selected� level� of� significance�� These� programs� do� not� provide� the� critical� value(s),� which�are�only�found�in�the�appendices�of�statistics�textbooks� 6.5.3  Constructing Confidence Intervals around the Mean Recall�our�discussion�from�Chapter�5�on�CIs��CIs�are�often�quite�useful�in�inferential�sta- tistics� for� providing� the� researcher� with� an� interval� estimate� of� a� population� parameter�� Although�the�sample�mean�gives�us�a�point�estimate�(i�e�,�just�one�value)�of�a�population� mean,�a�CI�gives�us�an�interval�estimate�of�a�population�mean�and�allows�us�to�determine� the�accuracy�or�precision�of�the�sample�mean��For�the�inferential�test�of�a�single�mean,�a�CI� around�the�sample�mean�Y – �is�formed�from � Y zcv Y± σ where zcv�is�the�critical�value�from�the�unit�normal�distribution σY–�is�the�population�standard�error�of�the�mean α/2 Critical region α/2 Critical region –1.96 z critical value +1.96 z critical value +2.00 z test statistic value µ0 Hypothesized value FIGuRe 6.2 Critical�regions�for�example� 134 An Introduction to Statistical Concepts CIs�are�typically�formed�for�nondirectional�or�two-tailed�tests�as�shown�in�the�equation�� A�CI�will�generate�a�lower�and�an�upper�limit��If�the�hypothesized�mean�value�falls�within� the�lower�and�upper�limits,�then�we�would�fail�to�reject�H0��In�other�words,�if�the�hypoth- esized�mean�is�contained�in�(or�falls�within)�the�CI�around�the�sample�mean,�then�we�con- clude�that�the�sample�mean�and�the�hypothesized�mean�are�not�significantly�different�and� that�the�sample�mean�could�have�come�from�a�population�with�the�hypothesized�mean�� If� the� hypothesized� mean� value� falls� outside� the� limits� of� the� interval,� then� we� would� reject�H0��Here�we�conclude�that�it�is�unlikely�that�the�sample�mean�could�have�come�from� a�population�with�the�hypothesized�mean� One�way�to�think�about�CIs�is�as�follows��Imagine�we�take�100�random�samples�of�the� same�sample�size�n,�compute�each�sample�mean,�and�then�construct�each�95%�CI��Then�we� can�say�that�95%�of�these�CIs�will�contain�the�population�parameter�and�5%�will�not��In� short,� 95%� of� similarly� constructed� CIs� will� contain� the� population� parameter�� It� should� also� be� mentioned� that� at� a� particular� level� of� significance,� one� will� always� obtain� the� same�statistical�decision�with�both�the�hypothesis�test�and�the�CI��The�two�procedures�use� precisely� the� same� information�� The� hypothesis� test� is� based� on� a� point� estimate;� the� CI� is�based�on�an�interval�estimate�providing�the�researcher�with�a�little�more�information� For�the�ASU�example�situation,�the�95%�CI�would�be�computed�by � Y zcv Y± = ± = ± =σ 103 1 96 1 5 103 2 94 100 06 105 94. ( . ) . ( . , . ) Thus,�the�95%�CI�ranges�from�100�06�to�105�94��Because�the�interval�does�not�contain�the� hypothesized�mean�value�of�100,�we�reject�H0�(the�same�decision�we�arrived�at�by�walking� through�the�steps�for�hypothesis�testing)��Thus,�it�is�quite�unlikely�that�our�sample�mean� could�have�come�from�a�population�distribution�with�a�mean�of�100� 6.6 Type II Error (β) and Power (1 − β) In� this� section,� we� complete� our� discussion� of� Type� II� error� (β)� and� power� (1� −� β)�� First� we�return�to�our�rain�example�and�discuss�the�entire�decision-making�context��Then�we� describe�the�factors�which�determine�power� 6.6.1  Full decision-Making Context Previously,� we� defined� Type� II� error� as� the� probability� of� failing� to� reject� H0� when� H0� is� really�false��In�other�words,�in�reality,�H0�is�false,�yet�we�made�a�decision�error�and�did�not� reject�H0��The�probability�associated�with�a�Type�II�error�is�denoted�by�β��Power�is�a�related� concept�and�is�defined�as�the�probability�of�rejecting�H0�when�H0�is�really�false��In�other� words,�in�reality,�H0�is�false,�and�we�made�the�correct�decision�to�reject�H0��The�probability� associated�with�power�is�denoted�by�1�−�β��Let�us�return�to�our�“rain”�example�to�describe� Type�I�and�Type�II�errors�and�power�more�completely� The�full�decision-making�context�for�the�“rain”�example�is�given�in�Figure�6�3��The�dis- tribution�on�the�left-hand�side�of�the�figure�is�the�sampling�distribution�when�H0�is�true,� meaning�in�reality�it�does�not�rain��The�vertical�line�represents�the�critical�value�for�decid- ing� whether� to� carry� an� umbrella� or� not�� To� the� left� of� the� vertical� line,� we� do� not� carry� an� umbrella,� and� to� the� right� side� of� the� vertical� line,� we� do� carry� an� umbrella�� For� the� 135Introduction to Hypothesis Testing: Inferences About a Single Mean no-rain�sampling�distribution�on�the�left,�there�are�two�possibilities��First,�we�do�not�carry� an�umbrella�and�it�does�not�rain��This�is�the�unshaded�portion�under�the�no-rain�sampling� distribution�to�the�left�of�the�vertical�line��This is a correct decision,�and�the�probability�asso- ciated�with�this�decision�is�1�−�α��Second,�we�do�carry�an�umbrella�and�it�does�not�rain�� This�is�the�shaded�portion�under�the�no-rain�sampling�distribution�to�the�right�of�the�verti- cal�line��This is an incorrect decision,�a�Type�I�error,�and�the�probability�associated�with�this� decision�is�α/2�in�either�the�upper�or�lower�tail,�and�α�collectively� The�distribution�on�the�right-hand�side�of�the�figure�is�the�sampling�distribution�when� H0� is� false,� meaning� in� reality,� it� does� rain�� For� the� rain� sampling� distribution,� there� are� two�possibilities��First,�we�do�carry�an�umbrella�and�it�does�rain��This�is�the�unshaded�por- tion�under�the�rain�sampling�distribution�to�the�right�of�the�vertical�line��This�is�a�correct decision,�and�the�probability�associated�with�this�decision�is�1�−�β�or�power��Second,�we�do� not�carry�an�umbrella�and�it�does�rain��This�is�the�shaded�portion�under�the�rain�sampling� distribution�to�the�left�of�the�vertical�line��This�is�an�incorrect decision,�a�Type�II�error,�and� the�probability�associated�with�this�decision�is�β� As�a�second�illustration,�consider�again�the�example�intelligence�test�situation��This�situ- ation�is� depicted� in� Figure� 6�4��The�distribution� on� the�left-hand� side�of�the�figure� is� the� sampling�distribution�of�Y – �when�H0�is�true,�meaning�in�reality,�μ�=�100��The�distribution�on� the�right-hand�side�of�the�figure�is�the�sampling�distribution�of�Y – �when�H1�is�true,�meaning� in�reality,�μ�=�115�(and�in�this�example,�while�there�are�two�critical�values,�only�the�right� tail�matters�as�that�relates�to�the�H1�sampling�distribution)��The�vertical�line�represents�the� critical� value� for� deciding� whether� to� reject� the� null� hypothesis� or� not�� To� the� left� of� the� vertical�line,�we�do�not�reject�H0�and�to�the�right�of�the�vertical�line,�we�reject�H0��For�the�H0� is�true�sampling�distribution�on�the�left,�there�are�two�possibilities��First,�we�do�not�reject� H0�and�H0�is�really�true��This is the unshaded portion under the H0�is true sampling distribution to the left of the vertical line��This�is�a�correct�decision,�and�the�probability�associated�with� this�decision�is�1�−�α��Second,�we�reject�H0�and�H0�is�true��This is the shaded portion under the H0�is true sampling distribution to the right of the vertical line��This�is�an�incorrect�decision,�a� Type II error (got wet) Do not carry umbrella. Correct decision Correct decision Do carry umbrella. Sampling distribution when H0 “No Rain” is true. Sampling distribution when H0 “No Rain” is false. Type I error (did not need umbrella) FIGuRe 6.3 Sampling�distributions�for�the�rain�case� 136 An Introduction to Statistical Concepts Type�I�error,�and�the�probability�associated�with�this�decision�is�α/2�in�either�the�upper�or� lower�tail,�and�α�collectively� The�distribution�on�the�right-hand�side�of�the�figure�is�the�sampling�distribution�when� H0�is�false,�and�in�particular,�when�H1:�μ�=�115�is�true��This�is�a�specific�sampling�distribu- tion�when�H0�is�false,�and�other�possible�sampling�distributions�can�also�be�examined�(e�g�,� μ�=�85,�110)��For�the�H1:�μ�=�115�is� true�sampling�distribution,� there�are� two�possibilities�� First,�we�do�reject�H0,�as�H0�is�really�false,�and�H1:�μ�=�115�is�really�true��This�is�the�unshaded� portion�under�the�H1:�μ�=�115�is�true�sampling�distribution�to�the�right�of�the�vertical�line�� This� is� a� correct� decision,� and� the� probability� associated� with� this� decision� is� 1� −� β� or� power��Second,�we�do�not�reject�H0,�H0�is�really�false,�and�H1:�μ�=�115�is�really�true��This� is�the�shaded�portion�under�the�H1:�μ�=�115�is�true�sampling�distribution�to�the�left�of�the� vertical�line��This�is�an�incorrect�decision,�a�Type�II�error,�and�the�probability�associated� with�this�decision�is�β� 6.6.2  power determinants Power�is�determined�by�five�different�factors:�(1)�level�of�significance,�(2)�sample�size,�(3)�popu- lation�standard�deviation,�(4)�difference�between�the�true�population�mean�μ�and�the�hypoth- esized�mean�value�μ0,�and�(5)�directionality�of�the�test�(i�e�,�one-�or�two-tailed�test)��Let�us�talk� about�each�of�these�factors�in�more�detail� First,�power�is�determined�by�the�level�of�significance�α��As�α�increases,�power�increases�� Thus,�if�α�increases�from��05�to��10,�then�power�will�increase��This�would�occur�in�Figure�6�4� if�the�vertical�line�were�shifted�to�the�left�(thus�creating�a�larger�critical�region�and�thereby� making� it� easier� to� reject� the� null� hypothesis)�� This� would� increase� the� α� level� and� also� increase�power��This�factor�is�under�the�control�of�the�researcher� Second,�power�is�determined�by�sample�size��As�sample�size�n�increases,�power�increases�� Thus,�if�sample�size�increases,�meaning�we�have�a�sample�that�consists�of�a�larger�propor- tion�of�the�population,�this�will�cause�the�standard�error�of�the�mean�to�decrease,�as�there� Type II error (β) Do not reject H0. Correct decision (1 – α) Correct decision (1 – β) Reject H0. Sampling distribution when H0: µ = 100 is true. Sampling distribution when H1: µ = 115 is true (i.e., H0: µ = 100 is false). Type I error (α/2) Type I error (α/2) Critical value Critical value FIGuRe 6.4 Sampling�distributions�for�the�intelligence�test�case� 137Introduction to Hypothesis Testing: Inferences About a Single Mean is�less�sampling�error�with�larger�samples��This�would�also�result�in�the�vertical�line�being� moved� to� the� left� (again� thereby� creating� a� larger� critical� region� and� thereby� making� it� easier�to�reject�the�null�hypothesis)��This�factor�is�also�under�the�control�of�the�researcher�� In� addition,� because� a� larger� sample� yields� a� smaller� standard� error,� it� will� be� easier� to� reject�H0�(all�else�being�equal),�and�the�CIs�generated�will�also�be�narrower� Third,�power�is�determined�by�the�size�of�the�population�standard�deviation�σ��Although� not� under� the� researcher’s� control,� as� σ� increases,� power� decreases�� Thus,� if� σ� increases,� meaning�the�variability�in�the�population�is�larger,�this�will�cause�the�standard�error�of�the� mean�to�increase�as�there�is�more�sampling�error�with�larger�variability��This�would�result� in�the�vertical�line�being�moved�to�the�right��If�σ�decreases,�meaning�the�variability�in�the� population�is�smaller,�this�will�cause�the�standard�error�of�the�mean�to�decrease�as�there� is�less�sampling�error�with�smaller�variability��This�would�result�in�the�vertical�line�being� moved�to�the�left��Considering,�for�example,�the�one-sample�mean�test,�the�standard�error� of�the�mean�is�the�denominator�of�the�test�statistic�formula��When�the�standard�error�term� decreases,�the�denominator�is�smaller�and�thus�the�test�statistic�value�becomes�larger�(and� thereby�easier�to�reject�the�null�hypothesis)� Fourth,�power�is�determined�by�the�difference�between�the�true�population�mean�μ�and� the�hypothesized�mean�value�μ0��Although�not�always�under�the�researcher’s�control�(only� in�true�experiments�as�described�in�Chapter�14),�as�the�difference�between�the�true�popula- tion�mean�and�the�hypothesized�mean�value�increases,�power�increases��Thus,�if�the�differ- ence�between�the�true�population�mean�and�the�hypothesized�mean�value�is�large,�it�will� be� easier� to� correctly� reject� H0�� This� would� result� in� greater� separation� between� the� two� sampling�distributions��In�other�words,�the�entire�H1�is�true�sampling�distribution�would� be� shifted� to� the� right�� Consider,� for� example,� the� one-sample� mean� test�� The� numerator� is�the�difference�between�the�means��The�larger�the�numerator�(holding�the�denominator� constant),�the�more�likely�it�will�be�to�reject�the�null�hypothesis� Finally,� power� is� determined� by� directionality� and� type� of� statistical� procedure— whether� we� conduct� a� one-� or� a� two-tailed� test� as� well� as� the� type� of� test� of� inference�� There�is�greater�power�in�a�one-tailed�test,�such�as�when�μ�>�100,�than�in�a�two-tailed�test��
In�a�one-tailed�test,�the�vertical�line�will�be�shifted�to�the�left,�creating�a�larger�rejection�
region��This�factor�is�under�the�researcher’s�control��There�is�also�often�greater�power�in�
conducting�parametric�as�compared�to�nonparametric�tests�of�inference�(we�will�talk�more�
about� parametric� versus� nonparametric� tests� in� later� chapters)�� This� factor� is� under� the�
researcher’s�control�to�some�extent�depending�on�the�scale�of�measurement�of�the�variables�
and�the�extent�to�which�the�assumptions�of�parametric�tests�are�met�
Power� has� become� of� much� greater� interest� and� concern� to� the� applied� researcher� in�
recent�years��We�begin�by�distinguishing�between�a priori power,�when�power�is�deter-
mined�as�a�study�is�being�planned�or�designed�(i�e�,�prior�to�the�study),�and�post hoc power,�
when�power�is�determined�after�the�study�has�been�conducted�and�the�data�analyzed�
For�a�priori�power,�if�you�want�to�insure�a�certain�amount�of�power�in�a�study,�then�you�can�
determine�what�sample�size�would�be�needed�to�achieve�such�a�level�of�power��This�requires�
the�input�of�characteristics�such�as�α,�σ,�the�difference�between�μ�and�μ0,�and�one-�versus�two-
tailed�test��Alternatively,�one�could�determine�power�given�each�of�those�characteristics��This�
can� be� done� either� by� using� statistical� software� [such� as� Power� and� Precision,� Ex-Sample,�
G*Power�(freeware),�or�a�CD�provided�with�the�Murphy,�Myors,�and�Wolach�(2008)�text]�or�
by�using�tables�[the�most�definitive�collection�of�tables�being�in�Cohen�(1988)]�
For�post�hoc�power�(also�called�observed�power),�most�statistical�software�packages�(e�g�,�
SPSS,�SAS,�STATGRAPHICS)�will�compute�this�as�part�of�the�analysis�for�many�types�of�
inferential� statistics� (e�g�,� analysis� of� variance)�� However,� even� though� post� hoc� power� is�

138 An Introduction to Statistical Concepts
routinely�reported�in�some�journals,�it�has�been�found�to�have�some�flaws��For�example,�
Hoenig�and�Heisey�(2001)�concluded�that�it�should�not�be�used�to�aid�in�interpreting�non-
significant� results�� They� found� that� low� power� may� indicate� a� small� effect� (e�g�,� a� small�
mean�difference)�rather�than�an�underpowered�study��Thus,�increasing�sample�size�may�
not� make� much� of� a� difference�� Yuan� and� Maxwell� (2005)� found� that� observed� power� is�
almost�always�biased�(too�high�or�too�low),�except�when�true�power�is��50��Thus,�we�do�not�
recommend� the� sole� use� of� post� hoc� power� to� determine� sample� size� in� the� next� study;�
rather�it�is�recommended�that�CIs�be�used�in�addition�to�post�hoc�power��(An�example�pre-
sented�later�in�this�chapter�will�use�G*Power�to�illustrate�both�a�priori�sample�size�require-
ments�given�desired�power�and�post�hoc�power�analysis�)
6.7 Statistical Versus Practical Significance
We�have�discussed�the�inferential�test�of�a�single�mean�in�terms�of�statistical�significance��
However,�are�statistically�significant�results�always�practically�significant?�In�other�words,�
if�a�result�is�statistically�significant,�should�we�make�a�big�deal�out�of�this�result�in�a�practi-
cal�sense?�Consider�again�the�simple�example�where�the�null�and�alternative�hypotheses�
are�as�follows:
� H H0 000 00 0: 1 or : 1µ µ= − = �
� H H1 1: 1 or : 1µ µ≠ − ≠00 00 0
A�sample�mean�intelligence�test�score�of�Y

�=�101�is�observed�for�a�sample�size�of�n�=�2000�and�
a�known�population�standard�deviation�of�σY�=�15��If�we�perform�the�test�at�the��01�level�of�
significance,�we�find�we�are�able�to�reject�H0�even�though�the�observed�mean�is�only�1�unit�
away�from�the�hypothesized�mean�value��The�reason�is,�because�the�sample�size�is�rather�
large,�a�rather�small�standard�error�of�the�mean�is�computed�(σY–�=�0�3354),�and�we�thus�reject�
H0�as�the�test�statistic�(z�=�2�9815)�exceeds�the�critical�value�(z�=�2�5758)��Holding�the�mean�
and�standard�deviation�constant,�if�we�had�a�sample�size�of�200�instead�of�2000,�the�standard�
error�becomes�much�larger�(σY–�=�1�0607),�and�we�thus�fail�to�reject�H0�as�the�test�statistic�
(z�=�0�9428)�does�not�exceed�the�critical�value�(z�=�2�5758)��From�this�example,�we�can�see�how�
the�sample�size�can�drive�the�results�of�the�hypothesis�test,�and�how�it�is�possible�that�statisti-
cal�significance�can�be�influenced�simply�as�an�artifact�of�sample�size�
Should�we�make�a�big�deal�out�of�an�intelligence�test�sample�mean�that�is�1�unit�away�
from�the�hypothesized�mean�intelligence?�The�answer�is�“maybe�not�”�If�we�gather�enough�
sample� data,� any� small� difference,� no� matter� how� small,� can� wind� up� being� statistically�
significant�� Thus,� larger� samples� are� more� likely� to� yield� statistically� significant� results��
Practical�significance�is�not�entirely�a�statistical�matter��It�is�also�a�matter�for�the�substan-
tive� field� under� investigation�� Thus,� the� meaningfulness� of� a� small� difference� is� for� the�
substantive�area�to�determine��All�that�inferential�statistics�can�really�determine�is�statis-
tical� significance�� However,� we� should� always� keep� practical� significance� in� mind� when�
interpreting�our�findings�
In�recent�years,�a�major�debate�has�been�ongoing�in�the�statistical�community�about�the�
role�of�significance�testing��The�debate�centers�around�whether�null�hypothesis�significance�
testing�(NHST)�best�suits�the�needs�of�researchers��At�one�extreme,�some�argue�that�NHST�is�

139Introduction to Hypothesis Testing: Inferences About a Single Mean
fine�as�is��At�the�other�extreme,�others�argue�that�NHST�should�be�totally�abandoned��In�the�
middle,�yet�others�argue�that�NHST�should�be�supplemented�with�measures�of�effect�size��In�
this�text,�we�have�taken�the�middle�road�believing�that�more�information�is�a�better�choice�
Let�us�formally�introduce�the�notion�of�effect size��While�there�are�a�number�of�different�
measures�of�effect�size,�the�most�commonly�used�measure�is�Cohen’s�δ�(delta)�or�d�(1988)��
For�the�population�case�of�the�one-sample�mean�test,�Cohen’s�delta�is�computed�as�follows:
� δ
µ µ
σ
=
− 0
For�the�corresponding�sample�case,�Cohen’s�d�is�computed�as�follows:
� d
Y
s
=
− µ0
For�the�one-sample�mean�test,�d�indicates�how�many�standard�deviations�the�sample�mean�
is�from�the�hypothesized�mean��Thus,�if�d�=�1�0,�the�sample�mean�is�one�standard�deviation�
away� from� the� hypothesized� mean�� Cohen� has� proposed� the� following� subjective� stan-
dards�for�the�social�and�behavioral�sciences�as�a�convention�for�interpreting�d:�small�effect�
size,�d�=��2;�medium�effect�size,�d�=��5;�large�effect�size,�d�=��8��Interpretation�of�effect�size�
should�always�be�made�first�based�on�a�comparison�to�similar�studies;�what�is�considered�
a�“small”�effect�using�Cohen’s�rule�of�thumb�may�actually�be�quite�large�in�comparison�to�
other�related�studies�that�have�been�conducted��In�lieu�of�a�comparison�to�other�studies,�
such�as�in�those�cases�where�there�are�no�or�minimal�related�studies,�then�Cohen’s�subjec-
tive�standards�may�be�appropriate�
Computing�CIs�for�effect�sizes�is�also�valuable��The�benefit�in�creating�CIs�for�effect�size�
values�is�similar�to�that�of�creating�CIs�for�parameter�estimates—CIs�for�the�effect�size�pro-
vide�an�added�measure�of�precision�that�is�not�obtained�from�knowledge�of�the�effect�size�
alone�� Computing� CIs� for� effect� size� indices,� however,� is� not� as� straightforward� as� simply�
plugging�in�known�values�into�a�formula��This�is�because�d�is�a�function�of�both�the�popula-
tion�mean�and�population�standard�deviation�(Finch�&�Cumming,�2009)��Thus,�specialized�
software�must�be�used�to�compute�the�CIs�for�effect�sizes,�and�interested�readers�are�referred�
to�appropriate�sources�(e�g�,�Algina�&�Keselman,�2003;�Algina,�Keselman,�&�Penfield,�2005;�
Cumming�&�Finch,�2001)�
While�a�complete�discussion�of�these�issues�is�beyond�this�text,�further�information�on�
effect� sizes� can� be� seen� in� special� sections� of� Educational and Psychological Measurement�
(2001a;� 2001b)� and� Grissom� and� Kim� (2005),� while� additional� material� on� NHST� can� be�
viewed� in� Harlow,� Mulaik,� and� Steiger� (1997)� and� a� special� section� of� Educational and
Psychological Measurement� (2000,� October)�� Additionally,� style� manuals� (e�g�,� American�
Psychological�Association,�2010)�often�provide�useful�guidelines�on�reporting�effect�size�
6.8 Inferences About μ When σ Is Unknown
We�have�already�considered�the�inferential�test�involving�a�single�mean�when�the�popula-
tion�standard�deviation�σ�is�known��However,�rarely�is�σ�known�to�the�applied�researcher��
When�σ�is�unknown,�then�the�z�test�previously�discussed�is�no�longer�appropriate��In�this�

140 An Introduction to Statistical Concepts
section,�we�consider�the�following:�the�test�statistic�for�inferences�about�the�mean�when�the�
population�standard�deviation�is�unknown,�the�t�distribution,�the�t�test,�and�an�example�
using�the�t�test�
6.8.1  New Test Statistic t
What�is�the�applied�researcher�to�do�then�when�σ�is�unknown?�The�answer�is�to�estimate�
σ�by�the�sample�standard�deviation�s��This�changes�the�standard�error�of�the�mean�to�be
� s
s
nY
Y=
Now�we�are�estimating�two�population�parameters:�(1)�the�population�mean,�μY,�is�being�
estimated�by�the�sample�mean,�Y

;�and�(2)�the�population�standard�deviation,�σY,�is�being�
estimated�by�the�sample�standard�deviation,�sY��Both�Y

�and�sY�can�vary�from�sample�to�
sample��Thus,�although�the�sampling�error�of�the�mean�is�taken�into�account�explicitly�in�
the�z�test,�we�also�need�to�take�into�account�the�sampling�error�of�the�standard�deviation,�
which�the�z�test�does�not�at�all�consider��We�now�develop�a�new�inferential�test�for�the�
situation�where�σ�is�unknown��The�test�statistic�is�known�as�the�t�test�and�is�computed�
as�follows:

t
Y
sY
=
− µ0
The� t� test� was� developed� by� William� Sealy� Gossett,� also� known� by� the� pseudonym�
Student,�previously�mentioned�in�Chapter�1��The�unit�normal�distribution�cannot�be�used�
here� for� the� unknown� σ� situation�� A� different� theoretical� distribution� must� be� used� for�
determining�critical�values�for�the�t�test,�known�as�the�t�distribution�
6.8.2  t distribution
The�t�distribution�is�the�theoretical�distribution�used�for�determining�the�critical�values�of�
the�t�test��Like�the�normal�distribution,�the�t�distribution�is�actually�a�family�of�distribu-
tions�� There� is� a� different� t� distribution� for� each� value� of� degrees� of� freedom�� However,�
before�we�look�more�closely�at�the�t�distribution,�some�discussion�of�the�degrees of free-
dom�concept�is�necessary�
As�an�example,�say�we�know�a�sample�mean�Y

�=�6�for�a�sample�size�of�n�=�5��How�many�
of� those� five� observed� scores� are� free� to� vary?� The� answer� is� that� four� scores� are� free� to�
vary��If�the�four�known�scores�are�2,�4,�6,�and�8�and�the�mean�is�6,�then�the�remaining�score�
must�be�10��The�remaining�score�is�not�free�to�vary,�but�is�already�totally�determined��We�
see�this�in�the�following�equation�where,�to�arrive�at�a�solution�of�6,�the�sum�in�the�numera-
tor�must�equal�30,�and�Y5�must�be�10:

Y
Y
n
Y
Y
i
i
n
i
i= = =
+ + + +
== =
∑ ∑
1 1
5
5
5
2 4 6 8
5
6
Therefore,�the�number�of�degrees�of�freedom�is�equal�to�4�in�this�particular�case�and�n�−�1�
in�general��For�the�t�test�being�considered�here,�we�specify�the�degrees�of�freedom�as�

141Introduction to Hypothesis Testing: Inferences About a Single Mean
ν� =� n� −� 1� (ν� is� the� Greek� letter� “nu”)�� We� use� ν� often� in� statistics� to� denote� some� type� of�
degrees�of�freedom�
Another�way�to�think�about�degrees�of�freedom�is�that�we�know�the�sum�of�the�devia-
tions� from� the� mean� must� equal� 0� (recall� the� unsquared� numerator� of� the� variance� con-
ceptual�formula)��For�example,�if�n�=�10,�there�are�10�deviations�from�the�mean��Once�the�
mean�is�known,�only�nine�of�the�deviations�are�free�to�vary��A�final�way�to�think�about�this�
is�that,�in�general,�df�=�(n�−�number�of�restrictions)��For�the�one-sample�t�test,�because�the�
population�variance�is�unknown,�we�have�to�estimate�it�resulting�in�one�restriction��Thus,�
df�=�(n�−�1)�for�this�particular�inferential�test�
Several�members�of�the�family�of�t�distributions�are�shown�in�Figure�6�5��The�distribu-
tion�for�ν�=�1�has�thicker�tails�than�the�unit�normal�distribution�and�a�shorter�peak��This�
indicates�that�there�is�considerable�sampling�error�of�the�sample�standard�deviation�with�
only�two�observations�(as�ν�=�2�−�1�=�1)��For�ν�=�5,�the�tails�are�thinner�and�the�peak�is�
taller�than�for�ν�=�1��As�the�degrees�of�freedom�increase,�the�t�distribution�becomes�more�
nearly� normal�� For� ν� =� ∞� (i�e�,� infinity),� the� t� distribution� is� precisely� the� unit� normal�
distribution�
A� few� important� characteristics� of� the� t� distribution� are� worth� mentioning�� First,� like�
the�unit�normal�distribution,�the�mean�of�any�t�distribution�is�0,�and�the�t�distribution�is�
symmetric�around�the�mean�and�unimodal��Second,�unlike�the�unit�normal�distribution,�
which�has�a�variance�of�1,�the�variance�of�a�t�distribution�is�as�follows:
� σ
ν
ν
ν2
2
2=

>for
Thus,�the�variance�of�a�t�distribution�is�somewhat�greater�than�1�but�approaches�1�as�
ν�increases�
The�table�for�the�t�distribution�is�given�in�Table�A�2,�and�a�snapshot�of�the�table�is�pre-
sented�in�Figure�6�6�for�illustration�purposes��In�looking�at�the�table,�each�column�header�
has�two�values��The�top�value�is�the�significance�level�for�a�one-tailed�test,�denoted�by�α1��
Thus,�if�you�were�doing�a�one-tailed�test�at�the��05�level�of�significance,�you�want�to�look�in�
the�second�column�of�numbers��The�bottom�value�is�the�significance�level�for�a�two-tailed�
test,�denoted�by�α2��Thus,�if�you�were�doing�a�two-tailed�test�at�the��05�level�of�significance,�
you�want�to�look�in�the�third�column�of�numbers��The�rows�of�the�table�denote�the�various�
degrees�of�freedom�ν�
0.4
0.3
0.2
Re
la
tiv
e
fr
eq
ue
nc
y
0.1
0
–4 0
t
4
1
5
Normal
FIGuRe 6.5
Several�members�of�the�family�of�t�distributions�

142 An Introduction to Statistical Concepts
Thus,�if�ν�=�3,�meaning�n�=�4,�you�want�to�look�in�the�third�row�of�numbers��If�ν�=�3�for�
α1�=��05,�the�tabled�value�is�2�353��This�value�represents�the�95th�percentile�point�in�a�t�dis-
tribution�with�three�degrees�of�freedom��This�is�because�the�table�only�presents�the�upper�
tail�percentiles��As�the�t�distribution�is�symmetric�around�0,�the�lower�tail�percentiles�are�
the�same�values�except�for�a�change�in�sign��The�fifth�percentile�for�three�degrees�of�free-
dom�then�is�−2�353��Thus,�for�a�right-tailed�directional�hypothesis,�the�critical�value�will�be�
+2�353,�and�for�a�left-tailed�directional�hypothesis,�the�critical�value�will�be�−2�353�
If�ν�=�120�for�α1�=��05,�then�the�tabled�value�is�1�658��Thus,�as�sample�size�and�degrees�of�
freedom�increase,�the�value�of�t�decreases��This�makes�it�easier�to�reject�the�null�hypothesis�
when�sample�size�is�large�
6.8.3  t Test
Now�that�we�have�covered�the�theoretical�distribution�underlying�the�test�of�a�single�mean�
for�an�unknown�σ,�we�can�go�ahead�and�look�at�the�inferential�test��First,�the�null�and�alter-
native�hypotheses�for�the�t�test�are�written�in�the�same�fashion�as�for�the�z�test�presented�
earlier��Thus,�for�a�two-tailed�test,�we�have�the�same�notation�as�previously�presented:
� H H0 000 00 0: 1 or : 1µ µ= − =
� H H1 1: 1 or : 1µ µ≠ − ≠00 00 0 �
The�test�statistic�t�is�written�as�follows:

t
Y
sY
=
− µ0
In�order�to�use�the�theoretical�t�distribution�to�determine�critical�values,�we�must�assume�
that�Yi�∼�N(μ,�σ2)�and�that�the�observations�are�independent�of�each�other�(also�referred�to�
as�“independent�and�identically�distributed”�or�IID)��In�terms�of�the�distribution�of�scores�
on� Y,� in� other� words,� we� assume� that� the� population� of� scores� on� Y� is� normally� distrib-
uted�with�some�population�mean�μ�and�some�population�variance�σ2��The�most�important�
assumption�for�the�t�test�is�normality�of�the�population��Conventional�research�has�shown�
that� the� t� test� is� very� robust� to� nonnormality� for� a� two-tailed� test� except� for� very� small�
samples�(e�g�,�n�<�5)��The�t�test�is�not�as�robust�to�nonnormality�for�a�one-tailed�test,�even� for�samples�as�large�as�40�or�more�(e�g�,�Noreen,�1989;�Wilcox,�1993)��Recall�from�Chapter�5� on�the�central�limit�theorem�that�when�sample�size�increases,�the�sampling�distribution�of� the�mean�becomes�more�nearly�normal��As�the�shape�of�a�population�distribution�may�be� unknown,�conservatively�one�would�do�better�to�conduct�a�two-tailed�test�when�sample� size�is�small,�unless�some�normality�evidence�is�available� 1ν = .10 1 = .20 .05 .10 .025 .050 .01 .02 .005 .010 .0025 .0050 .001 .002 .0005 .0010 1 3.078 6.314 12.706 31.821 63.657 127.32 318.31 636.62 2 1.886 2.920 4.303 6. 965 9.925 14.089 22.327 31.598 3 1.638 2.353 3.182 4.541 5.841 7.453 10.214 12.924 … … … … … … … … … FIGuRe 6.6 Snapshot�of�t�distribution�table� 143Introduction to Hypothesis Testing: Inferences About a Single Mean However,�recent�research�(e�g�,�Basu�&�DasGupta,�1995;�Wilcox,�1997,�2003)�suggests�that� small�departures�from�normality�can�inflate�the�standard�error�of�the�mean�(as�the�stan- dard�deviation�is�larger)��This�can�reduce�power�and�also�affect�control�over�Type�I�error�� Thus,�a�cavalier�attitude�about�ignoring�nonnormality�may�not�be�the�best�approach,�and� if� nonnormality� is� an� issue,� other� procedures,� such� as� the� nonparametric� Kolmogorov– Smirnov�one-sample�test,�may�be�considered��In�terms�of�the�assumption�of�independence,� this�assumption�is�met�when�the�cases�or�units�in�your�sample�have�been�randomly�selected� from� the� population�� Thus,� the� extent� to� which� this� assumption� is� met� is� dependent� on� your�sampling�design��In�reality,�random�selection�is�often�difficult�in�education�and�the� social�sciences�and�may�or�may�not�be�feasible�given�your�study� The� critical� values� for� the� t� distribution� are� obtained� from� the� t� table� in� Table� A�2,� where�you�take�into�account�the�α�level,�whether�the�test�is�one-�or�two-tailed,�and�the� degrees�of�freedom�ν�=�n�−�1��If�the�test�statistic�falls�into�a�critical�region,�as�defined�by� the�critical�values,�then�our�conclusion�is�to�reject�H0��If�the�test�statistic�does�not�fall�into� a�critical�region,�then�our�conclusion�is�to�fail�to�reject�H0��For�the�t�test,�the�critical�values� depend�on�sample�size,�whereas�for�the�z�test,�the�critical�values�do�not� As�was�the�case�for�the�z�test,�for�the�t�test,�a�CI�for�μ0�can�be�developed��The�(1�−�α)%�CI� is�formed�from � Y t scv Y± where�tcv�is�the�critical�value�from�the�t�table��If�the�hypothesized�mean�value�μ0�is�not�con- tained�in�the�interval,�then�our�conclusion�is�to�reject�H0��If�the�hypothesized�mean�value� μ0�is�contained�in�the�interval,�then�our�conclusion�is�to�fail�to�reject�H0��The�CI�procedure� for�the�t�test�then�is�comparable�to�that�for�the�z�test� 6.8.4  example Let�us�consider�an�example�of�the�entire�t�test�process��A�hockey�coach�wanted�to�determine� whether� the� mean� skating� speed� of� his� team� differed� from� the� hypothesized� league� mean� speed�of�12�seconds��The�hypotheses�are�developed�as�a�two-tailed�test�and�written�as�follows: � H H0 0 0: 12 or : 12µ µ= − = � H H1 1: 12 or : 12µ µ≠ − ≠ 0 Skating�speed�around�the�rink�was�timed�for�each�of�16�players�(data�are�given�in�Table�6�2�and� on�the�website�as�chap6data)��The�mean�speed�of�the�team�was�Y – �=�10�seconds�with�a�standard� deviation�of�sY�=�1�7889�seconds��The�standard�error�of�the�mean�is�then�computed�as�follows: � s s nY Y= = = 1 7889 16 0 4472 . . We�wish�to�conduct�a�t�test�at�α�=��05,�where�we�compute�the�test�statistic�t�as � t Y sY = − = − = − µ0 10 12 0 4472 4 4722 . . 144 An Introduction to Statistical Concepts Table 6.2 SPSS�Output�for�Skating�Example Raw data: 8, 12, 9, 7, 8, 10, 9, 11, 13.5, 8.5, 10.5, 9.5, 11.5, 12.5, 9.5, 10.5 One-Sample Statistics N Mean Std. Deviation Std. Error Mean Time 16 10.000 1.7889 .4472 One-Sample Test Test Value = 12 95% Confidence Interval of the Difference t df Sig. (2-Tailed) Mean Difference Lower Upper Time –4.472 15 “t” is the t test statistic value. .000 –2.0000 –2.953 –1.047 “Sig.” is the observed p value. It is interpreted as: there is less than a 1% probability of a sample mean of 10.00 occurring by chance if the null hypothesis is really true (i.e., if the population mean is really 12). The mean difference is simply the difference between the sample mean value (in this case, 10) and the hypothesized mean value (in this example, 12). In other words, 10 – 12 = –2.00 df are the degrees of freedom. For the one sample t test, they are calculated as n – 1 The table labeled “One-Sample Statistics” provides basic descriptive statistics for the sample. The standard error of the mean is: sY sY n =– sY 1.7889= =– 16 0.4472 .4472 t 10 – 12= = –4.472 SPSS reports the 95% confidence interval of the difference which means that in 95% of sample CIs, the true population mean difference will fall between –2.953 and –1.047. It is computed as: The 95% confidence interval of the mean (although not provided by SPSS) could also be calculated as: –2.00 ± (2.131)(.4472) ± sYtcv – –Ydifference 10 ± 2.131(0.4472) = 10 ± ( .9530) [9.047, 10.953] ± sYtcv ––Y sY– t = – Y – μ0 = 145Introduction to Hypothesis Testing: Inferences About a Single Mean We� turn� to� the�t� table� in�Table� A�2�and� determine� the�critical� values� based� on� α2�=��05� and�ν�=�15�degrees�of�freedom��The�critical�values�are�+�2�131,�which�defines�the�upper�tail� critical�region,�and�−2�131,�which�defines�the�lower�tail�critical�region��As�the�test�statistic t� (i�e�,  −4�4722)� falls� into� the� lower� tail� critical� region� (i�e�,� the� test� statistic� is� less� than� the� lower� tail� critical� value),� our� decision� is� to� reject� H0� and� conclude� that� the� mean� skating� speed�of�this�team�is�significantly�different�from�the�hypothesized�league�mean�speed�at�the� �05�level�of�significance��A�95%�CI�can�be�computed�as�follows: � Y t scv Y± = ± = ± =10 2 131 0 4472 10 9530 9 0470 10 9530. ( . ) (. ) ( . , . ) As�the�CI�does�not�contain�the�hypothesized�mean�value�of�12,�our�conclusion�is�to�again� reject�H0��Thus,�there�is�evidence�to�suggest�that�the�mean�skating�speed�of�the�team�differs� from�the�hypothesized�league�mean�speed�of�12�seconds� 6.9 SPSS Here�we�consider�what�SPSS�has�to�offer�in�the�way�of�testing�hypotheses�about�a�single� mean��As�with�most�statistical�software,�the�t�test�is�included�as�an�option�in�SPSS,�but�the� z�test�is�not��Instructions�for�determining�the�one-sample�t�test�using�SPSS�are�presented� first��This�is�followed�by�additional�steps�for�examining�the�normality�assumption� One-Sample t Test Step 1:�To�conduct�the�one-sample�t�test,�go�to�“Analyze”�in�the�top�pulldown�menu,� then�select�“Compare Means,”�and�then�select�“One-Sample T Test.”�Following�the� screenshot�(step�1)�as�follows�produces�the�“One-Sample T Test”�dialog�box� A B C Step 1 146 An Introduction to Statistical Concepts Step 2:�Next,�from�the�main�“One-Sample T Test”�dialog�box,�click�the�variable�of� interest�from�the�list�on�the�left�(e�g�,�time),�and�move�it�into�the�“Test Variable”�box�by� clicking�on�the�arrow�button��At�the�bottom�right�of�the�screen�is�a�box�for�“Test Value,”� where�you�indicate�the�hypothesized�value�(e�g�,�12)� Select the variable of interest from the list on the left and use the arrow to move to the “Test Variable” box on the right. Clicking on “Options” will allow you to define a confidence interval percentage. �e default is 95% (corresponding to an alpha of .05). Step 2 Step 3 (Optional):�The�default�alpha�level�in�SPSS�is��05,�and,�thus,�the�default�cor- responding�CI�is�95%��If�you�wish�to�test�your�hypothesis�at�an�alpha�level�other�than��05� (and�thus�obtain�CIs�other�than�95%),�then�click�on�the�“Options”�button�located�in�the� top�right�corner�of�the�main�dialog�box��From�here,�the�CI�percentage�can�be�adjusted�to� correspond�to�the�alpha�level�at�which�your�hypothesis�is�being�tested��(For�purposes�of� this�example,�the�test�has�been�generated�using�an�alpha�level�of��05�) Step 3 The�one-sample�t�test�output�for�the�skating�example�is�provided�in�Table�6�2� Using Explore to Examine Normality of Sample Distribution Generating normality evidence:�As�alluded�to�earlier�in�the�chapter,�understanding� the�distributional�shape�of�your�variable,�specifically�the�extent�to�which�normality�is�a�reason- able�assumption,�is�important��In�earlier�chapters,�we�saw�how�we�could�use�the�“Explore”� tool�in�SPSS�to�generate�a�number�of�useful�descriptive�statistics��In�conducting�our�one-sample� t�test,�we�can�again�use�“Explore”�to�examine�the�extent�to�which�the�assumption�of�normal- ity� is� met� for� our� sample� distribution�� As� the� general� steps� for� accessing�“Explore”� have� 147Introduction to Hypothesis Testing: Inferences About a Single Mean been�presented�in�previous�chapters�(e�g�,�Chapter�4),�they�will�not�be�reiterated�here��After�the� variable�of�interest�has�been�selected�and�moved�to�the�“Dependent List”�box�on�the�main� “Explore”�dialog�box,�click�on�“Plots”�in�the�upper�right�corner��Place�a�checkmark�in�the� boxes�for�“Normality plots with tests”�and�also�for “Histogram.” Select the variable of interest from the list on the left and use the arrow to move to the “Dependent List” box on the right. Then click on “Plots.” Generating normality evidence Interpreting normality evidence:� We� have� already� developed� a� good� under- standing�of�how�to�interpret�some�forms�of�evidence�of�normality,�including�skewness�and� kurtosis,�histograms,�and�boxplots��Using�our�hockey�data,�the�skewness�statistic�is��299� and�kurtosis�is�−�483—both�within�the�range�of�an�absolute�value�of�2�0,�suggesting�some� evidence�of�normality��The�histogram�also�suggests�relative�normality� 3 Histogram 2 1 Fr eq ue nc y 0 8.0 10.0 Time 12.0 14.0 Mean = 10.0 Std. dev. = 1.789 N = 16 148 An Introduction to Statistical Concepts There�are�a�few�other�statistics�that�can�be�used�to�gauge�normality�as�well��Using�SPSS,�we� can�obtain�two�statistical�tests�of�normality��The�Kolmogorov–Smirnov�(K–S)�(Chakravart,� Laha,�&�Roy,�1967)�with�Lilliefors�significance�(Lilliefors,�1967)�and�the�Shapiro-Wilk�(S–W)� (Shapiro�&�Wilk,�1965)�are�tests�that�provide�evidence�of�the�extent�to�which�our�sample� distribution�is�statistically�different�from�a�normal�distribution��The�K–S�test�tends�to�be� conservative,�whereas�the�S–W�test�is�usually�considered�the�more�powerful�of�the�two�for� testing�normality�and�is�recommended�for�use�with�small�sample�sizes�(n�<�50)��Both�of� these�statistics�are�generated�from�the�selection�of�“Normality plots with tests.”� The�output�for�the�K–S�and�S–W�tests�is�presented�as�follows��As�we�have�learned�in�this� chapter,�when�the�observed�probability�(i�e�,�p�value�which�is�reported�in�SPSS�as�“Sig�”)�is� less�than�our�stated�alpha�level,�then�we�reject�the�null�hypothesis��We�follow�those�same� rules�of�interpretation�here��Regardless�of�which�test�(K–S�or�S–W)�we�examine,�both�pro- vide�the�same�evidence—our�sample�distribution�is�not�statistically�significantly�different� than�what�would�be�expected�from�a�normal�distribution� Time a Lilliefors significance correction. * This is a lower bound of the true significance. .110 16 .200 .982 16 .978 Statistic Statisticdf dfSig. Tests of Normality Kolmogorov–Smirnova Shapiro–Wilk Sig. Quantile–quantile� (Q–Q)� plots� are� also� often� examined� to� determine� evidence� of� nor- mality��Q–Q�plots�are�graphs�that�depict�quantiles�of�the�sample�distribution�to�quantiles� of� the� theoretical� normal� distribution�� Points� that� fall� on� or� closely� to� the� diagonal� line� suggest�evidence�of�normality��The�Q–Q�plot�of�our�hockey�skating�time�provides�another� form�of�evidence�of�normality� 3 2 1 Ex pe ct ed n or m al 0 –1 –2 6 8 10 Observed value 12 14 Normal Q–Q plot of time 149Introduction to Hypothesis Testing: Inferences About a Single Mean The� detrended� normal� Q–Q� plot� shows� deviations� of� the� observed� values� from� the� theoretical� normal� distribution�� Evidence� of� normality� is� suggested� when� the� points� exhibit�little�or�no�pattern�around�0�(the�horizontal�line);�however�due�to�subjectivity�in� determining�the�extent�of�a�pattern,�this�graph�can�often�be�difficult�to�interpret��Thus,� in� many� cases,� you� may� wish� to� rely� more� heavily� on� the� other� forms� of� evidence� of� normality� 0.4 0.3 0.2 0.1 D ev . fr om n or m al 0.0 –0.1 –0.2 8 10 Observed value 12 14 Detrended normal Q–Q plot of time 6.10 G*Power In�our�discussion�of�power�presented�earlier�in�this�chapter,�we�indicated�that�the�sample� size� to� achieve� a� desired� level� of� power� can� be� determined� a� priori� (before� the� study� is� conducted),�and�observed�power�can�also�be�determined�post�hoc�(after�the�study�is�con- ducted)�using�statistical�software�or�power�tables��One�freeware�program�for�calculating� power� is� G*Power� (http://www�psycho�uni-duesseldorf�de/abteilungen/aap/gpower3/),� which� can� be� used� to� compute� both� a� priori� sample� size� and� post� hoc� power� analyses� (among�other�things)��Using�the�results�of�the�one-sample�t�test�just�conducted,�let�us�uti- lize�G*Power�to�first�determine�the�required�sample�size�given�various�estimated�param- eters�and�then�compute�the�post�hoc�power�of�our�test� A Priori Sample Size Using G*Power Step 1 (A priori sample size):�As�seen�in�step�1,�there�are�several�decisions�that� need�to�be�made�from�the�initial�G*Power�screen��First,�the�correct�test�family�needs�to�be� selected��In�our�case,�we�conducted�a�one-sample�t�test;�therefore,�the�default�selection�of� http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/), 150 An Introduction to Statistical Concepts “t tests”�is�the�correct�test�family��Next,�we�need�to�select�the�appropriate�statistical�test�� The�default�is�“Correlation: Point biserial model.”�This�is�not�the�correct�option� for� us,� and� so� we� use� the� arrow� to� toggle� to�“Means: Difference from constant (one sample case).” �e default selection for “Test Family” is “t tests.” �e default selection for “Statistical Test” is “Correlation: Point biserial model.” Use the arrow to toggle to the desired statistical test. For the one sample t test, we need “Means: Difference from constant (one sample case).” Step 1 �is is the test needed for a one sample t test. Step 2 (A priori sample size):�The�“Type of Power Analysis”�desired�then� needs� to� be� selected�� The� default� is“� A� priori:� Compute� required� sample� size—given� α,� power,�and�effect�size�”�For�this�illustration,�we�will�first�conduct�an�example�of�comput- ing� the� a� priori� sample� size� (i�e�,� the� default� option),� and� then�we� will�compute� post�hoc� power��Although�we�do�not�illustrate�the�use�of�these�here,�we�see�that�there�are�also�three� additional�forms�of�power�analysis�that�can�be�conducted�using�G*Power:�(1)�compromise,� (2)�criterion,�and�(3)�sensitivity� 151Introduction to Hypothesis Testing: Inferences About a Single Mean The default selection for “Type of Power Analysis” is “A priori: Compute required sample size–given , power, and effect size.” Step 2 Step 3 (A priori sample size):� The� “Input Parameters”� must� then� be� specified�� The� first� parameter� is� the� selection� of� whether� your� test� is� one-tailed� (i�e�,� directional)� or� two-tailed� (i�e�,� nondirectional)�� In� this� example,� we� have� a� two-tailed� test,� so� we� use� the� arrow� to� toggle�“Tails”� to�“Two.”� For� a� priori� power,� we� have� to� indicate� the� anticipated� effect� size�� Your� best� estimate� of� the� effect� size� you� can� anticipate� achieving� is� usually� to� rely� on� previous� studies� that� have� been� conducted� that� are� similar� to� yours�� In� G*Power,� the� default� effect� size� is� d� =� �50�� For� purposes� of� this� illustration,� let� us� use� the� default�� The� alpha� level� must� also� be� defined�� The� default� significance� level� in� G*Power� is� �05,� which� is� the� alpha� level� we� will� be� using� for�our�example��The�desired�level�of�power�must�also�be�defined��The�G*Power�default� for� power� is� �95�� Many� researchers� in� education� and� the� behavioral� sciences� indicate� that�a�desired�power�of��80�or�above�is�usually�desired��Thus,��95�may�be�higher�than� 152 An Introduction to Statistical Concepts what� many� would� consider� sufficient� power�� For� purposes� of� this� example,� however,� we�will�use�the�default�power�of��95��Once�the�parameters�are�specified,�simply�click�on� “Calculate”�to�generate�the�a�priori�power�statistics� Once the parameters are specified, click on “Calculate.” Step 3 The “Input Parameters” to determine a prior sample size must be specified including: 1. One versus two tailed test; 2. Anticipated effect size; 3. Alpha level; and 4. Desired power. Step 4 (A priori sample size):�The�“Output Parameters”�provide�the�relevant� statistics�given�the�input�specified��In�this�example,�we�were�interested�in�determining�the� a�priori�sample�size�given�a�two-tailed�test,�with�an�anticipated�effect�size�of��50,�an�alpha� level�of��05,�and�desired�power�of��95��Based�on�those�criteria,�the�required�sample�size�for� our�one-sample�t�test�is�54��In�other�words,�if�we�have�a�sample�size�of�54�individuals�or� cases�in�our�study,�testing�at�an�alpha�level�of��05,�with�a�two-tailed�test,�and�achieving�a� moderate�effect�size�of��50,�then�the�power�of�our�test�will�be��95—the�probability�of�reject- ing�the�null�hypothesis�when�it�is�really�false�will�be�95%� 153Introduction to Hypothesis Testing: Inferences About a Single Mean Step 4 The “Output Parameters” provide the relevant statistics given the input specified. In this case, we were interested in determining the required sample size given various parameters. Based on the parameters specified, we need a sample size of 54 for our one sample t test. If�we�had�anticipated�a�smaller�effect�size,�say��20,�but�left�all�of�the�other�input�parameters� the�same,�the�required�sample�size�needed�to�achieve�a�power�of��95�increases�greatly—to�327� If a small e�ect is anticipated, the needed sample size increases greatly to achieve the desired power. Post Hoc Power Using G*Power Now,�let�us�use�G*Power�to�compute�post�hoc�power��Step�1,�as�presented�earlier�for�a�priori� power,�remains�the�same;�thus,�we�will�start�from�step�2� 154 An Introduction to Statistical Concepts Step 2 (Post hoc power):�The�“Type of Power Analysis”�desired�then�needs�to� be�selected��The�default�is�“A�priori:�Compute�required�sample�size—given�α,�power,�and� effect�size�”�To�compute�post�hoc�power,�we�need�to�select�“Post�hoc:�Compute�achieved� power—given�α,�sample�size,�and�effect�size�” Step 3 (Post hoc power):�The�“Input Parameters”�must�then�be�specified��The� first�parameter�is�the�selection�of�whether�your�test�is�one-tailed�(i�e�,�directional)�or�two- tailed�(i�e�,�nondirectional)��In�this�example,�we�have�a�two-tailed�test�so�we�use�the�arrow� to�toggle�to�“Tails”�to�“Two.”�The�achieved�or�observed�effect�size�was�−1�117��The�alpha� level� we� tested� at� was� �05,� and� the� actual� sample� size� was� 16�� Once� the� parameters� are� specified,�simply�click�on�“Calculate”�to�generate�the�achieved�power�statistics� Step 4 (Post hoc power):�The�“Output Parameters”�provide�the�relevant�statis- tics�given�the�input�specified��In�this�example,�we�were�interested�in�determining�post�hoc� power�given�a�two-tailed�test,�with�an�observed�effect�size�of�−1�117,�an�alpha�level�of��05,�and� sample�size�of�16��Based�on�those�criteria,�the�post�hoc�power�was��96��In�other�words,�with�a� sample�size�of�16�skaters�in�our�study,�testing�at�an�alpha�level�of��05,�with�a�two-tailed�test,� and�observing�a�large�effect�size�of�−1�117,�then�the�power�of�our�test�was��96—the�probability� of�rejecting�the�null�hypothesis�when�it�is�really�false�will�be�96%,�an�excellent�level�of�power�� Keep�in�mind�that�conducting�power�analysis�a�priori�is�highly�recommended�so�that�you� avoid�a�situation�where,�post�hoc,�you�find�that�the�sample�size�was�not�sufficient�to�reach�the� desired�power�(given�the�observed�effect�size�and�alpha�level)� The “Input Parameters” must be specified including: Once the parameters are specified, click on “Calculate.” Steps 2–4 1. One versus two tailed test; 2. Actual effect size (for post hoc power); 3. Alpha level; and 4. Total sample size. 155Introduction to Hypothesis Testing: Inferences About a Single Mean 6.11 Template and APA-Style Write-Up Let� us� revisit� our� graduate� research� assistant,� Marie,� who� was� working� with� Oscar,� a� local� hockey� coach,� to� assist� in� analyzing� his� team’s� data�� As� a� reminder,� her� task� was� to� assist� Oscar� in� generating� the� test� of� inference� to� answer� his� research� question,� “Is the mean skating speed of our hockey team different from the league mean speed of 12 seconds”?� Marie�suggested�a�one-sample�test�of�means�as�the�test�of�inference��A�template�for�writ- ing�a�research�question�for�a�one-sample�test�of�inference�(i�e�,�one-sample�t�test)�is�pre- sented�as�follows: Is the mean of [sample variable] different from [hypothesized mean value]? It�may�be�helpful�to�preface�the�results�of�the�one-sample�t�test�with�the�information�we� gathered�to�examine�the�extent�to�which�the�assumption�of�normality�was�met��This�assists� the�reader�in�understanding�that�you�were�thorough�in�data�screening�prior�to�conducting� the�test�of�inference� The distributional shape of skating speed was examined to determine the extent to which the assumption of normality was met. Skewness (.299, SE = .564), kurtosis (−.483, SE = 1.091), and the Shapiro-Wilk test of normality (S-W = .982, df = 16, p = .978) suggest that normality is a reasonable assumption. Visually, a relatively bell-shaped distribution displayed in the histogram (reflected similarly in the boxplot) as well as a Q–Q plot with points adhering closely to the diagonal line also suggest evidence of normality. Additionally, the boxplot did not suggest the presence of any potential outliers. These indices suggest evidence that the assumption of normality was met. An� additional� assumption� of� the� one-sample� t� test� is� the� assumption� of� independence�� This�assumption�is�met�when�the�cases�in�our�sample�have�been�randomly�selected�from� the� population�� This� is� an� often� overlooked,� but� important,� assumption� for� researchers� when� presenting� the� results� of� their� test�� One� or� two� sentences� are� usually� sufficient� to� indicate�if�this�assumption�was�met� Because the skaters in this sample represented a random sample, the assumption of independence was met. It�is� also� desirable� to�include� a� measure� of� effect� size�� Recall� our� formula�for� computing� the�effect�size,�d,�presented�earlier�in�the�chapter��Plugging�in�the�values�for�our�skating� example,�we�find�an�effect�size�of�−1�117,�interpreted�according�to�Cohen’s�(1988)�guidelines� as�a�large�effect: � d Y s = − = − = − µ0 10 12 1 7889 1 117 . . Remember�that�for�the�one-sample�mean�test,�d�indicates�how�many�standard�deviations� the�sample�mean�is�from�the�hypothesized�mean��Thus,�with�an�effect�size�of�−1�117,�there� 156 An Introduction to Statistical Concepts are�nearly�one�and�one-quarter�standard�deviation�units�between�our�sample�mean�and� the�hypothesized�mean��The�negative�sign�simply�indicates�that�our�sample�mean�was�the� smaller� mean� (as� it� is� the� first� value� in� the� numerator� of� the� formula)�� In� this� particular� example,� the� negative� effect� is� desired� as� it� suggests� the� team’s� average� skating� time� is� quicker�than�the�league�mean� Here�is�an�example�APA-style�paragraph�of�results�for�the�skating�data�(remember�that� this�will�be�prefaced�by�the�paragraph�reporting�the�extent�to�which�the�assumptions�of� the�test�were�met)� A one-sample t test was conducted at an alpha level of .05 to answer the research question: Is the mean skating speed of a hockey team dif- ferent from the league mean speed of 12 seconds? The null hypothesis stated that the team mean speed would not differ from the league mean speed of 12. The alternative hypothesis stated that the team average speed would differ from the league mean. As depicted in Table 6.2, based on a random sample of 16 skaters, there was a mean time of 10 seconds, and a standard deviation of 1.7889 seconds. When compared against the hypothesized mean of 12 seconds, the one-sample t test was shown to be statistically significant (t = −4.472, df = 15, p < .001). Therefore, the null hypothesis that the team average time would be 12 seconds was rejected. This provides evidence to suggest that the sample mean skating time for this particular team was statistically different from the hypothesized mean skating time of the league. Additionally, the effect size d was −1.117, generally interpreted as a large effect (Cohen, 1988), and indicating that there is more than a one standard deviation difference between the team and league mean skating times. The post hoc power of the test, given the sample size, two-tailed test, alpha level, and observed effect size, was .96. 6.12 Summary In� this� chapter,� we� considered� our� first� inferential� testing� situation,� testing� hypotheses� about�a�single�mean��A�number�of�topics�and�new�concepts�were�discussed��First,�we�intro- duced�the�types�of�hypotheses�utilized�in�inferential�statistics,�that�is,�the�null�or�statistical� hypothesis�versus�the�scientific�or�alternative�or�research�hypothesis��Second,�we�moved� on�to�the�types�of�decision�errors�(i�e�,�Type�I�and�Type�II�errors)�as�depicted�by�the�deci- sion�table�and�illustrated�by�the�rain�example��Third,�the�level�of�significance�was�intro- duced� as� well� as� the� types� of� alternative� hypotheses� (i�e�,� nondirectional� vs�� directional� alternative�hypotheses)��Fourth,�an�overview�of�the�steps�in�the�decision-making�process� of� inferential� statistics� was� given�� Fifth,� we� examined� the� z� test,� which� is� the� inferential� test�about�a�single�mean�when�the�population�standard�deviation�is�known��This�was�fol- lowed� by� a� more� formal� description� of� Type� II� error� and� power�� We� then� discussed� the� notion�of�statistical�significance�versus�practical�significance��Finally,�we�considered�the�t� test,�which�is�the�inferential�test�about�a�single�mean�when�the�population�standard�devia- tion�is�unknown,�and�then�completed�the�chapter�with�an�example,�SPSS�information,�a� G*Power�illustration,�and�an�APA-style�write-up�of�results��At�this�point,�you�should�have� 157Introduction to Hypothesis Testing: Inferences About a Single Mean met�the�following�objectives:�(a)�be�able�to�understand�the�basic�concepts�of�hypothesis�testing,� (b)�be�able�to�utilize�the�normal�and�t�tables,�and�(c)�be�able�to�understand,�determine,�and� interpret�the�results�from�the�z�test,�t test,�and�CI�procedures��Many�of�the�concepts�in�this� chapter�carry�over�into�other�inferential�tests��In�the�next�chapter,�we�discuss�inferential� tests�involving�the�difference�between�two�means��Other�inferential�tests�will�be�consid- ered�in�subsequent�chapters� Problems Conceptual problems 6.1� In� hypothesis� testing,� the� probability� of� failing� to� reject� H0� when� H0� is� false� is� denoted by � a�� α � b�� 1�−�α � c�� β � d�� 1�−�β 6.2� The� probability� of� observing� the� sample� mean� (or� some� value� greater� than� the� sample� mean)� by� chance� if� the� null� hypothesis� is� really� true� is� which� one� of� the� following? � a�� α � b�� Level�of�significance � c�� p�value � d�� Test�statistic�value 6.3� When�testing�the�hypotheses�presented�in�the�following,�at�a��05�level�of�significance� with�the�t�test,�where�is�the�rejection�region? � H H 0 1 100 100 : : µ µ = < � a�� The�upper�tail � b�� The�lower�tail � c�� Both�the�upper�and�lower�tails � d�� Cannot�be�determined 6.4� A�research�question�asks,�“Is�the�mean�age�of�children�who�enter�preschool�different� from�48�months”?�Which�one�of�the�following�is�implied? � a�� Left-tailed�test � b�� Right-tailed�test � c�� Two-tailed�test � d�� Cannot�be�determined�based�on�this�information 158 An Introduction to Statistical Concepts 6.5� The�probability�of�making�a�Type�II�error�when�rejecting�H0�at�the��05�level�of�signifi- cance�is�which�one�of�the�following? � a�� 0 � b�� �05 � c�� Between��05�and��95 � d�� �95 6.6� If�the�90%�CI�does�not�include�the�value�for�the�parameter�being�estimated�in�H0,�then� which�one�of�the�following�is�a�correct�statement? � a�� H0�cannot�be�rejected�at�the��10�level� � b�� H0�can�be�rejected�at�the��10�level� � c�� A�Type�I�error�has�been�made� � d�� A�Type�II�error�has�been�made� 6.7� Other�things�being�equal,�which�of�the�values�of�t�given�next�is�least�likely�to�result� when�H0�is�true,�for�a�two-tailed�test? � a�� 2�67 � b�� 1�00 � c�� 0�00 � d�� −1�96 � e�� −2�70 6.8� The�fundamental�difference�between�the�z�test�and�the�t�test�for�testing�hypotheses� about�a�population�mean�is�which�one�of�the�following? � a�� Only�z�assumes�the�population�distribution�be�normal� � b�� z�is�a�two-tailed�test,�whereas�t�is�one-tailed� � c�� Only�t�becomes�more�powerful�as�sample�size�increases� � d�� Only�z�requires�the�population�variance�be�known� 6.9� If�one�fails�to�reject�a�true�H0,�one�is�making�a�Type�I�error��True�or�false? 6.10� Which�one�of�the�following�is�a�correct�interpretation�of�d? � a�� Alpha�level � b�� CI � c�� Effect�size � d�� Observed�probability � e�� Power 6.11� A�one-sample�t�test�is�conducted�at�an�alpha�level�of��10��The�researcher�finds�a�p�value� of��08�and�concludes�that�the�test�is�statistically�significant��Is�the�researcher�correct? 6.12� When�testing�the�following�hypotheses�at�the��01�level�of�significance�with�the�t�test,�a� sample�mean�of�301�is�observed��I�assert�that�if�I�calculate�the�test�statistic�and�compare�it� to�the�t�distribution�with�n�−�1�degrees�of�freedom,�it�is�possible�to�reject�H0��Am�I�correct? � H H 0 1 295 295 : : µ µ = < 159Introduction to Hypothesis Testing: Inferences About a Single Mean 6.13� If�the�sample�mean�exceeds�the�hypothesized�mean�by�200�points,�I�assert�that�H0�can� be�rejected��Am�I�correct? 6.14� I� assert� that� H0� can� be� rejected� with� 100%� confidence� if� the� sample� consists� of� the� entire�population��Am�I�correct? 6.15� I� assert� that� the� 95%� CI� has� a� larger� width� than� the� 99%� CI� for� a� population� mean� using�the�same�data��Am�I�correct? 6.16� I�assert�that�the�critical�value�of�z,�for�a�test�of�a�single�mean,�will�increase�as�sample� size�increases��Am�I�correct? 6.17� The� mean� of� the� t� distribution� increases� as� degrees� of� freedom� increase?� True� or� false? 6.18� It�is�possible�that�the�results�of�a�one-sample�t�test�and�for�the�corresponding�CI�will� differ�for�the�same�dataset�and�level�of�significance��True�or�false? 6.19� The�width�of�the�95%�CI�does�not�depend�on�the�sample�mean��True�or�false? 6.20� The�null�hypothesis�is�a�numerical�statement�about�which�one�of�the�following? � a�� An�unknown�parameter � b�� A�known�parameter � c�� An�unknown�statistic � d�� A�known�statistic Computational problems 6.1� Using�the�same�data�and�the�same�method�of�analysis,�the�following�hypotheses�are� tested� about� whether� mean� height� is� 72� inches�� Researcher� A� uses� the� �05� level� of� significance,�and�Researcher�B�uses�the��01�level�of�significance: � H H 0 1 72 72 : : µ µ = ≠ � a�� If�Researcher�A�rejects�H0,�what�is�the�conclusion�of�Researcher�B? � b�� If�Researcher�B�rejects�H0,�what�is�the�conclusion�of�Researcher�A? � c�� If�Researcher�A�fails�to�reject�H0,�what�is�the�conclusion�of�Researcher�B? � d�� If�Researcher�B�fails�to�reject�H0,�what�is�the�conclusion�of�Researcher�A? 6.2� Give�a�numerical�value�for�each�of�the�following�descriptions�by�referring�to�the� t�table� � a�� The�percentile�rank�of�t5�=�1�476 � b�� The�percentile�rank�of�t10�=�3�169 � c�� The�percentile�rank�of�t21�=�2�518 � d�� The�mean�of�the�distribution�of�t23 � e�� The�median�of�the�distribution�of�t23 � f�� The�variance�of�the�distribution�of�t23 � g�� The�90th�percentile�of�the�distribution�of�t27 160 An Introduction to Statistical Concepts 6.3� Give�a�numerical�value�for�each�of�the�following�descriptions�by�referring�to�the� t�table� � a�� The�percentile�rank�of�t5�=�2�015 � b�� The�percentile�rank�of�t20�=�1�325 � c�� The�percentile�rank�of�t30�=�2�042 � d�� The�mean�of�the�distribution�of�t10 � e�� The�median�of�the�distribution�of�t10 � f�� The�variance�of�the�distribution�of�t10 � g�� The�95th�percentile�of�the�distribution�of�t14 6.4� The� following� random� sample� of� weekly� student� expenses� is� obtained� from� a� normally� distributed� population� of� undergraduate� students� with� unknown� parameters: 68 56 76 75 62 81 72 69 91 84 49 75 69 59 70 53 65 78 71 87 71 74 69 65 64 � a�� Test�the�following�hypotheses�at�the��05�level�of�significance: � H H 0 1 74 74 : : µ µ = ≠ � b�� Construct�a�95%�CI� 6.5� The�following�random�sample�of�hours�spent�per�day�answering�e-mail�is�obtained� from�a�normally�distributed�population�of�community�college�faculty�with�unknown� parameters: 2 3�5 4 1�25 2�5 3�25 4�5 4�25 2�75 3�25 1�75 1�5 2�75 3�5 3�25 3�75 2�25 1�5 1�25 3�25 � a�� Test�the�following�hypotheses�at�the��05�level�of�significance: � H H 0 1 3 0 3 0 : : µ µ = ≠ . . � b�� Construct�a�95%�CI� 161Introduction to Hypothesis Testing: Inferences About a Single Mean 6.6� In�the�population,�it�is�hypothesized�that�flags�have�a�mean�usable�life�of�100�days�� Twenty-five�flags�are�flown�in�the�city�of�Tuscaloosa�and�are�found�to�have�a�sample� mean�usable�life�of�200�days�with�a�standard�deviation�of�216�days��Does�the�sample� mean�in�Tuscaloosa�differ�from�that�of�the�population�mean? � a�� Conduct�a�two-tailed�t�test�at�the��01�level�of�significance� � b�� Construct�a�99%�CI� Interpretive problems 6.1� Using�item�7�from�the�survey�1�dataset�accessible�from�the�website,�use�SPSS�to�con- duct� a� one-sample� t� test� to� determine� whether� the� mean� number� of� compact� disks� owned�is�significantly�different�from�25,�at�the��05�level�of�significance��Test�for�the� extent�to�which�the�assumption�of�normality�has�been�met��Calculate�an�effect�size�as� well�as�post�hoc�power��Then�write�an�APA-style�paragraph�reporting�your�results� 6.2� Using� item� 14� from� the� survey� 1� dataset� accessible� from� the� website,� use� SPSS� to� conduct�a�one-sample�t�test�to�determine�whether�the�mean�number�of�hours�slept� is�significantly�different�from�8,�at�the��05�level�of�significance��Test�for�the�extent�to� which�the�assumption�of�normality�has�been�met��Calculate�an�effect�size�as�well�as� post�hoc�power��Then�write�an�APA-style�paragraph�reporting�your�results� 163 7 Inferences About the Difference Between Two Means Chapter Outline 7�1� New�Concepts 7�1�1� Independent�Versus�Dependent�Samples 7�1�2� Hypotheses 7�2� Inferences�About�Two�Independent�Means 7�2�1� Independent�t�Test 7�2�2� Welch�t′�Test 7�2�3� Recommendations 7�3� Inferences�About�Two�Dependent�Means 7�3�1� Dependent�t�Test 7�3�2� Recommendations 7�4� SPSS 7�5� G*Power 7�6� Template�and�APA-Style�Write-Up Key Concepts � 1�� Independent�versus�dependent�samples � 2�� Sampling�distribution�of�the�difference�between�two�means � 3�� Standard�error�of�the�difference�between�two�means � 4�� Parametric�versus�nonparametric�tests In�Chapter�6,�we�introduced�hypothesis�testing�and�ultimately�considered�our�first�inferen- tial�statistic,�the�one-sample�t�test��There�we�examined�the�following�general�topics:�types� of�hypotheses,�types�of�decision�errors,�level�of�significance,�steps�in�the�decision-making� process,�inferences�about�a�single�mean�when�the�population�standard�deviation�is�known� (the� z� test),� power,� statistical� versus� practical� significance,� and� inferences� about� a� single� mean�when�the�population�standard�deviation�is�unknown�(the�t�test)� In�this�chapter,�we�consider�inferential�tests�involving�the�difference�between�two�means�� In�other�words,�our�research�question�is�the�extent�to�which�two�sample�means�are�statis- tically�different�and,�by�inference,�the�extent�to�which�their�respective�population�means� are�different��Several�inferential�tests�are�covered�in�this�chapter,�depending�on�whether� 164 An Introduction to Statistical Concepts the� two� samples� are� selected� in� an� independent� or� dependent� manner,� and� on� whether� the�statistical�assumptions�are�met��More�specifically,�the�topics�described�include�the�fol- lowing�inferential�tests:�for�two�independent�samples—the�independent�t�test,�the�Welch� t′�test,�and�briefly�the�Mann–Whitney–Wilcoxon�test;�and�for�two�dependent�samples—the� dependent�t�test�and�briefly�the�Wilcoxon�signed�ranks�test��We�use�many�of�the�founda- tional�concepts�previously�covered�in�Chapter�6��New�concepts�to�be�discussed�include�the� following:� independent� versus� dependent� samples,� the� sampling� distribution� of� the� dif- ference�between�two�means,�and�the�standard�error�of�the�difference�between�two�means�� Our�objectives�are�that�by�the�end�of�this�chapter,�you�will�be�able�to�(a)�understand�the� basic�concepts�underlying�the�inferential�tests�of�two�means,�(b)�select�the�appropriate�test,� and�(c)�determine�and�interpret�the�results�from�the�appropriate�test� 7.1 New Concepts Remember� Marie,� our� very� capable� educational� researcher� graduate� student?� Let� us� see� what�Marie�has�in�store�for�her�now…� Marie’s�first�attempts�at�consulting�went�so�well�that�her�faculty�advisor�has�assigned� Marie�two�additional�consulting�responsibilities�with�individuals�from�their�commu- nity�� Marie� has� been� asked� to� consult� with� a� local� nurse� practitioner,� JoAnn,� who� is� studying�cholesterol�levels�of�adults�and�how�they�differ�based�on�gender��Marie�sug- gests�the�following�research�question:�Is there a mean difference in cholesterol level between males and females?� Marie� suggests� an� independent� samples� t� test� as� the� test� of� infer- ence��Her�task�is�then�to�assist�JoAnn�in�generating�the�test�of�inference�to�answer�her� research�question� Marie�has�also�been�asked�to�consult�with�the�swimming�coach,�Mark,�who�works� with� swimming� programs� that� are� offered� through� their� local� Parks� and� Recreation� Department�� Mark� has� just� conducted� an� intensive� 2� month� training� program� for� a� group�of�10�swimmers��He�wants�to�determine�if,�on�average,�their�time�in�the�50�meter� freestyle�event�is�different�after�the�training��The�following�research�question�is�sug- gested�by�Marie:�Is there a mean difference in swim time for the 50-meter freestyle event before participation in an intensive training program as compared to swim time for the 50-meter free- style event after participation in an intensive training program?�Marie�suggests�a�dependent� samples�t�test�as�the�test�of�inference��Her�task�is�then�to�assist�Mark�in�generating�the� test�of�inference�to�answer�his�research�question� Before� we� proceed� to� inferential� tests� of� the� difference� between� two� means,� a� few� new� concepts� need� to� be� introduced�� The� new� concepts� are� the� difference� between� the� selec- tion�of�independent�samples�and�dependent�samples,�the�hypotheses�to�be�tested,�and�the� sampling�distribution�of�the�difference�between�two�means� 7.1.1   Independent Versus dependent Samples The�first�new�concept�to�address�is�to�make�a�distinction�between�the�selection�of�indepen- dent samples�and�dependent samples��Two�samples�are�independent�when�the�method� of� sample� selection� is� such� that� those� individuals� selected� for� sample� 1� do� not� have� any� 165Inferences About the Difference Between Two Means relationship� to� those� individuals� selected� for� sample� 2�� In� other� words,� the� selection� of� individuals�to�be�included�in�the�two�samples�are�unrelated�or�uncorrelated�such�that�they� have�absolutely�nothing�to�do�with�one�another��You�might�think�of�the�samples�as�being� selected� totally� separate� from� one� another�� Because� the� individuals� in� the� two� samples� are�independent�of�one�another,�their�scores�on�the�dependent�variable,�Y,�should�also�be� independent�of�one�another��The�independence�condition�leads�us�to�consider,�for�example,� the�independent samples�t�test��(This�should�not,�however,�be�confused�with�the�assump- tion�of�independence,�which�was�introduced�in�the�previous�chapter��The�assumption�of� independence�still�holds�for�the�independent�samples�t�test,�and�we�will�talk�later�about� how�this�assumption�can�be�met�with�this�particular�procedure�) Two� samples� are� dependent� when� the� method� of� sample� selection� is� such� that� those� individuals�selected�for�sample�1�do�have�a�relationship�to�those�individuals�selected�for� sample�2��In�other�words,�the�selections�of�individuals�to�be�included�in�the�two�samples� are�related�or�correlated��You�might�think�of�the�samples�as�being�selected�simultaneously� such�that�there�are�actually�pairs�of�individuals��Consider�the�following�two�typical�exam- ples��First,�if�the�same�individuals�are�measured�at�two�points�in�time,�such�as�during�a� pretest�and�a�posttest,�then�we�have�two�dependent�samples��The�scores�on�Y�at�time�1�will� be�correlated�with�the�scores�on�Y�at�time�2�because�the�same�individuals�are�assessed�at� both�time�points��Second,�if�husband-and-wife�pairs�are�selected,�then�we�have�two�depen- dent�samples��That�is,�if�a�particular�wife�is�selected�for�the�study,�then�her�corresponding� husband�is�also�automatically�selected—this�is�an�example�where�individuals�are�paired� or�matched�in�some�way�such�that�they�share�characteristics�that�makes�the�score�of�one� person�related�to�(i�e�,�dependent�on)�the�score�of�the�other�person��In�both�examples,�we� have�natural�pairs�of�individuals�or�scores��The�dependence�condition�leads�us�to�consider� the�dependent samples�t�test,�alternatively�known�as�the�correlated samples�t�test�or�the� paired samples�t�test��As�we�show�in�this�chapter,�whether�the�samples�are�independent� or�dependent�determines�the�appropriate�inferential�test� 7.1.2   hypotheses The�hypotheses�to�be�evaluated�for�detecting�a�difference�between�two�means�are�as�fol- lows�� The� null� hypothesis� H0� is� that� there� is� no� difference� between� the� two� population� means,�which�we�denote�as�the�following: � H H0 1 2 0 1 20: :µ µ µ µ− = =or where μ1�is�the�population�mean�for�sample�1 μ2�is�the�population�mean�for�sample�2 Mathematically,�both�equations�say�the�same�thing��The�version�on�the�left�makes�it�clear� to�the�reader�why�the�term�“null”�is�appropriate��That�is,�there�is�no�difference�or�a�“null”� difference�between�the�two�population�means��The�version�on�the�right�indicates�that�the� population� mean� of� sample� 1� is� the� same� as� the� population� mean� of� sample� 2—another� way�of�saying�that�there�is�no�difference�between�the�means�(i�e�,�they�are�the�same)��The� nondirectional�scientific�or�alternative�hypothesis�H1�is�that�there�is�a�difference�between� the�two�population�means,�which�we�denote�as�follows: � H H1 1 2 1 1 20: :µ µ µ µ− ≠ or ≠ 166 An Introduction to Statistical Concepts The�null�hypothesis�H0�will�be�rejected�here�in�favor�of�the�alternative�hypothesis�H1�if�the� population�means�are�different��As�we�have�not�specified�a�direction�on�H1,�we�are�will- ing�to�reject�either�if�μ1�is�greater�than�μ2�or�if�μ1�is�less�than�μ2��This�alternative�hypothesis� results�in�a�two-tailed�test� Directional�alternative�hypotheses�can�also�be�tested�if�we�believe�μ1�is�greater�than�μ2,� denoted�as�follows: � H H1 1 2 1 1 20: :µ µ µ µ− > >or
In�this�case,�the�equation�on�the�left�tells�us�that�when�μ2�is�subtracted�from�μ1,�a�positive�
value�will�result�(i�e�,�some�value�greater�than�0)��The�equation�on�the�right�makes�it�some-
what�clearer�what�we�hypothesize�
Or�if�we�believe�μ1�is�less�than�μ2,�the�directional�alternative�hypotheses�will�be�denoted�
as�we�see�here:
� H H1 1 2 1 1 20: :µ − < <µ µ µor In�this�case,�the�equation�on�the�left�tells�us�that�when�μ2�is�subtracted�from�μ1,�a�negative� value�will�result�(i�e�,�some�value�less�than�0)��The�equation�on�the�right�makes�it�somewhat� clearer�what�we�hypothesize��Regardless�of�how�they�are�denoted,�directional�alternative� hypotheses�result�in�a�one-tailed�test� The� underlying� sampling� distribution� for� these� tests� is� known� as� the� sampling dis- tribution of the difference between two means��This�makes�sense,�as�the�hypotheses� examine�the�extent�to�which�two�sample�means�differ��The�mean�of�this�sampling�dis- tribution�is�0,�as�that�is�the�hypothesized�difference�between�the�two�population�means� μ1�−�μ2��The�more�the�two�sample�means�differ,�the�more�likely�we�are�to�reject�the�null� hypothesis��As�we�show�later,�the�test�statistics�in�this�chapter�all�deal�in�some�way�with� the�difference�between�the�two�means�and�with�the�standard�error�(or�standard�devia- tion)�of�the�difference�between�two�means� 7.2 Inferences About Two Independent Means In� this� section,� three� inferential� tests� of� the� difference� between� two� independent� means� are� described:� the� independent� t� test,� the� Welch� t′� test,� and� briefly� the� Mann–Whitney– Wilcoxon�test��The�section�concludes�with�a�list�of�recommendations� 7.2.1   Independent t Test First,�we�need�to�determine�the�conditions�under�which�the�independent�t�test�is�appropri- ate��In�part,�this�has�to�do�with�the�statistical�assumptions�associated�with�the�test�itself�� The�assumptions�of�the�independent�t�test�are�that�the�scores�on�the�dependent�variable�Y� (a)�are�normally�distributed�within�each�of�the�two�populations,�(b)�have�equal�population� variances�(known�as�homogeneity�of�variance�or�homoscedasticity),�and�(c)�are�indepen- dent�� (The� assumptions� of� normality� and� independence� should� sound� familiar� as� they� were�introduced�as�we�learned�about�the�one-sample�t�test�)�Later�in�the�chapter,�we�more� 167Inferences About the Difference Between Two Means fully�discuss�the�assumptions�for�this�particular�procedure��When�these�assumptions�are� not�met,�other�procedures�may�be�more�appropriate,�as�we�also�show�later� The�measurement�scales�of�the�variables�must�also�be�appropriate��Because�this�is�a�test� of�means,�the�dependent�variable�must�be�measured�on�an�interval�or�ratio�scale��The�inde- pendent�variable,�however,�must�be�nominal�or�ordinal,�and�only�two�categories�or�groups� of�the�independent�variable�can�be�used�with�the�independent�t�test��(In�later�chapters,�we� will�learn�about�analysis�of�variance�(ANOVA)�which�can�accommodate�an�independent� variable� with� more� than� two� categories�)� It� is� not� a� condition� of� the� independent� t� test� that�the�sample�sizes�of�the�two�groups�be�the�same��An�unbalanced�design�(i�e�,�unequal� sample�sizes)�is�perfectly�acceptable� The�test�statistic�for�the�independent�t�test�is�known�as�t�and�is�denoted�by�the�following� formula: � t Y Y sY Y = − − 1 2 1 2 where Y – 1�and�Y – 2�are�the�means�for�sample�1�and�sample�2,�respectively sY Y1 2− �is�the�standard�error�of�the�difference�between�two�means This�standard�error�is�the�standard�deviation�of�the�sampling�distribution�of�the�difference� between�two�means�and�is�computed�as�follows: � s s n nY Y p1 2 1 1 1 2 − = + where�sp�is�the�pooled�standard�deviation�computed�as � s n s n s n n p = − + − + − ( ) ( )1 1 2 2 2 2 1 2 1 1 2 and�where s1 2�and� s2 2 �are�the�sample�variances�for�groups�1�and�2,�respectively n1�and�n2�are�the�sample�sizes�for�groups�1�and�2,�respectively Conceptually,�the�standard�error� sY Y1 2− �is�a�pooled�standard�deviation�weighted�by�the� two� sample� sizes;� more� specifically,� the� two� sample� variances� are� weighted� by� their� respective� sample� sizes� and� then� pooled�� This� is� conceptually� similar� to� the� standard� error�for�the�one-sample�t�test,�which�you�will�recall�from�Chapter�6�as � s s n Y Y= where�we�also�have�a�standard�deviation�weighted�by�sample�size��If�the�sample�variances� are�not�equal,�as�the�test�assumes,�then�you�can�see�why�we�might�not�want�to�take�a�pooled� or�weighted�average�(i�e�,�as�it�would�not�represent�well�the�individual�sample�variances)� 168 An Introduction to Statistical Concepts The� test� statistic� t� is� then� compared� to� a� critical� value(s)� from� the� t� distribution�� For� a� two-tailed� test,� from� Table� A�2,� we� would� use� the� appropriate� α2� column� depending� on� the� desired� level� of� significance� and� the� appropriate� row� depending� on� the� degrees� of� freedom�� The� degrees� of� freedom� for� this� test� are� n1� +� n2� −� 2�� Conceptually,� we� lose� one� degree�of�freedom�from�each�sample�for�estimating�the�population�variances�(i�e�,�there�are� two�restrictions�along�the�lines�of�what�was�discussed�in�Chapter�6)��The�critical�values�are� denoted�as� ± + −α2 1 2 2tn n ��The�subscript�α2�of�the�critical�values�reflects�the�fact�that�this�is�a� two-tailed�test,�and�the�subscript�n1�+�n2�−�2�indicates�these�particular�degrees�of�freedom�� (Remember�that�the�critical�value�can�be�found�based�on�the�knowledge�of�the�degrees�of� freedom�and�whether�it�is�a�one-�or�two-tailed�test�)�If�the�test�statistic�falls�into�either�criti- cal�region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�H0� For�a�one-tailed�test,�from�Table�A�2,�we�would�use�the�appropriate�α1�column�depend- ing�on�the�desired�level�of�significance�and�the�appropriate�row�depending�on�the�degrees� of�freedom��The�degrees�of�freedom�are�again�n1�+�n2�−�2��The�critical�value�is�denoted�as� +α1 1 2 2tn n+ − �for�the�alternative�hypothesis�H1:�μ1�−�μ2�>�0�(i�e�,�right-tailed�test�so�the�critical�
value�will�be�positive),�and�as�− + −α1 1 2 2tn n �for�the�alternative�hypothesis�H1:�μ1�−�μ2�<�0�(i�e�,� left-tailed�test�and�thus�a�negative�critical�value)��If�the�test�statistic�t�falls�into�the�appro- priate�critical�region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�H0� 7.2.1.1   Confidence Interval For�the�two-tailed�test,�a�(1�−�α)%�confidence�interval�(CI)�can�also�be�examined��The�CI�is� formed�as�follows: � ( ) ( )Y Y t sn n Y Y1 2 22 1 21 2− ± + − −α If�the�CI�contains�the�hypothesized�mean�difference�of�0,�then�the�conclusion�is�to�fail�to� reject�H0;�otherwise,�we�reject�H0��The�interpretation�and�use�of�CIs�is�similar�to�that�of�the� one-sample�test�described�in�Chapter�6��Imagine�we�take�100�random�samples�from�each�of� two�populations�and�construct�95%�CIs��Then�95%�of�the�CIs�will�contain�the�true�popula- tion�mean�difference�μ1�−�μ2�and�5%�will�not��In�short,�95%�of�similarly�constructed�CIs�will� contain�the�true�population�mean�difference� 7.2.1.2   Effect Size Next�we�extend�Cohen’s�(1988)�sample�measure�of�effect�size�d�from�Chapter�6�to�the�two� independent�samples�situation��Here�we�compute�d�as�follows: � d Y Y sp = −1 2 The�numerator�of�the�formula�is�the�difference�between�the�two�sample�means��The�denomi- nator� is� the� pooled� standard� deviation,� for� which� the� formula� was� presented� previously�� Thus,�the�effect�size�d�is�measured�in�standard�deviation�units,�and�again�we�use�Cohen’s� proposed�subjective�standards�for�interpreting�d:�small�effect�size,�d�=��2;�medium�effect�size,� d�=��5;�large�effect�size,�d�=��8��Conceptually,�this�is�similar�to�d�in�the�one-sample�case�from� Chapter�6��The�effect�size�d�is�considered�a�standardized�group�difference�type�of�effect�size� (Huberty,�2002)��There�are�other�types�of�effect�sizes,�however��Another�is�eta�squared�(η2),� 169Inferences About the Difference Between Two Means also�a�standardized�effect�size,�and�it�is�considered�a�relationship�type�of�effect�size�(Huberty,� 2002)��For�the�independent�t�test,�eta�squared�can�be�calculated�as�follows: � η2 2 2 2 2 1 2 2 = + = + + − t t df t t n n( ) Here� the� numerator� is� the� squared� t� test� statistic� value,� and� the� denominator� is� sum� of� the�squared�t�test�statistic�value�and�the�degrees�of�freedom��Values�for�eta�squared�range� from�0�to�+1�00,�where�values�closer�to�one�indicate�a�stronger�association��In�terms�of�what� this�effect�size�tells�us,�eta�squared�is�interpreted�as�the�proportion�of�variance�accounted� for�in�the�dependent�variable�by�the�independent�variable�and�indicates�the�degree�of�the� relationship�between�the�independent�and�dependent�variables��If�we�use�Cohen’s�(1988)� metric�for�interpreting�eta�squared:�small�effect�size,�η2�=��01;�moderate�effect�size,�η2�=��06;� large�effect�size,�η2�=��14� 7.2.1.3   Example of the Independent t Test Let�us�now�consider�an�example�where�the�independent�t�test�is�implemented��Recall�from� Chapter�6�the�basic�steps�for�hypothesis�testing�for�any�inferential�test:�(1)�State�the�null� and� alternative� hypotheses,� (2)� select� the� level� of� significance� (i�e�,� alpha,� α),� (3)� calculate� the�test�statistic�value,�and�(4)�make�a�statistical�decision�(reject�or�fail�to�reject�H0)��We�will� follow�these�steps�again�in�conducting�our�independent�t�test��In�our�example,�samples�of�8� female�and�12�male�middle-age�adults�are�randomly�and�independently�sampled�from�the� populations�of�female�and�male�middle-age�adults,�respectively��Each�individual�is�given� a�cholesterol�test�through�a�standard�blood�sample��The�null�hypothesis�to�be�tested�is�that� males�and�females�have�equal�cholesterol�levels��The�alternative�hypothesis�is�that�males� and�females�will�not�have�equal�cholesterol�levels,�thus�necessitating�a�nondirectional�or� two-tailed�test��We�will�conduct�our�test�using�an�alpha�level�of��05��The�raw�data�and�sum- mary�statistics�are�presented�in�Table�7�1��For�the�female�sample�(sample�1),�the�mean�and� variance�are�185�0000�and�364�2857,�respectively,�and�for�the�male�sample�(sample�2),�the� mean�and�variance�are�215�0000�and�913�6363,�respectively� In�order�to�compute�the�test�statistic�t,�we�first�need�to�determine�the�standard�error�of� the�difference�between�the�two�means��The�pooled�standard�deviation�is�computed�as � s n s n s n n p = − + − + − = − + −( ) ( ) ( ) . ( ) .1 1 2 2 2 2 1 2 1 1 2 8 1 364 2857 12 1 913 6363 88 12 2 26 4575 + − = . and�the�standard�error�of�the�difference�between�two�means�is�computed�as � s s n nY Y p1 2 1 1 26 4575 1 8 1 12 12 0752 1 2 − = + = + =. . The�test�statistic�t�can�then�be�computed�as � t Y Y sY Y = − = − = − − 1 2 1 2 185 215 12 0752 2 4844 . . 170 An Introduction to Statistical Concepts The�next�step�is�to�use�Table�A�2�to�determine�the�critical�values��As�there�are�18�degrees� of�freedom�(n1�+�n2�−�2�=�8�+�12�−�2�=�18),�using�α�=��05�and�a�two-tailed�or�nondirectional� test,�we�find�the�critical�values�using�the�appropriate�α2�column�to�be�+2�101�and�−2�101�� Since�the�test�statistic�falls�beyond�the�critical�values�as�shown�in�Figure�7�1,�we�therefore� reject�the�null�hypothesis�that�the�means�are�equal�in�favor�of�the�nondirectional�alterna- tive�that�the�means�are�not�equal��Thus,�we�conclude�that�the�mean�cholesterol�levels�for� males�and�females�are�not�equal�at�the��05�level�of�significance�(denoted�by�p�<��05)� The� 95%� CI� can� also� be� examined�� For� the� cholesterol� example,� the� CI� is� formed� as� follows: ( ) ( ) ( ) . ( . )Y Y t sn n Y Y1 2 22 1 2 1 2 185 215 2 101 12 0752 30 25− ± = − ± = − ±+ − −α .. ( . , . )3700 55 3700 4 6300= − − Table 7.1 Cholesterol�Data�for�Independent� Samples Female (Sample 1) Male (Sample 2) 205 245 160 170 170 180 180 190 190 200 200 210 210 220 165 230 240 250 260 185 X – 1�=�185�0000 X – 2�=�215�0000 s1 2 364 2857= . s22 913 6363= . FIGuRe 7.1 Critical� regions� and� test� statistics� for� the� cholesterol�example� α = .025 α = .025 +2.101 Critical value –2.101 Critical value –2.4884 t test statistic value –2.7197 Welch t΄ test statistic value 171Inferences About the Difference Between Two Means As�the�CI�does�not�contain�the�hypothesized�mean�difference�value�of�0,�then�we�would� again�reject�the�null�hypothesis�and�conclude�that�the�mean�gender�difference�in�choles- terol�levels�was�not�equal�to�0�at�the��05�level�of�significance�(p�<��05)��In�other�words,�there� is�evidence�to�suggest�that�the�males�and�females�differ,�on�average,�on�cholesterol�level�� More�specifically,�the�mean�cholesterol�level�for�males�is�greater�than�the�mean�cholesterol� level�for�females� The�effect�size�for�this�example�is�computed�as�follows: � d Y Y sp = − = − = −1 2 185 215 26 4575 1 1339 . . According�to�Cohen’s�recommended�subjective�standards,�this�would�certainly�be�a�rather� large�effect�size,�as�the�difference�between�the�two�sample�means�is�larger�than�one�stan- dard�deviation��Rather�than�d,�had�we�wanted�to�compute�eta�squared,�we�would�have�also� found�a�large�effect: � η2 2 2 2 2 2 4844 2 4844 18 2553= + = − − + = t t df ( . ) ( . ) ( ) . An� eta� squared� value� of� �26� indicates� a� large� relationship� between� the� independent� and� dependent�variables,�with�26%�of�the�variance�in�the�dependent�variable�(i�e�,�cholesterol� level)�accounted�for�by�the�independent�variable�(i�e�,�gender)� 7.2.1.4   Assumptions Let�us�return�to�the�assumptions�of�normality,�independence,�and�homogeneity�of�vari- ance�� For� the� independent� t� test,� the� assumption� of� normality� is� met� when� the� depen- dent�variable�is�normally�distributed�for�each�sample�(i�e�,�each�category�or�group)�of�the� independent�variable��The�normality�assumption�is�made�because�we�are�dealing�with�a� parametric�inferential�test��Parametric tests�assume�a�particular�underlying�theoretical� population� distribution,� in� this� case,� the� normal� distribution�� Nonparametric tests� do� not�assume�a�particular�underlying�theoretical�population�distribution� Conventional� wisdom� tells� us� the� following� about� nonnormality�� When� the� normality� assumption�is�violated�with�the�independent�t�test,�the�effects�on�Type�I�and�Type�II�errors� are�minimal�when�using�a�two-tailed�test�(e�g�,�Glass,�Peckham,�&�Sanders,�1972;�Sawilowsky�&� Blair,�1992)��When�using�a�one-tailed�test,�violation�of�the�normality�assumption�is�minimal� for� samples� larger� than� 10� and� disappears� for� samples� of� at� least� 20� (Sawilowsky� &� Blair,� 1992;� Tiku� &� Singh,� 1981)�� The� simplest� methods� for� detecting� violation� of� the� normality� assumption� are� graphical� methods,� such� as� stem-and-leaf� plots,� box� plots,� histograms,� or� Q–Q�plots,�statistical�procedures�such�as�the�Shapiro–Wilk�(S–W)�test�(1965),�and/or�skew- ness�and�kurtosis�statistics��However,�more�recent�research�by�Wilcox�(2003)�indicates�that� power� for� both� the� independent� t� and� Welch� t′� can� be� reduced� even� for� slight� departures� from�normality,�with�outliers�also�contributing�to�the�problem��Wilcox�recommends�several� procedures�not�readily�available�and�beyond�the�scope�of�this�text�(such�as�bootstrap�meth- ods,�trimmed�means,�medians)��Keep�in�mind,�though,�that�the�independent�t�test�is�fairly� robust�to�nonnormality�in�most�situations� The�independence�assumption�is�also�necessary�for�this�particular�test��For�the�indepen- dent� t� test,� the� assumption� of� independence� is� met� when� there� is� random� assignment� of� 172 An Introduction to Statistical Concepts individuals�to�the�two�groups�or�categories�of�the�independent�variable��Random�assignment� to�the�two�samples�being�studied�provides�for�greater�internal�validity—the�ability�to�state� with�some�degree�of�confidence�that�the�independent�variable�caused�the�outcome�(i�e�,�the� dependent�variable)��If�the�independence�assumption�is�not�met,�then�probability�statements� about� the� Type� I� and� Type� II� errors� will� not� be� accurate;� in� other� words,� the� probability� of�a�Type�I�or�Type�II�error�may�be�increased�as�a�result�of�the�assumption�not�being�met�� Zimmerman�(1997)�found�that�Type�I�error�was�affected�even�for�relatively�small�relations� or�correlations�between�the�samples�(i�e�,�even�as�small�as��10�or��20)��In�general,�the�assump- tion�can�be�met�by�(a)�keeping�the�assignment�of�individuals�to�groups�separate�through�the� design�of�the�experiment�(specifically�random�assignment—not�to�be�confused�with�random� selection),�and�(b)�keeping�the�individuals�separate�from�one�another�through�experimen- tal�control�so�that�the�scores�on�the�dependent�variable�Y�for�sample�1�do�not�influence�the� scores�for�sample�2��Zimmerman�also�stated�that�independence�can�be�violated�for�suppos- edly� independent� samples� due� to� some� type� of� matching� in� the� design� of� the� experiment� (e�g�,�matched�pairs�based�on�gender,�age,�and�weight)��If�the�observations�are�not�indepen- dent,�then�the�dependent�t�test,�discussed�further�in�the�chapter,�may�be�appropriate� Of�potentially�more�serious�concern�is�violation�of�the�homogeneity�of�variance�assump- tion��Homogeneity�of�variance�is�met�when�the�variances�of�the�dependent�variable�for�the� two�samples�(i�e�,�the�two�groups�or�categories�of�the�independent�variables)�are�the�same�� Research� has� shown� that� the� effect� of� heterogeneity� (i�e�,� unequal� variances)� is� minimal� when�the�sizes�of�the�two�samples,�n1�and�n2,�are�equal;�this�is�not�the�case�when�the�sample� sizes�are�not�equal��When�the�larger�variance�is�associated�with�the�smaller�sample�size� (e�g�,� group� 1� has� the� larger� variance� and� the� smaller� n),� then� the� actual� α� level� is� larger� than�the�nominal�α�level��In�other�words,�if�you�set�α�at��05,�then�you�are�not�really�conduct- ing�the�test�at�the��05�level,�but�at�some�larger�value��When�the�larger�variance�is�associated� with�the�larger�sample�size�(e�g�,�group�1�has�the�larger�variance�and�the�larger�n),�then�the� actual�α�level�is�smaller�than�the�nominal�α�level��In�other�words,�if�you�set�α�at��05,�then� you�are�not�really�conducting�the�test�at�the��05�level,�but�at�some�smaller�value� One�can�use�statistical�tests�to�detect�violation�of�the�homogeneity�of�variance�assump- tion,� although� the� most� commonly� used� tests� are� somewhat� problematic�� These� tests� include�Hartley’s�Fmax�test�(for�equal�n’s,�but�sensitive�to�nonnormality;�it�is�the�unequal� n’s�situation�that�we�are�concerned�with�anyway),�Cochran’s�test�(for�equal�n’s,�but�even� more� sensitive� to� nonnormality� than� Hartley’s� test;� concerned� with� unequal� n’s� situa- tion�anyway),�Levene’s�test�(for�equal�n’s,�but�sensitive�to�nonnormality;�concerned�with� unequal� n’s� situation� anyway)� (available� in� SPSS),� the� Bartlett� test� (for� unequal� n’s,� but� very� sensitive� to� nonnormality),� the� Box–Scheffé–Anderson� test� (for� unequal� n’s,� fairly� robust�to�nonnormality),�and�the�Browne–Forsythe�test�(for�unequal�n’s,�more�robust�to� nonnormality�than�the�Box–Scheffé–Anderson�test�and�therefore�recommended)��When� the�variances�are�unequal�and�the�sample�sizes�are�unequal,�the�usual�method�to�use�as� an�alternative�to�the�independent�t�test�is�the�Welch�t′�test�described�in�the�next�section�� Inferential� tests� for� evaluating� homogeneity� of� variance� are� more� fully� considered� in� Chapter�9� 7.2.2   Welch t′ Test The�Welch�t′�test�is�usually�appropriate�when�the�population�variances�are�unequal�and� the� sample� sizes� are� unequal�� The� Welch� t′� test� assumes� that� the� scores� on� the� depen- dent� variable� Y� (a)� are� normally� distributed� in� each� of� the� two� populations� and� (b)� are� independent� 173Inferences About the Difference Between Two Means The�test�statistic�is�known�as�t′�and�is�denoted�by � ′ = − = − + = − +− t Y Y s Y Y s s Y Y s n s n Y Y Y Y 1 2 1 2 2 2 1 2 1 2 1 2 2 2 1 2 1 2 where Y – 1�and�Y – 2�are�the�means�for�samples�1�and�2,�respectively sY1 2 �and� sY2 2 �are�the�variance�errors�of�the�means�for�samples�1�and�2,�respectively Here�we�see�that�the�denominator�of�this�test�statistic�is�conceptually�similar�to�the�one- sample� t� and� the� independent� t� test� statistics�� The� variance� errors� of� the� mean� are� com- puted�for�each�group�by � s s nY1 2 1 2 1 = � s s nY2 2 2 2 2 = where� s1 2 �and� s2 2 �are�the�sample�variances�for�groups�1�and�2,�respectively��The�square�root� of�the�variance�error�of�the�mean�is�the�standard�error�of�the�mean�(i�e�,� sY1 �and� sY2 )��Thus,� we�see�that�rather�than�take�a�pooled�or�weighted�average�of�the�two�sample�variances�as� we�did�with�the�independent�t�test,�the�two�sample�variances�are�treated�separately�with� the�Welch�t′�test� The�test�statistic�t′�is�then�compared�to�a�critical�value(s)�from�the�t�distribution�in�Table� A�2��We�again�use�the�appropriate�α�column�depending�on�the�desired�level�of�significance� and�whether�the�test�is�one-�or�two-tailed�(i�e�,�α1�and�α2),�and�the�appropriate�row�for�the� degrees�of�freedom��The�degrees�of�freedom�for�this�test�are�a�bit�more�complicated�than� for� the� independent� t� test�� The� degrees� of� freedom� are� adjusted� from� n1� +� n2� −� 2� for� the� independent�t�test�to�the�following�value�for�the�Welch�t′�test: � ν = +( ) ( ) − + ( ) − s s s n s n Y Y Y Y 1 2 1 2 2 2 2 2 2 1 2 2 21 1 The� degrees� of� freedom� ν� are� approximated� by� rounding� to� the� nearest� whole� number� prior� to� using� the� table�� If� the� test� statistic� falls� into� a� critical� region,� then� we� reject� H0;� otherwise,�we�fail�to�reject�H0� For�the�two-tailed�test,�a�(1�−�α)%�CI�can�also�be�examined��The�CI�is�formed�as�follows: � ( ) ( )Y Y t sY Y1 2 2 1 2− ± −α ν If�the�CI�contains�the�hypothesized�mean�difference�of�0,�then�the�conclusion�is�to�fail�to� reject�H0;�otherwise,�we�reject�H0��Thus,�interpretation�of�this�CI�is�the�same�as�with�the� independent�t�test� 174 An Introduction to Statistical Concepts Consider� again� the� example� cholesterol� data� where� the� sample� variances� were� some- what� different� and� the� sample� sizes� were� different�� The� variance� errors� of� the� mean� are� computed�for�each�sample�as�follows: � s s nY1 2 1 2 1 364 2857 8 45 5357= = = . . � s s nY2 2 2 2 2 913 6363 12 76 1364= = = . . The�t′�test�statistic�is�computed�as � ′ = − + = − + = − = −t Y Y s sY Y 1 2 2 2 1 2 185 215 45 5357 76 1364 30 11 0305 2 719 . . . . 77 Finally,�the�degrees�of�freedom�ν�are�determined�to�be � ν = +( ) ( ) − + ( ) − = +s s s n s n Y Y Y Y 1 2 1 2 2 2 2 2 2 1 2 2 2 2 1 1 45 5357 76 1364 4 ( . . ) ( 55 5357 8 1 76 1364 12 1 17 98382 2. ) ( . ) . − + − = which� is� rounded� to� 18,� the� nearest� whole� number�� The� degrees� of� freedom� remain� 18� as� they�were�for�the�independent�t�test,�and�thus,�the�critical�values�are�still�+2�101�and�−2�101�� As�the�test�statistic�falls�beyond�the�critical�values�as�shown�in�Figure�7�1,�we�therefore�reject� the�null�hypothesis�that�the�means�are�equal�in�favor�of�the�alternative�that�the�means�are� not�equal��Thus,�as�with�the�independent�t�test,�with�the�Welch�t′�test,�we�conclude�that�the� mean�cholesterol�levels�for�males�and�females�are�not�equal�at�the��05�level�of�significance��In� this�particular�example,�then,�we�see�that�the�unequal�sample�variances�and�unequal�sample� sizes�did�not�alter�the�outcome�when�comparing�the�independent�t�test�result�with�the�Welch� t′�test�result��However,�note�that�the�results�for�these�two�tests�may�differ�with�other�data� Finally,�the�95%�CI�can�be�examined��For�the�example,�the�CI�is�formed�as�follows: � ( ) ( ) ( ) . ( . ) .Y Y t sY Y1 2 2 1 2 185 215 2 101 11 0305 30 23 1751− ± = − ± = − ± =−α ν (( . , . )− −53 1751 6 8249 As�the�CI�does�not�contain�the�hypothesized�mean�difference�value�of�0,�then�we�would� again� reject� the� null� hypothesis� and� conclude� that� the� mean� gender� difference� was� not� equal�to�0�at�the��05�level�of�significance�(p�<��05)� 7.2.3   Recommendations The�following�four�recommendations�are�made�regarding�the�two�independent�samples� case�� Although� there� is� no� total� consensus� in� the� field,� our� recommendations� take� into� account,� as� much� as� possible,� the� available� research� and� statistical� software�� First,� if� the� normality� assumption� is� satisfied,� the� following� recommendations� are� made:� (a)� the� 175Inferences About the Difference Between Two Means independent�t�test�is�recommended�when�the�homogeneity�of�variance�assumption�is�met;� (b)�the�independent�t�test�is�recommended�when�the�homogeneity�of�variance�assumption� is�not�met�and�when�there�are�an�equal�number�of�observations�in�the�samples;�and�(c)�the� Welch�t′�test�is�recommended�when�the�homogeneity�of�variance�assumption�is�not�met� and�when�there�are�an�unequal�number�of�observations�in�the�samples� Second,�if�the�normality�assumption�is�not�satisfied,�the�following�recommendations�are� made:� (a)� if� the� homogeneity� of� variance� assumption� is� met,� then� the� independent� t� test� using�ranked�scores�(Conover�&�Iman,�1981),�rather�than�raw�scores,�is�recommended;�and� (b)�if�homogeneity�of�variance�assumption�is�not�met,�then�the�Welch�t′�test�using�ranked� scores�is�recommended,�regardless�of�whether�there�are�an�equal�number�of�observations� in�the�samples��Using�ranked�scores�means�you�rank�order�the�observations�from�highest� to�lowest�regardless�of�group�membership,�then�conduct�the�appropriate�t�test�with�ranked� scores�rather�than�raw�scores� Third,�the�dependent�t�test�is�recommended�when�there�is�some�dependence�between� the� groups� (e�g�,� matched� pairs� or� the� same� individuals� measured� on� two� occasions),� as� described� later� in� this� chapter�� Fourth,� the� nonparametric� Mann-Whitney-Wilcoxon� test� is�not�recommended��Among�the�disadvantages�of�this�test�are�that�(a)�the�critical�values� are�not�extensively�tabled,�(b)�tied�ranks�can�affect�the�results�and�no�optimal�procedure� has�yet�been�developed�(Wilcox,�1996),�and�(c)�Type�I�error�appears�to�be�inflated�regard- less� of� the� status� of� the� assumptions� (Zimmerman,� 2003)�� For� these� reasons,� the� Mann– Whitney–Wilcoxon� test� is� not� further� described� here�� Note� that� most� major� statistical� packages,�including�SPSS,�have�options�for�conducting�the�independent�t�test,�the�Welch�t′� test,�and�the�Mann-Whitney-Wilcoxon�test��Alternatively,�one�could�conduct�the�Kruskal– Wallis�nonparametric�one-factor�ANOVA,�which�is�also�based�on�ranked�data,�and�which� is�appropriate�for�comparing�the�means�of�two�or�more�independent�groups��This�test�is� considered�more�fully�in�Chapter�11��These�recommendations�are�summarized�in�Box�7�1� STOp aNd ThINk bOx 7.1 Recommendations�for�the�Independent�and�Dependent�Samples�Tests�Based�on�Meeting� or Violating�the�Assumption�of�Normality Assumption Independent Samples Tests Dependent Samples Tests Normality�is�met •��Use�the�independent�t�test�when� homogeneity�of�variances�is�met •�Use�the�dependent�t�test •��Use�the�independent�t�test�when� homogeneity�of�variances�is�not�met,�but� there�are�equal�sample�sizes�in�the�groups •��Use�the�Welch�t′�test�when�homogeneity�of� variances�is�not�met�and�there�are�unequal� sample�sizes�in�the�groups Normality�is�not�met •��Use�the�independent�t�test�with�ranked� scores�when�homogeneity�of�variances�is� met •��Use�the�Welch�t′�test�with�ranked�scores� when�homogeneity�of�variances�is�not�met,� regardless�of�equal�or�unequal�sample� sizes�in�the�groups •��Use�the�Kruskal–Wallis�nonparametric� procedure •��Use�the�dependent�t�test�with�ranked� scores,�or�alternative�procedures� including�bootstrap�methods,� trimmed�means,�medians,�or�Stein’s� method •��Use�the�Wilcoxon�signed�ranks�test� when�data�are�both�nonnormal�and� have�extreme�outliers •��Use�the�Friedman�nonparametric� procedure 176 An Introduction to Statistical Concepts 7.3 Inferences About Two Dependent Means In�this�section,�two�inferential�tests�of�the�difference�between�two�dependent�means�are� described,� the� dependent� t� test� and� briefly� the� Wilcoxon� signed� ranks� test�� The� section� concludes�with�a�list�of�recommendations� 7.3.1   dependent t Test As�you�may�recall,�the�dependent�t�test�is�appropriate�to�use�when�there�are�two�samples� that�are�dependent—the�individuals�in�sample�1�have�some�relationship�to�the�individuals� in�sample�2��First,�we�need�to�determine�the�conditions�under�which�the�dependent�t�test�is� appropriate��In�part,�this�has�to�do�with�the�statistical�assumptions�associated�with�the�test� itself—that�is,�(a)�normality�of�the�distribution�of�the�differences�of�the�dependent�variable� Y,�(b)�homogeneity�of�variance�of�the�two�populations,�and�(c)�independence�of�the�obser- vations�within�each�sample��Like�the�independent�t�test,�the�dependent�t�test�is�reasonably� robust�to�violation�of�the�normality�assumption,�as�we�show�later��Because�this�is�a�test�of� means,�the�dependent�variable�must�be�measured�on�an�interval�or�ratio�scale��For�example,� the� same� individuals� may� be� measured� at� two� points� in� time� on� the� same� interval-scaled� pretest�and�posttest,�or�some�matched�pairs�(e�g�,�twins�or�husbands–wives)�may�be�assessed� with�the�same�ratio-scaled�measure�(e�g�,�weight�measured�in�pounds)� Although�there�are�several�methods�for�computing�the�test�statistic�t,�the�most�direct�method� and�the�one�most�closely�aligned�conceptually�with�the�one-sample�t�test�is�as�follows: � t d sd = where d – �is�the�mean�difference sd–�is�the�standard�error�of�the�mean�difference Conceptually,� this� test� statistic� looks� just� like� the� one-sample� t� test� statistic,� except� now� that�the�notation�has�been�changed�to�denote�that�we�are�dealing�with�difference�scores� rather�than�raw�scores� The�standard�error�of�the�mean�difference�is�computed�by � s s n d d= where sd�is�the�standard�deviation�of�the�difference�scores�(i�e�,�like�any�other�standard�devia- tion,�only�this�one�is�computed�from�the�difference�scores�rather�than�raw�scores) n�is�the�total�number�of�pairs Conceptually,�this�standard�error�looks�just�like�the�standard�error�for�the�one-sample�t�test�� If�we�were�doing�hand�computations,�we�would�compute�a�difference�score�for�each�pair�of� scores�(i�e�,�Y1�−�Y2)��For�example,�if�sample�1�were�wives�and�sample�2�were�their�husbands,� then�we�calculate�a�difference�score�for�each�couple��From�this�set�of�difference�scores,�we� then�compute�the�mean�of�the�difference�scores�d – �and�standard�deviation�of�the�difference� 177Inferences About the Difference Between Two Means scores�sd��This�leads�us�directly�into�the�computation�of�the�t�test�statistic��Note�that�although� there�are�n�scores�in�sample�1,�n�scores�in�sample�2,�and�thus�2n�total�scores,�there�are�only�n� difference�scores,�which�is�what�the�analysis�is�actually�based�upon� The�test�statistic�t�is�then�compared�with�a�critical�value(s)�from�the�t�distribution��For�a� two-tailed�test,�from�Table�A�2,�we�would�use�the�appropriate�α2�column�depending�on�the� desired� level� of� significance� and� the� appropriate� row� depending� on� the� degrees� of� free- dom��The�degrees�of�freedom�for�this�test�are�n�−�1��Conceptually,�we�lose�one�degree�of� freedom�from�the�number�of�differences�(or�pairs)�because�we�are�estimating�the�popula- tion�variance�(or�standard�deviation)�of�the�difference��Thus,�there�is�one�restriction�along� the� lines� of� our� discussion� of� degrees� of� freedom� in� Chapter� 6�� The� critical� values� are� denoted�as�± −α2 1tn ��The�subscript,�α2,�of�the�critical�values�reflects�the�fact�that�this�is�a�two- tailed�test,�and�the�subscript�n�−�1�indicates�the�degrees�of�freedom��If�the�test�statistic�falls� into�either�critical�region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�H0� For�a�one-tailed�test,�from�Table�A�2,�we�would�use�the�appropriate�α1�column�depending� on�the�desired�level�of�significance�and�the�appropriate�row�depending�on�the�degrees�of� freedom��The�degrees�of�freedom�are�again�n�−�1��The�critical�value�is�denoted�as�+ −α1 1tn �for� the�alternative�hypothesis�H1:�μ1�−�μ2�>�0�and�as� − −α1 1tn �for�the�alternative�hypothesis�H1:�
μ1�−�μ2�<�0��If�the�test�statistic�t�falls�into�the�appropriate�critical�region,�then�we�reject�H0;� otherwise,�we�fail�to�reject�H0� 7.3.1.1   Confidence Interval For�the�two-tailed�test,�a�(1�−�α)%�CI�can�also�be�examined��The�CI�is�formed�as�follows: � d t sn d± −α 2 1( ) If�the�CI�contains�the�hypothesized�mean�difference�of�0,�then�the�conclusion�is�to�fail�to� reject�H0;�otherwise,�we�reject�H0��The�interpretation�of�these�CIs�is�the�same�as�those�previ- ously�discussed�for�the�one-sample�t�and�the�independent�t� 7.3.1.2   Effect Size The�effect�size�can�be�measured�using�Cohen’s�(1988)�d�computed�as�follows: � Cohen d d sd = where�Cohen’s�d�is�simply�used�to�distinguish�among�the�various�uses�and�slight�differ- ences� in� the� computation� of� d�� Interpretation� of� the� value� of� d� would� be� the� same� as� for� the�one-sample�t�and�the�independent�t�previously�discussed—specifically,�the�number�of� standard�deviation�units�for�which�the�mean(s)�differ(s)� 7.3.1.3   Example of the Dependent t Test Let�us�consider�an�example�for�purposes�of�illustrating�the�dependent�t�test��Ten�young� swimmers�participated�in�an�intensive�2�month�training�program��Prior�to�the�program,� each�swimmer�was�timed�during�a�50�meter�freestyle�event��Following�the�program,�the� 178 An Introduction to Statistical Concepts same�swimmers�were�timed�in�the�50�meter�freestyle�event�again��This�is�a�classic�pretest- posttest�design��For�illustrative�purposes,�we�will�conduct�a�two-tailed�test��However,�a� case� might� also� be� made� for� a� one-tailed� test� as� well,� in� that� the� coach� might� want� to� see�improvement�only��However,�conducting�a�two-tailed�test�allows�us�to�examine�the� CI�for�purposes�of�illustration��The�raw�scores,�the�difference�scores,�and�the�mean�and� standard�deviation�of�the�difference�scores�are�shown�in�Table�7�2��The�pretest�mean�time� was�64�seconds�and�the�posttest�mean�time�was�59�seconds� To�determine�our�test�statistic�value,�t,�first�we�compute�the�standard�error�of�the�mean� difference�as�follows: � s s n d d= = = 2 1602 10 0 6831 . . Next,�using�this�value�for�the�denominator,�the�test�statistic�t�is�then�computed�as�follows: � t d sd = = = 5 0 6831 7 3196 . . We�then�use�Table�A�2�to�determine�the�critical�values��As�there�are�nine�degrees�of�free- dom�(n�−�1�=�10�−�1�=�9),�using�α�=��05�and�a�two-tailed�or�nondirectional�test,�we�find�the� critical�values�using�the�appropriate�α2�column�to�be�+2�262�and�−2�262��Since�the�test�sta- tistic�falls�beyond�the�critical�values,�as�shown�in�Figure�7�2,�we�reject�the�null�hypothesis� that�the�means�are�equal�in�favor�of�the�nondirectional�alternative�that�the�means�are�not� equal��Thus,�we�conclude�that�the�mean�swimming�performance�changed�from�pretest�to� posttest�at�the��05�level�of�significance�(p�<��05)� The�95%�CI�is�computed�to�be�the�following: � d t sn d± = ± = ± =−α 2 1 5 2 262 0 6831 5 1 5452 3 4548 6 5452( ) . ( . ) . ( . , . ) Table 7.2 Swimming�Data�for�Dependent�Samples Swimmer Pretest Time (in Seconds) Posttest Time (in Seconds) Difference (d) 1 58 54 (i�e�,�58�−�54)�=�4 2 62 57 5 3 60 54 6 4 61 56 5 5 63 61 2 6 65 59 6 7 66 64 2 8 69 62 7 9 64 60 4 10 72 63 9 d – �=�5�0000 sd�=�2�1602 179Inferences About the Difference Between Two Means As�the�CI�does�not�contain�the�hypothesized�mean�difference�value�of�0,�we�would�again� reject�the�null�hypothesis�and�conclude�that�the�mean�pretest-posttest�difference�was�not� equal�to�0�at�the��05�level�of�significance�(p�<��05)� The�effect�size�is�computed�to�be�the�following: � Cohen d d sd = = = 5 2 1602 2 3146 . . which� is� interpreted� as� there� is� approximately� a� two� and� one-third� standard� deviation� difference�between�the�pretest�and�posttest�mean�swimming�times,�a�very�large�effect�size� according�to�Cohen’s�subjective�standard� 7.3.1.4   Assumptions Let� us� return� to� the� assumptions� of� normality,� independence,� and� homogeneity� of� vari- ance�� For� the� dependent� t� test,� the� assumption� of� normality� is� met� when� the� difference� scores� are� normally� distributed�� Normality� of� the� difference� scores� can� be� examined� as� discussed� previously—graphical� methods� (such� as� stem-and-leaf� plots,� box� plots,� histo- grams,�and/or�Q–Q�plots),�statistical�procedures�such�as�the�S–W�test�(1965),�and/or�skew- ness�and�kurtosis�statistics��The�assumption�of�independence�is�met�when�the�cases�in�our� sample�have�been�randomly�selected�from�the�population��If�the�independence�assump- tion�is�not�met,�then�probability�statements�about�the�Type�I�and�Type�II�errors�will�not�be� accurate;�in�other�words,�the�probability�of�a�Type�I�or�Type�II�error�may�be�increased�as�a� result�of�the�assumption�not�being�met��Homogeneity�of�variance�refers�to�equal�variances� of�the�two�populations��In�later�chapters,�we�will�examine�procedures�for�formally�testing� for�equal�variances��For�the�moment,�if�the�ratio�of�the�smallest�to�largest�sample�variance� is� within� 1:4,� then� we� have� evidence� to� suggest� the� assumption� of� homogeneity� of� vari- ances�is�met��Research�has�shown�that�the�effect�of�heterogeneity�(i�e�,�unequal�variances)� is�minimal�when�the�sizes�of�the�two�samples,�n1�and�n2,�are�equal,�as�is�the�case�with�the� dependent�t�test�by�definition�(unless�there�are�missing�data)� α = .025 –2.262 Critical value +2.262 Critical value +7.3196 t test statistic value α = .025 FIGuRe 7.2 Critical� regions� and� test� statistic� for� the� swimming�example� 180 An Introduction to Statistical Concepts 7.3.2   Recommendations The� following� three� recommendations� are� made� regarding� the� two� dependent� samples� case��First,�the�dependent�t�test�is�recommended�when�the�normality�assumption�is�met�� Second,�the�dependent�t�test�using�ranks�(Conover�&�Iman,�1981)�is�recommended�when� the�normality�assumption�is�not�met��Here�you�rank�order�the�difference�scores�from�high- est�to�lowest,�then�conduct�the�test�on�the�ranked�difference�scores�rather�than�on�the�raw� difference�scores��However,�more�recent�research�by�Wilcox�(2003)�indicates�that�power�for� the�dependent�t�can�be�reduced�even�for�slight�departures�from�normality��Wilcox�recom- mends� several� procedures� not� readily� available� and� beyond� the� scope� of� this� text� (boot- strap�methods,�trimmed�means,�medians,�Stein’s�method)��Keep�in�mind,�though,�that�the� dependent�t�test�is�fairly�robust�to�nonnormality�in�most�situations� Third,�the�nonparametric�Wilcoxon�signed�ranks�test�is�recommended�when�the�data� are�nonnormal�with�extreme�outliers�(one�or�a�few�observations�that�behave�quite�differ- ently�from�the�rest)��However,�among�the�disadvantages�of�this�test�are�that�(a)�the�criti- cal�values�are�not�extensively�tabled�and�two�different�tables�exist�depending�on�sample� size,� and� (b)� tied� ranks� can� affect� the� results� and� no� optimal� procedure� has� yet� been� developed� (Wilcox,� 1996)�� For� these� reasons,� the� details� of� the� Wilcoxon� signed� ranks� test� are� not� described� here�� Note� that� most� major� statistical� packages,� including� SPSS,� include�options�for�conducting�the�dependent�t�test�and�the�Wilcoxon�signed�ranks�test�� Alternatively,�one�could�conduct�the�Friedman�nonparametric�one-factor�ANOVA,�also� based�on�ranked�data,�and�which�is�appropriate�for�comparing�two�or�more�dependent� sample�means��This�test�is�considered�more�fully�in�Chapter�15��These�recommendations� are�summarized�in�Box�7�1� 7.4 SPSS Instructions�for�determining�the�independent�samples�t�test�using�SPSS�are�presented�first�� This� is� followed� by� additional� steps� for� examining� the� assumption� of� normality� for� the� independent�t�test��Next,�instructions�for�determining�the�dependent�samples�t�test�using� SPSS�are�presented�and�are�then�followed�by�additional�steps�for�examining�the�assump- tions�of�normality�and�homogeneity� Independent t Test Step 1:� In� order� to� conduct� an� independent� t� test,� your� dataset� needs� to� include� a� dependent�variable�Y�that�is�measured�on�an�interval�or�ratio�scale�(e�g�,�cholesterol),�as� well�as�a�grouping�variable�X�that�is�measured�on�a�nominal�or�ordinal�scale�(e�g�,�gen- der)��For�the�grouping�variable,�if�there�are�more�than�two�categories�available,�only�two� categories�can�be�selected�when�running�the�independent�t�test�(the�ANOVA�is�required� for� examining� more� than� two� categories)�� To� conduct� the� independent� t� test,� go� to� the� “Analyze”�in�the�top�pulldown�menu,�then�select�“Compare Means,”�and�then�select� “Independent-Samples T Test.”�Following�the�screenshot�(step�1)�as�follows�pro- duces�the�“Independent-Samples T Test”�dialog�box� 181Inferences About the Difference Between Two Means A B C Independent t test: Step 1 Step 2:�Next,�from�the�main�“Independent-Samples T Test”�dialog�box,�click�the� dependent�variable�(e�g�,�cholesterol)�and�move�it�into�the�“Test Variable”�box�by�click- ing�on�the�arrow�button��Next,�click�the�grouping�variable�(e�g�,�gender)�and�move�it�into� the�“Grouping Variable”� box� by� clicking� on� the� arrow� button�� You� will� notice� that� there�are�two�question�marks�next�to�the�name�of�your�grouping�variable��This�is�SPSS�let- ting�you�know�that�you�need�to�define�(numerically)�which�two�categories�of�the�grouping� variable�you�want�to�include�in�your�analysis��To�do�that,�click�on�“Define Groups.” Clicking on “Options” will allow you to define a confidence interval percentage. e default is 95% (corresponding to an alpha of .05). Select the variable of interest from the list on the left and use the arrow to move to the “Test Variable” box on the right. Clicking on “Define Groups” will allow you to define the two numeric values of the categories for the independent variable. Independent t test: Step 2 182 An Introduction to Statistical Concepts Step 3:�From�the�“Define Groups”�dialog�box,�enter�the�numeric�value�designated�for� each�of�the�two�categories�or�groups�of�your�independent�variable��Where�it�says�“Group 1,”�type�in�the�value�designated�for�your�first�group�(e�g�,�1,�which�in�our�case�indicated� that�the�individual�was�a�female),�and�where�it�says�“Group 2,”�type�in�the�value�desig- nated�for�your�second�group�(e�g�,�2,�in�our�example,�a�male)�(see�step�3�screenshot)� Independent t test: Step 3 Click�on�“Continue”�to�return�to�the�original�dialog�box�(see�step�2�screenshot)�and�then� click�on�“OK”�to�run�the�analysis��The�output�is�shown�in�Table�7�3� Changing the alpha level (optional):�The�default�alpha�level�in�SPSS�is��05,�and� thus,�the�default�corresponding�CI�is�95%��If�you�wish�to�test�your�hypothesis�at�an�alpha� level�other�than��05�(and�thus�obtain�CIs�other�than�95%),�click�on�the�“Options”�button� located�in�the�top�right�corner�of�the�main�dialog�box�(see�step�2�screenshot)��From�here,� the�CI�percentage�can�be�adjusted�to�correspond�to�the�alpha�level�at�which�you�wish�your� hypothesis�to�be�tested�(see�Chapter�6�screenshot�step�3)��(For�purposes�of�this�example,�the� test�has�been�generated�using�an�alpha�level�of��05�) Interpreting the output:�The�top�table�provides�various�descriptive�statistics�for� each� group,� while� the� bottom� box� gives� the� results� of� the� requested� procedure�� There� you� see� the� following� three� different� inferential� tests� that� are� automatically� provided:� (1)� Levene’s� test� of� the� homogeneity� of� variance� assumption� (the� first� two� columns� of� results),�(2)�the�independent�t�test�(which�SPSS�calls�“Equal variances assumed”)� (the�top�row�of�the�remaining�columns�of�results),�and�(3)�the�Welch�t′�test�(which�SPSS� calls�“Equal variances not assumed”)�(the�bottom�row�of�the�remaining�columns� of�results)� The� first� interpretation� that� must� be� made� is� for� Levene’s� test� of� equal� variances�� The� assumption� of� equal� variances� is� met� when� Levene’s� test� is� not� statistically� significant�� We� can� determine� statistical� significance� by� reviewing� the� p� value� for� the� F� test�� In� this� example,�the�p�value�is��090,�greater�than�our�alpha�level�of��05�and�thus�not�statistically� significant��Levene’s�test�tells�us�that�the�variance�for�cholesterol�level�for�males�is�not�sta- tistically�significantly�different�than�the�variance�for�cholesterol�level�for�females��Having� met� the� assumption� of� equal� variances,� the� values� in� the� rest� of� the� table� will� be� drawn� from�the�row�labeled�“Equal Variances Assumed.”�Had�we�not�met�the�assumption� of�equal�variances�(p�<�α),�we�would�report�Welch�t′�for�which�the�statistics�are�presented� on�the�row�labeled�“Equal Variances Not Assumed.” After� determining� that� the� variances� are� equal,� the� next� step� is� to� examine� the� results�of�the�independent�t�test��The�t�test�statistic�value�is�−2�4842,�and�the�associated� p�value�is��023��Since�p�is�less�than�α,�we�reject�the�null�hypothesis��There�is�evidence�to� suggest�that�the�mean�cholesterol�level�for�males�is�different�than�the�mean�cholesterol� level�for�females� 183Inferences About the Difference Between Two Means Table 7.3 SPSS�Results�for�Independent�t�Test Group Statistics Gender N Mean Std. Deviation Std. Error Mean Female 8 185.0000 19.08627 6.74802Cholesterol level Male 12 215.0000 30.22642 8.72562 Independent Samples Test Levene's Test for Equality of Variances t-Test for Equality of Means 95% Confidence Interval of the Difference F Sig. t df Sig. (2-Tailed) Mean Difference Std. Error Difference Lower Upper 3.201 .090 –2.484 .023 –30.00000 12.07615 –55.37104 –2.720 18 17.984 .014 –30.00000 11.03051 –53.17573 –4.62896 – 6.82427 “t” is the t test statistic value. �e t value in the top row is used when the assumption of equal variances has been met and is calculated as: The t value in the bottom row is the Welch t΄and is used when the assumption of equal variances has not been met. �e table labeled “Group Statistics” provides basic descriptive statistics for the dependent variable by group. SPSS reports the95% confidence interval of the difference. This is interpreted to mean that 95% of the CIs generated across samples will contain the true population mean difference of 0. �e F test (and p value) of Levene’s Test for Equality of Variances is reviewed to determine if the equal variances assumption has been met. �e result of this test determines which row of statistics to utilize. In this case, we meet the assumption and use the statistics reported in the top row. “Sig.” is the observed p value for the independent t test. It is interpreted as: there is less than a 3% probability of a sample mean difference of –30 or greater occurring by chance if the null hypothesis is really true (i.e., if the population mean difference is 0). �e mean difference is simply the difference between the sample mean cholesterol values. In other words, 185 – 215 = – 30 The standard error of the mean difference is calculated as: =–sY1 sp n1 n2 11 + df are the degrees of freedom. For the independent samples t test, they are calculated as Equal variances assumed Cholesterol level Equal variances not assumed Y2 n1 + n2 – 2. –2.484 12.075 Y1 – Y2t sY1 – Y2 185 – 215 === 184 An Introduction to Statistical Concepts Using “Explore” to Examine Normality of Distribution of Dependent Variable by Categories of Independent Variable Generating normality evidence: As�alluded�to�earlier�in�the�chapter,�understanding� the�distributional�shape,�specifically�the�extent�to�which�normality�is�a�reasonable�assump- tion,�is�important��For�the�independent�t�test,�the�distributional�shape�for�the�dependent�vari- able�should�be�normally�distributed�for�each�category/group�of�the�independent�variable��As� with�our�one-sample�t�test,�we�can�again�use�“Explore”�to�examine�the�extent�to�which�the� assumption�of�normality�is�met� The� general� steps� for� accessing�“Explore”� have� been� presented� in� previous� chapters� (e�g�,�Chapter�4),�and�they�will�not�be�reiterated�here��Normality�of�the�dependent�variable� must�be�examined�for�each�category�of�the�independent�variable,�so�we�must�tell�SPSS�to� split�the�examination�of�normality�by�group��Click�the�dependent�variable�(e�g�,�cholesterol)� and�move�it�into�the�“Dependent List”�box�by�clicking�on�the�arrow�button��Next,�click� the�grouping�variable�(e�g�,�gender)�and�move�it�into�the�“Factor List”�box�by�clicking� on�the�arrow�button��The�procedures�for�selecting�normality�statistics�were�presented�in� Chapter� 6,� and� they� remain� the� same� here:� click� on�“Plots”� in� the� upper� right� corner�� Place� a� checkmark� in� the� boxes� for� “Normality plots with tests”� and� also� for� “Histogram.”�Then�click�“Continue”�to�return�to�the�main�“Explore”�dialog�screen�� From�there,�click�“OK”�to�generate�the�output� Select the dependent variable from the list on the left and use the arrow to move to the “Dependent List” box on the right and the independent variable from the list on the left and use the arrow to move to the “Factor List” box on the right. �en click on “Plots.” Generating normality evidence by group Interpreting normality evidence:�We�have�already�developed�a�good�under- standing�of�how�to�interpret�some�forms�of�evidence�of�normality�including�skewness� 185Inferences About the Difference Between Two Means and� kurtosis,� histograms,� and� boxplots�� As� we� examine� the� “Descriptives”� table,� we� see� the� output� for� the� cholesterol� statistics� is� separated� for� male� (top� portion)� and� female�(bottom�portion)� Mean 95% Con�dence interval for mean 5% Trimmed mean Median Variance Std. deviation Minimum Maximum Range Interquartile range Skewness Kurtosis Mean 95% Con�dence interval for mean 5% Trimmed mean Median Variance Std. deviation Minimum Maximum Range Interquartile range Skewness Kurtosis Female Lower bound Upper bound Lower bound Upper bound Cholesterol level Male Gender Descriptives Statistic Std. Error 215.0000 195.7951 234.2049 215.0000 215.0000 913.636 170.00 260.00 90.00 57.50 .000 –1.446 185.0000 169.0435 200.9565 185.0000 185.0000 364.286 19.08627 160.00 210.00 50.00 37.50 .000 –1.790 1.232 6.74802 .637 30.22642 8.72562 .752 1.481 The� skewness� statistic� of� cholesterol� level� for� the� males� is� �000� and� kurtosis� is� −1�446—both� within� the� range� of� an� absolute� value� of� 2�0,� suggesting� some� evidence� of� normality� of� the� dependent� variable� for� males�� Evidence� of� normality� for� the� dis- tributional�shape�of�cholesterol�level�for�females�is�also�present:�skewness�=��000�and� kurtosis�=�−1�790� The�histogram�of�cholesterol�level�for�males�is�not�exactly�what�most�researchers�would� consider� a� classic� normally� shaped� distribution�� Although� the� histogram� of� cholesterol� level�for�females�is�not�presented�here,�it�follows�a�similar�distributional�shape� 186 An Introduction to Statistical Concepts 2.0 1.5 1.0 Fr eq ue nc y 0.5 0.0 160.00 180.00 200.00 220.00 Cholesterol level 240.00 260.00 Histogram for group = Male Mean = 215.00 Std. dev. = 30.226 N = 12 There�are�a�few�other�statistics�that�can�be�used�to�gauge�normality�as�well��Our�formal� test�of�normality,�the�Shapiro–Wilk�test�(SW)�(Shapiro�&�Wilk,�1965),�provides�evidence�of� the�extent�to�which�our�sample�distribution�is�statistically�different�from�a�normal�distri- bution�� The� output� for� the� S–W� test� is� presented� in� the� following� and� suggests� that� our� sample�distribution�for�cholesterol�level�is�not�statistically�significantly�different�than�what� would�be�expected�from�a�normal�distribution—and�this�is�true�for�both�males�(SW�=��949,� df�=�12,�p�=��617)�and�females�(SW�=��931,�df�=�8,�p�=��525)� Gender Male Female Cholesterol level Statistic Statisticdf Tests of Normality Kolmogorov–Smirnova df Shapiro–Wilk Sig. Sig. .129 .159 8 12 8 12 .617 .525 .200 .200 .931 .949 a Lilliefors significance correction * This is a lower bound of the true significance. Quantile–quantile� (Q–Q)� plots� are� also� often� examined� to� determine� evidence� of� nor- mality�� Q–Q� plots� are� graphs� that� plot� quantiles� of� the� theoretical� normal� distribution� against�quantiles�of�the�sample�distribution��Points�that�fall�on�or�close�to�the�diagonal�line� suggest�evidence�of�normality��Similar�to�what�we�saw�with�the�histogram,�the�Q–Q�plot� of�cholesterol�level�for�both�males�and�females�(although�not�shown�here)�suggests�some� nonnormality��Keep�in�mind�that�we�have�a�relatively�small�sample�size��Thus,�interpreting� the�visual�graphs�(e�g�,�histograms�and�Q–Q�plots)�can�be�challenging,�although�we�have� plenty�of�other�evidence�for�normality� 187Inferences About the Difference Between Two Means 2 1 0 –1 –2 Ex pe ct ed n or m al 175 200 225 Observed value 250 275 Normal Q–Q plot of cholesterol level For group = male Examination�of�the�boxplots�suggests�a�relatively�normal�distributional�shape�of�choles- terol�level�for�both�males�and�females�and�no�outliers� 260.00 240.00 220.00 200.00 C ho le st er ol le ve l 180.00 160.00 Male Female Gender Considering�the�forms�of�evidence�we�have�examined,�skewness�and�kurtosis�statistics,� the�S–W�test,�and�the�boxplots,�all�suggest�normality�is�a�reasonable�assumption��Although� the� histograms� and� Q–Q� plots� suggest� some� nonnormality,� this� is� somewhat� expected� given� the� small� sample� size�� Generally,� we� can� be� reasonably� assured� we� have� met� the� assumption� of� normality� of� the� dependent� variable� for� each� group� of� the� independent� variable��Additionally,�recall�that�when�the�assumption�of�normality�is�violated�with�the� independent�t�test,�the�effects�on�Type�I�and�Type�II�errors�are�minimal�when�using�a�two- tailed�test,�as�we�are�conducting�here�(e�g�,�Glass,�Peckham,�&�Sanders,�1972;�Sawilowsky� &�Blair,�1992)� 188 An Introduction to Statistical Concepts Dependent t Test Step 1:�To�conduct�a�dependent�t�test,�your�dataset�needs�to�include�the�two�variables� (i�e�,�for�the�paired�samples)�whose�means�you�wish�to�compare�(e�g�,�pretest�and�posttest)�� To� conduct� the� dependent� t� test,� go� to� the�“Analyze”� in� the� top� pulldown� menu,� then� select�“Compare Means,”�and�then�select�“Paired-Samples T Test.”�Following�the� screenshot�(step�1)�as�follows�produces�the�“Paired-Samples T Test”�dialog�box� A B C Dependent t test: Step 1 Step 2:�Click�both�variables�(e�g�,�pretest�and�posttest�as�variable�1�and�variable�2,�respec- tively)�and�move�them�into�the�“Paired Variables”�box�by�clicking�the�arrow�button�� Both�variables�should�now�appear�in�the�box�as�shown�in�screenshot�step�2��Then�click�on� “Ok” to�run�the�analysis�and�generate�the�output� Select the paired samples from the list on the left and use the arrow to move to the “Paired Variables” box on the right. Then click on “Ok.” Dependent t test: Step 2 The�output�appears�in�Table�7�4,�where�again�the�top�box�provides�descriptive�statistics,� the�middle�box�provides�a�bivariate�correlation�coefficient,�and�the�bottom�box�gives�the� results�of�the�dependent�t�test�procedure� 189Inferences About the Difference Between Two Means Table 7.4 SPSS�Results�for�Dependent�t�Test Paired Samples Statistics Mean N Std. Deviation Std. Error Mean Pretest 64.0000 10 4.21637 1.33333Pair 1 Posttest 59.0000 10 3.62093 1.14504 Paired Samples Correlations N Correlation Sig. Pair 1 Pretest and posttest 10 .859 .001 Paired Samples Test Paired Differences 95% Confidence Interval of the Difference Mean Std. Deviation Std. Error Mean Lower Upper t df Sig. (2-Tailed) Pair 1 Pretest - posttest 5.00000 2.16025 .68313 3.45465 6.54535 7.319 9 .000 �e table labeled “Paired Samples Statistics” provides basic descriptive statistics for the paired samples. The table labeled “Paired Samples Correlations” provides the Pearson Product Moment Correlation Coefficient value, a bivariate correlation coefficient, between the pretest and posttest values. In this example, there is a strong correlation (r = .859) and it is statistically significant ( p = .001).�e values in this section of the table are calculated based on paired differences (i.e., the difference values between pretest and posttest scores). “Sig.” is the observed p value for the dependent t test. It is interpreted as: there is less than a 1% probability of a sample mean difference of 5 or greater occurring by chance if the null hypothesis is really true (i.e., if the population mean difference is 0). df are the degrees of freedom. For the dependent samples t test, they are calculated as n – 1. “t” is the t test statistic value. The t value is calculated as: 5 0.6831 == d sd t 7.3196= 190 An Introduction to Statistical Concepts Using “Explore” to Examine Normality of Distribution of Difference Scores Generating normality evidence:�As�with�the�other�t�tests�we�have�studied,�under- standing� the� distributional� shape� and� the� extent� to� which� normality� is� a� reasonable� assumption� is� important�� For� the� dependent� t� test,� the� distributional� shape� for� the� dif- ference scores� should� be� normally� distributed�� Thus,� we� first� need� to� create� a� new� vari- able�in�our�dataset�to�reflect�the�difference�scores�(in�this�case,�the�difference�between�the� pre-�and�posttest�values)��To�do�this,�go�to�“Transform”�in�the�top�pulldown�menu,�then� select�“Compute Variable.”�Following�the�screenshot�(step�1)�as�follows�produces�the� “Compute Variable”�dialog�box� A B Computing the difference score: Step 1 From�the�“Compute Variable”�dialog�screen,�we�can�define�the�column�header�for�our� variable�by�typing�in�a�name�in�the�“Target Variable”�box�(no�spaces,�no�special�char- acters,�and�cannot�begin�with�a�numeric�value)��The�formula�for�computing�our�difference� score�is�inserted�in�the�“Numeric Expression”�box��To�create�this�formula,�(1)�click�on� “pretest”�in�the�left�list�of�variables�and�use�the�arrow�key�to�move�it�into�the�“Numeric Expression” box;�(2)�use�your�keyboard�or�the�keyboard�within�the�dialog�box�to�insert� a�minus�sign�(i�e�,�dash)�after�“pretest”�in�the�“Numeric Expression”�box;�(3)�click�on� “posttest”�in�the�left�list�of�variables�and�use�the�arrow�key�to�move�it�into�the�“Numeric Expression”�box;�and�(4)�click�on�“OK”�to�create�the�new�difference�score�variable�in� your�dataset� 191Inferences About the Difference Between Two Means Computing the difference score: Step 2 We�can�again�use�“Explore”�to�examine�the�extent�to�which�the�assumption�of�normal- ity�is�met�for�the�distributional�shape�of�our�newly�created�difference score��The�general�steps� for� accessing� “Explore”� (see,� e�g�,� Chapter� 4)� and� for� generating� normality� evidence� for� one�variable�(see�Chapter�6)�have�been�presented�in�previous�chapters,�and�they�will�not� be�reiterated�here� Interpreting normality evidence:� We� have� already� developed� a� good� under- standing�of�how�to�interpret�some�forms�of�evidence�of�normality�including�skewness�and� kurtosis,�histograms,�and�boxplots��The�skewness�statistic�for�the�difference�score�is��248� and� kurtosis� is� �050—both� within� the� range� of� an� absolute� value� of� 2�0,� suggesting� one� form�of�evidence�of�normality�of�the�differences� The�histogram�for�the�difference�scores�(not�presented�here)�is�not�necessarily�what�most� researchers� would� consider� a� normally� shaped� distribution�� Our� formal� test� of� normal- ity,� the� S–W� (SW)� test� (Shapiro� &� Wilk,� 1965),� suggests� that� our� sample� distribution� for� differences�is�not�statistically�significantly�different�than�what�would�be�expected�from�a� normal�distribution�(S–W�=��956,�df�=�10,�p�=��734)��Similar�to�what�we�saw�with�the�histo- gram,�the�Q–Q�plot�of�differences�suggests�some�nonnormality�in�the�tails�(as�the�farthest� points� are� not� falling� on� the� diagonal� line)�� Keep� in� mind� that� we� have� a� small� sample� size��Thus,�interpreting�the�visual�graphs�(e�g�,�histograms�and�Q–Q�plots)�can�be�difficult�� Examination�of�the�boxplot�suggests�a�relatively�normal�distributional�shape��Considering� the�forms�of�evidence�we�have�examined,�skewness�and�kurtosis,�the�S–W�test�of�normal- ity,� and� boxplots,� all� suggest� normality� is� a� reasonable� assumption�� Although� the� histo- grams�and�Q–Q�plots�suggested�some�nonnormality,�this�is�somewhat�expected�given�the� small�sample�size��Generally,�we�can�be�reasonably�assured�we�have�met�the�assumption� of�normality�of�the�difference�scores� Generating evidence of homogeneity of variance of difference scores:� Without�conducting�a�formal�test�of�equality�of�variances�(as�we�do�in�Chapter�9),�a�rough� benchmark�for�having�met�the�assumption�of�homogeneity�of�variances�when�conducting� 192 An Introduction to Statistical Concepts the�dependent�t�test�is�that�the�ratio�of�the�smallest�to�largest�variance�of�the�paired�samples� is�no�greater�than�1:4��The�variance�can�be�computed�easily�by�any�number�of�procedures� in�SPSS�(e�g�,�refer�back�to�Chapter�3),�and�these�steps�will�not�be�repeated�here��For�our� paired�samples,�the�variance�of�the�pretest�score�is�17�778�and�the�variance�of�the�posttest� score�is�13�111—well�within�the�range�of�1:4,�suggesting�that�homogeneity�of�variances�is� reasonable� 7.5 G*Power Using�the�results�of�the�independent�samples�t�test�just�conducted,�let�us�use�G*Power�to� compute�the�post�hoc�power�of�our�test� Post Hoc Power for the Independent t Test Using G*Power The� first� thing� that� must� be� done� when� using� G*Power� for� computing� post� hoc� power� is� to� select� the� correct� test� family�� In� our� case,� we� conducted� an� independent� samples� t� test;� therefore,� the� default� selection� of�“t tests”� is� the� correct� test� family�� Next,� we� need� to� select� the� appropriate� statistical� test�� We� use� the� arrow� to� toggle� to� “Means: Difference between two independent means (two groups).”�The�“Type of Power Analysis”� desired� then� needs� to� be� selected�� To� compute� post� hoc� power,� we� need�to�select�“Post hoc: Compute achieved power–given�α, sample size, and effect size.” The�“Input Parameters”�must�then�be�specified��The�first�parameter�is�the�selection�of� whether�your�test�is�one-tailed�(i�e�,�directional)�or�two-tailed�(i�e�,�nondirectional)��In�this� example,�we�have�a�two-tailed�test�so�we�use�the�arrow�to�toggle�to�“Two.”�The�achieved� or�observed�effect�size�was�−1�1339��The�alpha�level�we�tested�at�was��05,�and�the�sample� size�for�females�was�8�and�for�males,�12��Once�the�parameters�are�specified,�simply�click�on� “Calculate”�to�generate�the�achieved�power�statistics� The�“Output Parameters”�provide�the�relevant�statistics�given�the�input�just�speci- fied�� In� this� example,� we� were� interested� in� determining� post� hoc� power� given� a� two- tailed�test,�with�an�observed�effect�size�of�−1�1339,�an�alpha�level�of��05,�and�sample�sizes� of� 8� (females)� and� 12� (males)�� Based� on� those� criteria,� the� post� hoc� power� was� �65�� In� other� words,� with� a� sample� size� of� 8� females� and� 12� males� in� our� study,� testing� at� an� alpha�level�of��05�and�observing�a�large�effect�of�−1�1339,�then�the�power�of�our�test�was� �65—the�probability�of�rejecting�the�null�hypothesis�when�it�is�really�false�will�be�65%,� which�is�only�moderate�power��Keep�in�mind�that�conducting�power�analysis�a�priori�is� recommended�so�that�you�avoid�a�situation�where,�post�hoc,�you�find�that�the�sample�size� was�not�sufficient�to�reach�the�desired�power�(given�the�observed�effect�size�and�alpha� level)��We�were�fortunate�in�this�example�in�that�we�were�still�able�to�detect�a�statistically� significant�difference�in�cholesterol�levels�between�males�and�females;�however�we�will� likely�not�always�be�that�lucky� 193Inferences About the Difference Between Two Means The “Input Parameters” for computing post hoc power must be specified including: Once the parameters are specified, click on “Calculate.” Independent t test 1. One versus two tailed test; 2. Observed effect size d; 3. Alpha level; and 4. Sample size for each group of the independent variable. Post Hoc Power for the Dependent t Test Using G*Power Now,�let�us�use�G*Power�to�compute�post�hoc�power�for�the�dependent�t�test��First,�the�cor- rect�test�family�needs�to�be�selected��In�our�case,�we�conducted�a�dependent�samples�t�test;� therefore,�the�default�selection�of�“t tests”�is�the�correct�test�family��Next,�we�need�to� select�the�appropriate�statistical�test��We�use�the�arrow�to�toggle�to�“Means: Difference between two dependent means (matched pairs).”� The� “Type of Power Analysis”� desired� then� needs� to� be� selected�� To� compute� post� hoc� power,� we� need� to� select�“Post hoc: Compute achieved power–given α, sample size, and effect size.” The�“Input Parameters”� must� then� be� specified�� The� first� parameter� is� the� selec- tion�of�whether�your�test�is�one-tailed�(i�e�,�directional)�or�two-tailed�(i�e�,�nondirectional)�� 194 An Introduction to Statistical Concepts In�this�example,�we�have�a�two-tailed�test,�so�we�use�the�arrow�to�toggle�to�“Two.”�The� achieved�or�observed�effect�size�was�2�3146��The�alpha�level�we�tested�at�was��05,�and�the� total�sample�size�was�10��Once�the�parameters�are�specified,�simply�click�on�“Calculate”� to�generate�the�achieved�power�statistics� The�“Output Parameters”�provide�the�relevant�statistics�given�the�input�specified��In� this�example,�we�were�interested�in�determining�post�hoc�power�given�a�two-tailed�test,� with� an� observed� effect� size� of� 2�3146,� an� alpha� level� of� �05,� and� total� sample� size� of� 10�� Based�on�those�criteria,�the�post�hoc�power�was��99��In�other�words,�with�a�total�sample�size� of�10,�testing�at�an�alpha�level�of��05�and�observing�a�large�effect�of�2�3146,�then�the�power� of�our�test�was�over��99—the�probability�of�rejecting�the�null�hypothesis�when�it�is�really� false�will�be�greater�than�99%,�about�the�strongest�power�that�can�be�achieved��Again,�con- ducting�power�analysis�a�priori�is�recommended�so�that�you�avoid�a�situation�where,�post� hoc,�you�find�that�the�sample�size�was�not�sufficient�to�reach�the�desired�power�(given�the� observed�effect�size�and�alpha�level)� Once the parameters are specified, click on “Calculate.” Dependent t test �e “Input Parameters” for computing post hoc power must be specified including: 1. One versus two tailed test; 2. Observed effect size d; 3. Alpha level; and 4. Sample size for each group of the independent variable. 195Inferences About the Difference Between Two Means 7.6 Template and APA-Style Write-Up Next�we�develop�APA-style�paragraphs�describing�the�results�for�both�examples��First�is�a� paragraph�describing�the�results�of�the�independent�t�test�for�the�cholesterol�example,�and� this�is�followed�by�dependent�t�test�for�the�swimming�example� Independent t Test Recall�that�our�graduate�research�assistant,�Marie,�was�working�with�JoAnn,�a�local�nurse� practitioner,� to� assist� in� analyzing� cholesterol� levels�� Her� task� was� to� assist� JoAnn� with� writing�her�research�question�(Is there a mean difference in cholesterol level between males and females?)�and�generating�the�test�of�inference�to�answer�her�question��Marie�suggested�an� independent�samples�t�test�as�the�test�of�inference��A�template�for�writing�a�research�ques- tion�for�an�independent�t�test�is�presented�as�follows: Is there a mean difference in [dependent variable] between [group 1 of the independent variable] and [group 2 of the independent variable]? It�may�be�helpful�to�preface�the�results�of�the�independent�samples�t�test�with�informa- tion� on� an� examination� of� the� extent� to� which� the� assumptions� were� met� (recall� there� are� three� assumptions:� normality,� homogeneity� of� variances,� and� independence)�� This� assists� the� reader� in� understanding� that� you� were� thorough� in� data� screening� prior� to� conducting�the�test�of�inference� An independent samples t test was conducted to determine if the mean cholesterol level of males differed from females. The assumption of normality was tested and met for the distributional shape of the dependent variable (cholesterol level) for females. Review of the S-W test for normality (SW = .931, df = 8, p = .525) and skewness (.000) and kurtosis (−1.790) statistics suggested that normality of cholesterol levels for females was a reasonable assumption. Similar results were found for male cholesterol levels. Review of the S-W test for normality (S-W = .949, df = 12, p = .617) and skewness (.000) and kurtosis (−1.446) statistics suggested that normality of males cholesterol levels was a reasonable assumption. The boxplots suggested a relatively normal distributional shape (with no outliers) of cholesterol levels for both males and females. The Q–Q plots and histograms suggested some minor nonnormality for both male and female cholesterol levels. Due to the small sample, this was anticipated. Although normality indices gener- ally suggest the assumption is met, even if there are slight depar- tures from normality, the effects on Type I and Type II errors will be minimal given the use of a two-tailed test (e.g., Glass, Peckham, & Sanders, 1972; Sawilowsky & Blair, 1992). According to Levene’s test, the homogeneity of variance assumption was satisfied (F = 3.2007, p = .090). Because there was no random assignment of the individuals to gender, the assumption of independence was not met, creating a poten- tial for an increased probability of a Type I or Type II error. 196 An Introduction to Statistical Concepts It�is�also�desirable�to�include�a�measure�of�effect�size��Recall�our�formula�for�computing� the�effect�size,�d,�presented�earlier�in�the�chapter��Plugging�in�the�values�for�our�cholesterol� example,� we� find� an� effect� size� d� of� −1�1339,� which� is� interpreted� according� to� Cohen’s� (1988)�guidelines�as�a�large�effect: � d Y Y sp = − = − = −1 2 185 215 26 4575 1 1339 . . Remember�that�for�the�two-sample�mean�test,�d�indicates�how�many�standard�deviations� the�mean�of�sample�1�is�from�the�mean�of�sample�2��Thus,�with�an�effect�size�of�−1�1339,� there�are�nearly�one�and�one-quarter�standard�deviation�units�between�the�mean�choles- terol�levels�of�males�as�compared�to�females��The�negative�sign�simply�indicates�that�group� 1�has�the�smaller�mean�(as�it�is�the�first�value�in�the�numerator�of�the�formula;�in�our�case,� the�mean�cholesterol�level�of�females)� Here�is�an�APA-style�example�paragraph�of�results�for�the�cholesterol�level�data�(remem- ber�that�this�will�be�prefaced�by�the�paragraph�reporting�the�extent�to�which�the�assump- tions�of�the�test�were�met)� As shown in Table 7.3, cholesterol data were gathered from samples of 12 males and 8 females, with a female sample mean of 185 (SD = 19.09) and a male sample mean of 215 (SD = 30.22). The independent t test indi- cated that the cholesterol means were statistically significantly dif- ferent for males and females (t = −2.4842, df = 18, p = .023). Thus, the null hypothesis that the cholesterol means were the same by gender was rejected at the .05 level of significance. The effect size d (calculated using the pooled standard deviation) was −1.1339. Using Cohen’s (1988) guidelines, this is interpreted as a large effect. The results provide evidence to support the conclusion that males and females differ in cholesterol levels, on average. More specifically, males were observed to have larger cholesterol levels, on average, than females. Parenthetically,�notice�that�the�results�of�the�Welch�t′�test�were�the�same�as�for�the�inde- pendent� t� test� (Welch� t′� =� −2�7197,� rounded� df� =� 18,� p� =� �014)�� Thus,� any� deviation� from� homogeneity�of�variance�did�not�affect�the�results� Dependent t Test Marie,�as�you�recall,�was�also�working�with�Mark,�a�local�swimming�coach,�to�assist�in�analyz- ing�freestyle�swimming�time�before�and�after�swimmers�participated�in�an�intensive�training� program�� Marie� suggested� a� research� question� (Is there a mean difference in swim time for the 50-meter freestyle event before participation in an intensive training program as compared to swim time for the 50-meter freestyle event after participation in an intensive training program?)�and�assisted�in� generating�the�test�of�inference�(specifically�the�dependent�t�test)�to�answer�her�question��A� template�for�writing�a�research�question�for�a�dependent�t�test�is�presented�as�follows� Is there a mean difference in [paired sample 1] as compared to [paired sample 2]? 197Inferences About the Difference Between Two Means It�may�be�helpful�to�preface�the�results�of�the�dependent�samples�t�test�with�information�on� the�extent�to�which�the�assumptions�were�met�(recall�there�are�three�assumptions:�normal- ity,�homogeneity�of�variance,�and�independence)��This�assists�the�reader�in�understanding� that�you�were�thorough�in�data�screening�prior�to�conducting�the�test�of�inference� A dependent samples t test was conducted to determine if there was a difference in the mean swim time for the 50 meter freestyle before participation in an intensive training program as compared to the mean swim time for the 50 meter freestyle after participation in an intensive training program. The assumption of normality was tested and met for the distributional shape of the paired differences. Review of the S-W test for normality (SW = .956, df = 10, p = .734) and skew- ness (.248) and kurtosis (.050) statistics suggested that normality of the paired differences was reasonable. The boxplot suggested a rela- tively normal distributional shape, and there were no outliers pres- ent. The Q–Q plot and histogram suggested minor nonnormality. Due to the small sample, this was anticipated. Homogeneity of variance was tested by reviewing the ratio of the raw score variances. The ratio of the smallest (posttest = 13.111) to largest (pretest = 17.778) variance was less than 1:4; therefore, there is evidence of the equal variance assumption. The individuals were not randomly selected; therefore, the assumption of independence was not met, creating a potential for an increased probability of a Type I or Type II error. It�is�also�important�to�include�a�measure�of�effect�size��Recall�our�formula�for�computing� the�effect�size,�d,�presented�earlier�in�the�chapter��Plugging�in�the�values�for�our�swimming� example,�we�find�an�effect�size�d�of�2�3146,�which�is�interpreted�according�to�Cohen’s�(1988)� guidelines�as�a�large�effect: � Cohen d d sd = = = 5 2 1602 2 3146 . . With� an� effect� size� of� 2�3146,� there� are� about� two� and� a� third� standard� deviation� units� between�the�pretraining�mean�swim�time�and�the�posttraining�mean�swim�time� Here�is�an�APA-style�example�paragraph�of�results�for�the�swimming�data�(remember� that�this�will�be�prefaced�by�the�paragraph�reporting�the�extent�to�which�the�assumptions� of�the�test�were�met)� From Table 7.4, we see that pretest and posttest data were collected from a sample of 10 swimmers, with a pretest mean of 64 seconds (SD = 4.22) and a posttest mean of 59 seconds (SD = 3.62). Thus, swimming times decreased from pretest to posttest. The dependent t test was conducted to determine if this difference was statistically significantly dif- ferent from 0, and the results indicate that the pretest and posttest means were statistically different (t = 7.319, df = 9, p < .001). Thus, the null hypothesis that the freestyle swimming means were the same at both points in time was rejected at the .05 level of significance. The effect size d (calculated as the mean difference divided by the standard 198 An Introduction to Statistical Concepts deviation of the difference) was 2.3146. Using Cohen’s (1988) guidelines, this is interpreted as a large effect. The results provide evidence to support the conclusion that the mean 50 meter freestyle swimming time prior to intensive training is different than the mean 50 meter free- style swimming time after intensive training. 7.7 Summary In�this�chapter,�we�considered�a�second�inferential�testing�situation,�testing�hypotheses� about� the� difference� between� two� means�� Several� inferential� tests� and� new� concepts� were�discussed��New�concepts�introduced�were�independent�versus�dependent�samples,� the� sampling� distribution� of� the� difference� between� two� means,� the� standard� error� of� the�difference�between�two�means,�and�parametric�versus�nonparametric�tests��We�then� moved�on�to�describe�the�following�three�inferential�tests�for�determining�the�difference� between� two� independent� means:� the� independent� t� test,� the� Welch� t′� test,� and� briefly� the� Mann–Whitney–Wilcoxon� test�� The� following� two� tests� for� determining� the� differ- ence� between� two� dependent� means� were� considered:� the� dependent� t� test� and� briefly� the�Wilcoxon�signed�ranks�test��In�addition,�examples�were�presented�for�each�of�the�t� tests,� and� recommendations� were� made� as� to� when� each� test� is� most� appropriate�� The� chapter�concluded�with�a�look�at�SPSS�and�G*Power�(for�post�hoc�power)�as�well�as�devel- oping�an�APA-style�write-up�of�results��At�this�point,�you�should�have�met�the�following� objectives:�(a)�be�able�to�understand�the�basic�concepts�underlying�the�inferential�tests� of�two�means,�(b)�be�able�to�select�the�appropriate�test,�and�(c)�be�able�to�determine�and� interpret�the�results�from�the�appropriate�test��In�the�next�chapter,�we�discuss�inferential� tests�involving�proportions��Other�inferential�tests�are�covered�in�subsequent�chapters� Problems Conceptual problems 7.1� We�test�the�following�hypothesis: � H0 1 2 0: µ µ− = � H1 1 2 0: µ µ− ≠ � The�level�of�significance�is��05�and�H0�is�rejected��Assuming�all�assumptions�are�met�and� H0�is�true,�the�probability�of�committing�a�Type�I�error�is�which�one�of�the�following? � a�� 0 � b�� 0�05 � c�� Between��05�and��95 � d�� 0�95 � e�� 1�00 199Inferences About the Difference Between Two Means 7.2� When�H0�is�true,�the�difference�between�two�independent�sample�means�is�a�func- tion�of�which�one�of�the�following? � a�� Degrees�of�freedom � b�� The�standard�error � c�� The�sampling�distribution � d�� Sampling�error 7.3� The�denominator�of�the�independent�t�test�is�known�as�the�standard�error�of�the� difference�between�two�means,�and�may�be�defined�as�which�one�of�the�following? � a�� The�difference�between�the�two�group�means � b�� The�amount�by�which�the�difference�between�the�two�group�means�differs�from� the�population�mean � c�� The� standard� deviation� of� the� sampling� distribution� of� the� difference� between� two�means � d�� All�of�the�above � e�� None�of�the�above 7.4� In�the�independent�t�test,�the�homoscedasticity�assumption�states�what? � a�� The�two�population�means�are�equal� � b�� The�two�population�variances�are�equal� � c�� The�two�sample�means�are�equal� � d�� The�two�sample�variances�are�equal� 7.5� Sampling�error�increases�with�larger�samples��True�or�false? 7.6� At� a� given� level� of� significance,� it� is� possible� that� the� significance� test� and� the� CI� results�will�differ�for�the�same�dataset��True�or�false? 7.7� I� assert� that� the� critical� value� of� t� required� for� statistical� significance� is� smaller� (in� absolute�value�or�ignoring�the�sign)�when�using�a�directional�rather�than�a�nondirec- tional�test��Am�I�correct? 7.8� If�a�95%�CI�from�an�independent�t�test�ranges�from�−�13�to�+1�67,�I�assert�that�the�null� hypothesis�would�not�be�rejected�at�the��05�level�of�significance��Am�I�correct? 7.9� A� group� of� 15� females� was� compared� to� a� group� of� 25� males� with� respect� to� intel- ligence��To�test�if�the�sample�sizes�are�significantly�different,�which�of�the�following� tests�would�you�use? � a�� Independent�t�test � b�� Dependent�t�test � c�� z�test � d�� None�of�the�above 7.10� The� mathematic� ability� of� 10� preschool� children� was� measured� when� they� entered� their�first�year�of�preschool�and�then�again�in�the�spring�of�their�kindergarten�year�� To�test�for�pre-�to�post-mean�differences,�which�of�the�following�tests�would�be�used? � a�� Independent�t�test � b�� Dependent�t�test � c�� z�test � d�� None�of�the�above 200 An Introduction to Statistical Concepts 7.11� A� researcher� collected� data� to� answer� the� following� research� question:� Are� there� mean�differences�in�science�test�scores�for�middle�school�students�who�participate�in� school-sponsored�athletics�as�compared�to�students�who�do�not�participate?�Which� of�the�following�tests�would�be�used�to�answer�this�question? � a�� Independent�t�test � b�� Dependent�t�test � c�� z�test � d�� None�of�the�above 7.12� The�number�of�degrees�of�freedom�for�an�independent�t�test�with�15�females�and� 25�males�is�40��True�or�false? 7.13� I�assert�that�the�critical�value�of�t,�for�a�test�of�two�dependent�means,�will�increase�as� the�samples�become�larger��Am�I�correct? 7.14� Which�of�the�following�is�NOT�an�assumption�of�the�independent�t�test? � a�� Normality � b�� Independence � c�� Equal�sample�sizes � d�� Homogeneity�of�variance 7.15� For� which� of� the� following� assumptions� of� the� independent� t� test� is� evidence� pro- vided�in�the�SPSS�output�by�default? � a�� Normality � b�� Independence � c�� Equal�sample�sizes � d�� Homogeneity�of�variance Computational problems 7.1� The�following�two�independent�samples�of�older�and�younger�adults�were�measured� on�an�attitude�toward�violence�test: Sample 1 (Older Adult) Data Sample 1 (Younger Adult) Data 42 36 47 45 50 57 35 46 37 58 43 52 52 44 47 43 60 41 51 56 54 49 44 51 55 50 40 49� 55� 56 40 46 41 � a�� Test�the�following�hypotheses�at�the��05�level�of�significance: � H0 1 2 0: µ µ− = � H1 1 2 0: µ µ− ≠ � b�� Construct�a�95%�CI� 201Inferences About the Difference Between Two Means 7.2� The�following�two�independent�samples�of�male�and�female�undergraduate�students� were�measured�on�an�English�literature�quiz: Sample 1 (Male) Data Sample 1 (Female) Data 5 7 8 9 9 11 10 11 11 13 15 18 13 15 19 20 � a�� Test�the�following�hypotheses�at�the��05�level�of�significance: � H0 1 2 0: µ µ− = � H1 1 2 0: µ µ− ≠ � b�� Construct�a�95%�CI� 7.3� The� following� two� independent� samples� of� preschool� children� (who� were� demo- graphically� similar� but� differed� in� Head� Start� participation)� were� measured� on� teacher-reported�social�skills�during�the�spring�of�kindergarten: Sample 1 (Head Start) Data Sample 1 (Non-Head Start) Data 18 14 12 15 12 9 16 10 17 10 18 12 20 16 19 11 8 11 15 13 22 13 10 14 � a�� Test�the�following�hypothesis�at�the��05�level�of�significance: � H0 1 2 0: µ µ− = � H1 1 2 0: µ µ− ≠ � b�� Construct�a�95%�CI� 202 An Introduction to Statistical Concepts 7.4� The�following�is�a�random�sample�of�paired�values�of�weight�measured�before�(time�1)� and�after�(time�2)�a�weight-reduction�program: Pair 1 2 1 127 130 2 126 124 3 129 135 4 123 127 5 124 127 6 129 128 7 132 136 8 125 130 9 135 131 10 126 128 � a�� Test�the�following�hypothesis�at�the��05�level�of�significance: � H0 1 2 0: µ µ− = � H1 1 2 0: µ µ− ≠ � b�� Construct�a�95%�CI� 7.5� Individuals�were�measured�on�the�number�of�words�spoken�during�the�1�minute�prior� to�exposure�to�a�confrontational�situation��During�the�1�minute�after�exposure,�the�indi- viduals�were�again�measured�on�the�number�of�words�spoken��The�data�are�as�follows: Person Pre Post 1 60 50 2 80 70 3 120 80 4 100 90 5 90 100 6 85 70 7 70 40 8 90 70 9 100 60 10 110 100 11 80 100 12 100 70 13 130 90 14 120 80 15 90 50 � a�� Test�the�following�hypotheses�at�the��05�level�of�significance: � H0 1 2 0: µ µ− = � H1 1 2 0: µ µ− ≠ � b�� Construct�a�95%�CI� 203Inferences About the Difference Between Two Means 7.6� The�following�is�a�random�sample�of�scores�on�an�attitude�toward�abortion�scale�for� husband�(sample�1)�and�wife�(sample�2)�pairs: Pair 1 2 1 1 3 2 2 3 3 4 6 4 4 5 5 5 7 6 7 8 7 7 9 8 8 10 � a�� Test�the�following�hypotheses�at�the��05�level�of�significance: � H0 1 2 0: µ µ− = � H1 1 2 0: µ µ− ≠ � b�� Construct�a�95%�CI� 7.7� For� two� dependent� samples,� test� the� following� hypothesis� at� the� �05� level� of� significance: � Sample�statistics:�n�=�121;�d – �=�10;�sd�=�45� � H0 1 2 0: µ µ− = � H1 1 2 0: µ µ− >
7.8� For� two� dependent� samples,� test� the� following� hypothesis� at� the� �05� level� of�
significance�
� Sample�statistics:�n�=�25;�d

�=�25;�sd�=�14�
� H0 1 2 0: µ µ− =
� H1 1 2 0: µ µ− >
Interpretive problems
7.1� Using� the� survey� 1� dataset� from� the� website,� use� SPSS� to� conduct� an� independent�
t� test,� where� gender� is� the� grouping� variable� and� the� dependent� variable� is� a� vari-
able�of�interest�to�you��Test�for�the�extent�to�which�the�assumptions�have�been�met��
Calculate� an� effect� size� as� well� as� post� hoc� power�� Then� write� an� APA-style� para-
graph�describing�the�results�
7.2� Using�the�survey�1�dataset�from�the�website,�use�SPSS�to�conduct�an�independent�t test,�
where� the� grouping� variable� is� whether� or� not� the� person� could� tell� the� difference�
between�Pepsi�and�Coke�and�the�dependent�variable�is�a�variable�of�interest�to�you��Test�
for�the�extent�to�which�the�assumptions�have�been�met��Calculate�an�effect�size�as�well�as�
post�hoc�power��Then�write�an�APA-style�paragraph�describing�the�results�

205
8
Inferences About Proportions
Chapter Outline
8�1� Inferences�About�Proportions�Involving�the�Normal�Distribution
8�1�1� Introduction
8�1�2� Inferences�About�a�Single�Proportion
8�1�3� Inferences�About�Two�Independent�Proportions
8�1�4� Inferences�About�Two�Dependent�Proportions
8�2� Inferences�About�Proportions�Involving�the�Chi-Square�Distribution
8�2�1� Introduction
8�2�2� Chi-Square�Goodness-of-Fit�Test
8�2�3� Chi-Square�Test�of�Association
8�3� SPSS
8�4� G*Power
8�5� Template�and�APA-Style�Write-Up
Key Concepts
� 1�� Proportion
� 2�� Sampling�distribution�and�standard�error�of�a�proportion
� 3�� Contingency�table
� 4�� Chi-square�distribution
� 5�� Observed�versus�expected�proportions
In� Chapters� 6� and� 7,� we� considered� testing� inferences� about� means,� first� for� a� single� mean�
(Chapter�6)�and�then�for�two�means�(Chapter�7)��The�major�concepts�discussed�in�those�two�
chapters� included� the� following:� types� of� hypotheses,� types� of� decision� errors,� level� of� sig-
nificance,�power,�confidence�intervals�(CIs),�effect�sizes,�sampling�distributions�involving�the�
mean,�standard�errors�involving�the�mean,�inferences�about�a�single�mean,�inferences�about�
the�difference�between�two�independent�means,�and�inferences�about�the�difference�between�
two� dependent� means�� In� this� chapter,� we� consider� inferential� tests� involving� proportions��
We�define�a�proportion�as�the�percentage�of�scores�falling�into�particular�categories��Thus,�
the�tests�described�in�this�chapter�deal�with�variables�that�are�categorical�in�nature�and�thus�
are� nominal� or� ordinal� variables� (see� Chapter� 1),� or� have� been� collapsed� from� higher-level�
variables�into�nominal�or�ordinal�variables�(e�g�,�high�and�low�scorers�on�an�achievement�test)�

206 An Introduction to Statistical Concepts
The�tests�that�we�cover�in�this�chapter�are�considered�nonparametric�procedures,�also�
sometimes�referred�to�as�distribution-free�procedures,�as�there�is�no�requirement�that�the�
data�adhere�to�a�particular�distribution�(e�g�,�normal�distribution)��Nonparametric�pro-
cedures�are�often�less�preferable�than�parametric�procedures�(e�g�,�t�tests�which�assume�
normality� of� the� distribution)� for� the� following� reasons:� (1)� parametric� procedures� are�
often� robust� to� assumption� violations;� in� other� words,� the� results� are� often� still� inter-
pretable�even�if�there�may�be�assumption�violations;�(2)�nonparametric�procedures�have�
lower�power�relative�to�sample�size;�in�other�words,�rejecting�the�null�hypothesis�if�it�is�
false�requires�a�larger�sample�size�with�nonparametric�procedures;�and�(3)�the�types�of�
research�questions�that�can�be�addressed�by�nonparametric�procedures�are�often�quite�
simple� (e�g�,� while� complex� interactions� of� many� different� variables� can� be� tested� with�
parametric� procedures� such� as� factorial� analysis� of� variance,� this� cannot� be� done� with�
nonparametric�procedures)��Nonparametric�procedures�can�still�be�valuable�to�use�given�
the� measurement� scale(s)� of� the� variable(s)� and� the� research� question;� however,� at� the�
same�time,�it�is�important�that�researchers�recognize�the�limitations�in�using�these�types�
of�procedures�
Research�questions�to�be�asked�of�proportions�include�the�following�examples:
� 1�� Is� the� quarter� in� my� hand� a� fair� or� biased� coin;� in� other� words,� over� repeated�
samples,�is�the�proportion�of�heads�equal�to��50�or�not?
� 2�� Is�there�a�difference�between�the�proportions�of�Republicans�and�Democrats�who�
support�the�local�school�bond�issue?
� 3�� Is�there�a�relationship�between�ethnicity�(e�g�,�African-American,�Caucasian)�and�
type�of�criminal�offense�(e�g�,�petty�theft,�rape,�murder);�in�other�words,�is�the�pro-
portion�of�one�ethnic�group�different�from�another�in�terms�of�the�types�of�crimes�
committed?
Several�inferential�tests�are�covered�in�this�chapter,�depending�on�(a)�whether�there�are�one�
or�two�samples,�(b)�whether�the�two�samples�are�selected�in�an�independent�or�dependent�
manner,�and�(c)�whether�there�are�one�or�more�categorical�variables��More�specifically,�the�
topics� described� include� the� following� inferential� tests:� testing� whether� a� single� propor-
tion�is�different�from�a�hypothesized�value,�testing�whether�two�independent�proportions�
are�different,�testing�whether�two�dependent�proportions�are�different,�and�the�chi-square�
goodness-of-fit� test� and� chi-square� test� of� association�� We� use� many� of� the� foundational�
concepts� previously� covered� in� Chapters� 6� and� 7�� New� concepts� to� be� discussed� include�
the�following:�proportion,�sampling�distribution�and�standard�error�of�a�proportion,�con-
tingency� table,� chi-square� distribution,� and� observed� versus� expected� frequencies�� Our�
objectives�are�that�by�the�end�of�this�chapter,�you�will�be�able�to�(a)�understand�the�basic�
concepts�underlying�tests�of�proportions,�(b)�select�the�appropriate�test,�and�(c)�determine�
and�interpret�the�results�from�the�appropriate�test�
8.1 Inferences About Proportions Involving Normal Distribution
We�have�been�following�Marie,�an�educational�research�graduate�student,�as�she�completes�
tasks�assigned�to�her�by�her�faculty�advisor�

207Inferences About Proportions
Marie’s�advisor�has�received�two�additional�calls�from�individuals�in�other�states�who�
are� interested� in� assistance� with� statistical� analysis�� Knowing� the� success� Marie� has�
had� with� the� previous� consultations,� Marie’s� advisor� requests� that� Marie� work� with�
Tami,�a�staff�member�in�the�Undergraduate�Services�Office�at�Ivy-Covered�University�
(ICU),�and�Matthew,�a�lobbyist�from�a�state�that�is�considering�legalizing�gambling�
In�conversation�with�Marie,�Tami�shares�that�she�recently�read�a�report�that�provided�
national�statistics�on�the�proportion�of�students�that�major�in�various�disciplines��Tami�
wants�to�know�if�there�are�similar�proportions�at�their�institution��Marie�suggests�the�
following� research� question:� Are the sample proportions of undergraduate student college
majors at Ivy Covered University in the same proportions of those nationally?�Marie�suggests�
a�chi-square�goodness-of-fit�test�as�the�test�of�inference��Her�task�is�then�to�assist�Tami�
in�generating�the�test�of�inference�to�answer�her�research�question�
Marie�then�speaks�with�Matthew,�a�lobbyist�who�is�lobbying�against�legalizing�gam-
bling�in�his�state��Matthew�wants�to�determine�if�there�is�a�relationship�between�level�
of�education�and�stance�on�a�proposed�gambling�amendment��Matthew�suspects�that�
the�proportions�supporting�gambling�vary�as�a�function�of�their�education�level��The�
following�research�question�is�suggested�by�Marie:�Is there an association between level of
education and stance on gambling?�Marie�suggests�a�chi-square�test�of�association�as�the�
test�of�inference��Her�task�is�then�to�assist�Matthew�in�generating�the�test�of�inference�
to�answer�his�research�question�
This�section�deals�with�concepts�and�procedures�for�testing�inferences�about�proportions�
that�involve�the�normal�distribution��Following�a�discussion�of�the�concepts�related�to�tests�
of�proportions,�inferential�tests�are�presented�for�situations�when�there�is�a�single�propor-
tion,�two�independent�proportions,�and�two�dependent�proportions�
8.1.1   Introduction
Let�us�examine�in�greater�detail�the�concepts�related�to�tests�of�proportions��First,�a�propor-
tion�represents�the�percentage�of�individuals�or�objects�that�fall�into�a�particular�category��
For� instance,� the� proportion� of� individuals� who� support� a� particular� political� candidate�
might�be�of�interest��Thus,�the�variable�here�is�a�dichotomous,�categorical,�nominal�variable,�
as�there�are�only�two�categories�represented,�support�or�do�not�support�the�candidate�
For�notational�purposes,�we�define�the�population proportion�π�(pi)�as
π =
f
N
where
f� is� the� number� of� frequencies� in� the� population� who� fall� into� the� category� of� interest�
(e�g�,�the�number�of�individuals�in�the�population�who�support�the�candidate)
N�is�the�total�number�of�individuals�in�the�population
For� example,� if� the� population� consists� of� 100� individuals� and� 58� support� the� candidate,�
then�π�=��58�(i�e�,�58/100)��If�the�proportion�is�multiplied�by�100%,�this�yields�the�percent-
age� of� individuals� in� the� population� who� support� the� candidate,� which� in� the� example�
would�be�58%��At�the�same�time,�1�−�π�represents�the�population�proportion�of�individuals�
who�do�not�support�the�candidate,�which�for�this�example�would�be�1�−��58�=��42��If�this�is�
multiplied�by�100%,�this�yields�the�percentage�of�individuals�in�the�population�who�do�not�
support�the�candidate,�which�in�the�example�would�be�42%�

208 An Introduction to Statistical Concepts
In�a�fashion,�the�population�proportion�is�conceptually�similar�to�the�population�mean�if�
the�category�of�interest�(support�of�candidate)�is�coded�as�1�and�the�other�category�(no�sup-
port)�is�coded�as�0��In�the�case�of�the�example�with�100�individuals,�there�are�58�individuals�
coded�1,�42�individuals�coded�0,�and�therefore,�the�mean�would�be��58��To�this�point�then,�we�
have�π�representing�the�population�proportion�of�individuals�supporting�the�candidate�and�
1�−�π�representing�the�population�proportion�of�individuals�not�supporting�the�candidate�
The�population variance of a proportion�can�also�be�determined�by�σ2�=�π(1�−�π),�and�
thus,� the� population� standard� deviation� of� a� proportion� is� σ π π= −( )1 �� These� provide�
us�with�measures�of�variability�that�represent�the�extent�to�which�the�individuals�in�the�
population� vary� in� their� support� of� the� candidate�� For� the� example� population� then,� the�
variance�is�computed�to�be�σ2�=�π(1�−�π)�=��58(1�−��58)�=��58(�42)�=��2436,�and�the�standard�
deviation�is�σ π π= − = − =( ) . ( . ) . (. )1 58 1 58 58 42 �=��4936�
For�the�population�parameters,�we�now�have�the�population�proportion�(or�mean),�the�pop-
ulation�variance,�and�the�population�standard�deviation��The�next�step�is�to�discuss�the�cor-
responding�sample�statistics�for�the�proportion��The�sample proportion�p�is�defined�as
p
f
n
=
where
f�is�the�number�of�frequencies�in�the�sample�that�fall�into�the�category�of�interest�(e�g�,�the�
number�of�individuals�who�support�the�candidate)
n�is�the�total�number�of�individuals�in�the�sample
The� sample� proportion� p� is� thus� a� sample� estimate� of� the� population� proportion� π�� One�
way�we�can�estimate�the�population�variance�is�by�the�sample�variance�s2�=�p(1�−�p),�and�the�
population�standard�deviation�of�a�proportion�can�be�estimated�by�the�sample�standard�
deviation�s p p= −( )1 �
The�next�concept�to�discuss�is�the�sampling�distribution�of�the�proportion��This�is�com-
parable� to� the� sampling� distribution� of� the� mean� discussed� in� Chapter� 5�� If� one� were� to�
take�many�samples,�and�for�each�sample,�compute�the�sample�proportion�p,�then�we�could�
generate�a�distribution�of�p��This�is�known�as�the�sampling distribution of the proportion��
For�example,�imagine�that�we�take�50�samples�of�size�100�and�determine�the�proportion�
for�each�sample��That�is,�we�would�have�50�different�sample�proportions�each�based�on�100�
observations��If�we�construct�a�frequency�distribution�of�these�50�proportions,�then�this�is�
actually�the�sampling�distribution�of�the�proportion�
In�theory,�the�sample�proportions�for�this�example�could�range�from��00�(p�=�0/100)�to�1�00�
(p�=�100/100)�given�that�there�are�100�observations�in�each�sample��One�could�also�examine�
the�variability�of�these�50�sample�proportions��That�is,�we�might�be�interested�in�the�extent�
to�which�the�sample�proportions�vary��We�might�have,�for�one�example,�most�of�the�sample�
proportions�falling�near�the�mean�proportion�of��60��This�would�indicate�for�the�candidate�
data�that�(a)�the�samples�generally�support�the�candidate,�as�the�average�proportion�is��60,�
and�(b)�the�support�for�the�candidate�is�fairly�consistent�across�samples,�as�the�sample�pro-
portions�tend�to�fall�close�to��60��Alternatively,�in�a�second�example,�we�might�find�the�sample�
proportions� varying� quite� a� bit� around� the� mean� of� �60,� say� ranging� from� �20� to� �80�� This�
would� indicate� that� (a)� the� samples� generally� support� the� candidate� again,� as� the� average�
proportion�is��60,�and�(b)�the�support�for�the�candidate�is�not�very�consistent�across�samples,�
leading�one�to�believe�that�some�groups�support�the�candidate�and�others�do�not�

209Inferences About Proportions
The�variability�of�the�sampling�distribution�of�the�proportion�can�be�determined�as�fol-
lows��The�population�variance�of�the�sampling�distribution�of�the�proportion�is�known�as�
the�variance error of the proportion,�denoted�by�σ p
2��The�variance�error�is�computed�as
σ
π π
p
n
2 1=
−( )
where
π�is�again�the�population�proportion
n�is�sample�size�(i�e�,�the�number�of�observations�in�a�single�sample)
The�population�standard�deviation�of�the�sampling�distribution�of�the�proportion�is�known�
as� the� standard error of the proportion,� denoted� by� σp�� The� standard� error� is� an� index�
of� how� variable� a� sample� statistic� (in� this� case,� the� sample� proportion)� is� when� multiple�
samples�of�the�same�size�are�drawn,�and�is�computed�as�follows:
σ
π π
p
n
=
−( )1
This�situation�is�quite�comparable�to�the�sampling�distribution�of�the�mean�discussed�in�
Chapter�5��There�we�had�the�variance�error�and�standard�error�of�the�mean�as�measures�of�
the�variability�of�the�sample�means�
Technically� speaking,� the� binomial� distribution� is� the� exact� sampling� distribution� for� the�
proportion;�binomial�here�refers�to�a�categorical�variable�with�two�possible�categories,�which�is�
certainly�the�situation�here��However,�except�for�rather�small�samples,�the�normal�distribution�
is�a�reasonable�approximation�to�the�binomial�distribution�and�is�therefore�typically�used��The�
reason�we�can�rely�on�the�normal�distribution�is�due�to�the�central�limit�theorem,�previously�
discussed�in�Chapter�5��For�proportions,�the�central�limit�theorem�states�that�as�sample�size�n�
increases,�the�sampling�distribution�of�the�proportion�from�a�random�sample�of�size�n�more�
closely�approximates�a�normal�distribution��If�the�population�distribution�is�normal�in�shape,�
then� the� sampling� distribution� of� the� proportion� is� also� normal� in� shape�� If� the� population�
distribution�is�not�normal�in�shape,�then�the�sampling�distribution�of�the�proportion�becomes�
more�nearly�normal�as�sample�size�increases��As�previously�shown�in�Figure�5�1�in�the�context�
of�the�mean,�the�bottom�line�is�that�if�the�population�is�nonnormal,�this�will�have�a�minimal�
effect�on�the�sampling�distribution�of�the�proportion�except�for�rather�small�samples�
Because� nearly� always� the� applied� researcher� only� has� access� to� a� single� sample,� the�
population� variance� error� and� standard� error� of� the� proportion� must� be� estimated�� The�
sample�variance�error�of�the�proportion�is�denoted�by�sp
2�and�computed�as
s
p p
n
p
2 1=
−( )
where
p�is�again�the�sample�proportion
n�is�sample�size
The�sample�standard�error�of�the�proportion�is�denoted�by�sp�and�computed�as
s
p p
n
p =
−( )1

210 An Introduction to Statistical Concepts
8.1.2   Inferences about a Single proportion
In�the�first�inferential�testing�situation�for�proportions,�the�researcher�would�like�to�
know� whether� the� population� proportion� is� equal� to� some� hypothesized� proportion�
or� not�� This� is� comparable� to� the� one-sample� t� test� described� in� Chapter� 6� where� a�
population�mean�was�compared�against�some�hypothesized�mean��First,�the�hypoth-
eses� to� be� evaluated� for� detecting� whether� a� population� proportion� differs� from� a�
hypothesized� proportion� are� as� follows�� The� null� hypothesis� H0� is� that� there� is� no�
difference�between�the�population�proportion�π�and�the�hypothesized�proportion�π0,�
which�we�denote�as
H0 0: π π=
Here�there�is�no�difference,�or�a�“null”�difference,�between�the�population�proportion�and�
the� hypothesized� proportion�� For� example,� if� we� are� seeking� to� determine� whether� the�
quarter� you� are� flipping� is� a� biased� coin� or� not,� then� a� reasonable� hypothesized� value�
would�be��50,�as�an�unbiased�coin�should�yield�“heads”�about�50%�of�the�time�
The� nondirectional,� scientific,� or� alternative� hypothesis� H1� is� that� there� is� a� difference�
between� the� population� proportion� π� and� the� hypothesized� proportion� π0,� which� we�
denote�as
H1 0: π π≠
The�null�hypothesis�H0�will�be�rejected�here�in�favor�of�the�alternative�hypothesis�H1�
if� the� population� proportion� is� different� from� the� hypothesized� proportion�� As� we�
have�not�specified�a�direction�on�H1,�we�are�willing�to�reject�H0�either�if�π�is�greater�
than�π0�or�if�π�is�less�than�π0��This�alternative�hypothesis�results�in�a�two-tailed�test��
Directional� alternative� hypotheses� can� also� be� tested� if� we� believe� either� that� π� is�
greater�than�π0�or�that�π�is�less�than�π0��In�either�case,�the�more�the�resulting�sample�
proportion�differs�from�the�hypothesized�proportion,�the�more�likely�we�are�to�reject�
the�null�hypothesis�
It�is�assumed�that�the�sample�is�randomly�drawn�from�the�population�(i�e�,�the�assump-
tion�of�independence)�and�that�the�normal�distribution�is�the�appropriate�sampling�distri-
bution��The�next�step�is�to�compute�the�test�statistic�z�as
z
p
s
p
n
p
=

=


π π
π π
0 0
0 01ˆ ( )
where�sp̂�is�estimated�based�on�the�hypothesized�proportion�π0�
The�test�statistic�z�is�then�compared�to�a�critical�value(s)�from�the�unit�normal�distribu-
tion��For�a�two-tailed�test,�the�critical�values�are�denoted�as�±α/2z�and�are�found�in�Table�
A�1��If�the�test�statistic�z�falls�into�either�critical�region,�then�we�reject�H0;�otherwise,�we�fail�
to� reject� H0�� For� a� one-tailed� test,� the� critical� value� is� denoted� as� +αz� for� the� alternative�
hypothesis�H1:�π�>�π0�(i�e�,�a�right-tailed�test)�and�as�−αz�for�the�alternative�hypothesis�

211Inferences About Proportions
H1:� π� <� π0� (i�e�,� a� left-tailed� test)�� If� the� test� statistic� z� falls� into� the� appropriate� critical� region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�H0� For� the� two-tailed� test,� a� (1� −� α)%� CI� can� also� be� examined�� The� CI� is� formed� as� follows: p z sp± α/ ( )2 ˆ where p�is�the�observed�sample�proportion ±α/2z�is�the�tabled�critical�value sp̂�is�the�sample�standard�error�of�the�proportion If�the�CI�contains�the�hypothesized�proportion�π0,�then�the�conclusion�is�to�fail�to�reject� H0;� otherwise,� we� reject� H0�� Simulation� research� has� shown� that� this� CI� procedure� works�fine�for�small�samples�when�the�sample�proportion�is�near��50;�that�is,�the�normal� distribution� is� a� reasonable� approximation� in� this� situation�� However,� as� the� sample� proportion�moves�closer�to�0�or�1,�larger�samples�are�required�for�the�normal�distribu- tion� to� be� reasonably� approximate�� Alternative� approaches� have� been� developed� that� appear�to�be�more�widely�applicable��The�interested�reader�is�referred�to�Ghosh�(1979)� and�Wilcox�(1996)� Several�points�should�be�noted�about�each�of�the�z�tests�for�proportions�developed�in� this�chapter��First,�the�interpretation�of�CIs�described�in�this�chapter�is�the�same�as�those� in�Chapter�7��Second,�Cohen’s�(1988)�measure�of�effect�size�for�proportion�tests�using�z�is� known�as�h��Unfortunately,�h�involves�the�use�of�arcsine�transformations�of�the�propor- tions,� which� is� beyond� the� scope� of� this� test�� In� addition,� standard� statistical� software,� such�as�SPSS,�does�not�provide�measures�of�effect�size�for�any�of�these�tests� Let�us�consider�an�example�to�illustrate�the�use�of�the�test�of�a�single�proportion��We�fol- low�the�basic�steps�for�hypothesis�testing�that�we�applied�in�previous�chapters��These�steps� include�the�following: � 1�� State�the�null�and�alternative�hypotheses� � 2�� Select�the�level�of�significance�(i�e�,�alpha,�α)� � 3�� Calculate�the�test�statistic�value� � 4�� Make�a�statistical�decision�(reject�or�fail�to�reject�H0)� Suppose�a�researcher�conducts�a�survey�in�a�city�that�is�voting�on�whether�or�not�to�have� an�elected�school�board��Based�on�informal�conversations�with�a�small�number�of�influ- ential�citizens,�the�researcher�is�led�to�hypothesize�that�50%�of�the�voters�are�in�favor�of� an�elected�school�board��Through�the�use�of�a�scientific�poll,�the�researcher�would�like�to� know�whether�the�population�proportion�is�different�from�this�hypothesized�value;�thus,� a� nondirectional,� two-tailed� alternative� hypothesis� is� utilized�� The� null� and� alternative� hypotheses�are�denoted�as�follows: H H 0 0 1 0 : : π π π π = ≠ 212 An Introduction to Statistical Concepts If�the�null�hypothesis�is�rejected,�this�would�indicate�that�scientific�polls�of�larger�samples� yield� different� results� and� are� important� in� this� situation�� If� the� null� hypothesis� is� not� rejected,�this�would�indicate�that�informal�conversations�with�a�small�sample�are�just�as� accurate�as�a�scientific�larger-sized�sample� A� random� sample� of� 100� voters� is� taken,� and� 60� indicate� their� support� of� an� elected� school�board�(i�e�,�p�=��60)��In�an�effort�to�minimize�the�Type�I�error�rate,�the�significance� level�is�set�at�α�=��01��The�test�statistic�z�is�computed�as z p n = − − = − − = = π π π 0 0 01 60 50 50 1 50 100 10 50 50 100 10 05( ) . . . ( . ) . . (. ) . . 000 2 0000= . Note� that� the� final� value� for� the� denominator� is� the� standard� error� of� the� proportion� (i�e�,�sp̂�=��0500),�which�we�will�need�for�computing�the�CI��From�Table�A�1,�we�determine� the�critical�values�to�be�±α/2�z�=�±�005�z�=�±2�58;�in�other�words,�the�z�value�that�corresponds� to�the�P(z)�value�closest�to��995�is�when�z�is�equal�to�2�58��As�the�test�statistic�(i�e�,�z�=�2�000)� does�not�exceed�the�critical�values�and�thus�fails�to�fall�into�a�critical�region,�our�decision� is�to�fail�to�reject�H0��Our�conclusion�then�is�that�the�accuracy�of�the�scientific�poll�is�not� any�different�from�the�hypothesized�value�of��50�as�determined�informally� The�99%�CI�for�the�example�would�be�computed�as�follows: p z sp± = ± = ± =α/ ( ) . . (. ) . . (. , . )2 60 2 58 0500 60 129 471 729ˆ Because� the� CI� contains� the� hypothesized� value� of� �50,� our� conclusion� is� to� fail� to� reject� H0�(the�same�result�found�when�we�conducted�the�statistical�test)��The�conclusion�derived� from�the�test�statistic�is�always�consistent�with�the�conclusion�derived�from�the�CI��We�can� interpret�the�CI�as�follows:�99%�of�similarly�constructed�CIs�will�contain�the�hypothesized� value�of��50� 8.1.3   Inferences about Two Independent proportions In� our� second� inferential� testing� situation� for� proportions,� the� researcher� would� like� to� know�whether�the�population�proportion�for�one�group�is�different�from�the�population� proportion�for�a�second�independent�group��This�is�comparable�to�the�independent�t�test� described�in�Chapter�7,�where�one�population�mean�was�compared�to�a�second�indepen- dent� population� mean�� Once� again,� we� have� two� independently� drawn� samples,� as� dis- cussed�in�Chapter�7� First,� the� hypotheses� to� be� evaluated� for� detecting� whether� two� independent� popula- tion�proportions�differ�are�as�follows��The�null�hypothesis�H0�is�that�there�is�no�difference� between�the�two�population�proportions�π1�and�π2,�which�we�denote�as H0 1 2 0: π π− = Here� there� is� no� difference,� or� a� “null”� difference,� between� the� two� population� propor- tions��For�example,�we�may�be�seeking�to�determine�whether�the�proportion�of�Democratic� senators�who�support�gun�control�is�equal�to�the�proportion�of�Republican�senators�who� support�gun�control� 213Inferences About Proportions The� nondirectional,� scientific,� or� alternative� hypothesis� H1� is� that� there� is� a� difference� between�the�population�proportions�π1�and�π2,�which�we�denote�as H1 1 2 0: π π− ≠ The�null�hypothesis�H0�will�be�rejected�here�in�favor�of�the�alternative�hypothesis�H1�if�the�pop- ulation�proportions�are�different��As�we�have�not�specified�a�direction�on�H1,�we�are�willing� to�reject�either�if�π1�is�greater�than�π2�or�if�π1�is�less�than�π2��This�alternative�hypothesis�results� in�a�two-tailed�test��Directional�alternative�hypotheses�can�also�be�tested�if�we�believe�either� that�π1�is�greater�than�π2�or�that�π1�is�less�than�π2��In�either�case,�the�more�the�resulting�sample� proportions�differ�from�one�another,�the�more�likely�we�are�to�reject�the�null�hypothesis� It� is� assumed� that� the� two� samples� are� independently� and� randomly� drawn� from� their� respective�populations�(i�e�,�the�assumption�of�independence)�and�that�the�normal�distribu- tion�is�the�appropriate�sampling�distribution��The�next�step�is�to�compute�the�test�statistic�z�as z p p s p p p p n n p p = − = − − +       − 1 2 1 2 1 2 1 2 1 1 1 ( ) where�n1�and�n2�are�the�sample�sizes�for�samples�1�and�2,�respectively,�and p f f n n = + + 1 2 1 2 where�f1�and�f2�are�the�number�of�observed�frequencies�for�samples�1�and�2,�respectively�� The�denominator�of�the�z�test�statistic�sp p1 2− �is�known�as�the�standard error of the differ- ence between two proportions�and�provides�an�index�of�how�variable�the�sample�statistic� (in�this�case,�the�sample�proportion)�is�when�multiple�samples�of�the�same�size�are�drawn�� This�test�statistic�is�conceptually�similar�to�the�test�statistic�for�the�independent�t�test� The�test�statistic�z�is�then�compared�to�a�critical�value(s)�from�the�unit�normal�distribu- tion��For�a�two-tailed�test,�the�critical�values�are�denoted�as�±α/2z�and�are�found�in�Table� A�1�� If� the� test� statistic� z� falls� into� either� critical� region,� then� we� reject� H0;� otherwise,� we� fail�to�reject�H0��For�a�one-tailed�test,�the�critical�value�is�denoted�as�+αz�for�the�alternative� hypothesis�H1:�π1�−�π2�>�0�(i�e�,�a�right-tailed�test)�and�as�−αz�for�the�alternative�hypothesis�
H1:�π1�−�π2�<�0�(i�e�,�a�left-tailed�test)��If�the�test�statistic�z�falls�into�the�appropriate�critical� region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�H0��It�should�be�noted�that�other�alter- natives�to�this�test�have�been�proposed�(e�g�,�Storer�&�Kim,�1990)� For�the�two-tailed�test,�a�(1�−�α)%�CI�can�also�be�examined��The�CI�is�formed�as�follows: ( ) ( )/p p z sp p1 2 2 1 2− ± −α If� the� CI� contains� 0,� then� the� conclusion� is� to� fail� to� reject� H0;� otherwise,� we� reject� H0�� Alternative�methods�are�described�by�Beal�(1987)�and�Coe�and�Tamhane�(1993)� Let�us�consider�an�example�to�illustrate�the�use�of�the�test�of�two�independent�propor- tions��Suppose�a�researcher�is�taste-testing�a�new�chocolate�candy�(“chocolate�yummies”)� and� wants� to� know� the� extent� to� which� individuals� would� likely� purchase� the� product�� 214 An Introduction to Statistical Concepts As�taste�in�candy�may�be�different�for�adults�versus�children,�a�study�is�conducted�where� independent� samples� of� adults� and� children� are� given� “chocolate� yummies”� to� eat� and� asked�whether�they�would�buy�them�or�not��The�researcher�would�like�to�know�whether� the� population� proportion� of� individuals� who� would� purchase� “chocolate� yummies”� is� different�for�adults�and�children��Thus,�a�nondirectional,�two-tailed�alternative�hypothesis� is�utilized��The�null�and�alternative�hypotheses�are�denoted�as�follows: H H 0 1 2 1 1 2 0 0 : : π π π π − = − ≠ If�the�null�hypothesis�is�rejected,�this�would�indicate�that�interest�in�purchasing�the�prod- uct�is�different�in�the�two�groups,�and�this�might�result�in�different�marketing�and�packag- ing�strategies�for�each�group��If�the�null�hypothesis�is�not�rejected,�then�this�would�indicate� the�product�is�equally�of�interest�to�both�adults�and�children,�and�different�marketing�and� packaging�strategies�are�not�necessary� A�random�sample�of�100�children�(sample�1)�and�a�random�sample�of�100�adults�(sam- ple� 2)� are� independently� selected�� Each� individual� consumes� the� product� and� indicates� whether�or�not�he�or�she�would�purchase�it��Sixty-eight�of�the�children�and�54�of�the�adults� state�they�would�purchase�“chocolate�yummies”�if�they�were�available��The�level�of�signifi- cance�is�set�at�α�=��05��The�test�statistic�z�is�computed�as�follows��We�know�that�n1�=�100,� n2�=�100,�f1�=�68,�f2�=�54,�p1�=��68,�and�p2�=��54��We�compute�p�to�be p f f n n = + + = + + = =1 2 1 2 68 54 100 100 122 200 6100. This�allows�us�to�compute�the�test�statistic�z�as z p p p p n n = − − +     = . − . − +    1 2 61 1 61 1 100 1 100 (1 ) 1 1 68 54 . ( . ) 1 2  = = = . (. )(. )(. ) . . 2.0290 14 61 39 02 14 0690 The�denominator�of�the�z�test�statistic,�sp p1 2− �=��0690,�is�the�standard�error�of�the�difference� between�two�proportions,�which�we�will�need�for�computing�the�CI� The�test�statistic�z�is�then�compared�to�the�critical�values�from�the�unit�normal�distribu- tion��As�this�is�a�two-tailed�test,�the�critical�values�are�denoted�as�±α/2z�and�are�found�in� Table�A�1�to�be�±α/2z�=�±�025z�=�±1�9600��In�other�words,�this�is�the�z�value�that�is�closest�to� a�P(z)�of��975��As�the�test�statistic�z�falls�into�the�upper�tail�critical�region,�we�reject�H0�and� conclude�that�the�adults�and�children�are�not�equally�interested�in�the�product� Finally,�we�can�compute�the�95%�CI�as�follows: ( ) ( ) (. . ) . (. ) (. ) (. ) (/p p z sp p1 2 2 1 2 68 54 1 96 0690 14 1352− ± = − ± = ± =−α .. , . )0048 2752 Because�the�CI�does�not�include�0,�we�would�again�reject�H0�and�conclude�that�the�adults� and�children�are�not�equally�interested�in�the�product��As�previously�stated,�the�conclusion� 215Inferences About Proportions derived�from�the�test�statistic�is�always�consistent�with�the�conclusion�derived�from�the�CI� at�the�same�level�of�significance��We�can�interpret�the�CI�as�follows:�for�95%�of�similarly� constructed�CIs,�the�true�population�proportion�difference�will�not�include�0� 8.1.4   Inferences about Two dependent proportions In�our�third�inferential�testing�situation�for�proportions,�the�researcher�would�like�to�know� whether�the�population�proportion�for�one�group�is�different�from�the�population�propor- tion�for�a�second�dependent�group��This�is�comparable�to�the�dependent�t�test�described�in� Chapter�7,�where�one�population�mean�was�compared�to�a�second�dependent�population� mean��Once�again,�we�have�two�dependently�drawn�samples,�as�discussed�in�Chapter�7��For� example,�we�may�have�a�pretest-posttest�situation�where�a�comparison�of�proportions�over� time�for�the�same�individuals�is�conducted��Alternatively,�we�may�have�pairs�of�matched� individuals�(e�g�,�spouses,�twins,�brother-sister)�for�which�a�comparison�of�proportions�is� of�interest� First,�the�hypotheses�to�be�evaluated�for�detecting�whether�two�dependent�population� proportions� differ� are� as� follows�� The� null� hypothesis� H0� is� that� there� is� no� difference� between�the�two�population�proportions�π1�and�π2,�which�we�denote�as H0 1 2 0: π π− = Here�there�is�no�difference�or�a�“null”�difference�between�the�two�population�proportions�� For� example,� a� political� analyst� may� be� interested� in� determining� whether� the� approval� rating�of�the�president�is�the�same�just�prior�to�and�immediately�following�his�annual�State� of� the� Union� address� (i�e�,� a� pretest–posttest�situation)�� As� a� second� example,� a� marriage� counselor�wants�to�know�whether�husbands�and�wives�equally�favor�a�particular�training� program�designed�to�enhance�their�relationship�(i�e�,�a�couple�situation)� The� nondirectional,� scientific,� or� alternative� hypothesis� H1� is� that� there� is� a� difference� between�the�population�proportions�π1�and�π2,�which�we�denote�as�follows: H1 1 2 0: π π− ≠ The�null�hypothesis�H0�will�be�rejected�here�in�favor�of�the�alternative�hypothesis�H1�if�the� population�proportions�are�different��As�we�have�not�specified�a�direction�on�H1,�we�are�will- ing�to�reject�either�if�π1�is�greater�than�π2�or�if�π1�is�less�than�π2��This�alternative�hypothesis� results�in�a�two-tailed�test��Directional�alternative�hypotheses�can�also�be�tested�if�we�believe� either�that�π1�is�greater�than�π2�or�that�π1�is�less�than�π2��The�more�the�resulting�sample�pro- portions�differ�from�one�another,�the�more�likely�we�are�to�reject�the�null�hypothesis� Before�we�examine�the�test�statistic,�let�us�consider�a�table�in�which�the�proportions�are� often�presented��As�shown�in�Table�8�1,�the�contingency table�lists�proportions�for�each�of� Table 8.1 Contingency�Table�for�Two�Samples Sample 1 Sample 2 “Unfavorable” “Favorable” Marginal Proportions “Favorable” a b p2 “Unfavorable” c d 1�−�p2 Marginal�proportions 1�−�p1 p1 216 An Introduction to Statistical Concepts the�different�possible�outcomes��The�columns�indicate�the�proportions�for�sample�1��The�left� column�contains�those�proportions�related�to�the�“unfavorable”�condition�(or�disagree�or� no,�depending�on�the�situation),�and�the�right�column,�those�proportions�related�to�the�“favor- able”�condition�(or�agree�or�yes,�depending�on�the�situation)��At�the�bottom�of�the�columns� are� the� marginal� proportions� shown� for� the� “unfavorable”� condition,� denoted� by� 1� −� p1,� and� for� the� “favorable”� condition,� denoted� by� p1�� The� rows� indicate� the� proportions� for� sample�2��The�top�row�contains�those�proportions�for�the�“favorable”� condition,�and�the� bottom�row�contains�those�proportions�for�the�“unfavorable”�condition��To�the�right�of�the� rows�are�the�marginal�proportions�shown�for�the�“favorable”�condition,�denoted�by�p2,�and� for�the�“unfavorable”�condition,�denoted�by�1�−�p2� Within�the�box�of�the�table�are�the�proportions�for�the�different�combinations�of�condi- tions� across� the� two� samples�� The� upper� left-hand� cell� is� the� proportion� of� observations� that�are�“unfavorable”�in�sample�1�and�“favorable”�in�sample�2�(i�e�,�dissimilar�across�sam- ples),� denoted�by�a��The�upper�right-hand�cell�is�the�proportion� of�observations� who�are� “favorable”�in�sample�1�and�“favorable”�in�sample�2�(i�e�,�similar�across�samples),�denoted� by�b��The�lower�left-hand�cell�is�the�proportion�of�observations�who�are�“unfavorable”�in� sample� 1� and� “unfavorable”� in� sample� 2� (i�e�,� similar� across� samples),� denoted� by� c�� The� lower�right-hand�cell�is�the�proportion�of�observations�who�are�“favorable”�in�sample�1�and� “unfavorable”�in�sample�2�(i�e�,�dissimilar�across�samples),�denoted�by�d� It�is�assumed�that�the�two�samples�are�randomly�drawn�from�their�respective�popula- tions�and�that�the�normal�distribution�is�the�appropriate�sampling�distribution��The�next� step�is�to�compute�the�test�statistic�z�as z p p s p p d a n p p = − = − +− 1 2 1 2 1 2 where�n�is�the�total�number�of�pairs��The�denominator�of�the�z�test�statistic�sp p1 2− �is�again� known�as�the�standard�error�of�the�difference�between�two�proportions�and�provides�an� index�of�how�variable�the�sample�statistic�(i�e�,�the�difference�between�two�sample�propor- tions)�is�when�multiple�samples�of�the�same�size�are�drawn��This�test�statistic�is�conceptu- ally�similar�to�the�test�statistic�for�the�dependent�t�test� The�test�statistic�z�is�then�compared�to�a�critical�value(s)�from�the�unit�normal�distribu- tion��For�a�two-tailed�test,�the�critical�values�are�denoted�as�±α/2z�and�are�found�in�Table� A�1�� If� the� test� statistic� z� falls� into� either� critical� region,� then� we� reject� H0;� otherwise,� we� fail�to�reject�H0��For�a�one-tailed�test,�the�critical�value�is�denoted�as�+αz�for�the�alternative� hypothesis�H1:�π1�−�π2�>�0�(i�e�,�right-tailed�test)�and�as�−αz�for�the�alternative�hypothesis�
H1:� π1� −� π2� <� 0� (i�e�,� left-tailed� test)�� If� the� test� statistic� z� falls� into� the� appropriate� critical� region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�H0��It�should�be�noted�that�other�alter- natives�to�this�test�have�been�proposed�(e�g�,�the�chi-square�test�as�described�in�the�follow- ing�section)��Unfortunately,�the�z�test�does�not�yield�an�acceptable�CI�procedure� Let�us�consider�an�example�to�illustrate�the�use�of�the�test�of�two�dependent�propor- tions��Suppose�a�medical�researcher�is�interested�in�whether�husbands�and�wives�agree� on� the� effectiveness� of� a� new� headache� medication� “No-Head�”� A� random� sample� of� 100�husband-wife�couples�were�selected�and�asked�to�try�“No-Head”�for�2�months��At� the�end�of�2�months,�each�individual�was�asked�whether�the�medication�was�effective� or�not�at�reducing�headache�pain��The�researcher�wants�to�know�whether�the�medica- tion�is�differentially�effective�for�husbands�and�wives��Thus,�a�nondirectional,�two-tailed� alternative�hypothesis�is�utilized� 217Inferences About Proportions The�resulting�proportions�are�presented�as�a�contingency�table�in�Table�8�2��The�level�of� significance�is�set�at�α�=��05��The�test�statistic�z�is�computed�as�follows: z p p s p p d a n p p = − = − + = − + = − = − − 1 2 1 2 1 2 40 65 15 40 100 25 0742 3 3 (. . ) . . . . . 6693 The�test�statistic�z�is�then�compared�to�the�critical�values�from�the�unit�normal�distribu- tion�� As� this� is� a� two-tailed� test,� the� critical� values� are� denoted� as� ±α/2z� and� are� found� in�Table�A�1�to�be�±α/2z�=�±�025z�=�±1�96��In�other�words,�this�is�the�z�value�that�is�closest� to�a�P(z)�of��975��As�the�test�statistic�z�falls�into�the�lower�tail�critical�region,�we�reject�H0� and�conclude�that�the�husbands�and�wives�do�not�believe�equally�in�the�effectiveness�of� “No-Head�” 8.2 Inferences About Proportions Involving Chi-Square Distribution This�section�deals�with�concepts�and�procedures�for�testing�inferences�about�proportions� that� involve� the� chi-square� distribution�� Following� a� discussion� of� the� chi-square� distri- bution� relevant� to� tests� of� proportions,� inferential� tests� are� presented� for� the� chi-square� goodness-of-fit�test�and�the�chi-square�test�of�association� 8.2.1   Introduction The�previous�tests�of�proportions�in�this�chapter�were�based�on�the�unit�normal�distri- bution,� whereas� the� tests� of� proportions� in� the� remainder� of� the� chapter� are� based� on� the�chi-square distribution��Thus,�we�need�to�become�familiar�with�this�new�distribu- tion��Like�the�normal�and�t�distributions,�the�chi-square�distribution�is�really�a�family�of� distributions�� Also,� like� the� t� distribution,� the� chi-square� distribution� family� members� depend�on�the�number�of�degrees�of�freedom�represented��As�we�shall�see,�the�degrees� of� freedom� for� the� chi-square� goodness-of-fit� test� are� calculated� as� the� number� of� cat- egories�(denoted�as�J)�minus�1��For�example,�the�chi-square�distribution�for�one�degree� of� freedom� (i�e�,� for� a� variable� which� has� two� categories)� is� denoted� by� χ1 2 � as� shown� in� Figure� 8�1�� This� particular� chi-square� distribution� is� especially� positively� skewed� and� leptokurtic�(sharp�peak)� Table 8.2 Contingency�Table�for�Headache�Example Husband Sample Wife Sample “Ineffective” “Effective” Marginal Proportions “Effective” a�=��40 b�=��25 p2�=��65 “Ineffective” c�=��20 d�=��15 1�−�p2�=��35 Marginal�proportions 1�−�p1�=��60 p1�=��40 218 An Introduction to Statistical Concepts The�figure�also�describes�graphically�the�distributions�for�χ5 2 �and� χ10 2 ��As�you�can�see� in�the�figure,�as�the�degrees�of�freedom�increase,�the�distribution�becomes�less�skewed� and� less� leptokurtic;� in� fact,� the� distribution� becomes� more� nearly� normal� in� shape� as� the� number� of� degrees� of� freedom� increase�� For� extremely� large� degrees� of� freedom,� the�chi-square�distribution�is�approximately�normal��In�general,�we�denote�a�particular� chi-square� distribution� with� ν� degrees� of� freedom� as� χ ν2�� The� mean� of� any� chi-square� distribution�is�ν,�the�mode�is�ν�−�2�when�ν�is�at�least�2,�and�the�variance�is�2ν��The�value�of� chi-square�can�range�from�0�to�positive�infinity��A�table�of�different�percentile�values�for� many�chi-square�distributions�is�given�in�Table�A�3��This�table�is�utilized�in�the�following� two�chi-square�tests� One�additional�point�that�should�be�noted�about�each�of�the�chi-square�tests�of�propor- tions�developed�in�this�chapter�is�that�there�are�no�CI�procedures�for�either�the�chi-square� goodness-of-fit�test�or�the�chi-square�test�of�association� 8.2.2   Chi-Square Goodness-of-Fit Test The�first�test�to�consider�is�the�chi-square goodness-of-fit test��This�test�is�used�to�determine� whether� the� observed� proportions� in� two� or� more� categories� of� a� categorical� variable� dif- fer�from�what�we�would�expect�a�priori��For�example,�a�researcher�is�interested�in�whether� the�current�undergraduate�student�body�at�ICU�is�majoring�in�disciplines�according�to�an�a� priori�or�expected�set�of�proportions��Based�on�research�at�the�national�level,�the�expected� proportions�of�undergraduate�college�majors�are�as�follows:��20�education,��40�arts�and�sci- ences,��10�communications,�and��30�business��In�a�random�sample�of�100�undergraduates�at� ICU,�the�observed�proportions�are�as�follows:��25�education,��50�arts�and�sciences,��10�com- munications,�and��15�business��Thus,�the�researcher�would�like�to�know�whether�the�sample� proportions�observed�at�ICU�fit�the�expected�national�proportions��In�essence,�the�chi-square� goodness-of-fit�test�is�used�to�test�proportions�for�a�single�categorical�variable�(i�e�,�nominal� or�ordinal�measurement�scale)� The� observed proportions� are� denoted� by� pj,� where� p� represents� a� sample� proportion� and�j�represents�a�particular�category�(e�g�,�education�majors),�where�j�=�1,�…,�J�categories�� The�expected proportions�are�denoted�by�πj,�where�π�represents�an�expected�proportion� FIGuRe 8.1 Several� members� of� the� family� of� the� chi- square�distribution� 0.3 0.2 0.1Re la tiv e fr eq ue nc y 0 0 5 10 Chi-square 15 20 25 1 5 10 219Inferences About Proportions and� j� represents� a� particular� category�� The� null� and� alternative� hypotheses� are� denoted� as�follows,�where�the�null�hypothesis�states�that�the�difference�between�the�observed�and� expected�proportions�is�0�for�all�categories: H p for all jj j0 0: ( )− =π H p for all jj j1 0: ( )− ≠π The�test�statistic�is�a�chi-square�and�is�computed�by χ π π 2 2 1 = − = ∑n pj j jj J ( ) where�n�is�the�size�of�the�sample��The�test�statistic�is�compared�to�a�critical�value�from�the� chi-square�table�(Table�A�3)�α νχ 2 ,�where�ν�=�J�−�1��The�degrees�of�freedom�are�1�less�than�the� total�number�of�categories�J,�because�the�proportions�must�total�to�1�00;�thus,�only�J�−�1�are� free�to�vary� If� the� test� statistic� is� larger� than� the� critical� value,� then� the� null� hypothesis� is� rejected� in� favor� of� the� alternative�� This� would� indicate� that� the� observed� and� expected� propor- tions�were�not�equal�for�all�categories��The�larger�the�differences�are�between�one�or�more� observed�and�expected�proportions,�the�larger�the�value�of�the�test�statistic,�and�the�more� likely�it�is�to�reject�the�null�hypothesis��Otherwise,�we�would�fail�to�reject�the�null�hypoth- esis,�indicating�that�the�observed�and�expected�proportions�were�approximately�equal�for� all�categories� If�the�null�hypothesis�is�rejected,�one�may�wish�to�determine�which�sample�proportions� are� different� from� their� respective� expected� proportions�� Here� we� recommend� you� con- duct�tests�of�a�single�proportion�as�described�in�the�preceding�section��If�you�would�like�to� control�the�experimentwise�Type�I�error�rate�across�a�set�of�such�tests,�then�the�Bonferroni� method�is�recommended�where�the�α�level�is�divided�up�among�the�number�of�tests�con- ducted��For�example,�with�an�overall�α�=��05�and�five�categories,�one�would�conduct�five� tests�of�a�single�proportion,�each�at�the��01�level�of�α� Another� way� to� determine� which� cells� are� statistically� different� in� observed� to� expected�proportions�is�to�examine�the�standardized�residuals�which�can�be�computed� as�follows: R O E E = − Standardized�residuals�that�are�greater�(in�absolute�value�terms)�than�1�96�(when�α�=��05)�or� 2�58�(when�α�=��01)�have�different�observed�to�expected�frequencies�and�are�contributing�to� the�statistically�significant�chi-square�statistic��The�sign�of�the�residual�provides�informa- tion�on�whether�the�observed�frequency�is�greater�than�the�expected�frequency�(i�e�,�posi- tive�value)�or�less�than�the�expected�frequency�(i�e�,�negative�value)� 220 An Introduction to Statistical Concepts Let� us� return� to� the� example� and� conduct� the� chi-square� goodness-of-fit� test�� The� test� statistic�is�computed�as�follows: � χ π π 2 2 1 = − = ∑n pj j jj J ( ) � = − + − + − + − 100 25 20 20 50 40 40 10 10 10 15 30 3 2 2 2 2(. . ) . (. . ) . (. . ) . (. . ) . 00 1 4       = ∑ j � = + + + = = = ∑100 0125 0250 0000 0750 100 1125 11 25 1 4 (. . . . ) (. ) . j The� test� statistic� is� compared� to� the� critical� value,� from� Table� A�3,� of� .05 3 2χ � =� 7�8147�� Because� the� test� statistic� is� larger� than� the� critical� value,� we� reject� the� null� hypothesis� and�conclude�that�the�sample�proportions�from�ICU�are�different�from�the�expected�pro- portions�at�the�national�level��Follow-up�tests�to�determine�which�cells�are�statistically� different�in�their�observed�to�expected�proportions�involve�examining�the�standardized� residuals��In�this�example,�the�standardized�residuals�are�computed�as�follows: R O E E R Education Arts and sciences = − = − = = − = 25 20 20 1 118 50 40 40 1 58 . . 11 10 10 10 0 15 30 30 2 739 R R Communication Bu ess = − = = − = −sin . The�standardized�residual�for�business�is�greater�(in�absolute�value�terms)�than�1�96�(as�α�=� �05)�and�thus�suggests�that�there�are�different�observed�to�expected�frequencies�for�students� majoring�in�business�at�ICU�compared�to�national�estimates,�and�that�this�category�is�the� one�which�is�contributing�most�to�the�statistically�significant�chi-square�statistic� 8.2.2.1   Effect Size An�effect�size�for�the�chi-square�goodness-of-fit�test�can�be�computed�by�hand�as�follows,� where�N�is�the�total�sample�size�and�J�is�the�number�of�categories�in�the�variable: Effect size N J = − χ2 1( ) This�effect�size�statistic�can�range�from�0�to�+1,�where�0�indicates�no�difference�between� the�sample�and�hypothesized�proportions�(and�thus�no�effect)��Positive�one�indicates�the� 221Inferences About Proportions maximum� difference� between� the� sample� and� hypothesized� proportions� (and� thus� a� large�effect)��Given�the�range�of�this�value�(0�to�+1�0)�and�similarity�to�a�correlation�coeffi- cient,�it�is�reasonable�to�apply�Cohen’s�interpretations�for�correlations�as�a�rule�of�thumb�� These�include�the�following:�small�effect�size�=��10;�medium�effect�size�=��30;�and�large� effect�size�=��50��For�the�previous�example,�the�effect�size�would�be�calculated�as�follows� and�would�be�interpreted�as�a�small�effect: Effectsize N J = − = − = = χ2 1 11 25 100 4 1 11 25 300 0375 ( ) . ( ) . . 8.2.2.2   Assumptions Two� assumptions� are� made� for� the� chi-square� goodness-of-fit� test:� (1)� observations� are� independent�(which�is�met�when�a�random�sample�of�the�population�is�selected),�and� (2)�expected�frequency�is�at�least�5�per�cell�(and�in�the�case�of�the�chi-square�goodness-of- fit�test,�this�translates�to�an�expected�frequency�of�at�least�5�per�category�as�there�is�only� one� variable� included� in� the� analysis)�� When� the� expected� frequency� is� less� than� 5,� that� particular� cell� (i�e�,� category)� has� undue� influence� on� the� chi-square� statistic�� In� other� words,�the�chi-square�goodness-of-fit�test�becomes�too�sensitive�when�the�expected�values� are�less�than�5� 8.2.3   Chi-Square Test of association The�second�test�to�consider�is�the�chi-square test of association��This�test�is�equivalent� to�the�chi-square�test�of�independence�and�the�chi-square�test�of�homogeneity,�which� are�not�further�discussed��The�chi-square�test�of�association�incorporates�both�of�these� tests�(e�g�,�Glass�&�Hopkins,�1996)��The�chi-square�test�of�association�is�used�to�deter- mine�whether�there�is�an�association�or�relationship�between�two�or�more�categorical� (i�e�,� nominal� or� ordinal)� variables�� Our� discussion� is,� for� the� most� part,� restricted� to� the� two-variable� situation� where� each� variable� has� two� or� more� categories�� The� chi- square�test�of�association�is�the�logical�extension�to�the�chi-square�goodness-of-fit�test,� which�is�concerned�with�one�categorical�variable��Unlike�the�chi-square�goodness-of- fit� test� where� the� expected� proportions� are� known� a� priori,� for� the� chi-square� test� of� association,� the� expected� proportions� are� not� known� a� priori� but� must� be� estimated� from�the�sample�data� For� example,� suppose� a� researcher� is� interested� in� whether� there� is� an� association� between� level� of� education� and� stance� on� a� proposed� amendment� to� legalize� gambling�� Thus,� one� categorical� variable� is� level� of� education� with� the� categories� being� as� follows:� (1)�less�than�a�high�school�education,�(2)�high�school�graduate,�(3)�undergraduate�degree,� and�(4)�graduate�school�degree��The�other�categorical� variable� is�stance�on�the�gambling� amendment�with�the�following�categories:�(1)�in�favor�of�the�gambling�bill�and�(2)�opposed� to�the�gambling� bill��The�null�hypothesis� is� that�there�is� no�association�between�level� of� education�and�stance�on�gambling,�whereas�the�alternative�hypothesis�is�that�there�is�some� association�between�level�of�education�and�stance�on�gambling��The�alternative�would�be� supported�if�individuals�at�one�level�of�education�felt�differently�about�the�bill�than�indi- viduals�at�another�level�of�education� The�data�are�shown�in�Table�8�3,�known�as�a�contingency table�(or�crosstab�table)��As� there�are�two�categorical�variables,�we�have�a�two-way�or�two-dimensional�contingency� 222 An Introduction to Statistical Concepts table��Each�combination�of�the�two�variables�is�known�as�a�cell��For�example,�the�cell� for�row�1,�favor�bill,�and�column�2,�high�school�graduate,�is�denoted�as�cell 12,�the�first� value� (i�e�,� 1)� referring� to� the� row� and� the� second� value� (i�e�,� 2)� to� the� column�� Thus,� the� first� subscript� indicates� the� particular� row� r,� and� the� second� subscript� indicates� the� particular� column� c�� The� row� subscript� ranges� from� r� =� 1,…,� R,� and� the� column� subscript�ranges�from�c�=�1,…,�C,�where�R�is�the�last�row�and�C�is�the�last�column��This� example�contains�a�total�of�eight�cells,�two�rows�times�four�columns,�denoted�by�R�×� C�=�2�×�4�=�8� Each� cell� in� the� table� contains� two� pieces� of� information,� the� number� (or� count� or� frequencies)�of�observations�in�that�cell�and�the�observed�proportion�in�that�cell��For� cell 12,� there� are� 13� observations� denoted� by� n12� =� 13� and� an� observed� proportion� of� �65�denoted�by�p12�=��65��The�observed�proportion�is�computed�by�taking�the�number� of�observations�in�the�cell�and�dividing�by�the�number�of�observations�in�the�column�� Thus,�for�cell 12,�13�of�the�20�high�school�graduates�favor�the�bill,�or�13/20�=��65��The�col- umn�information�is�given�at�the�bottom�of�each�column,�known�as�the�column margin- als��Here�we�are�given�the�number�of�observations�in�a�column,�denoted�by�n�c,�where� the�“�”�indicates�we�have�summed�across�rows�and�c�indicates�the�particular�column�� For�column�2�(reflecting�high�school�graduates),�there�are�20�observations�denoted�by� n�2�=�20� There�is�also�row�information�contained�at�the�end�of�each�row,�known�as�the�row mar- ginals�� Two� values� are� listed� in� the� row� marginals�� First,� the� number� of� observations� in� a�row�is�denoted�by�nr�,�where�r�indicates�the�particular�row�and�the�“�”�indicates�we�have� summed�across�the�columns��Second,�the�expected�proportion�for�a�specific�row�is�denoted� by�πr�,�where�again� r�indicates� the�particular�row�and�the�“�”�indicates� we�have�summed� across�the�columns��The�expected�proportion�for�a�particular�row�is�computed�by�taking� the�number�of�observations�in�that�row�nr��and�dividing�by�the�number�of�total�observa- tions�n�����Note�that�the�total�number�of�observations�is�given�in�the�lower�right-hand�por- tion�of�the�figure�and�denoted�as�n���=�80��Thus,�for�the�first�row,�the�expected�proportion�is� computed�as�π1��=�n1�/n���=�44/80�=��55� The�null�and�alternative�hypotheses�can�be�written�as�follows: H p for all cellsrc r0 0: ( ).− =π H p for all cellsrc r1 0: ( ).− ≠π Table 8.3 Contingency�Table�for�Gambling�Example Level of Education Stance on Gambling Less than High School High School Undergraduate Graduate Row Marginals “Favor” n11�=�16 n12�=�13 n13�=�10 n14�=�5 n1��=�44 p11�=��80 p12�=��65 p13�=��50 p14�=��25 π1��=��55 “Opposed” n21�=�4 n22�=�7 n23�=�10 n24�=�15 n2��=�36 p21�=��20 p22�=��35 p23�=��50 p24�=��75 π2��=��45 Column�marginals n�1�=�20 n�2�=�20 n�3�=�20 n�4�=�20 n���=�80 223Inferences About Proportions The�test�statistic�is�a�chi-square�and�is�computed�by χ π π 2 1 2 1 = − == ∑∑ n pc c C rc r rr R . . . ( ) The�test�statistic�is�compared�to�a�critical�value�from�the�chi-square�table�(Table�A�3)� α χν 2 ,� where�ν�=�(R�−�1)(C�−�1)��That�is,�the�degrees�of�freedom�are�1�less�than�the�number�of�rows� times�1�less�than�the�number�of�columns� If�the�test�statistic�is�larger�than�the�critical�value,�then�the�null�hypothesis�is�rejected�in� favor�of�the�alternative��This�would�indicate�that�the�observed�and�expected�proportions� were�not�equal�across�cells�such�that�the�two�categorical�variables�have�some�association�� The�larger�the�differences�between�the�observed�and�expected�proportions,�the�larger�the� value�of�the�test�statistic�and�the�more�likely�it�is�to�reject�the�null�hypothesis��Otherwise,� we� would� fail� to� reject� the� null� hypothesis,� indicating� that� the� observed� and� expected� proportions� were� approximately� equal,� such� that� the� two� categorical� variables� have� no� association� If�the�null�hypothesis�is�rejected,�then�one�may�wish�to�determine�for�which�combina- tion�of�categories�the�sample�proportions�are�different�from�their�respective�expected� proportions��Here�we�recommend�you�construct�2�×�2�contingency�tables�as�subsets�of� the�larger�table�and�conduct�chi-square�tests�of�association��If�you�would�like�to�con- trol� the� experimentwise� Type� I� error� rate� across� the� set� of� tests,� then� the� Bonferroni� method� is� recommended� where� the� α� level� is� divided� up� among� the� number� of� tests� conducted�� For� example,� with� α� =� �05� and� five� 2� ×� 2� tables,� one� would� conduct� five� tests� each� at� the� �01� level� of� α�� As� with� the� chi-square� goodness-of-fit� test,� it� is� also� possible� to� examine� the� standardized� residuals� (which� can� be� requested� in� SPSS)� to� determine�the�cells�that�have�significantly�different�observed�to�expected�proportions�� Cells�where�the�standardized�residuals�are�greater�(in�absolute�value�terms)�than�1�96� (when�α�=��05)�or�2�58�(when�α�=��01)�are�significantly�different�in�observed�to�expected� frequencies� Finally,�it�should�be�noted�that�we�have�only�considered�two-way�contingency�tables�here�� Multiway�contingency�tables�can�also�be�constructed�and�the�chi-square�test�of�association� utilized�to�determine�whether�there�is�an�association�among�several�categorical�variables� Let�us�complete�the�analysis�of�the�example�data��The�test�statistic�is�computed�as � χ π π 2 1 2 1 = − == ∑∑ n pc c C rc r rr R . . . ( ) � = − + − + − + − 20 80 55 55 20 20 45 45 20 65 55 55 20 35 42 2 2(. . ) . (. . ) . (. . ) . (. . 55 45 2) . � + − + − + − + − 20 50 55 55 20 50 45 45 20 25 55 55 20 75 42 2 2(. . ) . (. . ) . (. . ) . (. . 55 45 2) . � = + + + + + + + =2 2727 2 7778 3636 4444 0909 1111 3 2727 4 0000 13 33. . . . . . . . . 332 224 An Introduction to Statistical Concepts The�test�statistic�is�compared�to�the�critical�value,�from�Table�A�3,�of�.05 3 2χ �=�7�8147��Because�the� test�statistic�is�larger�than�the�critical�value,�we�reject�the�null�hypothesis�and�conclude� that� there� is� an� association� between� level� of� education� and� stance� on� the� gambling� bill��In�other�words,�stance�on�gambling�is�not�the�same�for�all�levels�of�education��The� cells�with�the�largest�contribution�to�the�test�statistic�give�some�indication�as�to�where� the� observed� and� expected� proportions� are� most� different�� Here� the� first� and� fourth� columns�have�the�largest�contributions�to�the�test�statistic�and�have�the�greatest�differ- ences�between�the�observed�and�expected�proportions;�these�would�be�of�interest�in�a� 2�×�2�follow-up�test� 8.2.3.1   Effect Size Several�measures�of�effect�size,�such�as�correlation�coefficients�and�measures�of�association,� can� be� requested� in� SPSS� and� are� commonly� reported� effect� size� indices� for� results� from� chi-square� test� of� association�� Which� effect� size� value� is� selected� depends� in� part� on� the� measurement�scale�of�the�variable��For�example,�researchers�working�with�nominal�data�can� select�a�contingency�coefficient,�phi�(for�2�×�2�tables),�Cramer’s�V�(for�tables�larger�than�2�×�2),� lambda,�or�an�uncertainty�coefficient��Correlation�options�available�for�ordinal�data�include� gamma,�Somer’s�d,�Kendall’s�tau-b,�and�Kendall’s�tau-c��From�the�contingency�coefficient,�C,� we�can�compute�Cohen’s�w�as�follows: w C C = − 2 21 Cohen’s�recommended�subjective�standard�for�interpreting�w�(as�well�as�the�other�correla- tion�coefficients�presented)�is�as�follows:�small�effect�size,�w�=��10;�medium�effect�size,�w�=��30;� and�large�effect�size,�w�=��50��See�Cohen�(1988)�for�further�details� 8.2.3.2   Assumptions The� same� two� assumptions� that� apply� to� the� chi-square� goodness-of-fit� test� also� apply� to� the� chi-square� test� of� association,� as� follows:� (1)� observations� are� independent� (which� is� met� when� a� random� sample� of� the� population� is� selected),� and� (2)� expected� frequency� is�at�least�5�per�cell��When�the�expected�frequency�is�less�than�5,�that�particular�cell�has� undue�influence�on�the�chi-square�statistic��In�other�words,�the�chi-square�test�of�association� becomes�too�sensitive�when�the�expected�values�are�less�than�5� 8.3 SPSS Once� again,� we� consider� the� use� of� SPSS� for� the� example� datasets�� While� SPSS� does� not� have�any�of�the�z�procedures�described�in�the�first�part�of�this�chapter,�it�is�capable�of�con- ducting�both�of�the�chi-square�procedures�covered�in�the�second�part�of�this�chapter,�as� described�in�the�following� 225Inferences About Proportions Chi-Square Goodness-of-Fit Test Step 1: To� conduct� the� chi-square� goodness-of-fit� test,� you� need� one� variable� that� is� either�nominal�or�ordinal�in�scale�(e�g�,�major)��To�conduct�the�chi-square�goodness-of-fit� test,�go�to�“Analyze”�in�the�top�pulldown�menu,�then�select�“Nonparametric Tests,” followed� by�“Legacy Dialogs,”� and� then�“Chi-Square.”� Following� the� screenshot� (step�1)�as�follows�produces�the�“Chi-Square Goodness-of-Fit”�dialog�box� A B C D Chi-square goodness-of-fit: Step 1 Step 2:� Next,� from� the� main�“Chi-Square Goodness-of-Fit”� dialog� box,� click� the� variable�(e�g�,�major)�and�move�it�into�the�“Test Variable List”�box�by�clicking�on�the� arrow�button��In�the�lower�right-hand�portion�of�the�screen�is�a�section�labeled�“Expected Values.”� The� default� is� to� conduct� the� analysis� with� the� expected� values� equal� for� each� category� (you� will� see� that� the� radio� button� for� “All categories equal”� is� prese- lected)��Much�of�the�time,�you�will�want�to�use�different�expected�values��To�define�different� expected�values,�click�on�the�“Values”�radio�button��Enter�each�expected�value�in�the�box� below�“Values,”�in�the�same�order�as�the�categories�(e�g�,�first�enter�the�expected�value�for� category�1�and�then�the�expected�value�for�category�2),�and�then�click�“Add”�to�define�the� value�in�the�box��This�sets�up�an�expected�value�for�each�category��Repeat�this�process�for� every�category�of�your�variable� 226 An Introduction to Statistical Concepts Chi-square goodness-of-fit: Step 2a Enter the expected value for the category that corresponds to the first numeric value (e.g., 1). Click on “Add” to define the value expected in each group. Repeat this for each category. Chi-square goodness-of-fit: Step 2b The expected values will appear in rank order from the first category to the last category. 227Inferences About Proportions Then�click�on�“OK”�to�run�the�analysis��The�output�is�shown�in�Table�8�4� Table 8.4 SPSS�Results�for�Undergradute�Majors�Example Observed N reflects the observed frequencies from your sample. Expected N reflects the expected values that were input by the researcher. The residual is simply the difference between the observed and expected frequencies. “Asymp. sig.” is the observed p value for the chi-square goodness-of-fit test. It is interpreted as: there is about a 1% probability of the sample proportions occurring by chance if the null hypothesis is really true (i.e., if the population proportions are 20, 40, 10, and 30). df are the degrees of freedom. For the chi-square goodness-of-fit test, they are calculated as J – 1 (i.e., one less than the number of categories). College Major Chi-Square Test Frequencies Education 5.0 10.0 .0 –15.0 20.0 40.0 10.0 30.0 25 50 10 15 100 Arts and sciences Communications Business Total College Major Observed N Expected N Residual Chi-square df Asymp. sig. .010 a 0 cells (.0%) have expected frequencies less than 5. The minimum expected cell frequency is 10.0. 3 11.250a Test Statistics “Chi-square” is the test statistic value and is calculated as: = 100 j=1 j=1 J 4 (.25 – .20)2 .20 .40 .10 .30 +++ = 11.25 (.50 – .40)2 (pj – πj)2 (.10 – .10)2 (.15 – .30)2 Σ Σ2 = n πj Interpreting the output:� The� top� table� provides� the� frequencies� observed� in� the� sample� (“Observed N”)� and� the� expected� frequencies� given� the� values� defined� by� the� researcher�(“Expected N”)��The�“Residual”�is�simply�the�difference�between�the�two� Ns��The�chi-square�test�statistic�value�is�11�25,�and�the�associated�p�value�is��01��Since�p�is� less�than�α,�we�reject�the�null�hypothesis��Let�us�translate�this�back�to�the�purpose�of�our� null� hypothesis� statistical� test�� There� is� evidence� to� suggest� that� the� sample� proportions� observed�differ�from�the�proportions�of�college�majors�nationally��Follow-up�tests�to�deter- mine�which�cells�are�statistically�different�in�the�observed�to�expected�proportions�can�be� 228 An Introduction to Statistical Concepts conducted� by� examining� the� standardized� residuals�� In� this� example,� the� standardized� residuals�were�computed�previously�as�follows: R O E E R Education Arts and sciences = − = − = = − = 25 20 20 1 118 50 40 40 1 58 . . 11 10 10 10 0 15 30 30 2 739 R R Communication Bu ess = − = = − = −sin . The�standardized�residual�for�business�is�greater�(in�absolute�value�terms)�than�1�96�(given� α� =� �05)� and� thus� suggests� that� there� are� different� observed� to� expected� frequencies� for� students�majoring�in�business�at�ICU�compared�to�national�estimates��This�category�is�the� one�contributing�most�to�the�statistically�significant�chi-square�statistic� The�effect�size�can�be�calculated�as�follows: Effect size N J = − = − = = χ2 1 11 25 100 4 1 11 25 300 0375 ( ) . ( ) . . Chi-Square Test of Association Step 1:�To�conduct�a�chi-square�test�of�association,�you�need�two�categorical�variables�(nomi- nal�and/or�ordinal)�whose�frequencies�you�wish�to�associate�(e�g�,�education�level�and�gambling� stance)��To�compute�the�chi-square�test�of�association,�go�to�“Analyze”�in�the�top�pulldown,� then�select�“Descriptive Statistics,”�and�then�select�the�“Crosstabs”�procedure� A B C Chi-square test of association: Step 1 229Inferences About Proportions Step 2:�Select�the�dependent�variable�and�move�it�into�the�“Row(s)”�box�by�clicking�on� the�arrow�key�[e�g�,�here�we�used�stance�on�gambling�as�the�dependent�variable�(1�=�support;� 0�=�not�support)]��Then�select�the�independent�variable�and�move�it�into�the�“Column(s)”� box� [in� this� example,� level� of� education� is� the� independent� variable� (1� =� less� than� high� school;�2�=�high�school;�3�=�undergraduate;�4�=�graduate)]� Select the variable of interest from the list on the left and use the arrow to move to the boxes on the right. Clicking on “Cells” provides options for what statistics to display in the cells. See step 4. Clicking on “Format” allows the option of displaying the categories in ascending or descending order. See step 5. Chi-square test of association: Step 2 Clicking on “Statistics” will allow you to select various statistics to generate (including the chi-square test statistics value and various correlation coefficients). See step 3. The dependent variable should be displayed in the row(s) and the independent variable in the column(s). Step 3:�In�the�top�right�corner�of�the�“Crosstabs”�dialog�box�(see�screenshot�step�2),� click� on� the� button� labeled�“Statistics.”� From� here,� placing� a� checkmark� in� the� box� for� “Chi-square”� will� produce� the� chi-square� test� statistic� value� and� resulting� null� hypothesis� statistical� test� results� (including� degrees� of� freedom� and� p� value)�� Also� from� “Statistics,”�you�can�select�various�measures�of�association�that�can�serve�as�an�effect� size� (i�e�,� correlation� coefficient� values)�� Which� correlation� is� selected� should� depend� on� the� measurement� scales� of� your� variables�� We� are� working� with� two� nominal� variables;� thus,�for�purposes�of�this�illustration,�we�will�select�both�“Phi and Cramer’s V”�and� “Contingency coefficient”�just�to�illustrate�two�different�effect�size�indices�(although� it�is�standard�practice�to�use�and�report�only�one�effect�size)��We�will�use�the�contingency� coefficient�to�compute�Cohen’s�w��Click�on�“Continue”�to�return�to�the�main�“Crosstabs”� dialog�box� 230 An Introduction to Statistical Concepts Chi-square test of association: Step 3 Step 4:�In�the�top�right�corner�of�the�“Crosstabs”�dialog�box�(see�screenshot�step�2),� click�on�the�button�labeled�“Cells.”�From�the “Cells”�dialog�box,�options�are�available� for�selecting�counts�and�percentages��We�have�requested�“Observed”�and�“Expected”� counts,� “Column”� percentages,� and� “Standardized”� residuals�� We� will� review� the� expected� counts� to� determine� if� the� assumption� of� five� expected� frequencies� per� cell� is� met��We�will�use�the�standardized�residuals�post�hoc�if�the�results�of�the�test�are�statisti- cally�significant�to�determine�which�cell(s)�is�most�influencing�the�chi-square�value��Click� “Continue”�to�return�to�the�main�“Crosstabs”�dialog�box� Ch-square test of association: Step 4 231Inferences About Proportions Step 5:� In� the� top� right� corner� of� the� “Crosstabs”� dialog� box� (see� screenshot� step�2),�click�on�the�button�labeled�“Format.”�From�the�“Format”�dialog�box,�options� are� available� for� determining� which� order,� “Ascending”� or� “Descending,”� you� want� the� row� values� presented� in� the� contingency� table� (we� asked� for� descending� in� this� example,� such� that� row� 1� was� gambling� =� 1� and� row� 2� was� gambling� =� 0)�� Click� “Continue”� to� return� to� the� main�“Crosstabs”� dialog� box�� Then� click� on�“OK”� to� run�the�analysis� Chi-square test of association: Step 5 Interpreting the Output:�The�output�appears�in�Table�8�5,�where�the�top�box�(“Case Processing Summary”)�provides�information�on�the�sample�size�and�frequency�of�miss- ing� data� (if� any)�� The�“Crosstabulation”� table� is� next� and� provides� the� contingency� table� (i�e�,� counts,� percentages,� and� standardized� residuals)�� The�“Chi-Square Tests”� box� gives� the� results� of� the� procedure� (including� chi-square� test� statistic� value� labeled� “Pearson Chi-Square,”�degrees�of�freedom,�and�p�value�labeled�as�“Asymp. Sig.”)�� The� likelihood� ratio� chi-square� uses� a� different� mathematical� formula� than� the� Pearson� chi-square;�however�for�large�sample�sizes,�the�values�for�the�likelihood�ratio�chi-square� and�the�Pearson�chi-square�should�be�similar�(and�rarely�should�the�two�statistics�suggest� different�conclusions�in�terms�of�rejecting�or�failing�to�reject�the�null�hypothesis)��The�lin- ear�by�linear�association�statistic,�also�known�as�the�Mantel-Haenszel�chi-square,�is�based� on�the�Pearson�correlation�and�tests�whether�there�is�a�linear�association�between�the�two� variables�(and�thus�should�not�be�used�for�nominal�variables)� For�the�contingency�coefficient,�C,�of��378,�we�compute�Cohen’s�w�effect�size�as�follows:� w C C = − = − = − = = 2 2 2 21 378 1 378 143 1 143 167 408 . . . . . . Cohen’s�w�of��408�would�be�interpreted�as�a�moderate�to�large�effect��Cramer’s�V,�as�seen�in� the�output,�is��408�and�would�be�interpreted�similarly—a�moderate�to�large�effect� 8.4 G*Power A�priori�power�can�be�determined�using�specialized�software�(e�g�,�Power�and�Precision,� Ex-Sample,�G*Power)�or�power�tables�(e�g�,�Cohen,�1988),�as�previously�described��However,� since� SPSS� does� not� provide� power� information� for� the� results� of� the� chi-square� test� of� association�just�conducted,�let�us�use�G*Power�to�compute�the�post�hoc�power�of�our�test� 232 An Introduction to Statistical Concepts Table 8.5 SPSS�Results�for�Gambling�Example Review the standardized residuals to determine which cell(s) are contributing to the statistically significant chi-square value. Standardized residuals greater than an absolute value of 1.96 (critical value when alpha=.05) indicate that cell is contributing to the association between the variables. In this case, only one cell, graduate/do not support, has a standardized residual of 2.0 and thus is contributing to the relationship. When analyzing the percentages in the crosstab table, compare the categories of the dependent variable (rows) across the columns of the independent variable (columns). For example, of respondents with a high school diploma, 65% support gambling Zero cells have expected counts less than five, thus we have met this assumption of the chi-square test of association. Degrees of freedom are computed as: (Rows–1)(Columns – 1) = (2 – 1)(4 – 1) = 3 The probability is less than 1% (see “Asymp. sig.”) that we would see these proportions by random chance if the proportions were all equal (i.e., if the null hypothesis were really true). We have a 2 × 4 table thus Cramer’s V is appropriate. It is statistically significant, and using Cohen’s interpretations, reflects a moderate to large effect size. “Pearson Chi-square” is the test statistic value and is calculated as (see Section 8.2.3 for the full computation): 2 = n.c πr. R r=1 Σ C (prc–πr.) 2 c=1 Σ Observed and expected counts The contingency coefficient can be used to compute Cohen’s w, a measure of effect size as follows: w = = = .408 C 2 .3782 1–.37821–C 2 Stance on Gambling * Level of Education Crosstabulation Chi-Square Tests Symmetric Measures Cases Valid Missing Total N N NPercent Percent Percent 100.080.00100.080Stance on gambling* Level of education Level of Education Less Than High School High School Undergraduate Graduate Total Stance on gambling Support Do not support Count % Within level of education Std. residual Count Expected count % Within level of education Std. residual Count Expected count % Within level of education Pearson chi-square Likelihood ratio Linear-by-linear association N of valid cases 16 11.0 80.0% 1.5 4 9.0 20.0% –1.7 20 20.0 100.0% 20 –.7 35.0% 9.0 7 .6 65.0% 11.0 13 20.0 100.0% 20 2.0 75.0% 9.0 15 –1.8 25.0% 11.0 5 20.0 100.0% 45.0% 36.0 36 55.0% 44.0 44 80 80.0 100.0% 20 20.0 100.0% 10 11.0 50.0% –.3 10 9.0 50.0% .3 Value 13.333a 13.969 12.927 80 a0 cells (.0%) have expected co unt less than 5. The minimum expected count is 9.00. 3 3 1 .000 Value .408 .408 .378 80 Nominal by nominal N of valid cases .004 .004 .004 Approx. Sig. .003 .004 df Asymp. Sig. (2-Sided) Phi Cramer’s V Contingency coefficient Case Processing Summary Expected count 233Inferences About Proportions Post Hoc Power for the Chi-Square Test of Association Using G*Power The�first�thing�that�must�be�done�when�using�G*Power�for�computing�post�hoc�power�is� to� select� the� correct� test� family�� In� our� case,� we� conducted� a� chi-square� test� of� associa- tion;� therefore,� the� toggle� button� must� be� used� to� change� the� test� family� to� χ2�� Next,� we� need�to�select� the�appropriate� statistical� test��We� toggle�to�“Goodness-of-fit tests: Contingency tables.”�The�“Type of power analysis”�desired�then�needs�to�be� selected��To�compute�post�hoc�power,�we�need�to�select�“Post hoc: Compute achieved power—given α, sample size, and effect size.” The� “Input Parameters”� must� then� be� specified�� The� first� parameter� is� speci- fication� of� the� effect� size� w� (this� was� computed� by� hand� from� the� contingency� coef- ficient� and� w� =� �408)�� The� alpha� level� we� tested� at� was� �05,� the� sample� size� was� 80,� and�the�degrees�of�freedom�were�3��Once�the�parameters�are�specified,�simply�click�on� “Calculate”�to�generate�the�achieved�power�statistics� The�“Output Parameters”�provide�the�relevant�statistics�given�the�input�just�speci- fied��In�this�example,�we�were�interested�in�determining�post�hoc�power�given�a�two-tailed� test,�with�an�observed�effect�size�of��408,�an�alpha�level�of��05,�and�sample�size�of�80��Based� on�those�criteria,�the�post�hoc�power�was��88��In�other�words,�with�a�sample�size�of�80,�test- ing�at�an�alpha�level�of��05�and�observing�a�moderate�to�large�effect�of��408,�then�the�power� of�our�test�was��88—the�probability�of�rejecting�the�null�hypothesis�when�it�is�really�false� was�88%,�which�is�very�high�power��Keep�in�mind�that�conducting�power�analysis�a�priori� is�recommended�so�that�you�avoid�a�situation�where,�post�hoc,�you�find�that�the�sample� size�was�not�sufficient�to�reach�the�desired�level�of�power�(given�the�observed�effect�size� and�alpha�level)� Once the parameters are specified, click on “Calculate.”�e “Input Parameters” for computing post hoc power must be specified including: Chi-square test of association 1. Observed effect size w 2. Alpha level 3. Total sample size 4. Degrees of freedom 234 An Introduction to Statistical Concepts 8.5 Template and APA-Style Write-Up We�finish�the�chapter�by�presenting�templates�and�APA-style�write-ups�for�our�examples�� First�we�present�an�example�paragraph�detailing�the�results�of�the�chi-square�goodness-of- fit�test�and�then�follow�this�by�the�chi-square�test�of�association� Chi-Square Goodness-of-Fit Test Recall�that�our�graduate�research�assistant,�Marie,�was�working�with�Tami,�a�staff�member�in� the�Undergraduate�Services�Office�at�ICU,�to�assist�in�analyzing�the�proportions�of�students� enrolled� in� undergraduate� majors�� Her� task� was� to� assist� Tami� with� writing� her� research� question�(Are the sample proportions of undergraduate student college majors at ICU in the same proportions as those nationally?)�and�generating�the�statistical�test�of�inference�to�answer�her� question��Marie�suggested�a�chi-square�goodness-of-fit�test�as�the�test�of�inference��A�tem- plate� for� writing� a� research� question� for� a� chi-square� goodness-of-fit� test� is� presented� as� follows: Are the sample proportions of [units in categories] in the same pro- portions of those [identify the source to which the comparison is being made]? It�may�be�helpful�to�include�in�the�results�of�the�chi-square�goodness-of-fit�test�information� on�an�examination�of�the�extent�to�which�the�assumptions�were�met�(recall�there�are�two� assumptions:�independence�and�expected�frequency�of�at�least�5�per�cell)��This�assists�the� reader�in�understanding�that�you�were�thorough�in�data�screening�prior�to�conducting�the� test�of�inference� A chi-square goodness-of-fit test was conducted to determine if the sample proportions of undergraduate student college majors at ICU were in the same proportions of those reported nationally. The test was conducted using an alpha of .05. The null hypothesis was that the proportions would be as follows: .20 education, .40 arts and sciences, .10 communications, and .30 business. The assumption of an expected frequency of at least 5 per cell was met. The assumption of indepen- dence was met via random selection. As shown in Table 8.4, there was a statistically significant differ- ence between the proportion of undergraduate majors at ICU and those reported nationally (χ2 = 11.250, df = 3, p = .010). Thus, the null hypothesis that the proportions of undergraduate majors at ICU par- allel those expected at the national level was rejected at the .05 level of significance. The effect size (χ2/[N(J − 1)]) was .0375, and interpreted using Cohen’s guide (1988) as a very small effect. Follow-up tests were conducted by examining the standardized residu- als. The standardized residual for business was −2.739 and thus sug- gests that there are different observed to expected frequencies for students majoring in business at ICU compared to national estimates. 235Inferences About Proportions Therefore, business is the college major that is contributing most to the statistically significant chi-square statistic. Chi-Square Test of Association Marie,�our�graduate�research�assistant,�was�also�working�with�Matthew,�a�lobbyist�inter- ested�in�examining�the�association�between�education�level�and�stance�on�gambling��Marie� was�tasked�with�assisting�Matthew�in�writing�his�research�question�(Is there an association between level of education and stance on gambling?)� and� generating� the� test� of� inference� to� answer�his�question��Marie�suggested�a�chi-square�test�of�association�as�the�test�of�infer- ence��A�template�for�writing�a�research�question�for�a�chi-square�test�of�association�is�pre- sented�as�follows: Is there an association between [independent variable] and [dependent variable]? It� may� be� helpful� to� include� in� the� results� of� the� chi-square� test� of� association� informa- tion�on�the�extent�to�which�the�assumptions�were�met�(recall�there�are�two�assumptions:� independence� and� expected� frequency� of� at� least� 5� per� cell)�� This� assists� the� reader� in� understanding� that� you� were� thorough� in� data� screening� prior� to� conducting� the� test� of� inference�� It� is� also� desirable� to� include� a� measure� of� effect� size�� Given� the� contingency� coefficient,�C,�of��378,�we�computed�Cohen’s�w�effect�size�to�be��408,�which�would�be�inter- preted�as�a�moderate�to�large�effect� A chi-square test of association was conducted to determine if there was a relationship between level of education and stance on gambling. The test was conducted using an alpha of .01. It was hypothesized that there was an association between the two variables. The assump- tion of an expected frequency of at least 5 per cell was met. The assumption of independence was not met since the respondents were not randomly selected; thus, there is an increased probability of a Type I error. From Table 8.5, we can see from the row marginals that 55% of the individuals overall support gambling. However, lower levels of edu- cation have a much higher percentage of support, while the highest level of education has a much lower percentage of support. Thus, there appears to be an association or relationship between gambling stance and level of education. This is subsequently supported sta- tistically from the chi-square test (χ2 = 13.333, df = 3, p = .004). Thus, the null hypothesis that there is no association between stance on gambling and level of education was rejected at the .01 level of significance. Examination of the standardized residuals suggests that respondents who hold a graduate degree are signifi- cantly more likely not to support gambling (standardized residual = 2.0) as compared to all other respondents. The effect size, Cohen’s w, was computed to be .408, which is interpreted to be a moderate to large effect (Cohen, 1988). 236 An Introduction to Statistical Concepts 8.6 Summary In�this�chapter,�we�described�a�third�inferential�testing�situation:�testing�hypotheses�about� proportions��Several�inferential�tests�and�new�concepts�were�discussed��The�new�concepts� introduced� were� proportions,� sampling� distribution� and� standard� error� of� a� proportion,� contingency�table,�chi-square�distribution,�and�observed�versus�expected�frequencies��The� inferential� tests� described� involving� the� unit� normal� distribution� were� tests� of� a� single� proportion,� of� two� independent� proportions,� and� of� two� dependent� proportions�� These� tests�are�parallel�to�the�tests�of�one�or�two�means�previously�discussed�in�Chapters�6�and�7�� The�inferential�tests�described�involving�the�chi-square�distribution�consisted�of�the�chi- square� goodness-of-fit� test� and� the� chi-square� test� of� association�� In� addition,� examples� were�presented�for�each�of�these�tests��Box�8�1�summarizes�the�tests�reviewed�in�this�chap- ter�and�the�key�points�related�to�each�(including�the�distribution�involved�and�recommen- dations�for�when�to�use�the�test)� STOp aNd ThINk bOx 8.1 Characteristics�and�Recommendations�for�Inferences�About�Proportions Test Distribution When to Use Inferences�about�a� single�proportion� (akin�to�one-sample� mean�test) Unit�normal,�z •��To�determine�if�the�sample�proportion� differs�from�a�hypothesized�proportion •�One�variable,�nominal�or�ordinal�in�scale Inferences�about�two� independent� proportions�(akin�to� the�independent�t�test) Unit�normal,�z •��To�determine�if�the�population�proportion� for�one�group�differs�from�the�population� proportion�for�a�second�independent�group •��Two�variables,�both�nominal�and�ordinal� in scale Inferences�about�two� dependent� proportions�(akin�to� the�dependent�t�test) Unit�normal,�z •��To�determine�if�the�population�proportion� for�one�group�is�different�than�the� population�proportion�for�a�second� dependent�group •��Two�variables�of�the�same�measure,�both� nominal�and�ordinal�in�scale Chi-square�goodness- of-fit�test Chi-square •��To�determine�if�observed�proportions�differ� from�what�would�be�expected�a�priori •�One�variable,�nominal�or�ordinal�in�scale Chi-square�test�of� association Chi-square •��To�determine�association/relationship� between�two�variables�based�on�observed� proportions •��Two�variables,�both�nominal�and�ordinal� in scale At�this�point,�you�should�have�met�the�following�objectives:�(a)�be�able�to�understand�the� basic�concepts�underlying�tests�of�proportions,�(b)�be�able�to�select�the�appropriate�test,�and� (c)�be�able�to�determine�and�interpret�the�results�from�the�appropriate�test��In�Chapter�9,�we� discuss�inferential�tests�involving�variances� 237Inferences About Proportions Problems Conceptual problems 8.1� How�many�degrees�of�freedom�are�there�in�a�5�×�7�contingency�table�when�the�chi- square�test�of�association�is�used? � a�� 12 � b�� 24 � c�� 30 � d�� 35 8.2� The�more�that�two�independent�sample�proportions�differ,�all�else�being�equal,�the� smaller�the�z�test�statistic��True�or�false? 8.3� The�null�hypothesis�is�a�numerical�statement�about�an�unknown�parameter��True�or� false? 8.4� In�testing�the�null�hypothesis�that�the�proportion�is��50,�the�critical�value�of�z�increases� as�degrees�of�freedom�increase��True�or�false? 8.5� A�consultant�found�a�sample�proportion�of�individuals�favoring�the�legalization�of� drugs� to� be� −�50�� I� assert� that� a� test� of� whether� that� sample� proportion� is� different� from�0�would�be�rejected��Am�I�correct? 8.6� Suppose�we�wish�to�test�the�following�hypotheses�at�the��10�level�of�significance: H H 0 1 60 60 : . : . π π = >
A�sample�proportion�of��15�is�observed��I�assert�if�I�conduct�the�z�test�that�it�is�possible�
to�reject�the�null�hypothesis��Am�I�correct?
8.7� When� the� chi-square� test� statistic� for� a� test� of� association� is� less� than� the� cor-
responding� critical� value,� I� assert� that� I� should� reject� the� null� hypothesis�� Am� I�
correct?
8.8� Other�things�being�equal,�the�larger�the�sample�size,�the�smaller�the�value�of�sp��True�
or�false?
8.9� In� the� chi-square� test� of� association,� as� the� difference� between� the� observed� and�
expected�proportions�increases,
� a�� The�critical�value�for�chi-square�increases�
� b�� The�critical�value�for�chi-square�decreases�
� c�� The�likelihood�of�rejecting�the�null�hypothesis�decreases�
� d�� The�likelihood�of�rejecting�the�null�hypothesis�increases�
8.10� When� the� hypothesized� value� of� the� population� proportion� lies� outside� of� the� CI�
around�a�single�sample�proportion,�I�assert�that�the�researcher�should�reject�the�null�
hypothesis��Am�I�correct?

238 An Introduction to Statistical Concepts
8.11� Statisticians�at�a�theme�park�want�to�know�if�the�same�proportions�of�visitors�select�
the� Jungle� Safari� as� their� favorite� ride� as� compared� to� the� Mountain� Rollercoaster��
They�sample�150�visitors�and�collect�data�on�one�variable:�favorite�ride�(two�catego-
ries:�Jungle�Safari�and�Mountain�Rollercoaster)��Which�statistical�procedure�is�most�
appropriate�to�use�to�test�the�hypothesis?
� a�� Chi-square�goodness-of-fit�test
� b�� Chi-square�test�of�association
8.12� Sophie� is� a� reading� teacher�� She� is� researching� the� following� question:� is� there� a�
relationship�between�a�child’s�favorite�genre�of�book�and�their�socioeconomic�sta-
tus�category?�She�collects�data�from�35�children�on�two�variables:�(a)�favorite�genre�
of�book�(two�categories:�fiction,�nonfiction)�and�(b)�socioeconomic�status�category�
(three�categories:�low,�middle,�high)��Which�statistical�procedure�is�most�appropri-
ate�to�use�to�test�the�hypothesis?
� a�� Chi-square�goodness-of-fit�test
� b�� Chi-square�test�of�association
Computational problems
8.1� For�a�random�sample�of�40�widgets�produced�by�the�Acme�Widget�Company,�30�suc-
cesses�and�10�failures�are�observed��Test�the�following�hypotheses�at�the��05�level�of�
significance:
H
H
0
1
60
60
: .
: .
π
π
=

8.2� The� following� data� are� calculated� for� two� independent� random� samples� of� male�
and� female� teenagers,� respectively,� on� whether� they� expect� to� attend� graduate�
school:�n1�=�48,�p1�=�18/48,�n2�=�52,�p2�=�33/52��Test�the�following�hypotheses�at�the�
�05�level�of�significance:
H
H
0 1 2
1 1 2
0
0
:
:
π π
π π
− =
− ≠
8.3� The�following�frequencies�of�successes�and�failures�are�obtained�for�two�dependent�
random�samples�measured�at�the�pretest�and�posttest�of�a�weight�training�program:
Pretest
Posttest Success Failure
Failure 18 30
Success 33 19
Test�the�following�hypotheses�at�the��05�level�of�significance:
H
H
0 1 2
1 1 2
0
0
:
:
π π
π π
− =
− ≠

239Inferences About Proportions
8.4� A� chi-square� goodness-of-fit� test� is� to� be� conducted� with� six� categories� of� profes-
sions�to�determine�whether�the�sample�proportions�of�those�supporting�the�current�
government�differ�from�a�priori�national�proportions��The�chi-square�test�statistic�is�
equal�to�16�00��Determine�the�result�of�this�test�by�looking�up�the�critical�value�and�
making�a�statistical�decision,�using�α�=��01�
8.5� A�chi-square�goodness-of-fit�test�is�to�be�conducted�to�determine�whether�the�sample�
proportions�of�families�in�Florida�who�select�various�schooling�options�(five�catego-
ries�including�home�school,�public�school,�public�charter�school,�private�school,�and�
other)� differ� from� the� proportions� reported� nationally�� The� chi-square� test� statistic�
is�equal�to�9�00��Determine�the�result�of�this�test�by�looking�up�the�critical�value�and�
making�a�statistical�decision,�using�α�=��05�
8.6� A� random� sample� of� 30� voters� was� classified� according� to� their� general� political�
beliefs� (liberal� vs�� conservative)� and� also� according� to� whether� they� voted� for� or�
against�the�incumbent�representative�in�their�town��The�results�were�placed�into�the�
following�contingency�table:
Liberal Conservative
Yes 10 5
No 5 10
Use� the� chi-square� test� of� association� to� determine� whether� political� belief� is� inde-
pendent�of�voting�behavior�at�the��05�level�of�significance�
8.7� A�random�sample�of�40�kindergarten�children�was�classified�according�to�whether�
they� attended� at� least� 1� year� of� preschool� prior� to� entering� kindergarten� and� also�
according�to�gender��The�results�were�placed�into�the�following�contingency�table:
Boy Girl
Preschool 12 10
No�preschool 8 10
Use�the�chi-square�test�of�association�to�determine�whether�enrollment�in�preschool�
is�independent�of�gender�at�the��10�level�of�significance�
Interpretive problem
There�are�numerous�ways�to�use�the�survey�1�dataset�from�the�website�as�there�are�several�
categorical�variables��Here�are�some�examples�for�the�tests�described�in�this�chapter:
� a�� �Conduct�a�test�of�a�single�proportion:�Is�the�sample�proportion�of�females�equal�
to��50?
� b�� Conduct�a�test�of�two�independent�proportions:�Is�there�a�difference�between�the�
sample�proportion�of�females�who�are�right-handed�and�the�sample�proportion�of�
males�who�are�right-handed?
� c�� Conduct� a� test� of� two� dependent� proportions:� Is� there� a� difference� between� the�
sample�proportion�of�students’�mothers�who�are�right-handed�and�the�sample�pro-
portion�of�students’�fathers�who�are�right-handed?

240 An Introduction to Statistical Concepts
� d�� Conduct� a� chi-square� goodness-of-fit� test:� Do� the� sample� proportions� for� the�
political�view�categories�differ�from�their�expected�proportions�(very�liberal�=��10,�
liberal�=��15,�middle�of�the�road�=��50,�conservative�=��15,�very�conservative�=��10)?�
Determine�if�the�assumptions�of�the�test�are�met��Determine�and�interpret�the�cor-
responding�effect�size�
� e�� Conduct�a�chi-square�goodness-of-fit�test�to�determine�if�there�are�similar�propor-
tions� of� respondents� who� can� (vs�� cannot)� tell� the� difference� between� Pepsi� and�
Coke��Determine�if�the�assumptions�of�the�test�are�met��Determine�and�interpret�
the�corresponding�effect�size�
� f�� Conduct�a�chi-square�test�of�association:�Is�there�an�association�between�political�
view� and� gender?� Determine� if� the� assumptions� of� the� test� are� met�� Determine�
and�interpret�the�corresponding�effect�size�
� g�� Compute�a�chi-square�test�of�association�to�examine�the�relationship�between�if�a�
person�smokes�and�their�political�view��Determine�if�the�assumptions�of�the�test�
are�met��Determine�and�interpret�the�corresponding�effect�size�

241
9
Inferences About Variances
Chapter Outline
9�1� New�Concepts
9�2� Inferences�About�Single�Variance
9�3� Inferences�About�Two�Dependent�Variances
9�4� �Inferences�About�Two�or�More�Independent�Variances�(Homogeneity�of�
Variance�Tests)
9�4�1� Traditional�Tests
9�4�2� Brown–Forsythe�Procedure
9�4�3� O’Brien�Procedure
9�5� SPSS
9�6� Template�and�APA-Style�Write-Up
Key Concepts
� 1�� Sampling�distributions�of�the�variance
� 2�� The�F�distribution
� 3�� Homogeneity�of�variance�tests
In�Chapters�6�through�8,�we�looked�at�testing�inferences�about�means�(Chapters�6�and�7)�
and�about�proportions�(Chapter�8)��In�this�chapter,�we�examine�inferential�tests�involving�
variances�� Tests� of� variances� are� useful� in� two� applications,� (a)� as� an� inferential� test� by�
itself�and�(b)�as�a�test�of�the�homogeneity�of�variance�assumption�for�another�procedure�
(e�g�,�t�test,�analysis�of�variance�[ANOVA])��First,�a�researcher�may�want�to�perform�infer-
ential�tests�on�variances�for�their�own�sake,�in�the�same�fashion�that�we�described�for�the�
one-�and�two-sample�t�tests�on�means��For�example,�we�may�want�to�assess�whether�the�
variance�of�undergraduates�at�Ivy-Covered�University�(ICU)�on�an�intelligence�measure�is�
the�same�as�the�theoretically�derived�variance�of�225�(from�when�the�test�was�developed�
and�normed)��In�other�words,�is�the�variance�at�a�particular�university�greater�than�or�less�
than� 225?� As� another� example,� we� may� want� to� determine� whether� the� variances� on� an�
intelligence�measure�are�consistent�across�two�or�more�groups;�for�example,�is�the�variance�
of�the�intelligence�measure�at�ICU�different�from�that�at�Podunk�University?

242 An Introduction to Statistical Concepts
Second,�for�some�procedures�such�as�the�independent�t�test�(Chapter�7)�and�the�ANOVA�
(Chapter� 11),� it� is� assumed� that� the� variances� for� two� or� more� independent� samples� are�
equal� (known� as� the� homogeneity� of� variance� assumption)�� Thus,� we� may� want� to� use�
an� inferential� test� of� variances� to� assess� whether� this� assumption� has� been� violated� or�
not�� The� following� inferential� tests� of� variances� are� covered� in� this� chapter:� (a)� testing�
whether�a�single�variance�is�different�from�a�hypothesized�value,�(b)�testing�whether�two�
dependent�variances�are�different,�and�(c)�testing�whether�two�or�more�independent�vari-
ances� are� different�� We� utilize� many� of� the� foundational� concepts� previously� covered� in�
Chapters�6�through�8��New�concepts�to�be�discussed�include�the�following:�the�sampling�
distributions� of� the� variance,� the� F� distribution,� and� homogeneity� of� variance� tests�� Our�
objectives�are�that�by�the�end�of�this�chapter,�you�will�be�able�to�(a)�understand�the�basic�
concepts� underlying� tests� of� variances,� (b)� select� the� appropriate� test,� and� (c)� determine�
and�interpret�the�results�from�the�appropriate�test�
9.1 New Concepts
As� you� remember,� Marie� is� a� graduate� student� working� on� a� degree� in� educational�
research��She�has�been�building�her�statistical�skills�and�is�becoming�quite�adept�at�apply-
ing� her� skills� as� she� completes� tasks� assigned� to� her� by� her� faculty� advisor�� We� revisit�
Marie�again�
Another�call�has�been�fielded�by�Marie’s�advisor�for�assistance�with�statistical�analysis��
This�time,�it�is�Jessica,�an�elementary�teacher�within�the�community��Having�built�quite�
a�reputation�for�success�in�statistical�consultations,�Marie’s�advisor�requests�that�Marie�
work�with�Jessica�
Jessica�shares�with�Marie�that�she�is�conducting�a�teacher�research�project�related�to�
achievement�of�first-grade�students�at�her�school��Jessica�wants�to�determine�if�the�vari-
ances�of�the�achievement�scores�differ�when�children�begin�school�in�the�fall�as�com-
pared� to� when� they� end� school� in� the� spring�� Marie� suggests� the� following� research�
question:�Are�the�variances�of�achievement�scores�for�first-grade�children�the�same�in�
the�fall�as�compared�to�the�spring?�Marie�suggests�a�test�of�variance�as�the�test�of�infer-
ence��Her�task�is�then�to�assist�Jessica�in�generating�the�test�of�inference�to�answer�her�
research�question�
This�section�deals�with�concepts�for�testing�inferences�about�variances,�in�particular,�the�
sampling�distributions�underlying�such�tests��Subsequent�sections�deal�with�several�infer-
ential�tests�of�variances��Although�the�sampling�distribution�of�the�mean�is�a�normal�distri-
bution�(Chapters�6�and�7),�and�the�sampling�distribution�of�a�proportion�is�either�a�normal�
or�chi-square�distribution�(Chapter�8),�the�sampling distribution of a variance�is�either�a�
chi-square�distribution�for�a�single�variance,�a�t�distribution�for�two�dependent�variances,�
or� an� F� distribution� for� two� or� more� independent� variances�� Although� we� have� already�
discussed�the�t�distribution�in�Chapter�6�and�the�chi-square�distribution�in�Chapter�8,�we�
need�to�discuss�the�F�distribution�(named�in�honor�of�the�famous�statistician�Sir�Ronald�A��
Fisher)�in�some�detail�here�

243Inferences About Variances
Like�the�normal,�t,�and�chi-square�distributions,�the�F�distribution�is�really�a�family�of�
distributions��Also,�like�the�t�and�chi-square�distributions,�the�F�distribution�family�mem-
bers� depend� on� the� number� of� degrees� of� freedom� represented�� Unlike� any� previously�
discussed�distribution,�the�F�distribution�family�members�actually�depend�on�a�combina-
tion�of�two�different�degrees�of�freedom,�one�for�the�numerator�and�one�for�the�denomina-
tor��The�reason�is�that�the�F�distribution�is�a�ratio�of�two�chi-square�variables��To�be�more�
precise,�F�with�ν1�degrees�of�freedom�for�the�numerator�and�ν2�degrees�of�freedom�for�the�
denominator�is�actually�a�ratio�of�the�following�chi-square�variables:

Fν ν
ν
ν
χ ν
χ ν1 2
1
2
2
1
2
2
,
/
/
=
For�example,�the�F�distribution�for�1�degree�of�freedom�numerator�and�10�degrees�of�free-
dom�denominator�is�denoted�by�F1,10��The�F�distribution�is�generally�positively�skewed�
and�leptokurtic�in�shape�(like�the�chi-square�distribution)�and�has�a�mean�of�ν2/(ν2�−�2)�
when�ν2�>�2�(where�ν2�represents�the�denominator�degrees�of�freedom)��A�few�examples�
of�the�F�distribution�are�shown�in�Figure�9�1�for�the�following�pairs�of�degrees�of�freedom�
(i�e�,�numerator,�denominator):�F10,10;�F20,20;�and�F40,40�
Critical� values� for� several� levels� of� α� of� the� F� distribution� at� various� combinations� of�
degrees�of�freedom�are�given�in�Table�A�4��The�numerator�degrees�of�freedom�are�given�
in�the�columns�of�the�table�(ν1),�and�the�denominator�degrees�of�freedom�are�shown�in�the�
rows�of�the�table�(ν2)��Only�the�upper-tail�critical�values�are�given�in�the�table�(e�g�,�percen-
tiles�of��90,��95,��99�for�α�=��10,��05,��01,�respectively)��The�reason�is�that�most�inferential�tests�
involving�the�F�distribution�are�one-tailed�tests�using�the�upper-tail�critical�region��Thus,�
to�find�the�upper-tail�critical�value�for��05F1,10,�we�look�on�the�second�page�of�the�table�(α�=��05),�
in�the�first�column�of�values�on�that�page�for�ν1�=�1,�and�where�it�intersects�with�the�10th�
row�of�values�for�ν2�=�10��There�you�should�find��05F1,10�=�4�96�
1.5
1.2
0.9
0.6
Re
la
tiv
e
fr
eq
ue
nc
y
0.3
0
0 1 2 3
F
4 5
10,10
20,20
40,40
FIGuRe 9.1
Several� members� of� the� family� of�
F distributions�

244 An Introduction to Statistical Concepts
9.2 Inferences About Single Variance
In�our�initial�inferential�testing�situation�for�variances,�the�researcher�would�like�to�know�
whether�the�population�variance�is�equal�to�some�hypothesized�variance�or�not��First,�the�
hypotheses� to� be� evaluated� for� detecting� whether� a� population� variance� differs� from� a�
hypothesized�variance�are�as�follows��The�null�hypothesis�H0�is�that�there�is�no�difference�
between�the�population�variance�σ2�and�the�hypothesized�variance�σ02,�which�we�denote�as
� H0
2
0
2: σ σ=
Here�there�is�no�difference�or�a�“null”�difference�between�the�population�variance�and�the�
hypothesized�variance��For�example,�if�we�are�seeking�to�determine�whether�the�variance�
on� an� intelligence� measure� at� ICU� is� different� from� the� overall� adult� population,� then� a�
reasonable�hypothesized�value�would�be�225,�as�this�is�the�theoretically�derived�variance�
for�the�adult�population�
The� nondirectional,� scientific,� or� alternative� hypothesis� H1� is� that� there� is� a� differ-
ence�between�the�population�variance�σ2�and�the�hypothesized�variance� σ02 ,�which�we�
denote�as
� H1
2
0
2: σ σ≠
The�null�hypothesis�H0�will�be�rejected�here�in�favor�of�the�alternative�hypothesis�H1�if�
the� population� variance� is� different� from� the� hypothesized� variance�� As� we� have� not�
specified�a�direction�on�H1,�we�are�willing�to�reject�either�if�σ2�is�greater�than� σ0
2 �or�if�σ2�
is�less�than�σ0
2��This�alternative�hypothesis�results�in�a�two-tailed�test��Directional�alter-
native�hypotheses�can�also�be�tested�if�we�believe�either�that�σ2�is�greater�than� σ02 �or�that�
σ2�is�less�than�σ02��In�either�case,�the�more�the�resulting�sample�variance�differs�from�the�
hypothesized�variance,�the�more�likely�we�are�to�reject�the�null�hypothesis�
It�is�assumed�that�the�sample�is�randomly�drawn�from�the�population�(i�e�,�the�assump-
tion�of�independence)�and�that�the�population�of�scores�is�normally�distributed��Because�
we�are�testing�a�variance,�a�condition�of�the�test�is�that�the�variable�must�be�interval�or�ratio�
in�scale�
The�next�step�is�to�compute�the�test�statistic�χ2�as

χ
ν
σ
2
2
0
2=
s
where
s2�is�the�sample�variance
ν�=�n�−�1
The�test�statistic�χ2�is�then�compared�to�a�critical�value(s)�from�the�chi-square�distribu-
tion�� For� a� two-tailed� test,� the� critical� values� are� denoted� as� α νχ/2
2 � and� 1 2
2
− α νχ/ � and� are�
found�in�Table�A�3�(recall�that�unlike�z�and�t�critical�values,�two�unique�χ2�critical�values�
must�be�found�from�the�table�as�the�χ2�distribution�is�not�symmetric�like�z�or�t)��If�the�test�
statistic�χ2�falls�into�either�critical�region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�H0��
For�a�one-tailed�test,�the�critical�value�is�denoted�as� α νχ
2 �for�the�alternative�hypothesis�

245Inferences About Variances
H1:� σ2� <� σ0 2 � and� as� 1 2 −α νχ � for� the� alternative� hypothesis� H1:�σ2�>� σ0
2��If�the�test�statistic�
χ2�falls�into�the�appropriate�critical�region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�
H0��It�has�been�noted�by�statisticians�such�as�Wilcox�(1996)�that�the�chi-square�distribu-
tion�does�not�perform�adequately�when�sampling�from�a�nonnormal�distribution,�as�the�
actual�Type�I�error�rate�can�differ�greatly�from�the�nominal�α�level�(the�level�set�by�the�
researcher)�� However,� Wilcox� stated� “it� appears� that� a� completely� satisfactory� solution�
does�not�yet�exist,�although�many�attempts�have�been�made�to�find�one”�(p��85)�
For�the�two-tailed�test,�a�(1�−�α)%�confidence�interval�(CI)�can�also�be�examined�and�is�
formed�as�follows��The�lower�limit�of�the�CI�is

ν
χα ν
s2
1 2
2
− /
whereas�the�upper�limit�of�the�CI�is

ν
χα ν
s2
2
2
/
If� the� CI� contains� the� hypothesized� value� σ0
2 ,� then� the� conclusion� is� to� fail� to� reject� H0;�
otherwise,�we�reject�H0�
Now�consider�an�example�to�illustrate�use�of�the�test�of�a�single�variance��We�follow�the�
basic�steps�for�hypothesis�testing�that�we�applied�in�previous�chapters��These�steps�include�
the�following:
� 1�� State�the�null�and�alternative�hypotheses�
� 2�� Select�the�level�of�significance�(i�e�,�alpha,�α)�
� 3�� Calculate�the�test�statistic�value�
� 4�� Make�a�statistical�decision�(reject�or�fail�to�reject�H0)�
A�researcher�at�the�esteemed�ICU�is�interested�in�determining�whether�the�population�
variance�in�intelligence�at�the�university�is�different�from�the�norm-developed�hypoth-
esized�variance�of�225��Thus,�a�nondirectional,�two-tailed�alternative�hypothesis�is�uti-
lized�� If� the� null� hypothesis� is� rejected,� this� would� indicate� that� the� intelligence� level�
at� ICU� is� more� or� less� diverse� or� variable� than� the� norm�� If� the� null� hypothesis� is� not�
rejected,� this� would� indicate� that� the� intelligence� level� at� ICU� is� as� equally� diverse� or�
variable�as�the�norm�
The�researcher�takes�a�random�sample�of�101�undergraduates�from�throughout�the�uni-
versity�and�computes�a�sample�variance�of�149��The�test�statistic�χ2�is�computed�as�follows:

χ
ν
σ
2
2
0
2
100 149
225
66 2222= = =
s ( )
.
From� the� Table� A�3,� and� using� an� α� level� of� �05,� we� determine� the� critical� values� to� be�
. .025 100
2 74 2219χ = �and� . .975 100
2 129 561χ = ��As�the�test�statistic�does�exceed�one�of�the�critical�
values�by�falling�into�the�lower-tail�critical�region�(i�e�,�66�2222�<�74�2219),�our�decision�is�to� 246 An Introduction to Statistical Concepts reject�H0��Our�conclusion�then�is�that�the�variance�of�the�undergraduates�at�ICU�is�different� from�the�hypothesized�value�of�225� The�95%�CI�for�the�example�is�computed�as�follows��The�lower�limit�of�the�CI�is � ν χα ν s2 1 2 2 100 149 129 561 115 0037 − = = / ( ) . . and�the�upper�limit�of�the�CI�is � ν χα ν s2 2 2 100 149 74 2219 200 7494 / ( ) . .= = As�the�limits�of�the�CI�(i�e�,�115�0037,�200�7494)�do�not�contain�the�hypothesized�variance�of� 225,�the�conclusion�is�to�reject�H0��As�always,�the�CI�procedure�leads�us�to�the�same�conclu- sion�as�the�hypothesis�testing�procedure�for�the�same�α�level� 9.3 Inferences About Two Dependent Variances In�our�second�inferential�testing�situation�for�variances,�the�researcher�would�like�to�know� whether�the�population�variance�for�one�group�is�different�from�the�population�variance� for� a� second� dependent� group�� This� is� comparable� to� the� dependent� t� test� described� in� Chapter�7�where�one�population�mean�was�compared�to�a�second�dependent�population� mean��Once�again,�we�have�two�dependently�drawn�samples� First,�the�hypotheses�to�be�evaluated�for�detecting�whether�two�dependent�population� variances�differ�are�as�follows��The�null�hypothesis�H0�is�that�there�is�no�difference�between� the�two�population�variances�σ1 2�and�σ2 2,�which�we�denote�as � H0 1 2 2 2 0: σ σ− = Here�there�is�no�difference�or�a�“null”�difference�between�the�two�population�variances�� For�example,�we�may�be�seeking�to�determine�whether�the�variance�of�income�of�husbands� is�equal�to�the�variance�of�their�wives’�incomes��Thus,�the�husband�and�wife�samples�are� drawn�as�couples�in�pairs�or�dependently,�rather�than�individually�or�independently� The� nondirectional,� scientific,� or� alternative� hypothesis� H1� is� that� there� is� a� difference� between�the�population�variances�σ1 2�and�σ2 2,�which�we�denote�as � H1 1 2 2 2 0: σ σ− ≠ The�null�hypothesis�H0�is�rejected�here�in�favor�of�the�alternative�hypothesis�H1�if�the�popu- lation�variances�are�different��As�we�have�not�specified�a�direction�on�H1,�we�are�willing� to�reject�either�if� σ1 2�is�greater�than�σ2 2 �or�if�σ1 2�is�less�than�σ2 2 ��This�alternative�hypothesis� results� in� a� two-tailed� test�� Directional� alternative� hypotheses� can� also� be� tested� if� we� believe�either�that�σ1 2�is�greater�than�σ2 2�or�that�σ1 2�is�less�than� σ2 2 ��In�either�case,�the�more� the�resulting�sample�variances�differ�from�one�another,�the�more�likely�we�are�to�reject�the� null�hypothesis� 247Inferences About Variances It� is� assumed� that� the� two� samples� are� dependently� and� randomly� drawn� from� their� respective�populations,�that�both�populations�are�normal�in�shape,�and�that�the�t�distribu- tion�is�the�appropriate�sampling�distribution��Since�we�are�testing�variances,�a�condition�of� the�test�is�that�the�variable�must�be�interval�or�ratio�in�scale� The�next�step�is�to�compute�the�test�statistic�t�as�follows: � t s s s s r = − − 1 2 2 2 1 2 12 2 2 1 ν where s1 2�and�s2 2�are�the�sample�variances�for�samples�1�and�2�respectively s1�and�s2�are�the�sample�standard�deviations�for�samples�1�and�2�respectively r12�is�the�correlation�between�the�scores�from�sample�1�and�sample�2�(which�is�then�squared) ν� is� the� number� of� degrees� of� freedom,� ν� =� n� −� 2,� with� n� being� the� number� of� paired� observations�(not�the�number�of�total�observations) Although�correlations�are�not�formally�discussed�until�Chapter�10,�conceptually�the�cor- relation�is�a�measure�of�the�relationship�between�two�variables��This�test�statistic�is�concep- tually�somewhat�similar�to�the�test�statistic�for�the�dependent�t�test� The� test� statistic� t� is� then� compared� to� a� critical� value(s)� from� the� t� distribution�� For� a� two-tailed�test,�the�critical�values�are�denoted�as� ± α ν2 t �and�are�found�in�Table�A�2��If�the� test�statistic�t�falls�into�either�critical�region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�H0�� For�a�one-tailed�test,�the�critical�value�is�denoted�as� + α ν1 t �for�the�alternative�hypothesis�H1:� σ1 2 �−� σ2 2 �>�0�and�as� − α ν1 t �for�the�alternative�hypothesis�H1:� σ1
2 �−� σ2
2 �<�0��If�the�test�statistic�t� falls�into�the�appropriate�critical�region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�H0�� It�is�thought�that�this�test�is�not�particularly�robust�to�nonnormality�(Wilcox,�1987)��As�a� result,�other�procedures�have�been�developed�that�are�thought�to�be�more�robust��However,� little�in�the�way�of�empirical�results�is�known�at�this�time��Some�of�the�new�procedures�can� also�be�used�for�testing�inferences�involving�the�equality�of�two�or�more�dependent�variances�� In�addition,�note�that�acceptable�CI�procedures�are�not�currently�available� Let�us�consider�an�example�to�illustrate�use�of�the�test�of�two�dependent�variances��The� same�basic�steps�for�hypothesis�testing�that�we�applied�in�previous�chapters�will�be�applied� here�as�well��These�steps�include�the�following: � 1�� State�the�null�and�alternative�hypotheses� � 2�� Select�the�level�of�significance�(i�e�,�alpha,�α)� � 3�� Calculate�the�test�statistic�value� � 4�� Make�a�statistical�decision�(reject�or�fail�to�reject�H0)� A�researcher�is�interested�in�whether�there�is�greater�variation�in�achievement�test�scores� at�the�end�of�the�first�grade�as�compared�to�the�beginning�of�the�first�grade��Thus,�a�direc- tional,�one-tailed�alternative�hypothesis�is�utilized��If�the�null�hypothesis�is�rejected,�this� would�indicate�that�first�graders’�achievement�test�scores�are�more�variable�at�the�end�of� the�year�than�at�the�beginning�of�the�year��If�the�null�hypothesis�is�not�rejected,�this�would� indicate�that�first�graders’�achievement�test�scores�have�approximately�the�same�variance� at�both�the�end�of�the�year�and�at�the�beginning�of�the�year� 248 An Introduction to Statistical Concepts A�random�sample�of�62�first-grade�children�is�selected�and�given�the�same�achievement� test� at� the� beginning� of� the� school� year� (September)� and� at� the� end� of� the� school� year� (April)��Thus,�the�same�students�are�tested�twice�with�the�same�instrument,�thereby�result- ing�in�dependent�samples�at�time�1�and�time�2��The�level�of�significance�is�set�at�α�=��01��The� test�statistic�t�is�computed�as�follows��We�determine�that�n�=�62,�ν�=�60,� s12 �=�100,�s1�=�10,� s2 2 �=�169,� s2�=�13,�and�r12�=��80��We�compute�the�test�statistic�t�to�be�as�follows: � t s s s s r = − − = − − = −1 2 2 2 1 2 12 2 2 1 100 169 2 10 13 1 64 60 3 4261 ν ( ) . . The�test�statistic�t�is�then�compared�to�the�critical�value�from�the�t�distribution��As�this�is� a�one-tailed�test,�the�critical�value�is�denoted�as�− α ν1t �and�is�determined�from�Table�A�2�to� be�−�01t60�=�−2�390��The�test�statistic�t�falls�into�the�lower-tail�critical�region,�as�it�is�less�than� the�critical�value�(i�e�,�−3�4261�<�−2�390),�so�we�reject�H0�and�conclude�that�the�variance�in� achievement�test�scores�increases�from�September�to�April� 9.4 Inferences About Two or More Independent Variances (Homogeneity of Variance Tests) In�our�third�and�final�inferential�testing�situation�for�variances,�the�researcher�would� like�to�know�whether�the�population�variance�for�one�group�is�different�from�the�pop- ulation� variance� for� one� or� more� other� independent� groups�� In� this� section,� we� first� describe� the� somewhat� cloudy� situation� that� exists� for� the� traditional� tests�� Then� we� provide� details� on� two� recommended� tests,� the� Brown–Forsythe� procedure� and� the� O’Brien�procedure� 9.4.1   Traditional Tests One�of�the�more�heavily�studied�inferential�testing�situations�in�recent�years�has�been�for� testing�whether�differences�exist�among�two�or�more�independent�group�variances��These� tests�are�often�referred�to�as�homogeneity of variance tests��Here�we�briefly�discuss�the� more�traditional�tests�and�their�associated�problems��In�the�sections�that�follow,�we�then� recommend�two�of�the�“better”�tests��As�was�noted�in�the�previous�procedures,�the�vari- able�for�which�the�variance(s)�is�computed�must�be�interval�or�ratio�in�scale� Several� tests� have� traditionally� been� used� to� test� for� the� equality� of� independent� vari- ances�� An� early� simple� test� for� two� independent� variances� is� to� form� a� ratio� of� the� two� sample�variances,�which�yields�the�following�F�test�statistic: � F s s = 1 2 2 2 This�F�ratio�test�assumes�that�the�two�populations�are�normally�distributed��However,�it� is�known�that�the�F�ratio�test�is�not�very�robust�to�violation�of�the�normality�assumption,� 249Inferences About Variances except�for�when�the�sample�sizes�are�equal�(i�e�,�n1�=�n2)��In�addition,�the�F�ratio�test�can�only� be�used�for�the�two-group�situation� Subsequently,� more� general� tests� were� developed� to� cover� the� multiple-group� situation�� One�such�popular�test�is�Hartley’s�Fmax�test�(developed�in�1950),�which�is�simply�a�more�general� version�of�the�F�ratio�test�just�described��The�test�statistic�for�Hartley’s�Fmax�test�is�the�following: � F s s max largest smallest = 2 2 where slargest 2 �is�the�largest�variance�in�the�set�of�variances ssmallest 2 �is�the�smallest�variance�in�the�set Hartley’s�Fmax�test�assumes�normal�population�distributions�and�requires�equal�sample�sizes�� We�also�know�that�Hartley’s�Fmax�test�is�not�very�robust�to�violation�of�the�normality�assump- tion��Cochran’s�C�test�(developed�in�1941)�is�also�an�F�test�statistic�and�is�computed�by�taking�the� ratio�of�the�largest�variance�to�the�sum�of�all�of�the�variances��Cochran’s�C�test�also�assumes�nor- mality,�requires�equal�sample�sizes,�and�has�been�found�to�be�even�less�robust�to�nonnormality� than�Hartley’s�Fmax�test��As�we�see�in�Chapter�11�for�the�ANOVA,�it�is�when�we�have�unequal� sample�sizes�that�unequal�variances�are�a�problem;�for�these�reasons,�none�of�these�tests�can�be� recommended,�which�is�the�same�situation�we�encountered�with�the�independent�t�test� Bartlett’s�χ2�test�(developed�in�1937)�does�not�have�the�stringent�requirement�of�equal�sam- ple�sizes;�however,�it�does�still�assume�normality��Bartlett’s�test�is�very�sensitive�to�nonnor- mality�and�is�therefore�not�recommended�either��Since�1950,�the�development�of�homogeneity� tests�has�proliferated,�with�the�goal�being�to�find�a�test�that�is�fairly�robust�to�nonnormality�� Seemingly�as�each�new�test�was�developed,�later�research�would�show�that�the�test�was�not� very�robust��Today�there�are�well�over�60�such�tests�available�for�examining�homogeneity�of� variance�(e�g�,�a�bootstrap�method�developed�by�Wilcox�[2002])��Rather�than�engage�in�a�pro- tracted�discussion�of�these�tests�and�their�associated�limitations,�we�simply�present�two�tests� that�have�been�shown�to�be�most�robust�to�nonnormality�in�several�recent�studies��These�are� the�Brown–Forsythe�procedure�and�the�O’Brien�procedure��Unfortunately,�neither�of�these� tests�is�available�in�the�major�statistical�packages�(e�g�,�SPSS),�which�only�include�some�of�the� problematic�tests�previously�described� 9.4.2   brown–Forsythe procedure The�Brown–Forsythe�procedure�is�a�variation�of�Levene’s�test�developed�in�1960��Levene’s� test�is�essentially�an�ANOVA�on�the�transformed�variable: � Z Y Yij ij j= − � where ij�designates�the�ith�observation�in�group�j Zij�is�computed�for�each�individual�by�taking�their�score�Yij,�subtracting�from�it�the�group� mean�Y – �j�(the�“�”�indicating�we�have�averaged�across�all�i�observations�in�group j),�and� then�taking�the�absolute�value�(i�e�,�by�removing�the�sign) Unfortunately,� Levene’s� test� is� not� very� robust� to� nonnormality,� except� when� sample� sizes�are�equal� 250 An Introduction to Statistical Concepts Developed�in�1974,�the�Brown–Forsythe�procedure�has�been�shown�to�be�quite�robust�to� nonnormality�in�numerous�studies�(e�g�,�Olejnik�&�Algina,�1987;�Ramsey,�1994)��Based�on� this� and� other� research,� the� Brown–Forsythe� procedure� is� recommended� for� leptokurtic� distributions� (i�e�,� those� with� sharp� peaks),� as� it� is� robust� to� nonnormality� and� provides� adequate� Type� I� error� protection� and� excellent� power�� In� the� next� section,� we� describe� the� O’Brien� procedure,� which� is� recommended� for� other� distributions� (i�e�,� mesokurtic� and�platykurtic�distributions)��In�cases�where�you�are�unsure�of�which�procedure�to�use,� Algina,� Blair,� and� Combs� (1995)� recommend� using� a� maximum� procedure,� where� both� tests�are�conducted�and�the�procedure�with�the�maximum�test�statistic�is�selected� Let�us�now�examine�in�detail�the�Brown–Forsythe�procedure��The�null�hypothesis�is�that� H0:� σ1 2 �=� σ2 2 �=�…�=� σJ 2,�and�the�alternative�hypothesis�is�that�not�all�of�the�population�group� variances�are�the�same��The�Brown–Forsythe�procedure�is�essentially�an�ANOVA�on�the� transformed�variable: � Z Y Mdnij ij j= − � which� is� computed� for� each� individual� by� taking� their� score� Yij,� subtracting� from� it� the� group�median�Mdn�j,�and�then�taking�the�absolute�value�(i�e�,�by�removing�the�sign)��The� test�statistic�is�an�F�and�is�computed�by � F n Z Z J Z Z N J j j j J ij j j J i nj = − − − − = == ∑ ∑∑ ( ) /( ) ( ) /( ) . .. . 2 1 2 11 1 where nj�designates�the�number�of�observations�in�group�j J�is�the�number�of�groups�(where�j�ranges�from�1�to�J) Z – �j�is�the�mean�for�group�j�(computed�by�taking�the�sum�of�the�observations�in�group�j� and�dividing�by�the�number�of�observations�in�group�j,�which�is�nj) Z – ���is�the�overall�mean�regardless�of�group�membership�(computed�by�taking�the�sum�of�all� of�the�observations�across�all�groups�and�dividing�by�the�total�number�of�observations�N) The� test� statistic� F� is� compared� against� a� critical� value� from� the� F� table� (Table� A�4)� with� J�−�1�degrees�of�freedom�in�the�numerator�and�N�−�J�degrees�of�freedom�in�the�denomina- tor,�denoted�by� αFJ−1,�N−J��If�the�test�statistic�is�greater�than�the�critical�value,�we�reject�H0;� otherwise,�we�fail�to�reject�H0� An� example� using� the� Brown–Forsythe� procedure� is� certainly� in� order� now�� Three� dif- ferent� groups� of� children,� below-average,� average,� and� above-average� readers,� play� a� com- puter�game��The�scores�on�the�dependent�variable�Y�are�their�total�points�from�the�game��We� are� interested� in� whether� the� variances� for� the� three� student� groups� are� equal� or� not�� The� example�data�and�computations�are�given�in�Table�9�1��First�we�compute�the�median�for�each� group,� and� then� compute� the� deviation� from� the� median� for� each� individual� to� obtain� the� transformed�Z�values��Then�the�transformed�Z�values�are�used�to�compute�the�F�test�statistic� The�test�statistic�F�=�1�6388�is�compared�against�the�critical�value�for�α�=��05�of��05F2,9�=� 4�26�� As� the� test� statistic� is� smaller� than� the� critical� value� (i�e�,� 1�6388� <� 4�26),� we� fail� to� reject�the�null�hypothesis�and�conclude�that�the�three�student�groups�do�not�have�different� variances� 251Inferences About Variances 9.4.3   O’brien procedure The� final� test� to� consider� in� this� chapter� is� the� O’Brien� procedure�� While� the� Brown– Forsythe�procedure�is�recommended�for�leptokurtic�distributions,�the�O’Brien�procedure� is�recommended�for�other�distributions�(i�e�,�mesokurtic�and�platykurtic�distributions)�� Let�us�now�examine�in�detail�the�O’Brien�procedure��The�null�hypothesis�is�again�that� H0:� σ1 2 � =� σ2 2� =� …� =� σJ 2,� and� the� alternative� hypothesis� is� that� not� all� of� the� population� group�variances�are�the�same� Table 9.1 Example�Using�the�Brown–Forsythe�and�O’Brien�Procedures Group 1 Group 2 Group 3 Y Z r Y Z r Y Z r 6 4 124�2499 9 4 143 10 8 704 8 2 14�2499 12 1 −7 16 2 −16 12 2 34�2499 14 1 −7 20 2 −96 13 3 89�2499 17 4 143 30 12 1104 Mdn Z – r– Mdn Z – r– Mdn Z – r– 10 2�75 65�4999 13 2�50 68 18 6 424 Overall�Z – Overall�r– 3�75 185�8333 Computations�for�the�Brown–Forsythe�procedure: F n Z Z J Z Z N J j j j J ij j j J i nj = − − − − = = == ∑ ∑∑ ( ) /( ) ( ) /( ) [ ( . . .. . 2 1 2 11 1 4 2 775 3 75 4 2 50 3 75 4 6 3 75 2 4 2 75 2 2 75 2 2 2 2 2 − + − + − − + − . ) ( . . ) ( . ) ]/ [( . ) ( . ) ++ + − = � ( ) ]/ . 12 6 9 1 63882 Computations�for�the�O’Brien�procedure: Sample�means:�Y – 1�=�9�75,�Y – 2�=�13�0,�Y – 3�=�19�0 Sample�variances:�s1 2 0.= 1 9167,�s22 = 11 3333. ,�s32 0= 7 6667. Example�computation�for�rij: r n n Y Y s n n n ij j j ij j j j j j = − − − − − − = −( . ) ( ) . ( ) ( )( ) ( . ) (.1 5 5 1 1 2 4 1 5 42 2 66 9 75 5 10 9167 4 1 4 1 4 2 124 2499 2− − − − − = . ) . ( . )( ) ( )( ) . Test�statistic: F n r r J r r N J j j j J ij j j J i nj = − − − − = = == ∑ ∑∑ ( ) /( ) ( ) / ( ) [ ( . .. . 2 1 2 11 1 4 65.. . ) ( . ) ( . ) ]/ [( 4999 185 8333 4 68 185 8333 4 424 185 8333 2 124 2 2 2− + − + − .. . ) ( . . ) ( ) ]/ . 2499 65 4999 14 2499 65 4999 1104 424 9 1 479 2 2 2− + − + + − = � 99 252 An Introduction to Statistical Concepts The�O’Brien�procedure�is�an�ANOVA�on�a�different�transformed�variable: � r n n Y Y s n n n ij j j ij j j j j j = − − − − − − ( . ) ( ) . ( ) ( )( ) .1 5 5 1 1 2 2 2 which�is�computed�for�each�individual,�where nj�is�the�size�of�group�j Y – �j�is�the�mean�for�group�j sj 2�is�the�sample�variance�for�group�j The�test�statistic�is�an�F�and�is�computed�by � F n r r J r r N J j j j J ij j j J i nj = − − − − = == ∑ ∑∑ ( ) /( ) ( ) /( ) . .. . 2 1 2 11 1 where nj�designates�the�number�of�observations�in�group�j J�is�the�number�of�groups�(where�j�ranges�from�1�to�J) r–�j�is�the�mean�for�group�j�(computed�by�taking�the�sum�of�the�observations�in�group�j� and�dividing�by�the�number�of�observations�in�group�j,�which�is�nj) r���is�the�overall�mean�regardless�of�group�membership�(computed�by�taking�the�sum�of� all�of�the�observations�across�all�groups�and�dividing�by�the�total�number�of�observa- tions�N) The�test�statistic�F�is�compared�against�a�critical�value�from�the�F�table�(Table�A�4)�with�J�−�1� degrees�of�freedom�in�the�numerator�and�N�−�J�degrees�of�freedom�in�the�denominator,� denoted�by�αFJ−1,N−J��If�the�test�statistic�is�greater�than�the�critical�value,�then�we�reject�H0;� otherwise,�we�fail�to�reject�H0� Let�us�return�to�the�example�in�Table�9�1�and�consider�the�results�of�the�O’Brien�proce- dure��From�the�computations�shown�in�the�table,�the�test�statistic�F�=�1�4799�is�compared� against�the�critical�value�for�α�=��05�of��05F2,9�=�4�26��As�the�test�statistic�is�smaller�than�the� critical�value�(i�e�,�1�4799�<�4�26),�we�fail�to�reject�the�null�hypothesis�and�conclude�that� the�three�student�groups�do�not�have�different�variances� 9.5 SPSS Unfortunately,�there�is�not�much�to�report�on�tests�of�variances�for�SPSS��There�are�no�tests� available�for�inferences�about�a�single�variance�or�for�inferences�about�two�dependent�vari- ances��For�inferences�about�independent�variances,�SPSS�does�provide�Levene’s�test�as�part� of�the�“Independent�t�Test”�procedure�(previously�discussed�in�Chapter�7),�and�as�part�of�the� 253Inferences About Variances “One-Way�ANOVA”�and�“Univariate�ANOVA”�procedures�(to�be�discussed�in�Chapter�11)�� Given�our�previous�concerns�with�Levene’s�test,�use�it�with�caution��There�is�also�little�infor- mation�published�in�the�literature�on�power�and�effect�sizes�for�tests�of�variances� 9.6 Template and APA-Style Write-Up Consider�an�example�paragraph�for�one�of�the�tests�described�in�this�chapter,�more�spe- cifically,� testing� inferences� about� two� dependent� variances�� As� you� may� remember,� our� graduate�research�assistant,�Marie,�was�working�with�Jessica,�a�classroom�teacher,�to�assist� in�analyzing�the�variances�of�first-grade�students��Her�task�was�to�assist�Jessica�with�writ- ing� her� research� question� (Are the variances of achievement scores for first-grade children the same in the fall as compared to the spring?)�and�generating�the�test�of�inference�to�answer�her� question��Marie�suggested�a�dependent�variances�test�as�the�test�of�inference��A�template� for�writing�a�research�question�for�the�dependent�variances�is�presented�as�follows: Are the Variances of [Variable] the Same in [Time 1] as Compared to [Time 2]? An�example�write-up�is�presented�as�follows: A dependent variances test was conducted to determine if variances of achievement scores for first-grade children were the same in the fall as compared to the spring. The test was conducted using an alpha of .05. The null hypothesis was that the variances would be the same. There was a statistically significant difference in variances of achievement scores of first-grade children in the fall as compared to the spring (t = −3.4261, df = 60, p < .05). Thus, the null hypothesis that the variances would be equal at the beginning and end of the first grade was rejected. The variances of achievement test scores significantly increased from September to April. 9.7 Summary In� this� chapter,� we� described� testing� hypotheses� about� variances�� Several� inferential� tests� and� new� concepts� were� discussed�� The� new� concepts� introduced� were� the� sam- pling� distribution� of� the� variance,� the� F� distribution,� and� homogeneity� of� variance� tests�� The� first� inferential� test� discussed� was� the� test� of� a� single� variance,� followed� by�a�test�of�two�dependent�variances��Next�we�examined�several�tests�of�two�or�more� independent�variances��Here�we�considered�the�following�traditional�procedures:�the� F� ratio� test,� Hartley’s� Fmax� test,� Cochran’s� C� test,� Bartlett’s� χ2� test,� and� Levene’s� test�� Unfortunately,�these�tests�are�not�very�robust�to�violation�of�the�normality�assumption�� We� then� discussed� two� newer� procedures� that� are� relatively� robust� to� nonnormality,� 254 An Introduction to Statistical Concepts the�Brown–Forsythe�procedure�and�the�O’Brien�procedure��Examples�were�presented� for� each� of� the� recommended� tests�� At� this� point,� you� should� have� met� the� following� objectives:� (a)� be� able� to� understand� the� basic� concepts� underlying� tests� of� variances,� (b)�be�able�to�select�the�appropriate�test,�and�(c)�be�able�to�determine�and�interpret�the� results�from�the�appropriate�test��In�Chapter�10,�we�discuss�correlation�coefficients,�as� well�as�inferential�tests�involving�correlations� Problems Conceptual problems 9.1� Which�of�the�following�tests�of�homogeneity�of�variance�is�most�robust�to�assump- tion�violations? � a�� F�ratio�test � b�� Bartlett’s�chi-square�test � c�� The�O’Brien�procedure � d�� Hartley’s�Fmax�test 9.2� Cochran’s�C�test�requires�equal�sample�sizes��True�or�false? 9.3� I�assert�that�if�two�dependent�sample�variances�are�identical,�I�would�not�be�able�to� reject�the�null�hypothesis��Am�I�correct? 9.4� Suppose�that�I�wish�to�test�the�following�hypotheses�at�the��01�level�of�significance: � H H 0 2 1 2 250 250 : : σ σ = >
� A� sample� variance� of� 233� is� observed�� I� assert� that� if� I� compute� the� χ2� test� statistic�
and�compare�it�to�the�χ2�table,�it�is�possible�that�I�could�reject�the�null�hypothesis��Am�
I�correct?
9.5� Suppose�that�I�wish�to�test�the�following�hypotheses�at�the��05�level�of�significance:

H
H
0
2
1
2
16
16
:
:
σ
σ
=
>
� A�sample�variance�of�18�is�observed��I�assert�that�if�I�compute�the�χ2�test�statistic�and�
compare� it� to� the� χ2� table,� it� is� possible� that� I� could� reject� the� null� hypothesis�� Am� I�
correct?
9.6� If� the� 90%� CI� for� a� single� variance� extends� from� 25�7� to� 33�6,� I� assert� that� the� null�
hypothesis�would�definitely�be�rejected�at�the��10�level��Am�I�correct?
9.7� If� the� 95%� CI� for� a� single� variance� ranges� from� 82�0� to� 93�5,� I� assert� that� the� null�
hypothesis�would�definitely�be�rejected�at�the��05�level��Am�I�correct?
9.8� If�the�mean�of�the�sampling�distribution�of�the�difference�between�two�variances�equals�
0,�I�assert�that�both�samples�probably�represent�a�single�population��Am�I�correct?

255Inferences About Variances
9.9� Which�of�the�following�is�an�example�of�two�dependent�samples?
� a�� Pretest�scores�of�males�in�one�course�and�posttest�scores�of�females�in�another�course
� b�� Husbands�and�their�wives�in�your�neighborhood
� c�� Softball�players�at�your�school�and�football�players�at�your�school
� d�� Professors�in�education�and�professors�in�psychology
9.10� The� mean� of� the� F� distribution� increases� as� degrees� of� freedom� denominator� (ν2)�
increase��True�or�false?
Computational problems
9.1� The�following�random�sample�of�scores�on�a�preschool�ability�test�is�obtained�from�a�
normally�distributed�population�of�4�year�olds:
20 22 24 30 18 22 29 27
25 21 19 22 38 26 17 25
� a�� Test�the�following�hypotheses�at�the��10�level�of�significance:

H
H
0
2
1
2
75
75
:
:
σ
σ
=

� b�� Construct�a�90%�CI�
9.2� The� following� two� independent� random� samples� of� number� of� CDs� owned� are�
obtained�from�two�populations�of�undergraduate�(sample�1)�and�graduate�students�
(sample�2),�respectively:
Sample 1 Data Sample 2 Data
42 36 47 35 46 45 50 57 58 43
37 52 44 47 51 52 43 60 41 49
56 54 55 50 40 44 51 49 55 56
40 46 41
� Test� the� following� hypotheses� at� the� �05� level� of� significance� using� the� Brown–
Forsythe�and�O’Brien�procedures:

H
H
0 1
2
2
2
1 1
2
2
2
0
0
:
:
σ σ
σ σ
− =
− ≠
9.3� The�following�summary�statistics�are�available�for�two�dependent�random�samples�
of�brothers�and�sisters,�respectively,�on�their�allowance�for�the�past�month:�s1
2�=�49,�
s2
2�=�25,�n =�32,�r12�=��60�
�Test�the�following�hypotheses�at�the��05�level�of�significance:

H
H
0 1
2
2
2
1 1
2
2
2
0
0
:
:
σ σ
σ σ
− =
− ≠

256 An Introduction to Statistical Concepts
9.4� The�following�summary�statistics�are�available�for�two�dependent�random�samples�
of�first�semester�college�students�who�were�measured�on�their�high�school�and�first�
semester�college�GPAs,�respectively:� s1
2 �=�1�56,� s2
2 �=�4�42,�n�=�62,�r12�=��72�
�Test�the�following�hypotheses�at�the��05�level�of�significance:

H
H
0 1
2
2
2
1 1
2
2
2
0
0
:
:
σ σ
σ σ
− =
− ≠
9.5� A�random�sample�of�21�statistics�exam�scores�is�collected�with�a�sample�mean�of�
50� and� a� sample� variance� of� 10�� Test� the� following� hypotheses� at� the� �05� level� of�
significance:

H
H
0
2
1
2
25
25
:
:
σ
σ
=

9.6� A� random� sample� of� 30� graduate� entrance� exam� scores� is� collected� with� a� sample�
mean�of�525�and�a�sample�variance�of�16,900��Test�the�following�hypotheses�at�the��05�
level�of�significance:

H
H
0
2
1
2
10 000
10 000
: ,
: ,
σ
σ
=

9.7� A� pretest� was� given� at� the� beginning� of� a� history� course� and� a� posttest� at� the� end�
of� the� course�� The� pretest� variance� is� 36,� the� posttest� variance� is� 64,� sample� size� is�
31,� and� the� pretest-posttest� correlation� is� �80�� Test� the� null� hypothesis� that� the� two�
dependent�variances�are�equal�against�a�nondirectional�alternative�at�the��01�level�of�
significance�
Interpretive problems
9.1� Use� the� survey� 1� dataset� from� the� website� to� determine� if� there� are� gender� differ-
ences�among�the�variances�for�any�items�of�interest�that�are�at�least�interval�or�ratio�
in�scale��Some�example�items�might�include�the�following:
� a�� Item�#1:�height�in�inches
� b�� Item�#6:�amount�spent�at�last�hair�appointment
� c�� Item�#7:�number�of�compact�disks�owned
� d�� Item�#9:�current�GPA
� e�� Item�#10:�amount�of�exercise�per�week
� f�� Item�#15:�number�of�alcoholic�drinks�per�week
� g�� Item�#21:�number�of�hours�studied�per�week

257Inferences About Variances
9.2� Use� the� survey� 1� dataset� from� the� website� to� determine� if� there� are� differences�
between�the�variances�for�left-�versus�right-handed�individuals�on�any�items�of�inter-
est�that�are�at�least�interval�or�ratio�in�scale��Some�example�items�might�include�the�
following:
� a�� Item�#1:�height�in�inches
� b�� Item�#6:�amount�spent�at�last�hair�appointment
� c�� Item�#7:�number�of�compact�disks�owned
� d�� Item�#9:�current�GPA
� e�� Item�#10:�amount�of�exercise�per�week
� f�� Item�#15:�number�of�alcoholic�drinks�per�week
� g�� Item�#21:�number�of�hours�studied�per�week

259
10
Bivariate Measures of Association
Chapter Outline
10�1� Scatterplot
10�2� Covariance
10�3� Pearson�Product–Moment�Correlation�Coefficient
10�4� Inferences�About�the�Pearson�Product–Moment�Correlation�Coefficient
10�4�1� Inferences�for�a�Single�Sample
10�4�2� Inferences�for�Two�Independent�Samples
10�5� Assumptions�and�Issues�Regarding�Correlations
10�5�1� Assumptions
10�5�2� Correlation�and�Causality
10�5�3� Restriction�of�Range
10�6� Other�Measures�of�Association
10�6�1� Spearman’s�Rho
10�6�2� Kendall’s�Tau
10�6�3� Phi
10�6�4� Cramer’s�Phi
10�6�5� Other�Correlations
10�7� SPSS
10�8� G*Power
10�9� Template�and�APA-Style�Write-Up
Key Concepts
� 1�� Scatterplot
� 2�� Strength�and�direction
� 3�� Covariance
� 4�� Correlation�coefficient
� 5�� Fisher’s�Z�transformation
� 6�� Linearity�assumption,�causation,�and�restriction�of�range�issues

260 An Introduction to Statistical Concepts
We� have� considered� various� inferential� tests� in� the� last� four� chapters,� specifically� those�
that�deal�with�tests�of�means,�proportions,�and�variances��In�this�chapter,�we�examine�mea-
sures�of�association�as�well�as�inferences�involving�measures�of�association��Methods�for�
directly�determining�the�relationship�among�two�variables�are�known�as�bivariate analy-
sis,� rather� than� univariate analysis� which� is� only� concerned� with� a� single� variable�� The�
indices�used�to�directly�describe�the�relationship�among�two�variables�are�known�as�cor-
relation coefficients�(in�the�old�days,�known�as�co-relation)�or�as�measures of association�
These�measures�of�association�allow�us�to�determine�how�two�variables�are�related�to�
one� another� and� can� be� useful� in� two� applications,� (a)� as� a� descriptive� statistic� by� itself�
and�(b)�as�an�inferential�test��First,�a�researcher�may�want�to�compute�a�correlation�coeffi-
cient�for�its�own�sake,�simply�to�tell�the�researcher�precisely�how�two�variables�are�related�
or� associated�� For� example,� we� may� want� to� determine� whether� there� is� a� relationship�
between�the�GRE-Quantitative�(GRE-Q)�subtest�and�performance�on�a�statistics�exam��Do�
students�who�score�relatively�high�on�the�GRE-Q�perform�higher�on�a�statistics�exam�than�
do�students�who�score�relatively�low�on�the�GRE-Q?�In�other�words,�as�scores�increase�on�
the�GRE-Q,�do�they�also�correspondingly�increase�their�performance�on�a�statistics�exam?
Second,� we� may� want� to� use� an� inferential� test� to� assess� whether� (a)� a� correlation� is�
significantly�different�from�0�or�(b)�two�correlations�are�significantly�different�from�one�
another��For�example,�is�the�correlation�between�GRE-Q�and�statistics�exam�performance�
significantly�different�from�0?�As�a�second�example,�is�the�correlation�between�GRE-Q�
and�statistics�exam�performance�the�same�for�younger�students�as�it�is�for�older�students?
The�following�topics�are�covered�in�this�chapter:�scatterplot,�covariance,�Pearson�product-
moment�correlation�coefficient,�inferences�about�the�Pearson�product–moment�correlation�
coefficient,� some� issues� regarding� correlations,� other� measures� of� association,� SPSS,� and�
power��We�utilize�some�of�the�basic�concepts�previously�covered�in�Chapters�6�through�9��
New�concepts�to�be�discussed�include�the�following:�scatterplot;�strength�and�direction;�
covariance;� correlation� coefficient;� Fisher’s� Z� transformation;� and� linearity� assumption,�
causation,�and�restriction�of�range�issues��Our�objectives�are�that�by�the�end�of�this�chapter,�
you�will�be�able�to�(a)�understand�the�concepts�underlying�the�correlation�coefficient�and�
correlation�inferential�tests,�(b)�select�the�appropriate�type�of�correlation,�and�(c)�determine�
and�interpret�the�appropriate�correlation�and�inferential�test�
10.1 Scatterplot
Marie,�the�graduate�student�pursuing�a�degree�in�educational�research,�continues�to�work�
diligently�on�her�coursework��Additionally,�as�we�will�once�again�see�in�this�chapter,�Marie�
continues�to�assist�her�faculty�advisor�with�various�research�tasks�
Marie’s� faculty� advisor� received� a� telephone� call� from� Matthew,� the� director� of� mar-
keting�for�the�local�animal�shelter��Based�on�the�donor�list,�it�appears�that�the�donors�
who� contribute� the� largest� donations� also� have� children� and� pets�� In� an� effort� to�
attract�more�donors�to�the�animal�shelter,�Matthew�is�targeting�select�groups—one�of�
which�he�believes�may�be�families�that�have�children�at�home�and�who�also�have�pets��
Matthew�believes�if�there�is�a�relationship�between�these�variables,�he�can�more�easily�
reach� the� intended� audience� with� his� marketing� materials� which� will� then� translate�
into� increased� donations� to� the� animal� shelter�� However,� Matthew� wants� to� base� his�

261Bivariate Measures of Association
decision�on�solid�evidence�and�not�just�a�hunch��Having�built�a�good�knowledge�base�
with� previous� consulting� work,� Marie’s� faculty� advisor� puts� Matthew� in� touch� with�
Marie�� After� consulting� with� Matthew,� Marie� suggests� a� Pearson� correlation� as� the�
test�of�inference�to�test�his�research�question:�Is there a correlation between the number of
children in a family and the number of pets?�Marie’s�task�is�then�to�assist�in�generating�the�
test�of�inference�to�answer�Matthew’s�research�question�
This�section�deals�with�an�important�concept�underlying�the�relationship�among�two�vari-
ables,�the�scatterplot��Later�sections�move�us�into�ways�of�measuring�the�relationship�among�
two� variables�� First,� however,� we� need� to� set� up� the� situation� where� we� have� data� on� two�
different�variables�for�each�of�N�individuals�in�the�population��Table�10�1�displays�such�a�situ-
ation��The�first�column�is�simply�an�index�of�the�individuals�in�the�population,�from�i�=�1,…,�N,�
where� N� is� the� total� number� of� individuals� in� the� population�� The� second� column� denotes�
the�values�obtained�for�the�first�variable�X��Thus,�X1�=�10�means�that�the�first�individual�had�
a�score�of�10�on�variable�X��The�third�column�provides�the�values�for�the�second�variable�Y��
Thus,�Y1�=�20�indicates�that�the�first�individual�had�a�score�of�20�on�variable�Y��In�an�actual�
data�table,�only�the�scores�would�be�shown,�not�the�Xi�and�Yi�notation��Thus,�we�have�a�tabular�
method�for�depicting�the�data�of�a�two-variable�situation�in�Table�10�1�
A�graphical�method�for�depicting�the�relationship�among�two�variables�is�to�plot�the�
pair�of�scores�on�X�and�Y�for�each�individual�on�a�two-dimensional�figure�known�as�a�
scatterplot�(or�scattergram)��Each�individual�has�two�scores�in�a�two-dimensional�coor-
dinate�system,�denoted�by�(X,�Y)��For�example,�our�individual�1�has�the�paired�scores�
of� (10,� 20)�� An� example� scatterplot� is� shown� in� Figure� 10�1�� The� X� axis� (the� horizontal�
Table 10.1
Layout�for�Correlational�Data
Individual X Y
1 X1�=�10 Y1�=�20
2 X2�=�12 Y2�=�28
3 X3�=�20 Y3�=�33
� � �
� � �
� � �
N XN�=�44 YN�=�65
20
Y
10 X
FIGuRe 10.1
Scatterplot�

262 An Introduction to Statistical Concepts
axis�or�abscissa)�represents�the�values�for�variable�X,�and�the�Y�axis�(the�vertical�axis�or�
ordinate)�represents�the�values�for�variable�Y��Each�point�on�the�scatterplot�represents�a�
pair�of�scores�(X,�Y)�for�a�particular�individual��Thus,�individual�1�has�a�point�at�X�=�10�
and�Y�=�20�(the�circled�point)��Points�for�other�individuals�are�also�shown��In�essence,�
the�scatterplot�is�actually�a�bivariate�frequency�distribution��When�there�is�a�moderate�
degree�of�relationship,�the�points�may�take�the�shape�of�an�ellipse�(i�e�,�a�football�shape�
where� the� direction� of� the� relationship,� positive� or� negative,� may� make� the� football�
appear�to�point�up�to�the�right—as�with�a�positive�relation�depicted�in�this�figure),�as�
in�Figure�10�1�
The�scatterplot�allows�the�researcher�to�evaluate�both�the�direction�and�the�strength�of�
the�relationship�among�X�and�Y��The�direction�of�the�relationship�has�to�do�with�whether�
the�relationship�is�positive�or�negative��A�positive�relationship�occurs�when�as�scores�on�
variable�X�increase�(from�left�to�right),�scores�on�variable�Y�also�increase�(from�bottom�to�
top)��Thus,�Figure�10�1�indicates�a�positive�relationship�among�X�and�Y��Examples�of�dif-
ferent�scatterplots�are�shown�in�Figure�10�2��Figure�10�2a�and�d�displays�positive�relation-
ships��A�negative�relationship,�sometimes�called�an�inverse�relationship,�occurs�when�as�
scores�on�variable�X�increase�(from�left�to�right),�scores�on�variable�Y�decrease�(from�top�to�
bottom)��Figure�10�2b�and�e�shows�examples�of�negative�relationships��There�is�no�relation-
ship�between�X�and�Y�when�for�a�large�value�of�X,�a�large�or�a�small�value�of�Y�can�occur,�
and�for�a�small�value�of�X,�a�large�or�a�small�value�of�Y�can�also�occur��In�other�words,�X�
and�Y�are�not�related,�as�shown�in�Figure�10�2c�
The� strength� of� the� relationship� among� X� and� Y� is� determined� by� the� scatter� of� the�
points� (hence� the� name� scatterplot)�� First,� we� draw� a� straight� line� through� the� points�
which�cuts�the�bivariate�distribution�in�half,�as�shown�in�Figures�10�1�and�10�2��In�Chapter�17,�
we�note�that�this�line�is�known�as�the�regression�line��If�the�scatter�is�such�that�the�points�
tend�to�fall�close�to�the�line,�then�this�is�indicative�of�a�strong�relationship�among�X�and�Y��
Figure�10�2a�and�b�denotes�strong�relationships��If�the�scatter�is�such�that�the�points�are�
widely�scattered�around�the�line,�then�this�is�indicative�of�a�weak�relationship�among�
(e)
(c)(a)
(d)
(b)
FIGuRe 10.2
Examples�of�possible�scatterplots�

263Bivariate Measures of Association
X� and� Y�� Figure� 10�2d� and� e� denotes� weak� relationships�� To� summarize� Figure� 10�2,�
part�(a)�represents�a�strong�positive�relationship,�part�(b)�a�strong�negative�relationship,�part�
(c)� no� relationship,� part� (d)� a� weak� positive� relationship,� and� part� (e)� a� weak� negative�
relationship��Thus,�the�scatterplot�is�useful�for�providing�a�quick�visual�indication�of�the�
nature�of�the�relationship�among�variables�X�and�Y�
10.2 Covariance
The�remainder�of�this�chapter�deals�with�statistical�methods�for�measuring�the�relationship�
among�variables�X�and�Y��The�first�such�method�is�known�as�the�covariance��The�covariance�
conceptually� is� the� shared� variance� (or� co-variance)� among� X� and� Y�� The� covariance� and�
correlation� share� commonalities� as� the� correlation� is� simply� the� standardized� covariance��
The�population�covariance�is�denoted�by�σXY,�and�the�conceptual�formula�is�given�as�follows:
σ
µ µ
XY
i X i Y
i
N
X Y
N
=
− −
=
∑( )( )
1
where
Xi�and�Yi�are�the�scores�for�individual�i�on�variables�X�and�Y,�respectively
μX�and�μY�are�the�population�means�for�variables�X�and�Y,�respectively
N�is�the�population�size
This� equation� looks� similar� to� the� computational� formula� for� the� variance� presented� in�
Chapter�3,�where�deviation�scores�from�the�mean�are�computed�for�each�individual��The�
conceptual� formula� for� the� covariance� is� essentially� an� average� of� the� paired� deviation�
score�products��If�variables�X�and�Y�are�positively�related,�then�the�deviation�scores�will�
tend�to�be�of�the�same�sign,�their�products�will�tend�to�be�positive,�and�the�covariance�will�
be�a�positive�value�(i�e�,�σXY�>�0)��If�variables�X�and�Y�are�negatively�related,�then�the�devia-
tion�scores�will�tend�to�be�of�opposite�signs,�their�products�will�tend�to�be�negative,�and�
the�covariance�will�be�a�negative�value�(i�e�,�σXY�<�0)��Finally,�if�variables�X�and�Y�are�not� related,�then�the�deviation�scores�will�consist�of�both�the�same�and�opposite�signs,�their� products�will�be�both�positive�and�negative�and�sum�to�0,�and�the�covariance�will�be�a�zero� value�(i�e�,�σXY�=�0)� The� sample� covariance� is� denoted� by� sXY,� and� the� conceptual� formula� becomes� as� follows: s X X Y Y n XY i i i n = − − − = ∑( )( ) 1 1 where X – �and�Y – �are�the�sample�means�for�variables�X�and�Y,�respectively n�is�sample�size 264 An Introduction to Statistical Concepts Note� that� the� denominator� becomes� n� −� 1� so� as� to� yield� an� unbiased� sample� esti- mate�of�the�population�covariance�(i�e�,�similar�to�what�we�did�in�the�sample�variance� situation)� The� conceptual� formula� is� unwieldy� and� error� prone� for� other� than� small� samples�� Thus,� a� computational� formula� for� the� population� covariance� has� been� developed� as� seen�here: σXY i i i N i i N i i N N X Y X Y N =       −             = = = ∑ ∑ ∑ 1 1 1 2 where�the�first�summation�involves�the�cross�product�of�X�multiplied�by�Y�for�each�indi- vidual� summed� across� all� N� individuals,� and� the� other� terms� should� be� familiar�� The� computational�formula�for�the�sample�covariance�is�the�following: s n X Y X Y n n XY i i i n i i n i i n =       −             − = = = ∑ ∑ ∑ 1 1 1 1( ) where�the�denominator�is�n(n�−�1)�so�as�to�yield�an�unbiased�sample�estimate�of�the�popula- tion�covariance� Table�10�2�gives�an�example�of�a�population�situation�where�a�strong�positive�relation- ship�is�expected�because�as�X�(number�of�children�in�a�family)�increases,�Y�(number�of�pets� in�a�family)�also�increases��Here�σXY�is�computed�as�follows: σXY i i i N i i N i i N N X Y X Y N =       −             = −= = = ∑ ∑ ∑ 1 1 1 5 108 2 ( ) (115 30 25 3 6000 )( ) .= The�sign�indicates� that�the�relationship�between�X�and�Y�is�indeed�positive��That�is,�the� more�children�a�family�has,�the�more�pets�they�tend�to�have��However,�like�the�variance,� Table 10.2 Example�Correlational�Data�(X�=�#�Children,�Y�=�#�Pets) Individual X Y XY X 2 Y 2 Rank X Rank Y (Rank X − Rank Y )2 1 1 2 2 1 4 1 1 0 2 2 6 12 4 36 2 3 1 3 3 4 12 9 16 3 2 1 4 4 8 32 16 64 4 4 0 5 5 10 50 25 100 5 5 0 Sums 15 30 108 55 220 2 265Bivariate Measures of Association the� value� of� the� covariance� depends� on� the� scales� of� the� variables� involved�� Thus,� inter- pretation�of�the�magnitude�of�a�single�covariance�is�difficult,�as�it�can�take�on�literally�any� value�� We� see� shortly� that� the� correlation� coefficient� takes� care� of� this� problem�� For� this� reason,� you� are� only� likely� to� see� the� covariance� utilized� in� the� analysis� of� covariance� (Chapter� 14)� and� advanced� techniques� such� as� structural� equation� modeling� and� multi- level�modeling�(beyond�the�scope�of�this�text)� 10.3 Pearson Product–Moment Correlation Coefficient Other�methods�for�measuring�the�relationship�among�X�and�Y�have�been�developed�that� are�easier�to�interpret�than�the�covariance��We�refer�to�these�measures�as�correlation coeffi- cients��The�first�correlation�coefficient�we�consider�is�the�Pearson product–moment corre- lation coefficient,�developed�by�the�famous�statistician�Karl�Pearson,�and�simply�referred� to�as�the�Pearson�here��The�Pearson�can�be�considered�in�several�different�forms,�where�the� population�value�is�denoted�by�ρXY�(rho)�and�the�sample�value�by�rXY��One�conceptual�form� of�the�Pearson�is�a�product�of�standardized�z�scores�(previously�described�in�Chapter�4)�� This�formula�for�the�Pearson�is�given�as�follows: ρXY X Y i N z z N = = ∑( ) 1 where�zX�and�zY�are�the�z�scores�for�variables�X�and�Y,�respectively,�whose�product�is�taken� for�each�individual,�and�then�summed�across�all�N�individuals� Because�z�scores�are�standardized�versions�of�raw�scores,�then�the�Pearson�correla- tion�is�simply�a�standardized�version�of�the�covariance��The�sign�of�the�Pearson�denotes� the�direction�of�the�relationship�(e�g�,�positive�or�negative),�and�the�value�of�the�Pearson� denotes� the� strength� of� the� relationship�� The� Pearson� falls� on� a� scale� from� −1�00� to� +1�00,� where� −1�00� indicates� a� perfect� negative� relationship,� 0� indicates� no� relation- ship,� and� +1�00� indicates� a� perfect� positive� relationship�� Values� near� �50� or� −�50� are� considered�as�moderate�relationships,�values�near�0�as�weak�relationships,�and�values� near�+1�00�or�−1�00�as�strong�relationships�(although�these�are�subjective�terms)��Cohen� (1988)� also� offers� rules� of� thumb,� which� are� presented� later� in� this� chapter,� for� inter- preting� the� value� of� the� correlation�� As� you� may� see� as� you� read� more� statistics� and� research� methods� textbooks,� there� are� other� guidelines� offered� for� interpreting� the� value�of�the�correlation� There� are� other� forms� of� the� Pearson�� A� second� conceptual� form� of� the� Pearson� is� in� terms�of�the�covariance�and�the�standard�deviations�and�is�given�as�follows: ρ σ σ σ XY XY X Y = 266 An Introduction to Statistical Concepts This�form�is�useful�when�the�covariance�and�standard�deviations�are�already�known��A�final� form�of�the�Pearson�is�the�computational�formula,�written�as�follows: ρXY i i i N i i N i i N i i N N X Y X Y N X =       −               = = = = ∑ ∑ ∑ ∑ 1 1 1 2 1     −                     −       = = = ∑ ∑ ∑X N Y Yi i N i i N i i N 1 2 2 1 1 2        where�all�terms�should�be�familiar�from�the�computational�formulas�of�the�variance�and� covariance��This�is�the�formula�to�use�for�hand�computations,�as�it�is�more�error-free�than� the�other�previously�given�formulas� For�the�example�children-pet�data�given�in�Table�10�2,�we�see�that�the�Pearson�correlation� is�computed�as�follows: ρXY i i i N i i N i i N i i N N X Y X Y N X =       −               = = = = ∑ ∑ ∑ ∑ 1 1 1 2 1     −                     −       = = = ∑ ∑ ∑X N Y Yi i N i i N i i N 1 2 2 1 1 2        = − −  −  = 5 108 15 30 5 55 15 5 220 302 2 ( ) ( )( ) ( ) ( ) ( ) ( ) .99000 Thus,�there�is�a�very�strong�positive�relationship�among�variables�X�(the�number�of�chil- dren)�and�Y�(the�number�of�pets)� The�sample�correlation�is�denoted�by�rXY��The�formulas�are�essentially�the�same�for�the� sample�correlation�rXY�and�the�population�correlation�ρXY,�except�that�n�is�substituted�for�N�� For�example,�the�computational�formula�for�the�sample�correlation�is�noted�here: r n X Y X Y n X XY i i i n i i n i i n i i n =       −               = = = = ∑ ∑ ∑ ∑ 1 1 1 2 1     −                     −       = = = ∑ ∑ ∑X n Y Yi i n i i n i i n 1 2 2 1 1 2        Unlike�the�sample�variance�and�covariance,�the�sample�correlation�has�no�correction�for�bias� 10.4 Inferences About Pearson Product–Moment Correlation Coefficient Once�a�researcher�has�determined�one�or�more�Pearson�correlation�coefficients,�it�is�often� useful�to�know�whether�the�sample�correlations�are�significantly�different�from�0��Thus,� we�need�to�visit�the�world�of�inferential�statistics�again��In�this�section,�we�consider�two� 267Bivariate Measures of Association different� inferential� tests:� first� for� testing� whether� a� single� sample� correlation� is� signifi- cantly�different�from�0�and�second�for�testing�whether�two�independent�sample�correla- tions�are�significantly�different� 10.4.1   Inferences for a Single Sample Our�first�inferential�test�is�appropriate�when�you�are�interested�in�determining�whether� the� correlation� among� variables� X� and� Y� for� a� single� sample� is� significantly� different� from� 0�� For� example,� is� the� correlation� between� the� number� of� years� of� education� and� current�income�significantly�different�from�0?�The�test�of�inference�for�the�Pearson�cor- relation�will�be�conducted�following�the�same�steps�as�those�in�previous�chapters��The� null�hypothesis�is�written�as H0 0: ρ = A�nondirectional�alternative�hypothesis,�where�we�are�willing�to�reject�the�null�if�the�sam- ple�correlation�is�either�significantly�greater�than�or�less�than�0,�is�nearly�always�utilized�� Unfortunately,�the�sampling�distribution�of�the�sample�Pearson�r�is�too�complex�to�be�of� much� value� to� the� applied� researcher�� For� testing� whether� the� correlation� is� different� from�0�(i�e�,�where�the�alternative�hypothesis�is�specified�as�H1:�ρ�≠�0),�a�transformation�of�r� can�be�used�to�generate�a�t-distributed�test�statistic��The�test�statistic�is t r n r = − − 2 1 2 which�is�distributed�as�t�with�ν�=�n�−�2�degrees�of�freedom,�assuming�that�both�X�and�Y� are�normally�distributed�(although�even�if�one�variable�is�normal�and�the�other�is�not,�the� t�distribution�may�still�apply;�see�Hogg�&�Craig,�1970)� There�are�two�assumptions�with�the�Pearson�correlation��First,�the�Pearson�correlation� is� appropriate� only� when� there� is� a� linear� relationship� assumed� between� the� variables� (given�that�both�variables�are�at�least�interval�in�scale)��In�other�words,�when�a�curvilinear� or�some�type�of�polynomial�relationship�is�present,�the�Pearson�correlation�should�not�be� computed�� Testing� for� linearity� can� be� done� by� simply� graphing� a� bivariate� scatterplot� and�reviewing�it�for�a�general�linear�display�of�points��Also,�and�as�we�have�seen�with�the� other�inferential�procedures�discussed�in�previous�chapters,�we�need�to�again�assume�that� the�scores�of�the�individuals�are�independent�of�one�another��For�the�Pearson�correlation,�the� assumption� of� independence� is� met� when� a� random� sample� of� units� have� been� selected� from�the�population� It� should� be� noted� for� inferential� tests� of� correlations� that� sample� size� plays� a� role� in� determining� statistical� significance�� For� instance,� this� particular� test� is� based� on� n� −� 2� degrees� of� freedom�� If� sample� size� is� small� (e�g�,� 10),� then� it� is� difficult� to� reject� the� null� hypothesis�except�for�very�strong�correlations��If�sample�size�is�large�(e�g�,�200),�then�it�is� easy� to� reject� the� null� hypothesis� for� all� but� very� weak� correlations�� Thus,� the� statistical� significance�of�a�correlation�is�definitely�a�function�of�sample�size,�both�for�tests�of�a�single� correlation�and�for�tests�of�two�correlations� Effect�size�and�power�are�always�important,�particularly�here�where�sample�size�plays� such� a� large� role�� Cohen� (1988)� proposed� using� r� as� a� measure� of� effect� size,� using� the� subjective�standard�(ignoring�the�sign�of�the�correlation)�of�r�=��1�as�a�weak�effect,�r�=��3� 268 An Introduction to Statistical Concepts as�a�moderate�effect,�and�r�=��5�as�a�strong�effect��These�standards�were�developed�for�the� behavioral�sciences,�but�other�standards�may�be�used�in�other�areas�of�inquiry��Cohen�also� has�a�nice�series�of�power�tables�in�his�Chapter�3�for�determining�power�and�sample�size� when�planning�a�correlational�study��As�for�confidence�intervals�(CIs),�Wilcox�(1996)�notes� that�“many�methods�have�been�proposed�for�computing�CIs�for�ρ,�but�it�seems�that�a�satis- factory�method�for�applied�work�has�yet�to�be�derived”�(p��303)��Thus,�a�CI�procedure�is�not� recommended,�even�for�large�samples� From�the�example�children-pet�data,�we�want�to�determine�whether�the�sample�Pearson� correlation�is�significantly�different�from�0,�with�a�nondirectional�alternative�hypothesis� and�at�the��05�level�of�significance��The�test�statistic�is�computed�as�follows: t r n r = − − = − − = 2 1 9000 5 2 1 8100 3 57622 . . . The� critical� values� from� Table� A�2� are� ± = ±α2 3 3 182t . �� Thus,� we� would� reject� the� null� hypothesis,� as� the� test� statistic� exceeds� the� critical� value,� and� conclude� the� correlation� among�variables�X�and�Y�is�significantly�different�from�0��In�summary,�there�is�a�strong,� positive,�statistically�significant�correlation�between�the�number�of�children�and�the�num- ber�of�pets� 10.4.2   Inferences for Two Independent Samples In�a�second�situation,�the�researcher�may�have�collected�data�from�two�different�indepen- dent�samples��It�can�be�determined�whether�the�correlations�among�variables�X�and�Y�are� equal�for�these�two�independent�samples�of�observations��For�example,�is�the�correlation� among�height�and�weight�the�same�for�children�and�adults?�Here�the�null�and�alternative� hypotheses�are�written�as H H 0 1 2 1 1 2 0 0 : : ρ ρ ρ ρ − = − ≠ where�ρ1�is�the�correlation�among�X�and�Y�for�sample�1�and�ρ2�is�the�correlation�among�X� and�Y�for�sample�2��However,�because�correlations�are�not�normally�distributed�for�every� value�of�ρ,�a�transformation�is�necessary��This�transformation�is�known�as�Fisher’s�Z�trans- formation,� named� after� the� famous� statistician� Sir� Ronald� A�� Fisher,� which� is� approxi- mately� normally� distributed� regardless� of� the� value� of� ρ�� Table� A�5� is� used� to� convert� a� sample�correlation�r�to�a�Fisher’s�Z�transformed�value��Note�that�Fisher’s�Z�is�a�totally�dif- ferent�statistic�from�any�z�score�or�z�statistic�previously�covered� The�test�statistic�for�this�situation�is z Z Z n n = − − + − 1 2 1 2 1 3 1 3 where n1�and�n2�are�the�sizes�of�the�two�samples Z1�and�Z2�are�the�Fisher’s�Z�transformed�values�for�the�two�samples 269Bivariate Measures of Association The� test� statistic� is� then�compared� to� critical� values� from� the� z� distribution� in� Table� A�1�� For� a� nondirectional� alternative� hypothesis� where� the� two� correlations� may� be� different� in� either� direction,� the� critical� values� are� ± α2z�� Directional� alternative� hypotheses� where� the�correlations�are�different�in�a�particular�direction�can�also�be�tested�by�looking�in�the� appropriate�tail�of�the�z�distribution�(i�e�,�either�+ α1 z�or�− α1 z)� Cohen�(1988)�proposed�a�measure�of�effect�size�for�the�difference�between�two�indepen- dent�correlations�as�q�=�Z1�−�Z2��The�subjective�standards�proposed�(ignoring�the�sign)�are� q�=��1�as�a�weak�effect,�q�=��3�as�a�moderate�effect,�and�q�=��5�as�a�strong�effect�(these�are�the� standards�for�the�behavioral�sciences,�although�standards�vary�across�disciplines)��A�nice� set�of�power�tables�for�planning�purposes�is�contained�in�Chapter�4�of�Cohen��Once�again,� while�CI�procedures�have�been�developed,�none�of�these�have�been�viewed�as�acceptable� (Marascuilo�&�Serlin,�1988;�Wilcox,�2003)� Consider� the� following� example�� Two� samples� have� been� independently� drawn� of� 28� children� (sample� 1)� and� 28� adults� (sample� 2)�� For� each� sample,� the� correlations� among� height�and�weight�were�computed�to�be�rchildren�=��8�and�radults�=��4��A�nondirectional�alter- native�hypothesis�is�utilized�where�the�level�of�significance�is�set�at��05��From�Table�A�5,�we� first�determine�the�Fisher’s�Z�transformed�values�to�be�Zchildren�=�1�099�and�Zadults�=��4236�� Then�the�test�statistic�z�is�computed�as�follows: z Z Z n n = − − + − = − + =1 2 1 2 1 3 1 3 1 099 4236 1 25 1 25 2 3878 . . . From�Table�A�1,�the�critical�values�are�± = ±α2 1 96z . ��Our�decision�then�is�to�reject�the�null� hypothesis�and�conclude�that�height�and�weight�do�not�have�the�same�correlation�for�chil- dren�and�adults��In�other�words,�there�is�a�statistically�significant�difference�of�the�height- weight�correlation�between�children�and�adults�with�a�strong�effect�size�(q�=��6754)��This� inferential�test�assumes�both�variables�are�normally�distributed�for�each�population�and� that�scores�are�independent�across�individuals;�however,�the�procedure�is�not�very�robust� to� nonnormality� as� the� Z� transformation� assumes� normality� (Duncan� &� Layard,� 1973;� Wilcox,�2003;�Yu�&�Dunn,�1982)��Thus,�caution�should�be�exercised�in�using�the�z�test�when� data�are�nonnormal�(e�g�,�Yu�&�Dunn�recommend�the�use�of�Kendall’s�τ�as�discussed�later� in�this�chapter)� 10.5 Assumptions and Issues Regarding Correlations There�are�several�issues�about�the�Pearson�and�other�types�of�correlations�that�you�should� be�aware�of��These�issues�are�concerned�with�the�assumption�of�linearity,�correlation�and� causation,�and�restriction�of�range� 10.5.1   assumptions First,� as� mentioned� previously,� the� Pearson� correlation� assumes� that� the� relationship� among�X�and�Y�is�a�linear relationship.�In�fact,�the�Pearson�correlation,�as�a�measure�of� relationship,�is�really�a�linear�measure�of�relationship��Recall�from�earlier�in�the�chapter� 270 An Introduction to Statistical Concepts the� scatterplots� to� which� we� fit� a� straight� line�� The� linearity� assumption� means� that� a� straight�line�provides�a�reasonable�fit�to�the�data��If�the�relationship�is�not�a�linear�one,� then�the�linearity�assumption�is�violated��However,�these�correlational�methods�can�still� be�computed,�fitting�a�straight�line�to�the�data,�albeit�inappropriately��The�result�of�such� a� violation� is� that� the� strength� of� the� relationship� will� be� reduced�� In� other� words,� the� linear�correlation�will�be�much�closer�to�0�than�the�true�nonlinear�relationship� For�example,�there�is�a�perfect�curvilinear�relationship�shown�by�the�data�in�Figure�10�3,� where�all�of�the�points�fall�precisely�on�the�curved�line��Something�like�this�might�occur�if� you�correlate�age�with�time�in�the�mile�run,�as�younger�and�older�folks�would�take�longer� to�run�this�distance�than�others��If�these�data�are�fit�by�a�straight�line,�then�the�correlation� will�be�severely�reduced,�in�this�case,�to�a�value�of�0�(i�e�,�the�horizontal�straight�line�that� runs�through�the�curved�line)��This�is�another�good�reason�to�always�examine�your�data�� The� computer� may� determine� that� the� Pearson� correlation� among� variables� X� and� Y� is� small�or�around�0��However,�on�examination�of�the�data,�you�might�find�that�the�relation- ship�is�indeed�nonlinear;�thus,�you�should�get�to�know�your�data��We�return�to�the�assess- ment�of�nonlinear�relationships�in�Chapter�17� Second,�the�assumption�of�independence�applies�to�correlations��This�assumption�is�met� when�units�or�cases�are�randomly�sampled�from�the�population� 10.5.2   Correlation and Causality A�second�matter�to�consider�is�an�often-made�misinterpretation�of�a�correlation��Many�indi- viduals�(e�g�,�researchers,�the�public,�and�the�media)�often�infer�a�causal�relationship�from�a� strong�correlation��However,�a�correlation�by�itself�should�never�be�used�to�infer�causation�� In�particular,�a�high�correlation�among�variables�X�and�Y�does�not�imply�that�one�variable�is� causing�the�other;�it�simply�means�that�these�two�variables�are�related�in�some�fashion��There� are�many�reasons�why�variables�X�and�Y�are�highly�correlated��A�high�correlation�could�be� the�result�of�(a)�X�causing�Y,�(b)�Y�causing�X,�(c)�a�third�variable�Z�causing�both�X�and�Y,�or� (d)�even�many�more�variables�being�involved��The�only�methods�that�can�strictly�be�used�to� infer�cause�are�experimental�methods�that�employ�random�assignment�where�one�variable� is� manipulated� by� the� researcher� (the� cause),� a� second� variable� is� subsequently� observed� (the�effect),�and�all�other�variables�are�controlled��[There�are,�however,�some�excellent�quasi- experimental�methods,�propensity�score�analysis�and�regression�discontinuity,�that�can�be� used�in�some�situations�and�that�mimic�random�assignment�and�increase�the�likelihood�of� speaking�to�causal�inference�(Shadish,�Cook,�&�Campbell,�2002)�] FIGuRe 10.3 Nonlinear�relationship� Y X 271Bivariate Measures of Association 10.5.3   Restriction of Range A�final�issue�to�consider�is�the�effect�of�restriction of the range�of�scores�on�one�or�both� variables��For�example,�suppose�that�we�are�interested�in�the�relationship�among�GRE� scores�and�graduate�grade�point�average�(GGPA)��In�the�entire�population�of�students,� the� relationship� might� be� depicted� by� the� scatterplot� shown� in� Figure� 10�4�� Say� the� Pearson�correlation�is�found�to�be��60�as�depicted�by�the�entire�sample�in�the�full�scat- terplot��Now�we�take�a�more�restricted�population�of�students,�those�students�at�highly� selective� Ivy-Covered� University� (ICU)�� ICU� only� admits� students� whose� GRE� scores� are�above�the�cutoff�score�shown�in�Figure�10�4��Because�of�restriction�of�range�in�the� scores�of�the�GRE�variable,�the�strength�of�the�relationship�among�GRE�and�GGPA�at� ICU�is�reduced�to�a�Pearson�correlation�of��20,�where�only�the�subsample�portion�of�the� plot�to�the�right�of�the�cutoff�score�is�involved��Thus,�when�scores�on�one�or�both�vari- ables�are�restricted�due�to�the�nature�of�the�sample�or�population,�then�the�magnitude� of�the�correlation�will�usually�be�reduced�(although�see�an�exception�in�Figure�6�3�from� Wilcox,�2003)� It�is�difficult�for�two�variables�to�be�highly�related�when�one�or�both�variables�have�little� variability��This�is�due�to�the�nature�of�the�formula��Recall�that�one�version�of�the�Pearson� formula�consisted�of�standard�deviations�in�the�denominator��Remember�that�the�standard� deviation�measures�the�distance�of�the�sample�scores�from�the�mean��When�there�is�restric- tion�of�range,�the�distance�of�the�individual�scores�from�the�mean�is�minimized��In�other� words,� there� is� less� variation� or� variability� around� the� mean�� This� translates� to� smaller� correlations�(and�smaller�covariances)��If�the�size�of�the�standard�deviation�for�one�variable� is�reduced,�everything�else�being�equal,�then�the�size�of�correlations�with�other�variables� will�also�be�reduced��In�other�words,�we�need�sufficient�variation�for�a�relationship�to�be� evidenced�through�the�correlation�coefficient�value��Otherwise�the�correlation�is�likely�to� be�reduced�in�magnitude,�and�you�may�miss�an�important�correlation��If�you�must�use�a� restrictive�subsample,�we�suggest�you�choose�measures�of�greater�variability�for�correla- tional�purposes� Outliers,�observations�that�are�different�from�the�bulk�of�the�observations,�also�reduce� the� magnitude� of� correlations�� If� one� observation� is� quite� different� from� the� rest� such� that�it�fell�outside�of�the�ellipse,�then�the�correlation�would�be�smaller�in�magnitude�(e�g�,� closer�to�0)�than�the�correlation�without�the�outlier��We�discuss�outliers�in�this�context� in�Chapter�17� GGPA Cuto� GRE FIGuRe 10.4 Restriction�of�range�example� 272 An Introduction to Statistical Concepts 10.6 Other Measures of Association Thus�far,�we�have�considered�one�type�of�correlation,�the�Pearson�product–moment�cor- relation� coefficient�� The� Pearson� is� most� appropriate� when� both� variables� are� at� least� interval�level��That�is,�both�variables�X�and�Y�are�interval-�and/or�ratio-level�variables�� The�Pearson�is�considered�a�parametric�procedure�given�the�distributional�assumptions� associated� with� it�� If� both� variables� are� not� at� least� interval� level,� then� other� measures� of� association,� considered� nonparametric� procedures,� should� be� considered� as� they� do� not�have�distributional�assumptions�associated�with�them��In�this�section,�we�examine� in� detail� the� Spearman’s� rho� and� phi� types� of� correlation� coefficients� and� briefly� men- tion�several�other�types��While�a�distributional�assumption�for�these�correlations�is�not� necessary,�the�assumption�of�independence�still�applies�(and�thus�a�random�sample�from� the�population�is�assumed)� 10.6.1   Spearman’s Rho Spearman’s�rho�rank�correlation�coefficient�is�appropriate�when�both�variables�are�ordinal� level�� This� type� of� correlation� was� developed� by� Charles� Spearman,� the� famous� quanti- tative� psychologist�� Recall� from� Chapter� 1� that� ordinal� data� are� where� individuals� have� been�rank-ordered,�such�as�class�rank��Thus,�for�both�variables,�either�the�data�are�already� available�in�ranks,�or�the�researcher�(or�computer)�converts�the�raw�data�to�ranks�prior�to� the�analysis� The�equation�for�computing�Spearman’s�rho�correlation�is ρS i i i N X Y N N = − − − = ∑ 1 6 1 2 1 2 ( ) ( ) where ρS�denotes�the�population�Spearman’s�rho�correlation (Xi�−�Yi)�represents�the�difference�between�the�ranks�on�variables�X�and�Y�for�individual�i� The�sample�Spearman’s�rho�correlation�is�denoted�by�rS�where�n�replaces�N,�but�other- wise�the�equation�remains�the�same��In�case�you�were�wondering�where�the�“6”�in�the� equation�comes�from,�you�will�find�an�interesting�article�by�Lamb�(1984)��Unfortunately,� this�particular�computational�formula�is�only�appropriate�when�there�are�no�ties�among� the�ranks�for�either�variable��An�example�of�a�tie�in�rank�would�be�if�two�cases�scored�the� same�value�on�either�X�or�Y��With�ties,�the�formula�given�is�only�approximate,�depending� on�the�number�of�ties��In�the�case�of�ties,�particularly�when�there�are�more�than�a�few,� many�researchers�recommend�using�Kendall’s�τ�(tau)�as�an�alternative�correlation�(e�g�,� Wilcox,�1996)� As� with� the� Pearson� correlation,� Spearman’s� rho� ranges� from� −1�0� to� +1�0�� The� rules� of� thumb�that�we�used�for�interpreting�the�Pearson�correlation�(e�g�,�Cohen,�1988)�can�be�applied� to�Spearman’s�rho�correlation�values�as�well��The�sign�of�the�coefficient�can�be�interpreted�as� with�the�Pearson��A�negative�sign�indicates�that�as�the�values�for�one�variable�increase,�the� values�for�the�other�variable�decrease��A�positive�sign�indicates�that�as�one�variable�increases� in�value,�the�value�of�the�second�variable�also�increases� 273Bivariate Measures of Association As�an�example,�consider�the�children-pets�data�again�in�Table�10�2��To�the�right�of�the�table,� you� see� the� last� three� columns� labeled� as� rank� X,� rank� Y,� and� (rank� X� −� rank� Y)2�� The� raw� scores�were�converted�to�ranks,�where�the�lowest�raw�score�received�a�rank�of�1��The�last�col- umn�lists�the�squared�rank�differences��As�there�were�no�ties,�the�computations�are�as�follows: ρS i i i N X Y N N = − − − = − == ∑ 1 6 1 1 6 2 5 24 9000 2 1 2 ( ) ( ) ( ) ( ) . Thus,�again�there�is�a�strong�positive�relationship�among�variables�X�and�Y��It�is�a�coincidence� that�ρ�=�ρS�for�this�dataset,�but�not�so�for�computational�problem�1�at�the�end�of�this�chapter� To�test�whether�a�sample�Spearman’s�rho�correlation�is�significantly�different�from�0,� we� examine� the� following� null� hypothesis� (the� alternative� hypothesis� would� be� stated� as�H1:�ρS�≠�0): H S0 0: ρ = The�test�statistic�is�given�as t r n r S S = − − 2 1 2 which�is�approximately�distributed�as�a�t�distribution�with�ν�=�n�−�2�degrees�of�freedom� (Ramsey,�1989)��The�approximation�works�best�when�n�is�at�least�10��A�nondirectional�alter- native�hypothesis,�where�we�are�willing�to�reject�the�null�if�the�sample�correlation�is�either� significantly�greater�than�or�less�than�0,�is�nearly�always�utilized��From�the�example,�we� want�to�determine�whether�the�sample�Spearman’s�rho�correlation�is�significantly�different� from�0�at�the��05�level�of�significance��For�a�nondirectional�alternative�hypothesis,�the�test� statistic�is�computed�as t r n r S S = − − = − − = 2 1 9000 5 2 1 81 3 5762 2 . . . where�the�critical�values�from�Table�A�2�are�± = ±α2 3 3 182t . ��Thus,�we�would�reject�the�null� hypothesis� and� conclude� that� the� correlation� is� significantly� different� from� 0,� strong� in� magnitude�(suggested�by�the�value�of�the�correlation�coefficient;�using�Cohen’s�guidelines� for� interpretation� as� an� effect� size,� this� would� be� considered� a� large� effect),� and� positive� in� direction� (evidenced� from� the� sign� of� the� correlation� coefficient)�� The� exact� sampling� distribution�for�when�3�≤�n�≤�18�is�given�by�Ramsey� 10.6.2   kendall’s Tau Another�correlation�that�can�be�computed�with�ordinal�data�is�Kendall’s�tau,�which�also� uses�ranks�of�data�to�calculate�the�correlation�coefficient�(and�has�an�adjustment�for�tied� ranks)�� The� ranking� for� Kendall’s� tau� differs� from� Spearman’s� rho� in� the� following� way�� 274 An Introduction to Statistical Concepts With�Kendall’s�tau,�the�values�for�one�variable�are�rank-ordered,�and�then�the�order�of�the� second�variable�is�examined�to�see�how�many�pairs�of�values�are�out�of�order��A�perfect� positive�correlation�(+1�0)�is�achieved�with�Kendall’s�tau�when�no�scores�are�out�of�order,� and�a�perfect�negative�correlation�(−1�0)�is�obtained�when�all�scores�are�out�of�order��Values� for�Kendall’s�tau�range�from�−1�0�to�+1�0��The�rules�of�thumb�that�we�used�for�interpreting� the�Pearson�correlation�(e�g�,�Cohen,�1988)�can�be�applied�to�Kendall’s�tau�correlation�val- ues�as�well��The�sign�of�the�coefficient�can�be�interpreted�as�with�the�Pearson:�A�negative� sign�indicates�that�as�the�values�for�one�variable�increase,�the�values�for�the�second�vari- able�decrease��A�positive�sign�indicates�that�as�one�variable�increases�in�value,�the�value� of�the�second�variable�also�increases��While�similar�in�some�respects,�Spearman’s�rho�and� Kendall’s�tau�are�based�on�different�calculations,�and,�thus,�finding�different�results�is�not� uncommon�� While� both� are� appropriate� when� ordinal� data� are� being� correlated,� it� has� been�suggested�that�Kendall’s�tau�provides�a�better�estimation�of�the�population�correla- tion�coefficient�value�given�the�sample�data�(Howell,�1997),�especially�with�smaller�sample� sizes�(e�g�,�n�≤�10)� 10.6.3   phi The�phi�coefficient�ϕ�is�appropriate�when�both�variables�are�dichotomous�in�nature�(and�is� statistically� equivalent� to� the� Pearson)�� Recall� from� Chapter� 1� that� a� dichotomous� variable� is�one�consisting�of�only�two�categories�(i�e�,�binary),�such�as�gender,�pass/fail,�or�enrolled/ dropped�out��Thus,�the�variables�being�correlated�would�be�either�nominal�or�ordinal�in�scale�� When�correlating�two�dichotomous�variables,�one�can�think�of�a�2�×�2�contingency�table�as� previously�discussed�in�Chapter�8��For�instance,�to�determine�if�there�is�a�relationship�among� gender�and�whether�students�are�still�enrolled�since�their�freshman�year,�a�contingency�table� like�Table�10�3�can�be�constructed��Here�the�columns�correspond�to�the�two�levels�of�the�enroll- ment�status�variable,�enrolled�(coded�1)�or�dropped�out�(0),�and�the�rows�correspond�to�the� two�levels�of�the�gender�variable,�female�(1)�or�male�(0)��The�cells�indicate�the�frequencies�for� the�particular�combinations�of�the�levels�of�the�two�variables��If�the�frequencies�in�the�cells�are� denoted�by�letters,�then�a�represents�females�who�dropped�out,�b�represents�females�who� are�enrolled,�c�indicates�males�who�dropped�out,�and�d�indicates�males�who�are�enrolled� The�equation�for�computing�the�phi�coefficient�is ρφ = − + + + + ( ) ( )( )( )( ) bc ad a c b d a b c d where�ρϕ�denotes�the�population�phi�coefficient�(for�consistency’s�sake,�although�typically� written�as�ϕ),�and�rϕ�denotes�the�sample�phi�coefficient�using�the�same�equation��Note�that� Table 10.3 Contingency�Table�for�Phi�Correlation Enrollment Status Student Gender Dropped Out (0) Enrolled (1) Female�(1) a�=�5 b�=�20 a�+�b�=�25 Male�(0) c�=�15 d�=�10 c�+�d�=�25 a�+�c�=�20 b�+�d�=�30 a�+�b�+�c�+�d�=�50 275Bivariate Measures of Association the�bc�product�involves�the�consistent�cells,�where�both�values�are�the�same,�either�both�0�or� both�1,�and�the�ad�product�involves�the�inconsistent�cells,�where�both�values�are�different� Using�the�example�data�from�Table�10�3,�we�compute�the�phi�coefficient�to�be�the�following: ρφ = − + + + + = − = ( ) ( )( )( )( ) ( ) ( )( )( )( ) . bc ad a c b d a b c d 300 50 20 30 25 25 40082 Thus,�there�is�a�moderate,�positive�relationship�between�gender�and�enrollment�status��We� see�from�the�table�that�a�larger�proportion�of�females�than�males�are�still�enrolled� To� test� whether� a� sample� phi� correlation� is� significantly� different� from� 0,� we� test� the� following�null�hypothesis�(the�alternative�hypothesis�would�be�stated�as�H1:�ρϕ�≠�0): H0 0: ρφ = The�test�statistic�is�given�as χ φ 2 2= nr which�is�distributed�as�a�χ2�distribution�with�one�degree�of�freedom��From�the�example,� we�want�to�determine�whether�the�sample�phi�correlation�is�significantly�different�from�0� at�the��05�level�of�significance��The�test�statistic�is�computed�as χ φ 2 2 250 4082 8 3314= = =nr (. ) . and�the�critical�value�from�Table�A�3�is�. .05 1 2 3 84χ = ��Thus,�we�would�reject�the�null�hypoth- esis�and�conclude�that�the�correlation�among�gender�and�enrollment�status�is�significantly� different�from�0� 10.6.4   Cramer’s phi When�the�variables�being�correlated�have�more�than�two�categories,�Cramer’s�phi�(Cramer’s�V� in�SPSS)�can�be�computed��Thus,�Cramer’s�phi�is�appropriate�when�both�variables�are�nominal� (and�at�least�one�variable�has�more�than�two�categories)�or�when�one�variable�is�nominal�and� the�other�variable�is�ordinal�(and�at�least�one�variable�has�more�than�two�categories)�� As� with�the�other�correlation�coefficients�that�we�have�discussed,�values�range�from�−1�0�to� +1�0�� Cohen’s� guidelines� (1988)� for� interpreting� the� correlation� in� terms� of� effect� size� can� be� applied�to�Cramer’s�phi�correlations,�as�they�can�with�any�other�correlation�examined� 10.6.5   Other Correlations Other� types� of� correlations� have� been� developed� for� different� combinations� of� types� of� variables,�but�these�are�rarely�used�in�practice�and�are�unavailable�in�most�statistical�pack- ages�(e�g�,�rank�biserial�and�point�biserial)��Table�10�4�provides�suggestions�for�when�dif- ferent�types�of�correlations�are�most�appropriate��We�mention�briefly�the�two�other�types� of�correlations�in�the�table:�the�rank�biserial�correlation�is�appropriate�when�one�variable� is�dichotomous�and�the�other�variable�is�ordinal,�whereas�the�point�biserial�correlation�is� appropriate�when�one�variable�is�dichotomous�and�the�other�variable�is�interval�or�ratio� (statistically�equivalent�to�the�Pearson;�thus,�the�Pearson�correlation�can�be�computed�in� this�situation)� 276 An Introduction to Statistical Concepts 10.7 SPSS Next� let� us� see� what� SPSS� has� to� offer� in� terms� of� measures� of� association� using� the� children-pets�example�dataset��There�are�two�programs�for�obtaining�measures�of�asso- ciation� in� SPSS,� dependent� on� the� measurement� scale� of� your� variables—the� Bivariate� Correlation�program�(for�computing�the�Pearson,�Spearman’s�rho,�and�Kendall’s�tau)�and� the�Crosstabs�program�(for�computing�the�Pearson,�Spearman’s�rho,�Kendall’s�tau,�phi,� Cramer’s�phi,�and�several�other�types�of�measures�of�association)� Bivariate Correlations Step 1:�To�locate�the�Bivariate�Correlations�program,�we�go�to�“Analyze”�in�the�top�pull- down�menu,�then�select�“Correlate,”�and�then�“Bivariate.”�Following�the�screenshot� (step�1),�as�follows,�produces�the�“Bivariate”�dialog�box� A B C Bivariate correlations: Step 1 Table 10.4 Different�Types�of�Correlation�Coefficients Variable X Variable Y Nominal Ordinal Interval/Ratio Nominal Phi�(when�both�variables�are� dichotomous)�or�Cramer’s�V� (when�one�or�both�variables�have� more�than�two�categories) Rank�biserial�or� Cramer’s�V Point�biserial�(Pearson� in lieu�of�point�biserial) Ordinal Rank�biserial�or�Cramer’s�V Spearman’s�rho�or� Kendall’s�tau Spearman’s�rho�or� Kendall’s�tau�or�Pearson Interval/ratio Point�biserial�(Pearson�in�lieu� of point�biserial) Spearman’s�rho�or� Kendall’s�tau�or� Pearson Pearson 277Bivariate Measures of Association Step 2:�Next,�from�the�main�“Bivariate Correlations”�dialog�box,�click�the�variables� to�correlate�(e�g�,�number�of�children�and�pets)�and�move�them�into�the�“Variables”�box� by�clicking�on�the�arrow�button��In�the�bottom�half�of�this�dialog�box,�options�are�available� for�selecting�the�type�of�correlation,�one-�or�two-tailed�test�(i�e�,�directional�or�nondirectional� test),�and�whether�to�flag�statistically�significant�correlations��For�illustrative�purposes,�we� will� place� a� checkmark� to� generate� the�“Pearson”� and�“Spearman’s rho”� correlation� coefficients��We�will�also�select�the�radio�button�for�a�“Two-tailed”�test�of�significance,�and� at�the�very�bottom�check,�we�will�“Flag significant correlations”�(which�simply� means�an�asterisk�will�be�placed�next�to�significant�correlations�in�the�output)� Select the variables of interest from the list on the left and use the arrow to move to the “Variables” box on the right. Clicking on “Options” will allow you to obtain the means, standard deviations, and/or covariances. Place a checkmark in the box that corresponds to the type of correlation to generate. This decision will be based on the measurement scale of your variables. “Test of significance” selected is based on a non- directional (two-tailed) or directional (one-tailed) test. “Flag significant correlations” will generate asterisks in the output for statistically significant correlations. Bivariate correlations: Step 2 Step 3 (optional):� To� obtain� means,� standard� deviations,� and/or� covariances,� as� well�as�options�for�dealing�with�missing�data�(listwise�or�pairwise�deletion),�click�on�the� “Options”�button�located�in�the�top�right�corner�of�the�main�dialog�box� Step 3 278 An Introduction to Statistical Concepts From� the� main� dialog� box,� click� on� “Ok”� to� run� the� analysis� and� to� generate� the� output� Interpreting the output:�The�output�for�generation�of�the�Pearson�and�Spearman’s� rho� bivariate� correlations� between� number� of� children� and� number� of� pets� appears� in� Table� 10�5�� For� illustrative� purposes,� we� asked� for� both� the� Pearson� and� Spearman’s� rho� correlations�(although�the�Pearson�is�the�appropriate�correlation�given�the�measurement� scales�of�our�variables,�we�have�also�generated�the�Spearman’s�rho�so�that�the�output�can� be�reviewed)��Thus,�the�top�Correlations�box�gives�the�Pearson�results�and�the�bottom� Correlations�box�the�Spearman’s�rho�results��In�both�cases,�the�output�presents�the�cor- relation,�sample�size�(N�in�SPSS�language,�although�usually�denoted�as�n�by�everyone�else),� observed�level�of�significance,�and�asterisks�denoting�statistically�significant�correlations�� In� reviewing� Table� 10�5,� we� see� that� SPSS� does� not� provide� any� output� in� terms� of� CIs,� power,� or� effect� size�� Later� in� the� chapter,� we� illustrate� the� use� of� G*Power� for� comput- ing�power��Effect�size�is�easily�interpreted�from�the�correlation�coefficient�value�utilizing� Cohen’s�(1988)�subjective�standards�previously�described,�and�we�have�not�recommended� any�CI�procedures�for�correlations� Table 10.5 SPSS�Results�for�Child—Pet�Data The bivariate Pearson correlations are presented in the top row. The value of “1” indicates the Pearson correlation of the variable with itself. The correlation of interest (relationship of number of children to number of pets) is .900. The asterisk indicates the correlation is statistically significant at an alpha of .05. �e probability is less than 4% (see “Sig. (two-tailed)”) that we would see this relationship by random chance if the relationship between variables was zero (i.e., if the null hypothesis was really true). N represents the total sample size. �e bottom half of the table presents the same information as that presented in the top half. �e results for the same data computed with Spearman’s rho are presented here and interpreted similarly. Children Pets Pearson correlation Sig. (two-tailed) N Pearson correlation Sig. (two-tailed) N Correlation coe�cient Correlations Children Pets Sig. (two-tailed) N Correlation coe�cient Sig. (two-tailed) N * Correlation is significant at the 0.05 level (two-tailed). Children Correlations Pets .900* 5 1 5 1 5 5 .037 .900* .037 * Correlation is significant at the 0.05 level (two-tailed). Pets ChildrenSpearman’s rho 1.000 . 5 .900* .037 5 5 .900* .037 5 . 1.000 279Bivariate Measures of Association Using Scatterplots to Examine Linearity for Bivariate Correlations Step 1:�As�alluded�to�earlier�in�the�chapter,�understanding�the�extent�to�which�linear- ity� is� a� reasonable� assumption� is� an� important� first� step� prior� to� computing� a� Pearson� correlation� coefficient�� To� generate� a� scatterplot,� go� to� “Graphs”� in� the� top� pulldown� menu��From�there,�select�“Legacy Dialogs,”�then�“Scatter/Dot”�(see�screenshot�for� “Scatterplots: Step 1”)� A B C Scatterplots: Step 1 Step 2:� This� will� bring� up� the� “Scatter/Dot”� dialog� box� (see� screenshot� for� “Scatterplots: Step 2”).�The�default�selection�is�“Simple Scatter,”�and�this�is� the�option�we�will�use��Then�click�“Define.” Scatterplots: Step 2 Step 3:� This� will� bring� up� the�“Simple Scatterplot”� dialog� box� (see� screenshot� for�“Scatterplots: Step 3”)��Click�the�dependent�variable�(e�g�,�number�of�pets)�and� move�it�into�the�“Y�Axis”�box�by�clicking�on�the�arrow��Click�the�independent�variable� (e�g�,� number� of� children)� and� move� it� into� the�“X� Axis”� box� by� clicking� on� the� arrow�� Then�click�“Ok.” 280 An Introduction to Statistical Concepts Scatterplots: Step 3 Interpreting linearity evidence:� Scatterplots� are� also� often� examined� to� deter- mine�visual�evidence�of�linearity�prior�to�computing�Pearson�correlations��Scatterplots�are� graphs�that�depict�coordinate�values�of�X�and�Y��Linearity�is�suggested�by�points�that�fall�in� a�straight�line��This�line�may�suggest�a�positive�relation�(as�scores�on�X�increase,�scores�on�Y� increase,�and�vice�versa),�a�negative�relation�(as�scores�on�X�increase,�scores�on�Y�decrease,� and� vice� versa),� little� or� no� relation� (relatively� random� display� of� points),� or� a� polynomial� relation�(e�g�,�curvilinear)��In�this�example,�our�scatterplot�suggests�evidence�of�linearity�and,� more�specifically,�a�positive�relationship�between�number�of�children�and�number�of�pets�� Thus,�proceeding�to�compute�a�bivariate�Pearson�correlation�coefficient�is�reasonable� 10.00 8.00 6.00 4.00 2.00 1.00 2.00 3.00 Number of children 4.00 5.00 N um be r o f p et s 281Bivariate Measures of Association Using Crosstabs to Compute Correlations The�Crosstabs�program�has�already�been�discussed�in�Chapter�8,�but�it�can�also�be�used� for� obtaining� many� measures� of� association� (specifically� Spearman’s� rho,� Kendall’s� tau,� Pearson,�phi�and�Cramer’s�phi)��We�will�illustrate�the�use�of�Crosstabs�for�two�nominal� variables,�thus�generating�phi�and�Cramer’s�phi� Step 1:�To�compute�phi�or�Cramer’s�phi�correlations,�go�to�“Analyze”�in�the�top�pull- down,� then� select� “Descriptive Statistics,”� and� then� select� the� “Crosstabs”� procedure� A B C Phi and Cramers’s phi: Step 1 Step 2:�Select�the�dependent�variable�(if�applicable;�many�times,�there�are�not�depen- dent�and�independent�variables,�per�se,�with�bivariate�correlations,�and�in�those�cases,� determining� which� variable� is� X� and� which� variable� is� Y� is� largely� irrelevant)� and� move�it�into�the�“Row(s)”�box�by�clicking�on�the�arrow�key�[e�g�,�here�we�used�enroll- ment�status�as�the�dependent�variable�(1�=�enrolled;�0�=�not�enrolled)]��Then�select�the� independent�variable�and�move�it�into�the�“Column(s)”�box�[in�this�example,�gender� is�the�independent�variable�(0�=�male;�1�=�female)]� 282 An Introduction to Statistical Concepts Clicking on “Statistics” will allow you to select various statistics to generate (including various measures of association). Select the variable of interest from the list on the left and use the arrow to move to the boxes on the right. If applicable, the dependent variable should be displayed in the row(s) and the independent variable in the column(s). Phi and Cramers’s phi: Step 2 Step 3:�In�the�top�right�corner�of�the�“Crosstabs”�dialog�box�(see�screenshot�step�2),� click� on� the� button� labeled�“Statistics.”� From� here,� you� can� select� various� measures� of�association�(i�e�,�types�of�correlation�coefficients)��Which�correlation�is�selected�should� depend� on� the� measurement� scales� of� your� variables�� With� two� nominal� variables,� the� appropriate�correlation�to�select� is�“Phi and Cramer’s V.”� Click�on�“Continue”�to� return�to�the�main�“Crosstabs”�dialog�box� Clicking on “Correlations” will generate Pearson, Spearman’s rho, and Kendall’s tau correlations. Phi and Cramer’s phi: Step 3 From�the�main�dialog�box,�click�on�“Ok”�to�run�the�analysis�and�generate�the�output� 283Bivariate Measures of Association 10.8 G*Power A� priori� and� post� hoc� power� could� again� be� determined� using� the� specialized� software� described�previously�in�this�text�(e�g�,�G*Power),�or�you�can�consult�a�priori�power�tables� (e�g�,�Cohen,�1988)��As�an�illustration,�we�use�G*Power�to�compute�the�post�hoc�power�of� our�test� Post Hoc Power for the Pearson Bivariate Correlation Using G*Power The�first�thing�that�must�be�done�when�using�G*Power�for�computing�post�hoc�power�is�to� select�the�correct�test�family��In�our�case,�we�conducted�a�Pearson�correlation��To�find�the� Pearson,�we�will�select�“Tests”�in�the�top�pulldown�menu,�then�“Correlations and regression,”� and� then�“Correlations: Bivariate normal model.”� Once� that� selection�is�made,�the�“Test family”�automatically�changes�to�“Exact.” A B C Step 1 284 An Introduction to Statistical Concepts The�“Type of power analysis”�desired�then�needs�to�be�selected��To�compute� post hoc�power,�select�“Post hoc: Compute achieved power—given�α,�sample size, and effect size.” �e default selection for “Test Family” is“t tests.” Following the procedures presented in Step 1 will automatically change the test family to “exact.” �e default selection for “Statistical Test” is “Correlation: Point biserial model.” Following the procedures presented in Step 1 will automatically change the statistical test to “correlation: bivariate normal model.” Step 2 The� “Input Parameters”� must� then� be� specified�� The� first� parameter� is� specifica- tion� of� the� number� of� tail(s)�� For� a� directional� hypothesis,� “One”� is� selected,� and� for� a� nondirectional�hypothesis,�“Two”�is�selected��In�our�example,�we�chose�a�nondirectional� hypothesis�and�thus�will�select�“Two”�tails��We�then�input�the�observed�correlation�coef- ficient�value�in�the�box�for�“Correlation�ρ�H1�”�In�this�example,�our�Pearson�correlation� coefficient�value�was��90��The�alpha�level�we�tested�at�was��05,�the�total�sample�size�was�5,� and� the�“Correlation� ρ�H0”� will� remain� as� the� default� 0� (this� is� the� correlation� value� expected�if�the�null�hypothesis�is�true;�in�other�words,�there�is�zero�correlation�between� variables� given� the� null� hypothesis)�� Once� the� parameters� are� specified,� simply� click� on� “Calculate”�to�generate�the�power�results� 285Bivariate Measures of Association The “Input Parameters” for computing post hoc power must be specified for: 1. One or two tailed test 2. Observed correlation coefficient value 3. Alpha level 4. Total sample size 5. Hypothesized correlation coefficient value Once the parameters are specified, click on “Calculate.” The�“Output Parameters”�provide�the�relevant�statistics�given�the�input�just�speci- fied��In�this�example,�we�were�interested�in�determining�post�hoc�power�for�a�Pearson�cor- relation�given�a�two-tailed�test,�with�a�computed�correlation�value�of��90,�an�alpha�level�of� �05,�total�sample�size�of�5,�and�a�null�hypothesis�correlation�value�of�0� Based� on� those� criteria,� the� post� hoc� power� was� �67�� In� other� words,� with� a� two-tailed� test,�an�observed�Pearson�correlation�of��90,�an�alpha�level�of��05,�sample�size�of�5,�and�a� null� hypothesis� correlation� value� of� 0,� the� power� of� our� test� was� �67—the� probability� of� rejecting�the�null�hypothesis�when�it�is�really�false�(in�this�case,�the�probability�that�there� is�not�a�zero�correlation�between�our�variables)�was�67%,�which�is�slightly�less�than�what� would�be�usually�considered�sufficient�power�(sufficient�power�is�often��80�or�above)��Keep� in�mind�that�conducting�power�analysis�a�priori�is�recommended�so�that�you�avoid�a�situ- ation�where,�post�hoc,�you�find�that�the�sample�size�was�not�sufficient�to�reach�the�desired� level�of�power�(given�the�observed�parameters)� 286 An Introduction to Statistical Concepts 10.9 Template and APA-Style Write-Up Finally�we�conclude�the�chapter�with�a�template�and�an�APA-style�paragraph�detailing�the� results�from�an�example�dataset� Pearson Correlation Test As� you� may� recall,� our� graduate� research� assistant,� Marie,� was� working� with� the� mar- keting� director� of� the� local� animal� shelter,� Matthew�� Marie’s� task� was� to� assist� Matthew� in�generating�the�test�of�inference�to�answer�his�research�question,�“Is there a relationship between the number of children in a family and the number of pets”?�A�Pearson�correlation�was� the�test�of�inference�suggested�by�Marie��A�template�for�writing�a�research�question�for�a� correlation�(regardless�of�which�type�of�correlation�coefficient�is�computed)�is�presented� in�the�following: Is There a Correlation Between [Variable 1] and [Variable 2]? It� may� be� helpful� to� include� in� the� results� information� on� the� extent� to� which� the� assumptions�were�met�(recall�there�are�two�assumptions:�independence�and�linearity)�� This�assists�the�reader�in�understanding�that�you�were�thorough�in�data�screening�prior� to�conducting�the�test�of�inference��Recall�that�the�assumption�of�independence�is�met� when�the�cases�in�our�sample�have�been�randomly�selected�from�the�population��One� or�two�sentences�are�usually�sufficient�to�indicate�if�the�assumptions�are�met��It�is�also� important� to� address� effect� size� in� the� write-up�� Correlations� are� unique� in� that� they� are� already� effect� size� measures,� so� computing� an� effect� size� in� addition� to� the� cor- relation�value�is�not�needed��However,�it�is�desirable�to�interpret�the�correlation�value� as� an�effect�size�� Effect�size�is�easily�interpreted�from�the�correlation�coefficient�value� utilizing�Cohen’s�(1988)�subjective�standards�previously�described��Here�is�an�APA-style� example�paragraph�of�results�for�the�correlation�between�number�of�children�and�num- ber�of�pets� A Pearson correlation coefficient was computed to determine if there is a relationship between the number of children in a family and the number of pets in the family. The test was conducted using an alpha of .05. The null hypothesis was that the relationship would be 0. The assumption of independence was met via random selection. The assump- tion of linearity was reasonable given a review of a scatterplot of the variables. The Pearson correlation between children and pets is .90, which is positive, is interpreted as a large effect size (Cohen, 1988), and is statistically different from 0 (r =�.90,�n�= 5, p = .037).�Thus, the null hypothesis that the correlation is 0 was rejected at the .05 level of significance. There is a strong, positive correlation between the number of children in a family and the number of pets in the family. 287Bivariate Measures of Association 10.10 Summary In�this�chapter,�we�described�various�measures�of�the�association�or�correlation�among�two� variables�� Several� new� concepts� and� descriptive� and� inferential� statistics� were� discussed�� The� new� concepts� covered� were� as� follows:� scatterplot;� strength� and� direction;� covariance;� correlation� coefficient;� Fisher’s� Z� transformation;� and� linearity� assumption,� causation,� and� restriction�of�range�issues��We�began�by�introducing�the�scatterplot�as�a�graphical�method�for� visually�depicting�the�association�among�two�variables��Next�we�examined�the�covariance�as� an�unstandardized�measure�of�association��Then�we�considered�the�Pearson�product–moment� correlation�coefficient,�first�as�a�descriptive�statistic�and�then�as�a�method�for�making�infer- ences�when�there�are�either�one�or�two�samples�of�observations��Some�important�issues�about� the�correlational�measures�were�also�discussed��Finally,�a�few�other�measures�of�association� were�introduced,�in�particular,�the�Spearman’s�rho�and�Kendall’s�tau�rank-order�correlation� coefficients�and�the�phi�and�Cramer’s�phi�coefficients��At�this�point,�you�should�have�met�the� following�objectives:�(a)�be�able�to�understand�the�concepts�underlying�the�correlation�coef- ficient�and�correlation�inferential�tests,�(b)�be�able�to�select�the�appropriate�type�of�correlation,� and�(c)�be�able�to�determine�and�interpret�the�appropriate�correlation�and�correlation�inferen- tial�test��In�Chapter�11,�we�discuss�the�one-factor�analysis�of�variance,�the�logical�extension�of� the�independent�t�test,�for�assessing�mean�differences�among�two�or�more�groups� Problems Conceptual problems 10.1� The�variance�of�X�is�9,�the�variance�of�Y�is�4,�and�the�covariance�between�X�and�Y�is�2�� What�is�rXY? � a�� �039 � b�� �056 � c�� �233 � d�� �333 10.2� The�standard�deviation�of�X�is�20,�the�standard�deviation�of�Y�is�50,�and�the�covari- ance�between�X�and�Y�is�30��What�is�rXY? � a�� �030 � b�� �080 � c�� �150 � d�� �200 10.3� Which�of�the�following�correlation�coefficients,�each�obtained�from�a�sample�of�1000� children,�indicates�the�weakest�relationship? � a�� −�90 � b�� −�30 � c�� +�20 � d�� +�80 288 An Introduction to Statistical Concepts 10.4� �Which�of�the�following�correlation�coefficients,�each�obtained�from�a�sample�of�1000� children,�indicates�the�strongest�relationship? � a�� −�90 � b�� −�30 � c�� +�20 � d�� +�80 10.5� �If�the�relationship�between�two�variables�is�linear,�which�of�the�following�is�neces- sarily�true? � a�� The�relation�can�be�most�accurately�represented�by�a�straight�line� � b�� All�the�points�will�fall�on�a�curved�line� � c�� The�relationship�is�best�represented�by�a�curved�line� � d�� All�the�points�must�fall�exactly�on�a�straight�line� 10.6� �In� testing� the� null� hypothesis� that� a� correlation� is� equal� to� 0,� the� critical� value� decreases�as�α�decreases��True�or�false? 10.7� �If�the�variances�of�X�and�Y�are�increased,�but�their�covariance�remains�constant,�the� value�of�rXY�will�be�unchanged��True�or�false? 10.8� �We�compute�rXY�=��50�for�a�sample�of�students�on�variables�X�and�Y��I�assert�that�if� the�low-scoring�students�on�variable�X�are�removed,�then�the�new�value�of�rXY�would� most�likely�be�less�than��50��Am�I�correct? 10.9� �Two�variables�are�linearly�related�such�that�there�is�a�perfect�relationship�between�X� and�Y��I�assert�that�rXY�must�be�equal�to�either�+1�00�or�−1�00��Am�I�correct? 10.10� �If� the� number� of� credit� cards� owned� and� the� number� of� cars� owned� are� strongly positively�correlated,�then�those�with�more�credit�cards�tend�to�own�more�cars��True� or�false? 10.11� �If� the� number� of� credit� cards� owned� and� the� number� of� cars� owned� are� strongly negatively� correlated,� then� those� with� more� credit� cards� tend� to� own� more� cars�� True�or�false? 10.12� �A�statistical�consultant�at�a�rival�university�found�the�correlation�between�GRE-Q� scores� and� statistics� grades� to� be� +2�0�� I� assert� that� the� administration� should� be� advised�to�congratulate�the�students�and�faculty�on�their�great�work�in�the�class- room��Am�I�correct? 10.13� �If� X� correlates� significantly� with� Y,� then� X� is� necessarily� a� cause� of� Y�� True� or� false? 10.14� �A�researcher�wishes�to�correlate�the�grade�students�earned�from�a�pass/fail�course� (i�e�,�pass�or�fail)�with�their�cumulative�GPA��Which�is�the�most�appropriate�correla- tion�coefficient�to�examine�this�relationship? � a�� Pearson � b�� Spearman’s�rho�or�Kendall’s�tau � c�� Phi � d�� None�of�the�above 10.15� �If�both�X�and�Y�are�ordinal�variables,�then�the�most�appropriate�measure�of�associa- tion�is�the�Pearson��True�or�false? 289Bivariate Measures of Association Computational problems 10.1� You�are�given�the�following�pairs�of�sample�scores�on�X�(number�of�credit�cards�in� your�possession)�and�Y�(number�of�those�credit�cards�with�balances): X Y 5 4 6 1 4 3 8 7 2 2 � a�� Graph�a�scatterplot�of�the�data� � b�� Compute�the�covariance� � c�� Determine�the�Pearson�product–moment�correlation�coefficient� � d�� Determine�the�Spearman’s�rho�correlation�coefficient� 10.2� If� rXY� =� �17� for� a� random� sample� of� size� 84,� test� the� hypothesis� that� the� population� Pearson�is�significantly�different�from�0�(conduct�a�two-tailed�test�at�the��05�level�of� significance)� 10.3� If� rXY� =� �60� for� a� random� sample� of� size� 30,� test� the� hypothesis� that� the� population� Pearson�is�significantly�different�from�0�(conduct�a�two-tailed�test�at�the��05�level�of� significance)� 10.4� The�correlation�between�vocabulary�size�and�mother’s�age�is��50�for�12�rural�children� and��85�for�17�inner-city�children��Does�the�correlation�for�rural�children�differ�from� that�of�the�inner-city�children�at�the��05�level�of�significance? 10.5� You�are�given�the�following�pairs�of�sample�scores�on�X�(number�of�coins�in�posses- sion)�and�Y�(number�of�bills�in�possession): X Y 2 1 3 3 4 5 5 5 6 3 7 1 � a�� Graph�a�scatterplot�of�the�data� � b�� Describe�the�relationship�between�X�and�Y� � c�� What�do�you�think�the�Pearson�correlation�will�be? 290 An Introduction to Statistical Concepts 10.6� Six� adults� were� assessed� on� the� number� of� minutes� it� took� to� read� a� government� report�(X)�and�the�number�of�items�correct�on�a�test�of�the�content�of�that�report�(Y)�� Use�the�following�data�to�determine�the�Pearson�correlation�and�the�effect�size� X Y 10 17 8 17 15 13 12 16 14 15 16 12 10.7� Ten�kindergarten�children�were�observed�on�the�number�of�letters�written�in�proper� form�(given�26�letters)�(X)�and�the�number�of�words�that�the�child�could�read�(given� 50�words)�(Y)��Use�the�following�data�to�determine�the�Pearson�correlation�and�the� effect�size� X Y 10 5 16 8 22 40 8 15 12 28 20 37 17 29 21 30 15 18 9 4 Interpretive problems 10.1� Select�two�interval/ratio�variables�from�the�survey�1�dataset�on�the�website��Use�SPSS� to�generate�the�appropriate�correlation,�determine�statistical�significance,�interpret�the� correlation�value�(including�interpretation�as�an�effect�size),�and�examine�and�inter- pret�the�scatterplot� 10.2� Select� two� ordinal� variables� from� the� survey� 1� dataset� on� the� website�� Use� SPSS� to�generate�the�appropriate�correlation,�determine�statistical�significance,�interpret� the�correlation�value�(including�interpretation�as�an�effect�size),�and�examine�and� interpret�the�scatterplot� 10.3� Select�one�ordinal�variable�and�one�interval/ratio�variable�from�the�survey�1�dataset� on�the�website��Use�SPSS�to�generate�the�appropriate�correlation,�determine�statisti- cal�significance,�interpret�the�correlation�value�(including�interpretation�as�an�effect� size),�and�examine�and�interpret�the�scatterplot� 10.4� Select� one� dichotomous� variable� and� one� interval/ratio� variable� from� the� survey� 1� dataset�on�the�website��Use�SPSS�to�generate�the�appropriate�correlation,�determine� statistical�significance,�interpret�the�correlation�value�(including�interpretation�as�an� effect�size),�and�examine�and�interpret�the�scatterplot� 291 11 One-Factor Analysis of Variance: Fixed-Effects Model Chapter Outline 11�1� Characteristics�of�the�One-Factor�ANOVA�Model 11�2� Layout�of�Data 11�3� ANOVA�Theory � 11�3�1� General�Theory�and�Logic � 11�3�2� Partitioning�the�Sums�of�Squares � 11�3�3� ANOVA�Summary�Table 11�4� ANOVA�Model � 11�4�1� Model � 11�4�2� Estimation�of�the�Parameters�of�the�Model � 11�4�3� Effect�Size�Measures,�Confidence�Intervals,�and�Power � 11�4�4� Example � 11�4�5� Expected�Mean�Squares 11�5� Assumptions�and�Violation�of�Assumptions � 11�5�1� Independence � 11�5�2� Homogeneity�of�Variance � 11�5�3� Normality 11�6� Unequal�n’s�or�Unbalanced�Design 11�7� Alternative�ANOVA�Procedures � 11�7�1� Kruskal–Wallis�Test � 11�7�2� Welch,�Brown–Forsythe,�and�James�Procedures 11�8� SPSS�and�G*Power 11�9� Template�and�APA-Style�Write-Up Key Concepts � 1�� Between-�and�within-groups�variability � 2�� Sources�of�variation � 3�� Partitioning�the�sums�of�squares � 4�� The�ANOVA�model � 5�� Expected�mean�squares 292 An Introduction to Statistical Concepts In�the�last�five�chapters,�our�discussion�has�dealt�with�various�inferential�statistics,�includ- ing�inferences� about� means��The�next�six� chapters�are�concerned�with� different� analysis� of�variance�(ANOVA)�models��In�this�chapter,�we�consider�the�most�basic�ANOVA�model,� known� as� the� one-factor� ANOVA� model�� Recall� the� independent� t� test� from� Chapter� 7� where� the� means� from� two� independent� samples� were� compared�� What� if� you� wish� to� compare� more� than� two� means?� The� answer� is� to� use� the� analysis of variance�� At� this� point,�you�may�be�wondering�why�the�procedure�is�called�the�analysis�of�variance�rather� than�the�analysis�of�means,�because�the�intent�is�to�study�possible�mean�differences��One� way�of�comparing�a�set�of�means�is�to�think�in�terms�of�the�variability�among�those�means�� If�the�sample�means�are�all�the�same,�then�the�variability�of�those�means�would�be�0��If�the� sample�means�are�not�all�the�same,�then�the�variability�of�those�means�would�be�somewhat� greater�than�0��In�general,�the�greater�the�mean�differences�are,�the�greater�is�the�variabil- ity�of�the�means��Thus,�mean�differences�are�studied�by�looking�at�the�variability�of�the� means;�hence,�the�term�analysis�of�variance�is�appropriate�rather�than�analysis�of�means� (further�discussed�in�this�chapter)� We� use� X� to� denote� our� single� independent variable,� which� we� typically� refer� to� as� a� factor,� and� Y� to� denote� our� dependent� (or� criterion)� variable�� Thus,� the� one-factor� ANOVA� is� a� bivariate,� or� two-variable,� procedure�� Our� interest� here� is� in� determin- ing�whether�mean�differences�exist�on�the�dependent�variable��Stated�another�way,�the� researcher� is� interested� in� the� influence� of� the� independent� variable� on� the� dependent� variable�� For� example,� a� researcher� may� want� to� determine� the� influence� that� method� of�instruction�has�on�statistics�achievement��The�independent�variable,�or�factor,�would� be� method� of� instruction� and� the� dependent� variable� would� be� statistics� achievement�� Three� different� methods� of� instruction� that� might� be� compared� are� large� lecture� hall� instruction,�small-group�instruction,�and�computer-assisted�instruction��Students�would� be�randomly�assigned�to�one�of�the�three�methods�of�instruction�and�at�the�end�of�the� semester�evaluated�as�to�their�level�of�achievement�in�statistics��These�results�would�be�of� interest�to�a�statistics�instructor�in�determining�the�most�effective�method�of�instruction� (where�“effective”�is�measured�by�student�performance�in�statistics)��Thus,�the�instructor� may�opt�for�the�method�of�instruction�that�yields�the�highest�mean�achievement� There� are� a� number� of� new� concepts� introduced� in� this� chapter� as� well� as� a� refresher� of� concepts� that� have� been� covered� in� previous� chapters�� The� concepts� addressed� in� this� chapter� include� the� following:� independent� and� dependent� variables;� between-� and� within-groups�variability;�fixed�and�random�effects;�the�linear�model;�partitioning�of�the� sums�of�squares;�degrees�of�freedom,�mean�square�terms,�and�F�ratios;�the�ANOVA�sum- mary� table;� expected� mean� squares;� balanced� and� unbalanced� models;� and� alternative� ANOVA�procedures��Our�objectives�are�that�by�the�end�of�this�chapter,�you�will�be�able�to� (a)�understand�the�characteristics�and�concepts�underlying�a�one-factor�ANOVA,�(b)�gener- ate�and�interpret�the�results�of�a�one-factor�ANOVA,�and�(c)�understand�and�evaluate�the� assumptions�of�the�one-factor�ANOVA� 11.1 Characteristics of One-Factor ANOVA Model We� have� been� following� Marie,� our� very� capable� educational� research� graduate� student,� as�she�develops�her�statistical�skills��As�we�will�see,�Marie�is�embarking�on�a�very�exciting� research�adventure�of�her�own� 293One-Factor Analysis of Variance: Fixed-Effects Model Marie�is�enrolled�in�an�independent�study�class��As�part�of�the�course�requirement,�she� has�to�complete�a�research�study��In�collaboration�with�the�statistics�faculty�in�her�pro- gram,�Marie�designs�an�experimental�study�to�determine�if�there�is�a�mean�difference� in�student�attendance�in�the�statistics�lab�based�on�the�attractiveness�of�the�statistics�lab� instructor��Marie’s�research�question�is:�Is there a mean difference in the number of statistics labs attended by students based on the attractiveness of the lab instructor?�Marie�determined� that�a�one-way�ANOVA�was�the�best�statistical�procedure�to�use�to�answer�her�ques- tion��Her�next�task�is�to�collect�and�analyze�the�data�to�address�her�research�question� This� section� describes� the� distinguishing� characteristics� of� the� one-factor� ANOVA� model�� Suppose�you�are�interested�in�comparing�the�means�of�two�independent�samples��Here�the� independent�t�test�would�be�the�method�of�choice�(or�perhaps�the�Welch�t′�test)��What�if�your� interest�is�in�comparing�the�means�of�more�than�two�independent�samples?�One�possibility� is�to�conduct�multiple�independent�t�tests�on�each�pair�of�means