SPSS questions

please answer these questions “STATISTICAL QUESTIONS based of SPSS knowledge.

Richard G. Lomax
The Ohio State University
Debbie L. Hahs-Vaughn
University of Central Florida

Don't use plagiarized sources. Get Your Custom Essay on

SPSS questions

Just from $13/Page

Order Essay

Routledge
Taylor & Francis Group
711 Third Avenue
New York, NY 10017
Routledge
Taylor & Francis Group
27 Church Road
Hove, East Sussex BN3 2FA
© 2012 by Taylor & Francis Group, LLC
Routledge is an imprint of Taylor & Francis Group, an Informa business
Printed in the United States of America on acid-free paper
Version Date: 20111003
International Standard Book Number: 978-0-415-88005-3 (Hardback)
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://
www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923,
978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For
organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Library of Congress Cataloging‑in‑Publication Data
Lomax, Richard G.
An introduction to statistical concepts / Richard G. Lomax, Debbie L. Hahs-Vaughn. — 3rd ed.
p. cm.
Includes bibliographical references and index.
ISBN 978-0-415-88005-3
1. Statistics. 2. Mathematical statistics. I. Hahs-Vaughn, Debbie L. II. Title.
QA276.12.L67 2012
519.5–dc23 2011035052
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the Psychology Press Web site at
http://www.psypress.com

http://www.copyright.com/

http://www.taylorandfrancis.com

http://www.psypress.com

www.copyright.com

This book is dedicated to our families
and to all of our former students.

vii
Contents
Preface�� xiii
Acknowledgments�� xvii
1. Introduction�� 1
1�1� What�Is�the�Value�of�Statistics?�� 3
1�2� Brief�Introduction�to�History�of�Statistics�� 4
1�3� General�Statistical�Definitions�� 5
1�4� Types�of�Variables�� 7
1�5� Scales�of�Measurement�� 8
1�6� Summary�� 13
Problems�� 14
2. Data Representation�� 17
2�1� �Tabular�Display�of�Distributions�� 18
2�2� �Graphical�Display�of�Distributions�� 23
2�3� �Percentiles�� 29
2�4� �SPSS�� 33
2�5� �Templates�for�Research�Questions�and�APA-Style�Paragraph�� 41
2�6� �Summary�� 42
Problems�� 43
3. Univariate Population Parameters and Sample Statistics�� 49
3�1� �Summation�Notation�� 50
3�2� Measures�of�Central�Tendency�� 51
3�3� �Measures�of�Dispersion�� 56
3�4� �SPSS�� 65
3�5� �Templates�for�Research�Questions�and�APA-Style�Paragraph�� 69
3�6� �Summary�� 70
Problems�� 71
4. Normal Distribution and Standard Scores�� 77
4�1� �Normal�Distribution�� 78
4�2� �Standard�Scores�� 84
4�3� �Skewness�and�Kurtosis�Statistics�� 87
4�4� �SPSS�� 91
4�5� �Templates�for�Research�Questions�and�APA-Style�Paragraph�� 98
4�6� �Summary�� 99
Problems�� 99
5. Introduction to Probability and Sample Statistics�� 105
5�1� �Brief�Introduction�to�Probability�� 106
5�2� �Sampling�and�Estimation�� 109
5�3� �Summary�� 117
� Appendix:�Probability�That�at�Least�Two Individuals�Have�the�Same�Birthday�� 117
Problems�� 118

viii Contents
6. Introduction to Hypothesis Testing: Inferences About a Single Mean�� 121
6�1� Types�of�Hypotheses�� 122
6�2� Types�of�Decision�Errors�� 124
6�3� Level�of�Significance�(α)�� 127
6�4� Overview�of�Steps�in�Decision-Making�Process�� 129
6�5� Inferences�About�μ�When�σ�Is�Known�� 130
6�6� Type�II�Error�(β)�and�Power�(1�−�β)�� 134
6�7� Statistical�Versus�Practical�Significance�� 138
6�8� Inferences�About�μ�When�σ�Is�Unknown�� 139
6�9� SPSS�� 145
6�10� G*Power�� 149
6�11� Template�and�APA-Style�Write-Up�� 155
6�12� Summary�� 156
Problems�� 157
7. Inferences About the Difference Between Two Means�� 163
7�1� �New�Concepts�� 164
7�2� �Inferences�About�Two�Independent�Means�� 166
7�3� �Inferences�About�Two�Dependent�Means�� 176
7�4� �SPSS�� 180
7�5� �G*Power�� 192
7�6� �Template�and�APA-Style�Write-Up�� 195
7�7� �Summary�� 198
Problems�� 198
8. Inferences About Proportions�� 205
8�1� �Inferences�About�Proportions�Involving�Normal�Distribution�� 206
8�2� �Inferences�About�Proportions�Involving�Chi-Square�Distribution�� 217
8�3� �SPSS�� 224
8�4� �G*Power�� 231
8�5� �Template�and�APA-Style�Write-Up�� 234
8�6� �Summary�� 236
Problems�� 237
9. Inferences About Variances�� 241
9�1� �New�Concepts�� 242
9�2� �Inferences�About�Single�Variance�� 244
9�3� �Inferences�About�Two�Dependent�Variances�� 246
9�4� Inferences�About�Two�or�More�Independent�Variances�(Homogeneity�
of Variance�Tests)�� 248
9�5� �SPSS�� 252
9�6� �Template�and�APA-Style�Write-Up�� 253
9�7� �Summary�� 253
Problems�� 254

ixContents
10. Bivariate Measures of Association�� 259
10�1� �Scatterplot�� 260
10�2� �Covariance�� 263
10�3� �Pearson�Product–Moment�Correlation�Coefficient�� 265
10�4� �Inferences�About�Pearson�Product–Moment�Correlation�Coefficient�� 266
10�5� �Assumptions�and�Issues�Regarding�Correlations�� 269
10�6� �Other�Measures�of�Association�� 272
10�7� �SPSS�� 276
10�8� �G*Power�� 283
10�9� �Template�and�APA-Style�Write-Up�� 286
10�10� �Summary�� 287
Problems�� 287
11. One-Factor Analysis of Variance: Fixed-Effects Model�� 291
11�1� �Characteristics�of�One-Factor�ANOVA�Model�� 292
11�2� �Layout�of�Data�� 296
11�3� �ANOVA�Theory�� 296
11�4� �ANOVA�Model�� 302
11�5� �Assumptions�and�Violation�of�Assumptions�� 309
11�6� �Unequal�n’s�or�Unbalanced�Procedure�� 312
11�7� �Alternative�ANOVA�Procedures�� 312
11�8� �SPSS�and�G*Power�� 313
11�9� �Template�and�APA-Style�Write-Up�� 334
11�10� �Summary�� 336
Problems�� 336
12. Multiple Comparison Procedures�� 341
12�1� �Concepts�of�Multiple�Comparison�Procedures�� 342
12�2� �Selected�Multiple�Comparison�Procedures�� 348
12�3� �SPSS�� 362
12�4� �Template�and�APA-Style�Write-Up�� 366
12�5� �Summary�� 366
Problems�� 367
13. Factorial Analysis of Variance: Fixed-Effects Model�� 371
13�1� �Two-Factor�ANOVA�Model�� 372
13�2� �Three-Factor�and�Higher-Order�ANOVA�� 390
13�3� �Factorial�ANOVA�With�Unequal�n’s�� 393
13�4� �SPSS�and�G*Power�� 395
13�5� �Template�and�APA-Style�Write-Up�� 417
13�6� �Summary�� 419
Problems�� 420
14. Introduction to Analysis of Covariance: One- Factor Fixed-Effects Model
With Single Covariate�� 427
14�1� �Characteristics�of�the�Model�� 428
14�2� �Layout�of�Data�� 431
14�3� �ANCOVA�Model�� 431

x Contents
14�4� �ANCOVA�Summary�Table�� 432
14�5� �Partitioning�the�Sums�of�Squares�� 433
14�6� �Adjusted�Means�and�Related�Procedures�� 434
14�7� �Assumptions�and�Violation�of�Assumptions�� 436
14�8� �Example�� 441
14�9� �ANCOVA�Without�Randomization�� 443
14�10� �More�Complex�ANCOVA�Models�� 444
14�11� �Nonparametric�ANCOVA�Procedures�� 444
14�12� �SPSS�and�G*Power�� 445
14�13� �Template�and�APA-Style�Paragraph�� 469
14�14� �Summary�� 471
Problems�� 471
15. Random- and Mixed-Effects Analysis of Variance Models�� 477
15�1� �One-Factor�Random-Effects�Model�� 478
15�2� �Two-Factor�Random-Effects�Model�� 483
15�3� �Two-Factor�Mixed-Effects�Model�� 488
15�4� �One-Factor�Repeated�Measures�Design�� 493
15�5� �Two-Factor�Split-Plot�or�Mixed�Design�� 500
15�6� �SPSS�and�G*Power�� 508
15�7� �Template�and�APA-Style�Write-Up�� 548
15�8� �Summary�� 551
Problems�� 551
16. Hierarchical and Randomized Block Analysis of Variance Models�� 557
16�1� �Two-Factor�Hierarchical�Model�� 558
16�2� �Two-Factor�Randomized�Block�Design�for�n�=�1�� 566
16�3� �Two-Factor�Randomized�Block�Design�for�n�>�1�� 574
16�4� �Friedman�Test�� 574
16�5� �Comparison�of�Various�ANOVA�Models�� 575
16�6� �SPSS�� 576
16�7� �Template�and�APA-Style�Write-Up�� 603
16�8� �Summary�� 605
Problems�� 605
17. Simple Linear Regression�� 611
17�1� �Concepts�of�Simple�Linear�Regression�� 612
17�2� �Population�Simple�Linear�Regression�Model�� 614
17�3� �Sample�Simple�Linear�Regression�Model�� 615
17�4� �SPSS�� 634
17�5� �G*Power�� 647
17�6� �Template�and�APA-Style�Write-Up�� 650
17�7� �Summary�� 652
Problems�� 652

xiContents
18. Multiple Regression�� 657
18�1� Partial�and�Semipartial�Correlations�� 658
18�2� Multiple�Linear�Regression�� 661
18�3� Methods�of�Entering�Predictors�� 676
18�4� Nonlinear�Relationships�� 679
18�5� Interactions�� 680
18�6� Categorical�Predictors�� 680
18�7� SPSS�� 682
18�8� G*Power�� 698
18�9� Template�and�APA-Style�Write-Up�� 701
18�10� Summary�� 703
Problems�� 704
19. Logistic Regression�� 709
19�1� �How�Logistic�Regression�Works�� 710
19�2� �Logistic�Regression�Equation�� 711
19�3� �Estimation�and�Model�Fit�� 715
19�4� �Significance�Tests�� 716
19�5� �Assumptions�and�Conditions�� 721
19�6� �Effect�Size�� 725
19�7� �Methods�of�Predictor�Entry�� 726
19�8� �SPSS�� 727
19�9� �G*Power�� 746
19�10� �Template�and�APA-Style�Write-Up�� 749
19�11� �What�Is�Next?�� 751
19�12� �Summary�� 752
Problems�� 752
Appendix: Tables�� 757
References�� 783
Odd-Numbered Answers to Problems�� 793
Author Index�� 809
Subject Index�� 813

xiii
Preface
Approach
We�know,�we�know!�We’ve�heard�it�a�million�times�before��When�you�hear�someone�at�a�
party�mention�the�word�statistics�or�statistician,�you�probably�say�“I�hate�statistics”�and�turn�
the�other�cheek��In�the�many�years�that�we�have�been�in�the�field�of�statistics,�it�is�extremely�
rare� when� someone� did� not� have� that� reaction�� Enough� is� enough�� With� the� help� of� this�
text,�we�hope�that�“statistics�hating”�will�become�a�distant�figment�of�your�imagination�
As�the�title�suggests,�this�text�is�designed�for�a�course�in�statistics�for�students�in�educa-
tion� and� the� behavioral� sciences�� We� begin� with� the� most� basic� introduction� to�statistics�
in� the� first� chapter� and� proceed� through� intermediate� statistics�� The� text� is� designed� for�
you�to�become�a�better�prepared�researcher�and�a�more�intelligent�consumer�of�research��
We�do�not�assume�that�you�have�extensive�or�recent�training�in�mathematics��Many�of�you�
have�only�had�algebra,�perhaps�some�time�ago��We�also�do�not�assume�that�you�have�ever�
had�a�statistics�course��Rest�assured;�you�will�do�fine�
We�believe�that�a�text�should�serve�as�an�effective�instructional�tool��You�should�find�this�
text�to�be�more�than�a�reference�book;�you�might�actually�use�it�to�learn�statistics��(What�an�
oxymoron�that�a�statistics�book�can�actually�teach�you�something�)�This�text�is�not�a�theo-
retical�statistics�book,�nor�is�it�a�cookbook�on�computing�statistics�or�a�statistical�software�
manual�� Recipes� have� to� be� memorized;� consequently,� you� tend� not� to� understand� how�
or�why�you�obtain�the�desired�product��As�well,�knowing�how�to�run�a�statistics�package�
without� understanding� the� concepts� or� the� output� is� not� particularly� useful�� Thus,� con-
cepts�drive�the�field�of�statistics�
Goals and Content Coverage
Our�goals�for�this�text�are�lofty,�but�the�effort�and�its�effects�will�be�worthwhile��First,�the�
text�provides�a�comprehensive�coverage�of�topics�that�could�be�included�in�an�undergradu-
ate�or�graduate�one-�or�two-course�sequence�in�statistics��The�text�is�flexible�enough�so�that�
instructors�can�select�those�topics�that�they�desire�to�cover�as�they�deem�relevant�in�their�
particular�discipline��In�other�words,�chapters�and�sections�of�chapters�from�this�text�can�
be�included�in�a�statistics�course�as�the�instructor�sees�fit��Most�of�the�popular�as�well�as�
many�of�the�lesser-known�procedures�and�models�are�described�in�the�text��A�particular�
feature�is�a�thorough�and�up-to-date�discussion�of�assumptions,�the�effects�of�their�viola-
tion,�and�how�to�deal�with�their�violation�
The�first�five�chapters�of�the�text�cover�basic�descriptive�statistics,�including�ways�of�repre-
senting�data�graphically,�statistical�measures�which�describe�a�set�of�data,�the�normal�distri-
bution�and�other�types�of�standard�scores,�and�an�introduction�to�probability�and�sampling��

xiv Preface
The�remainder�of�the�text�covers�different�inferential�statistics��In�Chapters�6�through�10,�we�
deal�with�different�inferential�tests�involving�means�(e�g�,�t�tests),�proportions,�variances,�and�
correlations��In�Chapters�11�through�16,�all�of�the�basic�analysis�of�variance�(ANOVA)�models�
are�considered��Finally,�in�Chapters�17�through�19�we�examine�various�regression�models�
Second,�the�text�communicates�a�conceptual,�intuitive�understanding�of�statistics,�which�
requires� only� a� rudimentary� knowledge� of� basic� algebra� and� emphasizes� the� important�
concepts�in�statistics��The�most�effective�way�to�learn�statistics�is�through�the�conceptual�
approach��Statistical�concepts�tend�to�be�easy�to�learn�because�(a)�concepts�can�be�simply�
stated,�(b)�concepts�can�be�made�relevant�through�the�use�of�real-life�examples,�(c)�the�same�
concepts�are�shared�by�many�procedures,�and�(d)�concepts�can�be�related�to�one�another�
This�text�will�help�you�to�reach�these�goals��The�following�indicators�will�provide�some�
feedback�as�to�how�you�are�doing��First,�there�will�be�a�noticeable�change�in�your�attitude�
toward� statistics�� Thus,� one� outcome� is� for� you� to� feel� that� “statistics� is� not� half� bad,”� or�
“this� stuff� is� OK�”� Second,� you� will� feel� comfortable� using� statistics� in� your� own� work��
Finally,�you�will�begin�to�“see�the�light�”�You�will�know�when�you�have�reached�this�high-
est�stage�of�statistics�development�when�suddenly,�in�the�middle�of�the�night,�you�wake�up�
from�a�dream�and�say,�“now�I�get�it!”�In�other�words,�you�will�begin�to�think�statistics�rather�
than�think�of�ways�to�get�out�of�doing�statistics�
Pedagogical Tools
The�text�contains�several�important�pedagogical�features�to�allow�you�to�attain�these�goals��
First,�each�chapter�begins�with�an�outline�(so�you�can�anticipate�what�will�be�covered),�and�
a�list�of�key�concepts�(which�you�will�need�in�order�to�understand�what�you�are�doing)��
Second,�realistic�examples�from�education�and�the�behavioral�sciences�are�used�to�illustrate�
the�concepts�and�procedures�covered�in�each�chapter��Each�of�these�examples�includes�an�
initial� vignette,� an� examination� of� the� relevant� procedures� and� necessary� assumptions,�
how�to�run�SPSS�and�develop�an�APA�style�write-up,�as�well�as�tables,�figures,�and�anno-
tated�SPSS�output�to�assist�you��Third,�the�text�is�based�on�the�conceptual�approach��That�
is,�material�is�covered�so�that�you�obtain�a�good�understanding�of�statistical�concepts��If�
you� know� the� concepts,� then� you� know� statistics�� Finally,� each� chapter� ends� with� three�
sets�of�problems,�computational,�conceptual,�and�interpretive��Pay�particular�attention�to�
the� conceptual� problems� as� they� provide� the� best� assessment� of� your� understanding� of�
the�concepts�in�the�chapter��We�strongly�suggest�using�the�example�data�sets�and�the�com-
putational� and� interpretive� problems� for� additional� practice� through� available� statistics�
software��This�will�serve�to�reinforce�the�concepts�covered��Answers�to�the�odd-numbered�
problems�are�given�at�the�end�of�the�text�
New to This Edition
A� number� of� changes� have� been� made� in� the� third� edition� based� on� the� suggestions�
of�reviewers,�instructors,�teaching�assistants,�and�students��These�improvements�have�
been� made� in� order� to� better� achieve� the� goals� of� the� text�� You� will� note� the� addition�
of� a� coauthor� to� this� edition,� Debbie� Hahs-Vaughn,� who� has� contributed� greatly� to�

xvPreface
the�further�development�of�this�text��The�changes�include�the�following:�(a)�additional�
end�of�chapter�problems�have�been�included;�(b)�more�information�on�power�has�been�
added,�particularly�use�of�the�G*Power�software�with�screenshots;�(c)�content�has�been�
updated�and�numerous�additional�references�have�been�provided;�(d)�the�final�chapter�
on� logistic� regression� has� been� added� for� a� more� complete� presentation� of� regression�
models;�(e)�numerous�SPSS�(version�19)�screenshots�on�statistical�techniques�and�their�
assumptions�have�been�included�to�assist�in�the�generation�and�interpretation�of�output;�
(f)�more�information�has�been�added�to�most�chapters�on�SPSS;�(g)�research�vignettes�
and�templates�have�been�added�to�the�beginning�and�end�of�each�chapter,�respectively;�
(h)�a�discussion�of�expected�mean�squares�has�been�folded�into�the�analysis�of�variance�
chapters�to�provide�a�rationale�for�the�formation�of�proper�F�ratios;�and�(i)�a�website�for�
the�text�that�provides�students�and�instructors�access�to�detailed�solutions�to�the�book’s�
odd-numbered�problems;�chapter�outlines;�lists�of�key�terms�for�each�chapter;�and�SPSS�
datasets� that� correspond� to� the� chapter� examples� and� end-of-chapter� problems� that�
can�be�used�in�SPSS�and�other�packages�such�as�SAS,�HLM,�STATA,�and�LISREL��Only�
instructors� are� granted� access� to� the� PowerPoint� slides� for� each� chapter� that� include�
examples� and� APA� style� write� ups,� chapter� outlines,� and� key� terms;� multiple-choice�
(approximately�25�for�each�chapter)�and�short�answer�(approximately�5�for�each�chapter)�
test� questions;� and� answers� to� the� even-numbered� problems�� This� material� is� available� at:�
http://www�psypress�com/an-introduction-to-statistical-concepts-9780415880053�

http://www.psypress.com/an-introduction-to-statistical-concepts-9780415880053.

xvii
Acknowledgments
There� are� many� individuals� whose� assistance� enabled� the� completion� of� this� book�� We�
would� like� to� thank� the� following� individuals� whom� we� studied� with� in� school:� Jamie�
Algina,� Lloyd� Bond,� Amy� Broeseker,� Jim� Carlson,� Bill� Cooley,� Judy� Giesen,� Brian� Gray,�
Harry�Hsu,�Mary�Nell�McNeese,�Camille�Ogden,�Lou�Pingel,�Rod�Roth,�Charles�Stegman,�
and�Neil�Timm��Next,�numerous�colleagues�have�played�an�important�role�in�our�personal�
and� professional� lives� as� statisticians�� Rather� than� include� an� admittedly� incomplete�
listing,�we�just�say�“thank�you”�to�all�of�you��You�know�who�you�are�
Thanks� also� to� all� of� the� wonderful� people� at� Lawrence� Erlbaum� Associates� (LEA),� in�
particular,�to�Ray�O’Connell�for�inspiring�this�project�back�in�1986,�and�to�Debra�Riegert�
(formerly� at� LEA� and� now� at� Routledge)� for� supporting� the� development� of� subsequent�
texts� and� editions�� We� are� most� appreciative� of� the� insightful� suggestions� provided� by�
the� reviewers� of� this� text� over� the� years,� and� in� particular� the� reviewers� of� this� edition:�
Robert�P��Conti,�Sr��(Mount�Saint�Mary�College),�Feifei�Ye�(University�of�Pittsburgh),�Nan�
Thornton� (Capella� University),� and� one� anonymous� reviewer�� A� special� thank� you� to�
all�of�the�terrific�students�that�we�have�had�the�pleasure�of�teaching�at�the�University�of�
Pittsburgh,�the�University�of�Illinois–Chicago,�Louisiana�State�University,�Boston�College,�
Northern� Illinois� University,� the� University� of� Alabama,� The� Ohio� State� University,� and�
the� University� of� Central� Florida�� For� all� of� your� efforts,� and� the� many� lights� that� you�
have�seen�and�shared�with�us,�this�book�is�for�you��We�are�most�grateful�to�our�families,�
in�particular�to�Lea�and�Kristen,�and�to�Mark�and�Malani��It�is�because�of�your�love�and�
understanding�that�we�were�able�to�cope�with�such�a�major�project��Thank�you�one�and�all�
Richard G. Lomax
Debbie L. Hahs-Vaughn

1
1
Introduction
Chapter Outline
1�1� What�Is�the�Value�of�Statistics?
1�2� Brief�Introduction�to�History�of�Statistics
1�3� General�Statistical�Definitions
1�4� Types�of�Variables
1�5� Scales�of�Measurement
1�5�1� Nominal�Measurement�Scale
1�5�2� Ordinal�Measurement�Scale
1�5�3� Interval�Measurement�Scale
1�5�4� Ratio�Measurement�Scale
Key Concepts
� 1�� General�statistical�concepts
Population
Parameter
Sample
Statistic
Descriptive�statistics
Inferential�statistics
� 2�� Variable-related�concepts
Variable
Constant
Categorical
Dichotomous�variables
Numerical
Discrete�variables
Continuous�variables

2 An Introduction to Statistical Concepts
� 3�� Measurement�scale�concepts
Measurement
Nominal
Ordinal
Interval
Ratio
We�want�to�welcome�you�to�the�wonderful�world�of�statistics��More�than�ever,�statistics�are�
everywhere��Listen�to�the�weather�report�and�you�hear�about�the�measurement�of�variables�
such�as�temperature,�rainfall,�barometric�pressure,�and�humidity��Watch�a�sporting�event�
and�you�hear�about�batting�averages,�percentage�of�free�throws�completed,�and�total�rush-
ing�yardage��Read�the�financial�page�and�you�can�track�the�Dow�Jones�average,�the�gross�
national�product,�and�bank�interest�rates��Turn�to�the�entertainment�section�to�see�movie�
ratings,�movie�revenue,�or�the�top�10�best-selling�novels��These�are�just�a�few�examples�of�
statistics�that�surround�you�in�every�aspect�of�your�life�
Although�you�may�be�thinking�that�statistics�is�not�the�most�enjoyable�subject�on�the�planet,�
by�the�end�of�this�text,�you�will�(a)�have�a�more�positive�attitude�about�statistics,�(b)�feel�more�
comfortable�using�statistics,�and�thus�be�more�likely�to�perform�your�own�quantitative�data�
analyses,�and�(c)�certainly�know�much�more�about�statistics�than�you�do�now��In�other�words,�
our�goal�is�to�equip�you�with�the�skills�you�need�to�be�both�a�better�consumer�and�producer�of�
research��But�be�forewarned;�the�road�to�statistical�independence�is�not�easy��However,�we�will�
serve�as�your�guides�along�the�way��When�the�going�gets�tough,�we�will�be�there�to�help�you�
with�advice�and�numerous�examples�and�problems��Using�the�powers�of�logic,�mathematical�
reasoning,�and�statistical�concept�knowledge,�we�will�help�you�arrive�at�an�appropriate�solu-
tion�to�the�statistical�problem�at�hand�
Some�students�begin�their�first�statistics�class�with�some�anxiety��This�could�be�caused�
by�not�having�had�a�quantitative�course�for�some�time,�apprehension�built�up�by�delaying�
taking�statistics,�a�poor�past�instructor�or�course,�or�less�than�adequate�past�success��Let�
us�offer�a�few�suggestions�along�these�lines��First,�this�is�not�a�math�class�or�text��If�you�
want�one�of�those,�then�you�need�to�walk�over�to�the�math�department��This�is�a�course�
and�text�on�the�application�of�statistics�to�education�and�the�behavioral�sciences��Second,�
the�philosophy�of�the�text�is�on�the�understanding�of�concepts�rather�than�on�the�deriva-
tion�of�statistical�formulas��It�is�more�important�to�understand�concepts�than�to�derive�or�
memorize�various�and�sundry�formulas��If�you�understand�the�concepts,�you�can�always�
look�up�the�formulas�if�need�be��If�you�do�not�understand�the�concepts,�then�knowing�the�
formulas�will�only�allow�you�to�operate�in�a�cookbook�mode�without�really�understanding�
what� you� are� doing�� Third,� the� calculator� and� computer� are� your� friends�� These� devices�
are�tools�that�allow�you�to�complete�the�necessary�computations�and�obtain�the�results�of�
interest��If�you�are�performing�hand�computations,�find�a�calculator�that�you�are�comfort-
able�with;�it�need�not�have�800�functions,�as�the�four�basic�operations�and�sum�and�square�
root� functions� are� sufficient� (one� of� our� personal� calculators� is� one� of� those� little� credit�
card�calculators,�although�we�often�use�the�calculator�on�our�computers)��If�you�are�using�
a� statistical� software� program,� find� one� that� you� are� comfortable� with� (most� instructors�
will�have�you�using�a�program�such�as�SPSS,�SAS,�or�Statistica)��In�this�text,�we�use�SPSS�
to�illustrate�statistical�applications��Finally,�this�text�will�take�you�from�raw�data�to�results�
using�realistic�examples��These�can�then�be�followed�up�using�the�problems�at�the�end�of�
each�chapter��Thus,�you�will�not�be�on�your�own�but�will�have�the�text,�a�computer/calculator,�
as�well�as�your�course�and�instructor,�to�help�guide�you�

3Introduction
The�intent�and�philosophy�of�this�text�is�to�be�conceptual�and�intuitive�in�nature��Thus,�the�
text�does�not�require�a�high�level�of�mathematics�but�rather�emphasizes�the�important�con-
cepts�in�statistics��Most�statistical�concepts�really�are�fairly�easy�to�learn�because�(a)�concepts�
can�be�simply�stated,�(b)�concepts�can�be�related�to�real-life�examples,�(c)�many�of�the�same�
concepts�run�through�much�of�statistics,�and�therefore,�(d)�many�concepts�can�be�related�
In� this� introductory� chapter,� we� describe� the� most� basic� statistical� concepts�� We� begin�
with� the� question,� “What� is� the� value� of� statistics?”� We� then� look� at� a� brief� history� of�
statistics� by� mentioning� a� few� of� the� more� important� and� interesting� statisticians�� Then�
we�consider�the�concepts�of�population,�parameter,�sample�and�statistic,�descriptive�and�
inferential�statistics,�types�of�variables,�and�scales�of�measurement��Our�objectives�are�that�
by�the�end�of�this�chapter,�you�will�(a)�have�a�better�sense�of�why�statistics�are�necessary,�
(b)�see�that�statisticians�are�an�interesting�group�of�people,�and�(c)�have�an�understanding�
of�several�basic�statistical�concepts�
1.1 What Is the Value of Statistics?
Let�us�start�off�with�a�reasonable�rhetorical�question:�why�do�we�need�statistics?�In�other�
words,�what�is�the�value�of�statistics,�either�in�your�research�or�in�your�everyday�life?�As�a�
way�of�thinking�about�these�questions,�consider�the�following�headlines,�which�have�prob-
ably�appeared�in�your�local�newspaper�
Cigarette Smoking Causes Cancer—Tobacco Industry Denies Charges
A� study� conducted� at� Ivy-Covered� University� Medical� School,� recently� published� in�
the� New England Journal of Medicine,� has� definitively� shown� that� cigarette� smoking�
causes�cancer��In�interviews�with�100�randomly�selected�smokers�and�nonsmokers�over�
50 years�of�age,�30%�of�the�smokers�have�developed�some�form�of�cancer,�while�only�
10%� of� the� nonsmokers� have� cancer�� “The� higher� percentage� of� smokers� with� cancer�
in� our� study� clearly� indicates� that� cigarettes� cause� cancer,”� said� Dr�� Jason� P�� Smythe��
On� the� contrary,� “this� study� doesn’t� even� suggest� that� cigarettes� cause� cancer,”� said�
tobacco�lobbyist�Cecil�B��Hacker��“Who�knows�how�these�folks�got�cancer;�maybe�it�is�
caused�by�the�aging�process�or�by�the�method�in�which�individuals�were�selected�for�
the�interviews,”�Mr��Hacker�went�on�to�say�
North Carolina Congressional Districts
Gerrymandered—African-Americans Slighted
A�study�conducted�at�the�National�Center�for�Legal�Research�indicates�that�congressio-
nal�districts�in�the�state�of�North�Carolina�have�been�gerrymandered�to�minimize�the�
impact�of�the�African-American�vote��“From�our�research,�it�is�clear�that�the�districts�
are�apportioned�in�a�racially�biased�fashion��Otherwise,�how�could�there�be�no�single�
district� in� the� entire� state� which� has� a� majority� of� African-American� citizens� when�
over� 50%� of� the� state’s� population� is� African-American�� The� districting� system� abso-
lutely�has�to�be�changed,”�said�Dr��I��M��Researcher��A�spokesman�for�The�American�
Bar�Association�countered�with�the�statement�“according�to�a�decision�rendered�by�the�
United�States�Supreme�Court�in�1999�(No��98-85),�intent�or�motive�must�be�shown�for�
racial�bias�to�be�shown�in�the�creation�of�congressional�districts��The�decision�states�a�

4 An Introduction to Statistical Concepts
‘facially�neutral�law�…�warrants�strict�scrutiny�only�if�it�can�be�proved�that�the�law�was�
motivated�by�a�racial�purpose�or�object�’�The�data�in�this�study�do�not�show�intent�or�
motive��To�imply�that�these�data�indicate�racial�bias�is�preposterous�”
Global Warming—Myth According to the President
Research�conducted�at�the�National�Center�for�Global�Warming�(NCGW)�has�shown�
the�negative�consequences�of�global�warming�on�the�planet�Earth��As�summarized�by�
Dr��Noble�Pryze,�“our�studies�at�NCGW�clearly�demonstrate�that�if�global�warming�is�
not�halted�in�the�next�20�years,�the�effects�on�all�aspects�of�our�environment�and�cli-
matology�will�be�catastrophic�”�A�different�view�is�held�by�U�S��President�Harold�W��
Tree��He�stated�in�a�recent�address�that�“the�scientific�community�has�not�convinced�
him�that�global�warming�even�exists��Why�should�our�administration�spend�millions�
of�dollars�on�an�issue�that�has�not�been�shown�to�be�a�real�concern?”
How� is� one� to� make� sense� of� the� studies� described� by� these� headlines?� How� is� one� to�
decide�which� side�of�the�issue�these�data�support,�so�as�to�take�an�intellectual� stand?�In�
other�words,�do�the�interview�data�clearly�indicate�that�cigarette�smoking�causes�cancer?�
Do� the� congressional� district� percentages� of� African-Americans� necessarily� imply� that�
there�is�racial�bias?�Have�scientists�convinced�us�that�global�warming�is�a�problem?�These�
studies�are�examples�of�situations�where�the�appropriate�use�of�statistics�is�clearly�neces-
sary��Statistics�will�provide�us�with�an�intellectually�acceptable�method�for�making�deci-
sions�in�such�matters��For�instance,�a�certain�type�of�research,�statistical�analysis,�and�set�
of� results� are� all� necessary� to� make� causal� inferences� about� cigarette� smoking�� Another�
type�of�research,�statistical�analysis,�and�set�of�results�are�all�necessary�to�lead�one�to�con-
fidently�state�that�the�districting�system�is�racially�biased�or�not,�or�that�global�warming�
needs�to�be�dealt�with��The�bottom�line�is�that�the�purpose�of�statistics,�and�thus�of�this�
text,�is�to�provide�you�with�the�tools�to�make�important�decisions�in�an�appropriate�and�
confident�manner��You�will�not�have�to�trust�a�statement�made�by�some�so-called�expert�on�
an�issue,�which�may�or�may�not�have�any�empirical�basis�or�validity;�you�can�make�your�
own�judgments�based�on�the�statistical�analyses�of�data��For�you,�the�value�of�statistics�can�
include�(a)�the�ability�to�read�and�critique�articles�in�both�professional�journals�and�in�the�
popular� press� and� (b)� the� ability� to� conduct� statistical� analyses� for� your� own� research�
(e�g�,�thesis�or�dissertation)�
1.2 Brief Introduction to History of Statistics
As�a�way�of�getting�to�know�the�topic�of�statistics,�we�want�to�briefly�introduce�you�to�a�
few� famous� statisticians�� The� purpose� of� this� section� is� not� to� provide� a� comprehensive�
history�of�statistics,�as�those�already�exist�(e�g�,�Heyde,�Seneta,�Crepel,�Fienberg,�&�Gani,�
2001;�Pearson,�1978;�Stigler,�1986)��Rather,�the�purpose�of�this�section�is�to�show�that�famous�
statisticians�not�only�are�interesting�but�are�human�beings�just�like�you�and�me�
One� of� the� fathers� of� probability� (see� Chapter� 5)� is� acknowledged� to� be� Blaise� Pascal�
from� the� late� 1600s�� One� of� Pascal’s� contributions� was� that� he� worked� out� the� probabili-
ties� for� each� dice� roll� in� the� game� of� craps,� enabling� his� friend,� a� member� of� royalty,� to�
become�a�consistent�winner��He�also�developed�Pascal’s�triangle�which�you�may�remember�

5Introduction
from�your�early�mathematics�education��The�statistical�development�of�the�normal�or�bell-
shaped�curve�(see�Chapter�4)�is�interesting��For�many�years,�this�development�was�attrib-
uted�to�Karl�Friedrich�Gauss�(early�1800s)�and�was�actually�known�for�some�time�as�the�
Gaussian� curve�� Later� historians� found� that� Abraham� DeMoivre� actually� developed� the�
normal�curve�in�the�1730s��As�statistics�was�not�thought�of�as�a�true�academic�discipline�
until� the� late� 1800s,� people� like� Pascal� and� DeMoivre� were� consulted� by� the� wealthy� on�
odds�about�games�of�chance�and�by�insurance�underwriters�to�determine�mortality�rates�
Karl� Pearson� is� one� of� the� most� famous� statisticians� to� date� (late� 1800s� to� early� 1900s)��
Among�his�many�accomplishments�is�the�Pearson�product–moment�correlation�coefficient�
still�in�use�today�(see�Chapter�10)��You�may�know�of�Florence�Nightingale�(1820–1910)�as�an�
important�figure�in�the�field�of�nursing��However,�you�may�not�know�of�her�importance�in�
the�field�of�statistics��Nightingale�believed�that�statistics�and�theology�were�linked�and�that�
by�studying�statistics�we�might�come�to�understand�God’s�laws�
A�quite�interesting�statistical�personality�is�William�Sealy�Gossett,�who�was�employed�
by� the� Guinness� Brewery� in� Ireland�� The� brewery� wanted� to� select� a� sample� of� people�
from�Dublin�in�1906�for�purposes�of�taste�testing��Gossett�was�asked�how�large�a�sample�
was�needed�in�order�to�make�an�accurate�inference�about�the�entire�population�(see�next�
section)�� The� brewery� would� not� let� Gossett� publish� any� of� his� findings� under� his� own�
name,� so� he� used� the� pseudonym� of� Student�� Today,� the� t� distribution� is� still� known� as�
Student’s�t�distribution��Sir�Ronald�A��Fisher�is�another�of�the�most�famous�statisticians�of�
all�time��Working�in�the�early�1900s,�Fisher�introduced�the�analysis�of�variance�(ANOVA)�
(see�Chapters�11�through�16)�and�Fisher’s�z�transformation�for�correlations�(see�Chapter�10)��
In� fact,� the� major� statistic� in� the� ANOVA� is� referred� to� as� the� F� ratio� in� honor� of� Fisher��
These� individuals� represent� only� a� fraction� of� the� many� famous� and� interesting� statisti-
cians�over�the�years��For�further�information�about�these�and�other�statisticians,�we�sug-
gest�you�consult�references�such�as�Pearson�(1978),�Stigler�(1986),�and�Heyde�et�al��(2001),�
which�consist�of�many�interesting�stories�about�statisticians�
1.3 General Statistical Definitions
In�this�section,�we�define�some�of�the�most�basic�concepts�in�statistics��Included�here�are�
definitions�and�examples�of�the�following�concepts:�population,�parameter,�sample,�statis-
tic,�descriptive�statistics,�and�inferential�statistics�
The�first�four�concepts�are�tied�together,�so�we�discuss�them�together��A�population�is�
defined�as�consisting�of�all�members�of�a�well-defined�group��A�population�may�be�large�
in�scope,�such�as�when�a�population�is�defined�as�all�of�the�employees�of�IBM�worldwide��
A� population� may� be� small� in� scope,� such� as� when� a� population� is� defined� as� all� of� the�
IBM� employees� at� the� building� on� Main� Street� in� Atlanta�� Thus,� a� population� could� be�
large�or�small�in�scope��The�key�is�that�the�population�is�well�defined�such�that�one�could�
determine�specifically�who�all�of�the�members�of�the�group�are�and�then�information�or�
data�could�be�collected�from�all�such�members��Thus,�if�our� population�is�defined�as�all�
members�working�in�a�particular�office�building,�then�our�study�would�consist�of�collect-
ing�data�from�all�employees�in�that�building��It�is�also�important�to�remember�that�you,�the�
researcher,�define�the�population�
A� parameter� is� defined� as� a� characteristic� of� a� population�� For� instance,� parameters�
of� our� office� building� example� might� be� the� number� of� individuals� who� work� in� that�

6 An Introduction to Statistical Concepts
building�(e�g�,�154),�the�average�salary�of�those�individuals�(e�g�,�$49,569),�and�the�range�of�
ages�of�those�individuals�(e�g�,�21–68�years�of�age)��When�we�think�about�characteristics�of�
a�population,�we�are�thinking�about�population parameters��Those�two�terms�are�often�
linked�together�
A� sample� is� defined� as� consisting� of� a� subset� of� a� population�� A� sample� may� be� large�
in�scope,�such�as�when�a�population�is�defined�as�all�of�the�employees�of�IBM�worldwide�
and�20%�of�those�individuals�are�included�in�the�sample��A�sample�may�be�small�in�scope,�
such�as�when�a�population�is�defined�as�all�of�the�IBM�employees�at�the�building�on�Main�
Street�in�Atlanta�and�10%�of�those�individuals�are�included�in�the�sample��Thus,�a�sample�
could�be�large�or�small�in�scope�and�consist�of�any�portion�of�the�population��The�key�is�
that� the� sample� consists� of� some,� but� not� all,� of� the� members� of� the� population;� that� is,�
anywhere�from�one�individual�to�all�but�one�individual�from�the�population�is�included�in�
the�sample��Thus,�if�our�population�is�defined�as�all�members�working�in�the�IBM�building�
on�Main�Street�in�Atlanta,�then�our�study�would�consist�of�collecting�data�from�a�sample�
of�some�of�the�employees�in�that�building��It�follows�that�if�we,�the�researcher,�define�the�
population,�then�we�also�determine�what�the�sample�will�be�
A�statistic�is�defined�as�a�characteristic�of�a�sample��For�instance,�statistics�of�our�office�
building�example�might�be�the�number�of�individuals�who�work�in�the�building�that�we�
sampled�(e�g�,�77),�the�average�salary�of�those�individuals�(e�g�,�$54,090),�and�the�range�of�
ages� of� those� individuals� (e�g�,� 25–62� years� of� age)�� Notice� that� the� statistics� of� a� sample�
need�not�be�equal�to�the�parameters�of�a�population�(more�about�this�in�Chapter�5)��When�
we�think�about�characteristics�of�a�sample,�we�are�thinking�about�sample statistics��Those�
two� terms� are� often� linked� together�� Thus,� we� have� population� parameters� and� sample�
statistics,� but� no� other� combinations� of� those� terms� exist�� The� field� has� become� known�
as�statistics�simply�because�we�are�almost�always�dealing�with�sample�statistics�because�
population�data�are�rarely�obtained�
The�final�two�concepts�are�also�tied�together�and�thus�considered�together��The�field�of�
statistics�is�generally�divided�into�two�types�of�statistics,�descriptive�statistics�and�inferen-
tial�statistics��Descriptive statistics�are�defined�as�techniques�which�allow�us�to�tabulate,�
summarize,�and�depict�a�collection�of�data�in�an�abbreviated�fashion��In�other�words,�the�
purpose�of�descriptive�statistics�is�to�allow�us�to�talk�about�(or�describe)�a�collection�of�data�
without�having�to�look�at�the�entire�collection��For�example,�say�we�have�just�collected�a�
set�of�data�from�100,000�graduate�students�on�various�characteristics�(e�g�,�height,�weight,�
gender,�grade�point�average,�aptitude�test�scores)��If�you�were�to�ask�us�about�the�data,�we�
could�do�one�of�two�things��On�the�one�hand,�we�could�carry�around�the�entire�collection�
of�data�everywhere�we�go,�and�when�someone�asks�us�about�the�data,�simply�say�“Here�is�
the�data;�take�a�look�at�them�yourself�”�On�the�other�hand,�we�could�summarize�the�data�
in�an�abbreviated�fashion,�and�when�someone�asks�us�about�the�data,�simply�say�“Here�is�
a�table�and�a�graph�about�the�data;�they�summarize�the�entire�collection�”�So,�rather�than�
viewing�100,000�sheets�of�paper,�perhaps�we�would�only�have�to�view�two�sheets�of�paper��
Since� statistics� is� largely� a� system� of� communicating� information,� descriptive� statistics�
are�considerably�more�useful�to�a�consumer�than�an�entire�collection�of�data��Descriptive�
statistics�are�discussed�in�Chapters�2�through�4�
Inferential statistics�are�defined�as�techniques�which�allow�us�to�employ�inductive�rea-
soning�to�infer�the�properties�of�an�entire�group�or�collection�of�individuals,�a�population,�
from�a�small�number�of�those�individuals,�a�sample��In�other�words,�the�purpose�of�infer-
ential�statistics�is�to�allow�us�to�collect�data�from�a�sample�of�individuals�and�then�infer�the�
properties�of�that�sample�back�to�the�population�of�individuals��In�case�you�have�forgotten�
about�logic,�inductive�reasoning�is�where�you�infer�from�the�specific�(here�the�sample)�to�

7Introduction
the�general�(here�the�population)��For�example,�say�we�have�just�collected�a�set�of�sample�
data�from�5,000�of�the�population�of�100,000�graduate�students�on�various�characteristics�
(e�g�,�height,�weight,�gender,�grade�point�average,�aptitude�test�scores)��If�you�were�to�ask�
us�about�the�data,�we�could�compute�various�sample�statistics�and�then�infer�with�some�
confidence�that�these�would�be�similar�to�the�population�parameters��In�other�words,�this�
allows� us� to� collect� data� from� a� subset� of� the� population� yet� still� make� inferential� state-
ments�about�the�population�without�collecting�data�from�the�entire�population��So,�rather�
than�collecting�data�from�all�100,000�graduate�students�in�the�population,�we�could�collect�
data�on�a�sample�of�say�5,000�students�
As�another�example,�Gossett�(aka�Student)�was�asked�to�conduct�a�taste�test�of�Guinness�
beer� for� a� sample� of� Dublin� residents�� Because� the� brewery� could� not� afford� to� do� this�
with�the�entire�population�of�Dublin,�Gossett�collected�data�from�a�sample�of�Dublin�resi-
dents�and�was�able�to�make�an�inference�from�these�sample�results�back�to�the�population��
A discussion�of�inferential�statistics�begins�in�Chapter�5��In�summary,�the�field�of�statistics�
is�roughly�divided�into�descriptive�statistics�and�inferential�statistics��Note,�however,�that�
many�further�distinctions�are�made�among�the�types�of�statistics,�but�more�about�that�later�
1.4 Types of Variables
There�are�several�terms�we�need�to�define�about�variables��First,�it�might�be�useful�to�define�
the�term�variable��A�variable�is�defined�as�any�characteristic�of�persons�or�things�that�is�
observed�to�take�on�different�values��In�other�words,�the�values�for�a�particular�character-
istic�vary�across�the�individuals�observed��For�example,�the�annual�salary�of�the�families�
in�your�neighborhood�varies�because�not�every�family�earns�the�same�annual�salary��One�
family�might�earn�$50,000�while�the�family�right�next�door�might�earn�$65,000��Thus,�the�
annual�family�salary�is�a�variable�because�it�varies�across�families�
In� contrast,� a� constant� is� defined� as� any� characteristic� of� persons� or� things� that� is�
observed�to�take�on�only�a�single�value��In�other�words,�the�values�for�a�particular�char-
acteristic� are� the� same� for� all� individuals� observed�� For� example,� say� every� family� in�
your� neighborhood� has� a� lawn�� Although� the� nature� of� the� lawns� may� vary,� everyone�
has�a�lawn��Thus,�whether�a�family�has�a�lawn�in�your�neighborhood�is�a�constant�and�
therefore�would�not�be�a�very�interesting�characteristic�to�study��When�designing�a�study,�
you�(i�e�,�the�researcher)�can�determine�what�is�a�constant��This�is�part�of�the�process�of�
delimiting,�or�narrowing�the�scope�of,�your�study��As�an�example,�you�may�be�interested�
in� studying� career� paths� of� girls� who� complete� AP� science� courses�� In� designing� your�
study,�you�are�only�interested�in�girls,�and�thus,�gender�would�be�a�constant��This�is�not�
to� say� that� the� researcher� wholly� determines� when� a� characteristic� is� a� constant�� It� is�
sometimes�the�case�that�we�find�that�a�characteristic�is�a�constant�after�we�conduct�the�
study�� In� other� words,� one� of� the� measures� has� no� variation—everyone� or� everything�
scored�or�remained�the�same�on�that�particular�characteristic�
There�are�different�typologies�for�describing�variables��One�typology�is�categorical�(or�
qualitative)� versus� numerical� (or� quantitative),� and� within� numerical,� discrete,� and� con-
tinuous��A�categorical�variable�is�a�qualitative�variable�that�describes�categories�of�a�char-
acteristic�or�attribute��Examples�of�categorical�variables�include�political�party�affiliation�
(Republican�=�1,�Democrat�=�2,�Independent�=�3),�religious�affiliation�(e�g�,�Methodist�=�1,�
Baptist�=�2,�Roman�Catholic�=�3),�and�course�letter�grade�(A�=�4,�B�=�3,�C�=�2,�D�=�1,�F�=�0)��

8 An Introduction to Statistical Concepts
A�dichotomous variable�is�a�special,�restricted�type�of�categorical�variable�and�is�defined�
as� a� variable� that� can� take� on� only� one� of� two� values�� For� example,� biologically� deter-
mined�gender�is�a�variable�that�can�only�take�on�the�values�of�male�or�female�and�is�often�
coded�numerically�as�0�(e�g�,�for�males)�or�1�(e�g�,�for�females)��Other�dichotomous�variables�
include�pass/fail,�true/false,�living/dead,�and�smoker/nonsmoker��Dichotomous�variables�
will�take�on�special�importance�as�we�study�binary�logistic�regression�(Chapter�19)�
A�numerical�variable�is�a�quantitative�variable��Numerical�variables�can�further�be�clas-
sified�as�either�discrete�or�continuous��A�discrete variable�is�defined�as�a�variable�that�can�
only�take�on�certain�values��For�example,�the�number�of�children�in�a�family�can�only�take�on�
certain�values��Many�values�are�not�possible,�such�as�negative�values�(e�g�,�the�Joneses�cannot�
have�−2�children)�or�decimal�values�(e�g�,�the�Smiths�cannot�have�2�2�children)��In�contrast,�
a�continuous variable�is�defined�as�a�variable�that�can�take�on�any�value�within�a�certain�
range�given�a�precise�enough�measurement�instrument��For�example,�the�distance�between�
two� cities� can� be� measured� in� miles,� with� miles� estimated� in� whole� numbers�� However,�
given� a� more� precise� instrument� with� which� to� measure,� distance� can� even� be� measured�
down� to� the� inch� or� millimeter�� When� considering� the� difference� between� a� discrete� and�
continuous� variable,� keep� in� mind� that� discrete variables arise from the counting process� and�
continuous variables arise from the measuring process�� For� example,� the� number� of� students�
enrolled�in�your�statistics�class�is�a�discrete�variable��If�we�were�to�measure�(i�e�,�count)�the�
number�of�students�in�the�class,�it�would�not�matter�if�we�counted�first�names�alphabetically�
from�A�to�Z�or�if�we�counted�beginning�with�who�sat�in�the�front�row�to�the�last�person�in�
the�back�row—either�way,�we�would�arrive�at�the�same�value��In�other�words,�how�we�“mea-
sure”�(again,�count)�the�students�in�the�class�does�not�matter—we�will�always�arrive�at�the�
same�result��In�comparison,�the�value�of�a�continuous�variable�is�dependent�on�how�precise�
the�measuring�instrument�is��Weighing�yourself�on�a�scale�that�rounds�to�whole�numbers�
will�give�us�one�measure�of�weight��However,�weighing�on�another,�more�precise,�scale�that�
rounds�to�three�decimal�places�will�provide�a�more�precise�measure�of�weight�
Here�are�a�few�additional�examples�of�the�discrete�and�continuous�variables��Other�dis-
crete� variables� include� a� number� of� CDs� owned,� number� of� credit� hours� enrolled,� and�
number�of�teachers�employed�at�a�school��Other�continuous�variables�include�salary�(from�
zero�to�billions�in�dollars�and�cents),�age�(from�zero�up,�in�millisecond�increments),�height�
(from� zero� up,�in�increments�of�fractions�of�millimeters),�weight� (from� zero�up,� in�incre-
ments�of�fractions�of�ounces),�and�time�(from�zero�up,�in�millisecond�increments)��Variable�
type�is�often�important�in�terms�of�selecting�an�appropriate�statistic,�as�shown�later�
1.5 Scales of Measurement
Another�concept�useful�for�selecting�an�appropriate�statistic�is�the�scale�of�measurement�
of�the�variables��First,�however,�we�define�measurement�as�the�assignment�of�numerical�
values�to�persons�or�things�according�to�explicit�rules��For�example,�how�do�we�measure�a�
person’s�weight?�Well,�there�are�rules�that�individuals�commonly�follow��Currently,�weight�
is�measured�on�some�sort�of�balance�or�scale�in�pounds�or�grams��In�the�old�days,�weight�
was�measured�by�different�rules,�such�as�the�number�of�stones�or�gold�coins��These�explicit�
rules�were�developed�so�that�there�was�a�standardized�and�generally�agreed�upon�method�
of� measuring� weight�� Thus,� if� you� weighted� 10� stones� in� Coventry,� England,� then� that�
meant�the�same�as�10�stones�in�Liverpool,�England�

9Introduction
In�1951,�the�psychologist�S�S��Stevens�developed�four�types�of�measurement�scales�that�
could�be�used�for�assigning�these�numerical�values��In�other�words,�the�type�of�rule�used�
was�related�to�the�measurement�scale��The�four�types�of�measurement�scales�are�the�nomi-
nal,�ordinal,�interval,�and�ratio�scales��They�are�presented�in�order�of�increasing�complex-
ity�and�of�increasing�information�(remembering�the�acronym�NOIR�might�be�helpful)��
It�is�worth�restating�the�importance�of�understanding�the�measurement�scales�of�variables�
as�the�measurement�scales�will�dictate�what�statistical�procedures�can�be�performed�with�
the�data�
1.5.1 Nominal Measurement Scale
The� simplest� scale� of� measurement� is� the� nominal scale�� Here� individuals� or� objects�
are�classified�into�categories�so�that�all�of�those�in�a�single�category�are�equivalent�with�
respect� to� the� characteristic� being� measured�� For� example,� the� country� of� birth� of� an�
individual� is� a� nominally� scaled� variable�� Everyone� born� in� France� is� equivalent� with�
respect�to�this�variable,�whereas�two�people�born�in�different�countries�(e�g�,�France�and�
Australia)�are�not�equivalent�with�respect�to�this�variable��The�categories�are�truly�quali-
tative�in�nature,�not�quantitative��Categories�are�typically�given�names�or�numbers��For�
our� example,� the� country� name� would� be� an� obvious� choice� for� categories,� although�
numbers�could�also�be�assigned�to�each�country�(e�g�,�Brazil�=�5,�India�=�34)��The�numbers�
do�not�represent�the�amount�of�the�attribute�possessed��An�individual�born�in�India�does�
not�possess�any�more�of�the�“country�of�birth�origin”�attribute�than�an�individual�born�
in�Brazil�(which�would�not�make�sense�anyway)��The�numbers�merely�identify�to�which�
category� an� individual� or� object� belongs�� The� categories� are� also� mutually� exclusive��
That�is,�an�individual�can�belong�to�one�and�only�one�category,�such�as�a�person�being�
born�in�only�one�country�
The� statistics� of� a� nominal� scale� variable� are� quite� simple� as� they� can� only� be� based�
on� the� frequencies� that� occur� within� each� of� the� categories�� For� example,� we� may� be�
studying�characteristics�of�various�countries�in�the�world��A�nominally�scaled�variable�
could� be� the� hemisphere� in� which� the� country� is� located� (northern,� southern,� eastern,�
and�western)��While�it�is�possible�to�count�the�number�of�countries�that�belong�to�each�
hemisphere,�that�is�all�that�we�can�do��The�only�mathematical�property�that�the�nominal�
scale�possesses�is�that�of�equality�versus�inequality��In�other�words,�two�individuals�or�
objects�are�either�in�the�same�category�(equal)�or�in�different�categories�(unequal)��For�the�
hemisphere�variable,�we�can�either�use�the�country�name�or�assign�numerical�values�
to�each�country��We�might�perhaps�assign�each�hemisphere�a�number�alphabetically�from�
1�to�4��Countries�that�are�in�the�same�hemisphere�are�equal�with�respect�to�this�character-
istic��Countries�that�are�in�different�hemispheres�are�unequal�with�respect�to�this�charac-
teristic��Again,�these�particular�numerical�values�are�meaningless�and�could�arbitrarily�
be�any�values��The�numerical�values�assigned�only�serve�to�keep�the�categories�distinct�
from�one�another��Many�other�numerical�values�could�be�assigned�for�the�hemispheres�
and� still� maintain� the� equality� versus� inequality� property�� For� example,� the� northern�
hemisphere�could�easily�be�categorized�as�1000�and�the�southern�hemisphere�as�2000�with�
no�change�in�information��Other�examples�of�nominal�scale�variables�include�hair�color,�
eye�color,�neighborhood,�gender,�ethnic�background,�religious�affiliation,�political�party�
affiliation,�type�of�life�insurance�owned�(e�g�,�term,�whole�life),�blood�type,�psychological�
clinical�diagnosis,�Social�Security�number,�and�type�of�headache�medication�prescribed��
The� term� nominal� is� derived� from� “giving� a� name�”� Nominal� variables� are� considered�
categorical�or�qualitative�

10 An Introduction to Statistical Concepts
1.5.2 Ordinal Measurement Scale
The�next�most�complex�scale�of�measurement�is�the�ordinal scale��Ordinal�measurement�
is�determined�by�the�relative�size�or�position�of�individuals�or�objects�with�respect�to�the�
characteristic�being�measured��That�is,�the�individuals�or�objects�are�rank-ordered�accord-
ing�to�the�amount�of�the�characteristic�that�they�possess��For�example,�say�a�high�school�
graduating� class� had� 250� students�� Students� could� then� be� assigned� class� ranks� accord-
ing�to�their�academic�performance�(e�g�,�grade�point�average)�in�high�school��The�student�
ranked�1�in�the�class�had�the�highest�relative�performance,�and�the�student�ranked�250�had�
the�lowest�relative�performance�
However,�equal�differences�between�the�ranks�do�not�imply�equal�distance�in�terms�of�
the�characteristic�being�measured��For�example,�the�students�ranked�1�and�2�in�the�class�
may�have�a�different�distance�in�terms�of�actual�academic�performance�than�the�students�
ranked� 249� and� 250,� even� though� both� pairs� of� students� differ� by� a� rank� of� 1�� In� other�
words,�here�a�rank�difference�of�1�does�not�imply�the�same�actual�performance�distance��
The�pairs�of�students�may�be�very,�very�close�or�be�quite�distant�from�one�another��As�
a�result�of�equal�differences�not�implying�equal�distances,�the�statistics�that�we�can�use�
are�limited�due�to�these�unequal�intervals��The�ordinal�scale�then�consists�of�two�math-
ematical�properties:�equality�versus�inequality�again;�and�if�two�individuals�or�objects�
are�unequal,�then�we�can�determine�greater�than�or�less�than��That�is,�if�two�individuals�
have�different�class�ranks,�then�we�can�determine�which�student�had�a�greater�or�lesser�
class�rank��Although�the�greater�than�or�less�than�property�is�evident,�an�ordinal�scale�
cannot�tell�us�how�much�greater�than�or�less�than�because�of�the�unequal�intervals��Thus,�
the�student�ranked�250�could�be�farther�away�from�student�249�than�the�student�ranked�2�
from�student�1�
When� we� have� untied� ranks,� as� shown� on� the� left� side� of� Table� 1�1,� assigning� ranks� is�
straightforward�� What� do� we� do� if� there� are� tied� ranks?� For� example,� suppose� there� are�
two�students�with�the�same�grade�point�average�of�3�8�as�given�on�the�right�side�of�Table�1�1��
How�do�we�assign�them�into�class�ranks?�It�is�clear�that�they�have�to�be�assigned�the�same�
rank,�as�that�would�be�the�only�fair�method��However,�there�are�at�least�two�methods�for�
dealing�with�tied�ranks��One�method�would�be�to�assign�each�of�them�a�rank�of�2�as�that�is�
the�next�available�rank��However,�there�are�two�problems�with�that�method��First,�the�sum�
of�the�ranks�for�the�same�number�of�scores�would�be�different�depending�on�whether�there�
Table 1.1
Untied�Ranks�and�Tied�Ranks�for�Ordinal�Data
Untied Ranks Tied Ranks
Grade Point
Average Rank
Grade Point
Average Rank
4�0 1 4�0 1
3�9 2 3�8 2�5
3�8 3 3�8 2�5
3�6 4 3�6 4
3�2 5 3�0 6
3�0 6 3�0 6
2�7 7 3�0 6
Sum�=�28 Sum�=�28

11Introduction
were�ties�or�not��Statistically,�this�is�not�a�satisfactory�solution��Second,�what�rank�would�
the�next�student�having�the�3�6�grade�point�average�be�given,�a�rank�of�3�or�4?
The� second� and� preferred� method� is� to� take� the� average� of� the� available� ranks� and�
assign�that�value�to�each�of�the�tied�individuals��Thus,�the�two�persons�tied�at�a�grade�
point�average�of�3�8�have�as�available�ranks�2�and�3��Both�would�then�be�assigned�the�
average�rank�of�2�5��Also,�the�three�persons�tied�at�a�grade�point�average�of�3�0�have�as�
available�ranks�5,�6,�and�7��These�all�would�be�assigned�the�average�rank�of�6��You�also�
see�in�the�table�that�with�this�method�the�sum�of�the�ranks�for�7�scores�is�always�equal�
to�28,�regardless�of�the�number�of�ties��Statistically,�this�is�a�satisfactory�solution�and�the�
one� we� prefer,� whether� we� are� using� a� statistical� software� package� or� hand� computa-
tions�� Other� examples� of� ordinal� scale� variables� include� course� letter� grades,� order� of�
finish� in� the� Boston� Marathon,� socioeconomic� status,� hardness� of� minerals� (1� =� soft-
est�to�10�=�hardest),�faculty�rank�(assistant,�associate,�and�full�professor),�student�class�
(freshman,�sophomore,�junior,�senior,�graduate�student),�ranking�on�a�personality�trait�
(e�g�,� extreme� intrinsic� to� extreme� extrinsic� motivation),� and� military� rank�� The� term�
ordinal� is� derived� from� “ordering”� individuals� or� objects�� Ordinal� variables� are� most�
often�considered�categorical�or�qualitative�
1.5.3 Interval Measurement Scale
The�next�most�complex�scale�of�measurement�is�the�interval scale��An�interval�scale�is�one�
where� individuals� or� objects� can� be� ordered,� and� equal� differences� between� the� values�
do�imply�equal�distance�in�terms�of�the�characteristic�being�measured��That�is,�order�and�
distance�relationships�are�meaningful��However,�there�is�no�absolute�zero�point��Absolute�
zero,�if�it�exists,�implies�the�total�absence�of�the�property�being�measured��The�zero�point�of�
an�interval�scale,�if�it�exists,�is�arbitrary�and�does�not�reflect�the�total�absence�of�the�prop-
erty�being�measured��Here�the�zero�point�merely�serves�as�a�placeholder��For�example,�sup-
pose�that�we�gave�you�the�final�exam�in�advanced�statistics�right�now��If�you�were�to�be�so�
unlucky�as�to�obtain�a�score�of�0,�this�score�does�not�imply�a�total�lack�of�knowledge�of�sta-
tistics��It�would�merely�reflect�the�fact�that�your�statistics�knowledge�is�not�that�advanced�
yet�(or�perhaps�the�questions�posed�on�the�exam�just�did�not�capture�those�concepts�that�
you�do�understand)��You�do�have�some�knowledge�of�statistics�but�just�at�an�introductory�
level�in�terms�of�the�topics�covered�so�far�
Take�as�an�example�the�Fahrenheit�temperature�scale,�which�has�a�freezing�point�of�
32�degrees��A�temperature�of�zero�is�not�the�total�absence�of�heat,�just�a�point�slightly�
colder� than� 1� degree� and� slightly� warmer� than� −1� degree�� In� terms� of� the� equal� dis-
tance�notion,�consider�the�following�example��Say�that�we�have�two�pairs�of�Fahrenheit�
temperatures,�the�first�pair�being�55�and�60�degrees�and�the�second�pair�being�25�and�
30�degrees��The�difference�of�5�degrees�is�the�same�for�both�pairs�and�is�also�the�same�
everywhere�along�the�Fahrenheit�scale��Thus,�every�5�degree�interval�is�an�equal�interval��
However,�we�cannot�say�that�60�degrees�is�twice�as�warm�as�30�degrees,�as�there�is�no�
absolute�zero��In�other�words,�we�cannot�form�true�ratios�of�values�(i�e�,�60/30�=�2)��This�
property�only�exists�for�the�ratio�scale�of�measurement��The�interval�scale�has�as�math-
ematical� properties� equality� versus� inequality,� greater� than� or� less� than� if� unequal,�
and�equal�intervals��Other�examples�of�interval�scale�variables�include�the�Centigrade�
temperature� scale,� calendar� time,� restaurant� ratings� by� the� health� department� (on� a�
100-point�scale),�year�(since�1�AD),�and�arguably,�many�educational�and�psychological�
assessment�devices�(although�statisticians�have�been�debating�this�one�for�many�years;�

12 An Introduction to Statistical Concepts
e�g�,�on�occasion�there�is�a�fine�line�between�whether�an�assessment�is�measured�along�
the�ordinal�or�the�interval�scale)��Interval�variables�are�considered�numerical�and�pri-
marily�continuous�
1.5.4 Ratio Measurement Scale
The�most�complex�scale�of�measurement�is�the�ratio scale��A�ratio�scale�has�all�of�the�proper-
ties�of�the�interval�scale,�plus�an�absolute�zero�point�exists��Here�a�measurement�of�0�indi-
cates�a�total�absence�of�the�property�being�measured��Due�to�an�absolute�zero�point�existing,�
true�ratios�of�values�can�be�formed�which�actually�reflect�ratios�in�the�amounts�of�the�charac-
teristic�being�measured��Thus,�if�concepts�such�as�“one-half�as�big”�or�“twice�as�large”�make�
sense,�then�that�may�be�a�good�indication�that�the�variable�is�ratio�in�scale�
For�example,�the�height�of�individuals�is�a�ratio�scale�variable��There�is�an�absolute�zero�
point� of� zero� height�� We� can� also� form� ratios� such� that� 6′0″� Sam� is� twice� as� tall� as� his�
3′0″� daughter� Samantha�� The� ratio� scale� of� measurement� is� not� observed� frequently� in�
education�and�the�behavioral�sciences,�with�certain�exceptions��Motor�performance�vari-
ables�(e�g�,�speed�in�the�100�meter�dash,�distance�driven�in�24�hours),�elapsed�time,�calorie�
consumption,�and�physiological�characteristics�(e�g�,�weight,�height,�age,�pulse�rate,�blood�
pressure)� are� ratio� scale� measures� (and� are� all� also� examples� of� continuous� variables)��
Discrete�variables,�those�that�arise�from�the�counting�process,�are�also�examples�of�ratio�
variables�since�zero�indicates�an�absence�of�what�is�measured�(e�g�,�the�number�of�children�
in�a�family�or�the�number�of�trees�in�a�park)��A�summary�of�the�measurement�scales,�their�
characteristics,� and� some� examples� is� given� in� Table� 1�2�� Ratio� variables� are� considered�
numerical�and�can�be�either�discrete�or�continuous�
Table 1.2
Summary�of�the�Scales�of�Measurement
Scale Characteristics Examples
Nominal Classify�into�categories;�categories�are�given�
names�or�numbers,�but�the�numbers�are�
arbitrary;�mathematical�property:
1��Equal�versus�unequal
Hair�or�eye�color,�ethnic�background,�
neighborhood,�gender,�country�of�birth,�social�
security�number,�type�of�life�insurance,�religious�
or�political�affiliation,�blood�type,�clinical�
diagnosis
Ordinal Rank-ordered�according�to�relative�size�
or position;�mathematical�properties:
1��Equal�versus�unequal
2��If�unequal,�then�greater�than�or�less�than
Letter�grades,�order�of�finish�in�race,�class�rank,�
SES,�hardness�of�minerals,�faculty�rank,�student�
class,�military�rank,�rank�on�personality�trait
Interval Rank-ordered�and�equal�differences�between�
values�imply�equal�distances�in�the�attribute;�
mathematical�properties:
1��Equal�versus�unequal
2��If�unequal,�then�greater�than�or�less�than
3��Equal�intervals
Temperature,�calendar�time,�most�assessment�
devices,�year,�restaurant�ratings
Ratio Rank-ordered,�equal�intervals,�absolute�zero�
allows�ratios�to�be�formed;�mathematical�
properties:
1��Equal�versus�unequal
2��If�unequal,�then�greater�than�or�less�than
3��Equal�intervals
4��Absolute�zero
Speed�in�100�meter�dash,�height,�weight,�age,�
distance�driven,�elapsed�time,�pulse�rate,�blood�
pressure,�calorie�consumption

13Introduction
1.6 Summary
In� this� chapter,� an� introduction� to� statistics� was� given�� First,� we� discussed� the� value� and�
need�for�knowledge�about�statistics�and�how�it�assists�in�decision�making��Next,�a�few�of�
the�more�colorful�and�interesting�statisticians�of�the�past�were�mentioned��Then,�we�defined�
the�following�general�statistical�terms:�population,�parameter,�sample,�statistic,�descriptive�
statistics,�and�inferential�statistics��We�then�defined�variable-related�terms�including�vari-
ables,� constants,� categorical� variables,� and� continuous� variables�� For� a� summary� of� these�
definitions,�see�Box�1�1��Finally,�we�examined�the�four�classic�types�of�measurement�scales,�
nominal,� ordinal,� interval,� and� ratio�� By� now,� you� should� have� met� the� following� objec-
tives:�(a) have�a�better�sense�of�why�statistics�are�necessary;�(b)�see�that�statisticians�are�an�
interesting�group�of�people;�and�(c)�have�an�understanding�of�the�basic�statistical�concepts�
of�population,�parameter,�sample,�and�statistic,�descriptive�and�inferential�statistics,�types�
of� variables,� and� scales� of� measurement�� The� next� chapter� begins� to� address� some� of� the�
details�of�descriptive�statistics�when�we�consider�how�to�represent�data�in�terms�of�tables�
and�graphs��In�other�words,�rather�than�carrying�our�data�around�with�us�everywhere�we�go,�
we�examine�ways�to�display�data�in�tabular�and�graphical�forms�to�foster�communication�
STOp aNd ThINk bOx 1.1
Summary�of�Definitions
Term Definition Example(s)
Population All�members�of�a�well-defined�group All�employees�of�IBM�Atlanta
Parameter A�characteristic�of�a�population Average�salary�of�a�population
Sample A�subset�of�a�population Some�employees�of�IBM�Atlanta
Statistic A�characteristic�of�a�sample Average�salary�of�a�sample
Descriptive�statistics Techniques�which�allow�us�to�tabulate,�
summarize,�and�depict�a�collection�of�data�
in an�abbreviated�fashion
Table�or�graph�summarizing�data
Inferential�statistics Techniques�which�allow�us�to�employ�inductive�
reasoning�to�infer�the�properties�of�a�
population�from�a�sample
Taste�test�statistics�from�sample�
of Dublin�residents
Variable Any�characteristic�of�persons�or�things�that�
is observed�to�take�on�different�values
Salary�of�the�families�in�your�
neighborhood
Constant Any�characteristic�of�persons�or�things�that�
is observed�to�take�on�only�a�single�value
Every�family�has�a�lawn�in�your�
neighborhood
Categorical�variable A�qualitative�variable Political�party�affiliation
Dichotomous�variable A�categorical�variable�that�can�take�on�only�
one of�two�values
Biologically�determined�gender
Numerical�variable A�quantitative�variable�that�is�either�discrete�
or continuous
Number�of�children�in�a�family;�
the�distance�between�two�cities
Discrete�variable A�numerical�variable�that�arises�from�the�
counting�process�that�can�take�on�only�certain�
values
Number�of�children�in�a�family
Continuous�variable A�numerical�variable�that�can�take�on�any�value�
within�a�certain�range�given�a�precise�enough�
measurement�instrument
Distance�between�two�cities

14 An Introduction to Statistical Concepts
Problems
Conceptual problems
1.1� �A�mental�health�counselor�is�conducting�a�research�study�on�satisfaction�that�married�
couples� have� with� their� marriage�� “Marital� status”� (e�g�,� single,� married,� divorced,�
widowed),�in�this�scenario,�is�which�one�of�the�following?
� a�� Constant
� b�� Variable
1.2� �Belle� randomly� samples� 100� library� patrons� and� gathers� data� on� the� genre� of� the�
“first�book”�that�they�checked�out�from�the�library��She�finds�that�85�library�patrons�
checked� out� a� fiction� book� and� 15� library� patrons� checked� out� a� nonfiction� book��
Which�of�the�following�best�characterizes�the�type�of�“first�book”�checked�out�in�this�
study?
� a�� Constant
� b�� Variable
1.3� For�interval�level�variables,�which�of�the�following�properties�does�not�apply?
� a�� Jim�is�two�units�greater�than�Sally�
� b�� Jim�is�greater�than�Sally�
� c�� Jim�is�twice�as�good�as�Sally�
� d�� Jim�differs�from�Sally�
1.4� �Which�of�the�following�properties�is�appropriate�for�ordinal�but�not�for�nominal�variables?
� a�� Sue�differs�from�John�
� b�� Sue�is�greater�than�John�
� c�� Sue�is�10�units�greater�than�John�
� d�� Sue�is�twice�as�good�as�John�
1.5� �Which� scale� of� measurement� is� implied� by� the� following� statement:� “Jill’s� score� is�
three�times�greater�than�Eric’s�score”?
� a�� Nominal
� b�� Ordinal
� c�� Interval
� d�� Ratio
1.6� �Which�scale�of�measurement�is�implied�by�the�following�statement:�“Bubba�had�the�
highest�score”?
� a�� Nominal
� b�� Ordinal
� c�� Interval
� d�� Ratio
1.7� �A�band�director�collects�data�on�the�number�of�years�in�which�students�in�the�band�
have� played� a� musical� instrument�� Which� scale� of� measurement� is� implied� by� this�
scenario?

15Introduction
� a�� Nominal
� b�� Ordinal
� c�� Interval
� d�� Ratio
1.8� �Kristen�has�an�IQ�of�120��I�assert�that�Kristen�is�20%�more�intelligent�than�the�average�
person�having�an�IQ�of�100��Am�I�correct?
1.9� Population�is�to�parameter�as�sample�is�to�statistic��True�or�false?
1.10� Every�characteristic�of�a�sample�of�100�persons�constitutes�a�variable��True�or�false?
1.11� A�dichotomous�variable�is�also�a�categorical�variable��True�or�false?
1.12� �The� amount� of� time� spent� studying� in� 1� week� for� a� population� of� students� is� an�
inferential�statistic��True�or�false?
1.13� For�ordinal�level�variables,�which�of�the�following�properties�does�not�apply?
� a�� IBM�differs�from�Apple�
� b�� IBM�is�greater�than�Apple�
� c�� IBM�is�two�units�greater�than�Apple�
� d�� All�of�the�aforementioned�properties�apply�
1.14� �A�sample�of�50�students�take�an�exam,�and�the�instructor�decides�to�give�the�top�5�
scores�a�bonus�of�5�points��Compared�to�the�original�set�of�scores�(no�bonus),�I�assert�
that�the�ranks�of�the�new�set�of�scores�(including�bonus)�will�be�exactly�the�same��Am�
I�correct?
1.15� �Johnny�and�Buffy�have�class�ranks�of�5�and�6��Ingrid�and�Toomas�have�class�ranks�of�
55�and�56��I�assert�that�the�GPAs�of�Johnny�and�Buffy�are�the�same�distance�apart�as�
are�the�GPAs�of�Ingrid�and�Toomas��Am�I�correct?
Computational problems
1.1� �Rank� the� following� values� of� the� number� of� CDs� owned,� assigning� rank� 1� to� the�
largest�value:
10 15 12 8 20 17 5 21 3 19
1.2� �Rank�the�following�values�of�the�number�of�credits�earned,�assigning�rank�1�to�the�
largest�value:
10 16 10 8 19 16 5 21 3 19
1.3� �Rank�the�following�values�of�the�number�of�pairs�of�shoes�owned,�assigning�rank�1�
to�the�largest�value:
8 6 3 12 19 7 10 25 4 42
Interpretive problems
Consider�the�following�class�survey:
1.1� What�is�your�gender?
1.2� What�is�your�height�in�inches?
1.3� What�is�your�shoe�size�(length)?

16 An Introduction to Statistical Concepts
1.4� Do�you�smoke?
1.5� Are�you�left-�or�right-handed?�Your�mother?�Your�father?
1.6� How�much�did�you�spend�at�your�last�hair�appointment�(including�tip)?
1.7� How�many�CDs�do�you�own?
1.8� What�was�your�quantitative�GRE�score?
1.9� What�is�your�current�GPA?
1.10� On�average,�how�much�exercise�do�you�get�per�week�(in�hours)?
1.11� �On�a�5-point�scale,�what�is�your�political�view�(1�=�very�liberal,�3�=�moderate,�5�=�very�
conservative)?
1.12� On�average,�how�many�hours�of�TV�do�you�watch�per�week?
1.13� How�many�cups�of�coffee�did�you�drink�yesterday?
1.14� How�many�hours�did�you�sleep�last�night?
1.15� On�average,�how�many�alcoholic�drinks�do�you�have�per�week?
1.16� Can�you�tell�the�difference�between�Pepsi�and�Coke?
1.17� What�is�the�natural�color�of�your�hair�(black,�blonde,�brown,�red,�other)?
1.18� What�is�the�natural�color�of�your�eyes�(black,�blue,�brown,�green,�other)?
1.19� How�far�do�you�live�from�this�campus�(in�miles)?
1.20� On�average,�how�many�books�do�you�read�for�pleasure�each�month?
1.21� On�average,�how�many�hours�do�you�study�per�week?
1.22� �Which�question�on�this�survey�is�the�most�interesting�to�you?�The�least�interesting?
Possible Activities
1�� For�each�item,�determine�the�most�likely�scale�of�measurement�(nominal,�ordinal,�inter-
val,�or�ratio)�and�the�type�of�variable�[categorical�or�numerical�(if�numerical,�discrete�or�
continuous)]�
2�� Create� scenarios� in� which� one� or� more� of� the� variables� in� this� survey� would� be� a�
constant,�given�the�delimitations�that�you�define�for�your�study��For�example,�we�are�
designing�a�study�to�measure�study�habits�(as�measured�by�Question�1�21)�for�students�
who�do�not�exercise�(Question�1�10)��In�this�sample�study,�our�constant�is�the�number�of�
hours�per�week�that�a�student�studies�(in�this�case,�we�are�delimiting�that�to�be�zero—
and�thus,�Question�1�10�will�be�a�constant;�all�students�in�our�study�will�have�answered�
Question�1�10�as�“zero”)�
3�� Collect�data�from�a�sample�of�individuals��In�subsequent�chapters,�you�will�be�asked�to�
analyze�these�data�for�different�procedures�
N O T E : � An�actual�sample�dataset�using�this�survey�is�contained�on�the�website�(SPSS�file:�
survey1)�and�is�utilized�in�later�chapters�

17
2
Data Representation
Chapter Outline
2�1� Tabular�Display�of�Distributions
2�1�1� Frequency�Distributions
2�1�2� Cumulative�Frequency�Distributions
2�1�3� Relative�Frequency�Distributions
2�1�4� Cumulative�Relative�Frequency�Distributions
2�2� Graphical�Display�of�Distributions
2�2�1� Bar�Graph
2�2�2� Histogram
2�2�3� Frequency�Polygon
2�2�4� Cumulative�Frequency�Polygon
2�2�5� Shapes�of�Frequency�Distributions
2�2�6� Stem-and-Leaf�Display
2�3� Percentiles
2�3�1� Percentiles
2�3�2� Quartiles
2�3�3� Percentile�Ranks
2�3�4� Box-and-Whisker�Plot
2�4� SPSS
2�5� Templates�for�Research�Questions�and�APA-Style�Paragraph
Key Concepts
� 1�� Frequencies,�cumulative�frequencies,�relative�frequencies,�and�cumulative�relative�
frequencies
� 2�� Ungrouped�and�grouped�frequency�distributions
� 3�� Sample�size
� 4�� Real�limits�and�intervals
� 5�� Frequency�polygons
� 6�� Normal,�symmetric,�and�skewed�frequency�distributions
� 7�� Percentiles,�quartiles,�and�percentile�ranks

18 An Introduction to Statistical Concepts
In� Chapter� 1,� we� introduced� the� wonderful� world� of� statistics�� There,� we� discussed� the�
value�of�statistics,�met�a�few�of�the�more�interesting�statisticians,�and�defined�several�basic�
statistical� concepts�� The� concepts� included� population,� parameter,� sample� and� statistic,�
descriptive�and�inferential�statistics,�types�of�variables,�and�scales�of�measurement��In�this�
chapter,�we�begin�our�examination�of�descriptive�statistics,�which�we�previously�defined�
as�techniques�that�allow�us�to�tabulate,�summarize,�and�depict�a�collection�of�data�in�an�
abbreviated� fashion�� We� used� the� example� of� collecting� data� from� 100,000� graduate� stu-
dents�on�various�characteristics�(e�g�,�height,�weight,�gender,�grade�point�average,�aptitude�
test� scores)�� Rather� than� having� to� carry� around� the� entire� collection� of� data� in� order� to�
respond�to�questions,�we�mentioned�that�you�could�summarize�the�data�in�an�abbreviated�
fashion�through�the�use�of�tables�and�graphs��This�way,�we�could�communicate�features�of�
the�data�through�a�few�tables�or�figures�without�having�to�carry�around�the�entire�dataset�
This�chapter�deals�with�the�details�of�the�construction�of�tables�and�figures�for�purposes�
of�describing�data��Specifically,�we�first�consider�the�following�types�of�tables:�frequency�dis-
tributions�(ungrouped�and�grouped),�cumulative�frequency�distributions,�relative�frequency�
distributions,�and�cumulative�relative�frequency�distributions��Next�we�look�at�the�following�
types�of�figures:�bar�graph,�histogram,�frequency�polygon,�cumulative�frequency�polygon,�
and�stem-and-leaf�display��We�also�discuss�common�shapes�of�frequency�distributions��Then�
we� examine� the� use� of� percentiles,� quartiles,� percentile� ranks,� and� box-and-whisker� plots��
Finally,�we�look�at�the�use�of�SPSS�and�develop�an�APA-style�paragraph�of�results��Concepts�
to�be�discussed�include�frequencies,�cumulative�frequencies,�relative�frequencies,�and�cumu-
lative�relative�frequencies;�ungrouped�and�grouped�frequency�distributions;�sample�size;�real�
limits�and�intervals;�frequency�polygons;�normal,�symmetric,�and�skewed�frequency�distri-
butions;�and�percentiles,�quartiles,�and�percentile�ranks��Our�objectives�are�that�by�the�end�of�
this�chapter,�you�will�be�able�to�(1)�construct�and�interpret�statistical�tables,�(2)�construct�and�
interpret�statistical�graphs,�and�(3)�determine�and�interpret�percentile-related�information�
2.1 Tabular Display of Distributions
Consider�the�following�research�scenario:
Marie,�a�graduate�student�pursuing�a�master’s�degree�in�educational�research,�has�been�
assigned�to�her�first�task�as�a�research�assistant��Her�faculty�mentor�has�given�Marie�
quiz�data�collected�from�25�students�enrolled�in�an�introductory�statistics�course�and�
has�asked�Marie�to�summarize�the�data��In�addition�to�the�data,�the�faculty�mentor�has�
shared�the�following�research�question�that�should�guide�Marie�in�her�analysis�of�the�
data:�How can the quiz scores of students enrolled in an introductory statistics class be graphi-
cally represented in a table? In a figure? What is the distributional shape of the statistics quiz
score? What is the 50th�percentile of the quiz scores?
In�this�section,�we�consider�ways�in�which�data�can�be�represented�in�the�form�of�tables��
More�specifically,�we�are�interested�in�how�the�data�for�a�single�variable�can�be�represented�
(the�representation�of�data�for�multiple�variables�is�covered�in�later�chapters)��The�methods�
described� here� include� frequency� distributions� (both� ungrouped� and� grouped),� cumu-
lative� frequency� distributions,� relative� frequency� distributions,� and� cumulative� relative�
frequency�distributions�

19Data Representation
2.1.1 Frequency distributions
Let�us�use�an�example�set�of�data�in�this�chapter�to�illustrate�ways�in�which�data�can�be�
represented��We�have�selected�a�small�dataset�for�purposes�of�simplicity,�although�datasets�
are�typically�larger�in�size��Note�that�there�is�a�larger�dataset�(based�on�the�survey�from�
Chapter�1�interpretive�problem)�utilized�in�the�end�of�chapter�problems�and�available�on�
our�website�as�“survey1�”�As�shown�in�Table�2�1,�the�smaller�dataset�consists�of�a�sample�
of�25�student�scores�on�a�statistics�quiz,�where�the�maximum�score�is�20�points��If�a�col-
league� asked� a� question� about� these� data,� again� a� response� could� be,� “take� a� look� at� the�
data�yourself�”�This�would�not�be�very�satisfactory�to�the�colleague,�as�the�person�would�
have�to�eyeball�the�data�to�answer�his�or�her�question��Alternatively,�one�could�present�the�
data�in�the�form�of�a�table�so�that�questions�could�be�more�easily�answered��One�question�
might�be�which�score�occurred�most�frequently?�In�other�words,�what�score�occurred�more�
than�any�other�score?�Other�questions�might�be�which�scores�were�the�highest�and�lowest�
scores�in�the�class?�and�where�do�most�of�the�scores�tend�to�fall?�In�other�words,�how�well�
did�the�students�tend�to�do�as�a�class?�These�and�other�questions�can�be�easily�answered�
by�looking�at�a�frequency distribution�
Let�us�first�look�at�how�an�ungrouped frequency distribution�can�be�constructed�for�
these� and� other� data�� By� following� these� steps,� we� develop� the� ungrouped� frequency�
distribution�as�shown�in�Table�2�2��The�first�step�is�to�arrange�the�unique�scores�on�a�list�
from�the�lowest�score�to�the�highest�score��The�lowest�score�is�9�and�the�highest�score�is�20��Even�
though�scores�such�as�15�were�observed�more�than�once,�the�value�of�15�is�only�entered�
in�this�column�once��This�is�what�we�mean�by�unique��Note�that�if�the�score�of�15�was�not�
observed,�it�could�still�be�entered�as�a�value�in�the�table�to�serve�as�a�placeholder�within�
Table 2.1
Statistics�Quiz�Data
9 11 20 15 19 10 19 18 14 12 17 11 13
16 17 19 18 17 13 17 15 18 17 19 15
Table 2.2
Ungrouped�Frequency�Distribution�
of Statistics�Quiz�Data
X f cf rf crf
9 1 1 f/n�=�1/25�=��04 �04
10 1 2 �04 �08
11 2 4 �08 �16
12 1 5 �04 �20
13 2 7 �08 �28
14 1 8 �04 �32
15 3 11 �12 �44
16 1 12 �04 �48
17 5 17 �20 �68
18 3 20 �12 �80
19 4 24 �16 �96
20 1 25 �04 1�00
n�=�25 1�00

20 An Introduction to Statistical Concepts
the�distribution�of�scores�observed��We�label�this�column�as�“raw�score”�or�“X,”�as�shown�by�
the�first�column�in�the�table��Raw scores�are�a�set�of�scores�in�their�original�form;�that�is,�the�
scores�have�not�been�altered�or�transformed�in�any�way��X�is�often�used�in�statistics�to�denote�
a�variable,�so�you�see�X�quite�a�bit�in�this�text��(As�a�side�note,�whenever�upper�or�lowercase�
letters�are�used�to�denote�statistical�notation,�the�letter�is�always�italicized�)
The� second� step� is� to� determine� for� each� unique� score� the� number� of� times� it� was�
observed�� We� label� this� second� column� as� “frequency”� or� by� the� abbreviation� “f�”� The�
frequency� column� tells� us� how� many� times� or� how� frequently� each� unique� score� was�
observed��For�instance,�the�score�of�20�was�only�observed�one�time�whereas�the�score�of�17�
was�observed�five�times��Now�we�have�some�information�with�which�to�answer�the�ques-
tions�of�our�colleague��The�most�frequently�observed�score�is�17,�the�lowest�score�is�9,�and�
the�highest�score�is�20��We�can�also�see�that�scores�tended�to�be�closer�to�20�(the�highest�
score)�than�to�9�(the�lowest�score)�
Two�other�concepts�need�to�be�introduced�that�are�included�in�Table�2�2��The�first�concept�
is�sample size��At�the�bottom�of�the�second�column,�you�see�n�=�25��From�now�on,�n�will�
be�used�to�denote�sample�size,�that�is,�the�total�number�of�scores�obtained�for�the�sample��
Thus,�because�25�scores�were�obtained�here,�then�n�=�25�
The�second�concept�is�related�to�real limits�and�intervals��Although�the�scores�obtained�
for� this� dataset� happened� to� be� whole� numbers,� not� fractions� or� decimals,� we� still� need� a�
system�that�will�cover�that�possibility��For�example,�what�would�we�do�if�a�student�obtained�
a�score�of�18�25?�One�option�would�be�to�list�that�as�another�unique�score,�which�would�prob-
ably�be�more�confusing�than�useful��A�second�option�would�be�to�include�it�with�one�of�the�
other�unique�scores�somehow;�this�is�our�option�of�choice��The�system�that�all�researchers�
use�to�cover�the�possibility�of�any�score�being�obtained�is�through�the�concepts�of�real�limits�
and� intervals�� Each� value� of� X� in� Table� 2�2� can� be� thought� of� as� being� the� midpoint� of� an�
interval��Each�interval�has�an�upper�and�a�lower�real�limit��The�upper�real�limit�of�an�interval�
is�halfway�between�the�midpoint�of�the�interval�under�consideration�and�the�midpoint�of�
the�next�larger�interval��For�example,�the�value�of�18�represents�the�midpoint�of�an�interval��
The�next�larger�interval�has�a�midpoint�of�19��Therefore,�the�upper�real�limit�of�the�interval�
containing�18�would�be�18�5,�halfway�between�18�and�19��The�lower�real�limit�of�an�interval�
is�halfway�between�the�midpoint�of�the�interval�under�consideration�and�the�midpoint�of�the�
next�smaller�interval��Following�the�example�interval�of�18�again,�the�next�smaller�interval�
has�a�midpoint�of�17��Therefore,�the�lower�real�limit�of�the�interval�containing�18�would�be�
17�5,�halfway�between�18�and�17��Thus,�the�interval�of�18�has�18�5�as�an�upper�real�limit�and�
17�5�as�a�lower�real�limit��Other�intervals�have�their�upper�and�lower�real�limits�as�well�
Notice� that� adjacent� intervals� (i�e�,� those� next� to� one� another)� touch� at� their� respective�
real�limits��For�example,�the�18�interval�has�18�5�as�its�upper�real�limit�and�the�19�interval�
has� 18�5� as� its� lower� real� limit�� This� implies� that� any� possible� score� that� occurs� can� be�
placed�into�some�interval�and�no�score�can�fall�between�two�intervals��If�someone�obtains�
a�score�of�18�25,�that�will�be�covered�in�the�18�interval��The�only�limitation�to�this�procedure�
is� that� because� adjacent� intervals� must� touch� in� order� to� deal� with� every� possible� score,�
what�do�we�do�when�a�score�falls�precisely�where�two�intervals�touch�at�their�real�limits�
(e�g�,�at�18�5)?�There�are�two�possible�solutions��The�first�solution�is�to�assign�the�score�to�
one�interval�or�another�based�on�some�rule��For�instance,�we�could�randomly�assign�such�
scores� to� one� interval� or� the� other� by� flipping� a� coin�� Alternatively,� we� could� arbitrarily�
assign�such�scores�always�into�either�the�larger�or�smaller�of�the�two�intervals��The�second�
solution�is�to�construct�intervals�such�that�the�number�of�values�falling�at�the�real�limits�
is�minimized��For�example,�say�that�most�of�the�scores�occur�at��5�(e�g�,�15�5,�16�5,�17�5)��We�
could�construct�the�intervals�with��5�as�the�midpoint�and��0�as�the�real�limits��Thus,�the�15�5�

21Data Representation
interval�would�have�15�5�as�the�midpoint,�16�0�as�the�upper�real�limit,�and�15�0�as�the�lower�
real�limit��It�should�also�be�noted�that,�strictly�speaking,�real�limits�are�only�appropriate�
for�continuous�variables�but�not�for�discrete�variables��That�is,�since�discrete�variables�can�
only�have�limited�values,�we�probably�don’t�need�to�worry�about�real�limits�(e�g�,�there�is�
not�really�an�interval�for�two�children)�
Finally,� the� width� of� an� interval� is� defined� as� the� difference� between� the� upper� and�
lower�real�limits�of�an�interval��We�can�denote�this�as�w = URL − LRL,�where�w�is�interval�
width,�and�URL�and�LRL�are�the�upper�and�lower�real�limits,�respectively��In�the�case�of�
our�example�interval�again,�we�see�that�w = URL − LRL�=�18�5�−�17�5�=�1�0��For�Table�2�2,�
then,�all�intervals�have�the�same�interval�width�of�1�0��For�each�interval,�we�have�a�mid-
point,�a�lower�real�limit�that�is�one-half�unit�below�the�midpoint,�and�an�upper�real�limit�
that�is�one-half�unit�above�the�midpoint��In�general,�we�want�all�of�the�intervals�to�have�the�
same�width�for�consistency�as�well�as�for�equal�interval�reasons��The�only�exception�might�
be�if�the�largest�or�smallest�intervals�were�above�a�certain�value�(e�g�,�greater�than�20)�or�
below�a�certain�value�(e�g�,�less�than�9),�respectively�
A�frequency�distribution�with�an�interval�width�of�1�0�is�often�referred�to�as�an�ungrouped
frequency distribution,�as�the�intervals�have�not�been�grouped�together��Does�the�interval�
width�always�have�to�be�equal�to�1�0?�The�answer,�of�course,�is�no��We�could�group�intervals�
together�and�form�what�is�often�referred�to�as�a�grouped frequency distribution��For�our�
example�data,�we�can�construct�a�grouped�frequency�distribution�with�an�interval�width�
of�2�0,�as�shown�in�Table�2�3��The�largest�interval�now�contains�the�scores�of�19�and�20,�the�
second� largest� interval� the� scores� of� 17� and� 18,� and� so� on� down� to� the� smallest� interval�
with�the�scores�of�9�and�10��Correspondingly,�the�largest�interval�contains�a�frequency�of�5,�
the�second�largest�interval�a�frequency�of�8,�and�the�smallest�interval�a�frequency�of�2��All�
we�have�really�done�is�collapse�the�intervals�from�Table�2�2,�where�interval�width�was�1�0,�
into�the�intervals�of�width�2�0,�as�shown�in�Table�2�3��If�we�take,�for�example,�the�interval�
containing�the�scores�of�17�and�18,�then�the�midpoint�of�the�interval�is�17�5,�the�URL�is�18�5,�
the�LRL�is�16�5,�and�thus�w�=�2�0��The�interval�width�could�actually�be�any�value,�including�
�20�or�100,�depending�on�what�best�suits�the�data�
How�does�one�determine�what�the�proper�interval�width�should�be?�If�there�are�many�
frequencies�for�each�score�and�less�than�15�or�20�intervals,�then�an�ungrouped�frequency�
distribution�with�an�interval�width�of�1�is�appropriate�(and�this�is�the�default�in�SPSS�for�
determining� frequency� distributions)�� If� there� are� either� minimal� frequencies� per� score�
(say� 1� or� 2)� or� a� large� number� of� unique� scores� (say� more� than� 20),� then� a� grouped� fre-
quency�distribution�with�some�other�interval�width�is�appropriate��For�a�first�example,�say�
Table 2.3
Grouped�Frequency�Distribution�
of�Statistics�Quiz�Data
X f
9–10 2
11–12 3
13–14 3
15–16 4
17–18 8
19–20 5
n�=�25

22 An Introduction to Statistical Concepts
that�there�are�100�unique�scores�ranging�from�0�to�200��An�ungrouped�frequency�distri-
bution�would�not�really�summarize�the�data�very�well,�as�the�table�would�be�quite�large��
The�reader�would�have�to�eyeball�the�table�and�actually�do�some�quick�grouping�in�his�or�
her�head�so�as�to�gain�any�information�about�the�data��An�interval�width�of�perhaps�10–15�
would�be�more�useful��In�a�second�example,�say�that�there�are�only�20�unique�scores�rang-
ing�from�0�to�30,�but�each�score�occurs�only�once�or�twice��An�ungrouped�frequency�dis-
tribution�would�not�be�very�useful�here�either,�as�the�reader�would�again�have�to�collapse�
intervals�in�his�or�her�head��Here�an�interval�width�of�perhaps�2–5�would�be�appropriate�
Ultimately,�deciding�on�the�interval�width,�and�thus,�the�number�of�intervals,�becomes�a�
trade-off�between�good�communication�of�the�data�and�the�amount�of�information�contained�
in�the�table��As�interval�width�increases,�more�and�more�information�is�lost�from�the�original�
data�� For� the� example� where� scores� range� from� 0� to� 200� and� using� an� interval� width� of� 10,�
some� precision� in� the� 15� scores� contained� in� the� 30–39� interval� is� lost�� In� other� words,� the�
reader�would�not�know�from�the�frequency�distribution�where�in�that�interval�the�15�scores�
actually� fall�� If� you� want� that� information� (you� may� not),� you� would� need� to� return� to� the�
original�data��At�the�same�time,�an�ungrouped�frequency�distribution�for�those�data�would�
not�have�much�of�a�message�for�the�reader��Ultimately,�the�decisive�factor�is�the�adequacy�with�
which�information�is�communicated�to�the�reader��The�nature�of�the�interval�grouping�comes�
down�to�whatever�form�best�represents�the�data��With�today’s�powerful�statistical�computer�
software,�it�is�easy�for�the�researcher�to�try�several�different�interval�widths�before�deciding�
which�one�works�best�for�a�particular�set�of�data��Note�also�that�the�frequency�distribution�can�
be�used�with�variables�of�any�measurement�scale,�from�nominal�(e�g�,�the�frequencies�for�eye�
color�of�a�group�of�children)�to�ratio�(e�g�,�the�frequencies�for�the�height�of�a�group�of�adults)�
2.1.2 Cumulative Frequency distributions
A�second�type�of�frequency�distribution�is�known�as�the�cumulative frequency distribution��
For�the�example�data,�this�is�depicted�in�the�third�column�of�Table�2�2�and�labeled�as�“cf�”�To�
put�it�simply,�the�number�of�cumulative�frequencies�for�a�particular�interval�is�the�number�of�
scores�contained�in�that�interval�and�all�of�the�smaller�intervals��Thus,�the�nine�interval�con-
tains�one�frequency,�and�there�are�no�frequencies�smaller�than�that�interval,�so�the�cumulative�
frequency�is�simply�1��The�10�interval�contains�one�frequency,�and�there�is�one�frequency�in�
a�smaller�interval,�so�the�cumulative�frequency�is�2��The�11�interval�contains�two�frequencies,�
and�there�are�two�frequencies�in�smaller�intervals;�thus,�the�cumulative�frequency�is�4��Then�
four�people�had�scores�in�the�11�interval�and�smaller�intervals��One�way�to�think�about�deter-
mining�the�cumulative�frequency�column�is�to�take�the�frequency�column�and�accumulate�
downward�(i�e�,�from�the�top�down,�yielding�1;�1�+�1�=�2;�1�+�1�+�2�=�4;�etc�)��Just�as�a�check,�the�
cf�in�the�largest�interval�(i�e�,�the�interval�largest�in�value)�should�be�equal�to�n,�the�number�
of�scores�in�the�sample,�25�in�this�case��Note�also�that�the�cumulative�frequency�distribution�
can�be�used�with�variables�of�measurement�scales�from�ordinal�(e�g�,�the�number�of�students�
receiving�a�B�or�less)�to�ratio�(e�g�,�the�number�of�adults�that�are�5′7″�or�less),�but�cannot�be�
used�with�nominal�as�there�is�not�at�least�rank�order�to�nominal�data�(and�thus�accumulating�
information�from�one�nominal�category�to�another�does�not�make�sense)�
2.1.3 Relative Frequency distributions
A�third�type�of�frequency�distribution�is�known�as�the�relative frequency distribution��For�
the�example�data,�this�is�shown�in�the�fourth�column�of�Table�2�2�and�labeled�as�“rf�”�Relative�
frequency�is�simply�the�percentage�of�scores�contained�in�an�interval��Computationally,�

23Data Representation
rf = f/n�� For� example,� the� percentage� of� scores� occurring� in� the� 17� interval� is� computed� as�
rf� =� 5/25� =� �20�� Relative� frequencies� take� sample� size� into� account� allowing� us� to� make�
statements�about�the�number�of�individuals�in�an�interval�relative�to�the�total�sample��Thus,�
rather�than�stating�that�5�individuals�had�scores�in�the�17�interval,�we�could�say�that�20%�of�
the�scores�were�in�that�interval��In�the�popular�press,�relative�frequencies�(which�they�call�
percentages)�are�quite�often�reported�in�tables�without�the�frequencies��Note�that�the�sum�of�
the�relative�frequencies�should�be�1�00�(or�100%)�within�rounding�error��Also�note�that�the�
relative�frequency�distribution�can�be�used�with�variables�of�any�measurement�scale,�from�
nominal�(e�g�,�the�percent�of�children�with�blue�eye�color)�to�ratio�(e�g�,�the�percent�of�adults�
that�are�5′7″)�
2.1.4 Cumulative Relative Frequency distributions
A�fourth�and�final�type�of�frequency�distribution�is�known�as�the�cumulative relative fre-
quency distribution��For�the�example�data,�this�is�depicted�in�the�fifth�column�of�Table�2�2�
and�labeled�as�“crf�”�The�number�of�cumulative�relative�frequencies�for�a�particular�interval�
is�the�percentage�of�scores�in�that�interval�and�smaller��Thus,�the�nine�interval�has�a�rela-
tive�frequency�of��04,�and�there�are�no�relative�frequencies�smaller�than�that�interval,�so�the�
cumulative�relative�frequency�is�simply��04��The�10�interval�has�a�relative�frequency�of��04,�
and� the� relative� frequencies� less� than� that� interval� are� �04,� so� the� cumulative� relative� fre-
quency�is��08��The�11�interval�has�a�relative�frequency�of��08,�and�the�relative�frequencies�less�
than�that�interval�total��08,�so�the�cumulative�relative�frequency�is��16��Thus,�16%�of�the�peo-
ple�had�scores�in�the�11�interval�and�smaller��In�other�words,�16%�of�people�scored�11�or�less��
One�way�to�think�about�determining�the�cumulative�relative�frequency�column�is�to�take�
the�relative�frequency�column�and�accumulate�downward�(i�e�,�from�the�top�down,�yield-
ing��04;��04�+��04�=��08;��04�+��04�+��08�=��16;�etc�)��Just�as�a�check,�the�crf�in�the�largest�interval�
should�be�equal�to�1�0,�within�rounding�error,�just�as�the�sum�of�the�relative�frequencies�is�
equal�to�1�0��Also�note�that�the�cumulative�relative�frequency�distribution�can�be�used�with�
variables�of�measurement�scales�from�ordinal�(e�g�,�the�percent�of�students�receiving�a�B�or�
less)�to�ratio�(e�g�,�the�percent�of�adults�that�are�5′7″�or�less)��As�with�relative�frequency�dis-
tributions,�cumulative�relative�frequency�distributions�cannot�be�used�with�nominal�data�
2.2 Graphical Display of Distributions
In� this� section,� we� consider� several� types� of� graphs�for� viewing� a� distribution� of� scores��
Again,�we�are�still�interested�in�how�the�data�for�a�single�variable�can�be�represented,�but�
now� in� a� graphical� display� rather� than� a� tabular� display�� The� methods� described� here�
include�the�bar�graph,�histogram,�frequency,�relative�frequency,�cumulative�frequency�and�
cumulative� relative� frequency� polygons,� and� stem-and-leaf� display�� Common� shapes� of�
distributions�will�also�be�discussed�
2.2.1 bar Graph
A�popular�method�used�for�displaying�nominal�scale�data�in�graphical�form�is�the�bar
graph�� As� an� example,� say� that� we� have� data� on� the� eye� color� of� a� sample� of� 20� chil-
dren�� Ten� children� are� blue� eyed,� six� are� brown� eyed,� three� are� green� eyed,� and� one�

24 An Introduction to Statistical Concepts
is�black�eyed��A�bar�graph�for�these�data�is�shown�in�Figure�2�1�(SPSS�generated)��The�
horizontal� axis,� going� from� left� to� right� on� the� page,� is� often� referred� to� in� statistics�
as�the�X�axis�(for�variable�X,�in�this�example�our�variable�is�eye color)��On�the�X�axis�of�
Figure�2�1,�we�have�labeled�the�different�eye�colors�that�were�observed�from�individu-
als� in� our� sample�� The� order� of� the� colors� is� not� relevant� (remember,� this� is� nominal�
data,�so�order�or�rank�is�irrelevant)��The�vertical�axis,�going�from�bottom�to�top�on�the�
page,�is�often�referred�to�in�statistics�as�the�Y�axis�(the�Y�label�will�be�more�relevant�in�
later�chapters�when�we�have�a�second�variable�Y)��On�the�Y�axis�of�Figure�2�1,�we�have�
labeled�the�frequencies��Finally,�a�bar�is�drawn�for�each�eye�color�where�the�height�of�
the�bar�denotes�the�number�of�frequencies�for�that�particular�eye�color�(i�e�,�the�number�
of�times�that�particular�eye�color�was�observed�in�our�sample)��For�example,�the�height�
of�the�bar�for�the�blue-eyed�category�is�10�frequencies��Thus,�we�see�in�the�graph�which�
eye� color� is� most� popular� in� this� sample� (i�e�,� blue)� and� which� eye� color� occurs� least�
(i�e�,�black)�
Note�that�the�bars�are�separated�by�some�space�and�do�not�touch�one�another,�reflect-
ing�the�nature�of�nominal�data��As�there�are�no�intervals�or�real�limits�here,�we�do�not�
want� the� bars� to� touch� one� another�� One� could� also� plot� relative� frequencies� on� the� Y�
axis�to�reflect�the�percentage�of�children�in�the�sample�who�belong�to�each�category�of�
eye�color��Here�we�would�see�that�50%�of�the�children�had�blue�eyes,�30%�brown�eyes,�
15%�green�eyes,�and�5%�black�eyes��Another�method�for�displaying�nominal�data�graphi-
cally�is�the�pie�chart,�where�the�pie�is�divided�into�slices�whose�sizes�correspond�to�the�
frequencies� or� relative� frequencies� of� each� category�� However,� for� numerous� reasons�
(e�g�,�contains�little�information�when�there�are�few�categories;�is�unreadable�when�there�
are�many�categories;�visually�assessing�the�sizes�of�each�slice�is�difficult�at�best),�the�pie�
chart� is� statistically� problematic� such� that� Tufte� (1992)� states,� “the� only� worse� design�
than�a�pie�chart�is�several�of�them”�(p��178)��The�bar�graph�is�the�recommended�graphic�
for�nominal�data�
FIGuRe 2.1
Bar�graph�of�eye-color�data�
10
8
6
4
Fr
eq
ue
nc
y
2
Black Blue Brown
Eye color
Green

25Data Representation
2.2.2 histogram
A�method�somewhat�similar�to�the�bar�graph�that�is�appropriate�for�data�that�are�at�least�
ordinal�(i�e�,�ordinal,�interval,�or�ratio)�is�the�histogram��Because�the�data�are�at�least�theo-
retically�continuous�(even�though�they�may�be�measured�in�whole�numbers),�the�main�dif-
ference�in�the�histogram�(as�compared�to�the�bar�graph)�is�that�the�bars�touch�one�another,�
much�like�intervals�touching�one�another�as�real�limits��An�example�of�a�histogram�for�the�
statistics�quiz�data�is�shown�in�Figure�2�2�(SPSS�generated)��As�you�can�see,�along�the�X�axis�
we�plot�the�values�of�the�variable�X�and�along�the�Y�axis�the�frequencies�for�each�interval��
The�height�of�the�bar�again�corresponds�to�the�number�of�frequencies�for�a�particular�value�
of�X��This�figure�represents�an�ungrouped�histogram�as�the�interval�size�is�1��That�is,�along�
the�X�axis�the�midpoint�of�each�bar�is�the�midpoint�of�the�interval,�the�bar�begins�on�the�left�
at�the�lower�real�limit�of�the�interval,�the�bar�ends�on�the�right�at�the�upper�real�limit,�and�
the�bar�is�one�unit�wide��If�we�wanted�to�use�an�interval�size�of�2,�for�example,�using�the�
grouped�frequency�distribution�in�Table�2�3,�then�we�could�construct�a�grouped�histogram�
in�the�same�way;�the�differences�would�be�that�the�bars�would�be�two�units�wide,�and�the�
height�of�the�bars�would�obviously�change��Try�this�one�on�your�own�for�practice�
One� could� also� plot� relative� frequencies� on� the� Y� axis� to� reflect� the� percentage� of� stu-
dents�in�the�sample�whose�scores�fell�into�a�particular�interval��In�reality,�all�that�we�have�
to�change�is�the�scale�of�the�Y�axis��The�height�of�the�bars�would�remain�the�same��For�this�
particular�dataset,�each�frequency�corresponds�to�a�relative�frequency�of��04�
2.2.3 Frequency polygon
Another� graphical� method� appropriate� for� data� that� have� at� least� some� rank� order� (i�e�,�
ordinal,�interval,�or�ratio)�is�the�frequency polygon�(line�graph�in�SPSS�terminology)��A�poly-
gon�is�defined�simply�as�a�many-sided�figure��The�frequency�polygon�is�set�up�in�a�fashion�
5
4
3
2
Fr
eq
ue
nc
y
1
9 10 11 12 13 14 15
Quiz
16 17 18 19 20 FIGuRe 2.2
Histogram�of�statistics�quiz�data�

26 An Introduction to Statistical Concepts
similar�to�the�histogram��However,�rather�than�plotting�a�bar�for�each�interval,�points�are�
plotted�for�each�interval�and�then�connected�together�as�shown�in�Figure�2�3�(SPSS�gener-
ated)��The�axes�are�the�same�as�with�the�histogram��A�point�is�plotted�at�the�intersection�(or�
coordinates)�of�the�midpoint�of�each�interval�along�the�X�axis�and�the�frequency�for�that�
interval�along�the�Y�axis��Thus,�for�the�15�interval,�a�point�is�plotted�at�the�midpoint�of�the�
interval� 15�0�and�for�three� frequencies��Once� the�points� are� plotted� for� each� interval,� we�
“connect�the�dots�”
One�could�also�plot�relative�frequencies�on�the�Y�axis�to�reflect�the�percentage�of�students�
in�the�sample�whose�scores�fell�into�a�particular�interval��This�is�known�as�the�relative fre-
quency polygon��As�with�the�histogram,�all�we�have�to�change�is�the�scale�of�the�Y�axis��The�
position�of�the�polygon�would�remain�the�same��For�this�particular�dataset,�each�frequency�
corresponds�to�a�relative�frequency�of��04�
Note�also�that�because�the�histogram�and�frequency�polygon�each�contain�the�exact�same�
information,�Figures�2�2�and�2�3�can�be�superimposed�on�one�another��If�you�did�this,�you�
would�see�that�the�points�of�the�frequency�polygon�are�plotted�at�the�top�of�each�bar�of�the�
histogram��There�is�no�advantage�of�the�histogram�or�frequency�polygon�over�the�other;�how-
ever,�the�histogram�is�more�frequently�used�due�to�its�availability�in�all�statistical�software�
2.2.4 Cumulative Frequency polygon
Cumulative�frequencies�of�data�that�have�at�least�some�rank�order�(i�e�,�ordinal,�interval,�
or�ratio)�can�be�displayed�as�a�cumulative frequency polygon�(sometimes�referred�to�as�
the�ogive curve)��As�shown�in�Figure�2�4�(SPSS�generated),�the�differences�between�the�
frequency� polygon� and� the� cumulative� frequency� polygon� are� that� (a)� the� cumulative�
frequency� polygon� involves� plotting� cumulative� frequencies� along� the� Y� axis,� (b)� the�
points� should� be� plotted� at the upper real limit� of� each� interval� (although� SPSS� plots�
the points�at�the�interval�midpoints�by�default),�and�(c)�the�polygon�cannot�be�closed�on�
the�right-hand�side�
FIGuRe 2.3
Frequency�polygon�of�statistics�quiz�data�
5
Markers/lines show count
4
3
Fr
eq
ue
nc
y
2
1
0
9 10 11 12 13 14 15
Quiz
16 17 18 19 20

27Data Representation
Let�us�discuss�each�of�these�differences��First,�the�Y�axis�represents�the�cumulative�frequen-
cies�from�the�cumulative�frequency�distribution��The�X�axis�is�the�usual�set�of�raw�scores��
Second,�to�reflect�the�cumulative�nature�of�this�type�frequency,�the�points�must�be�plotted�at�
the�upper�real�limit�of�each�interval��For�example,�the�cumulative�frequency�for�the�16�inter-
val�is�12,�indicating�that�there�are�12�scores�in�that�interval�and�smaller��Finally,�the�polygon�
cannot� be� closed� on� the� right-hand� side�� Notice� that� as� you� move� from� left� to� right� in� the�
cumulative�frequency�polygon,�the�height�of�the�points�always�increases�or�stays�the�same��
Because� of� the� nature� of� accumulating� information,� there� will� never� be� a� decrease� in� the�
accumulation�of�the�frequencies��For�example,�there�is�an�increase�in�cumulative�frequency�
from�the�16�to�the�17�interval�as�five�new�frequencies�are�included��Beyond�the�20�interval,�the�
number�of�cumulative�frequencies�remains�at�25�as�no�new�frequencies�are�included�
One�could�also�plot�cumulative�relative�frequencies�on�the�Y�axis�to�reflect�the�percent-
age�of�students�in�the�sample�whose�scores�fell�into�a�particular�interval�and�smaller��This�
is� known� as� the� cumulative relative frequency polygon�� All� we� have� to� change� is� the�
scale� of� the� Y� axis� to� cumulative� relative� frequency�� The� position� of� the� polygon� would�
remain�the�same��For�this�particular�dataset,�each�cumulative�frequency�corresponds�to�a�
cumulative�relative�frequency�of��04��Thus,�a�cumulative�relative�frequency�polygon�of�the�
example�data�would�look�exactly�like�Figure�2�4;�except�on�the�Y�axis�we�plot�cumulative�
relative�frequencies�ranging�from�0�to�1�
2.2.5 Shapes of Frequency distributions
There�are�several�common�shapes�of�frequency�distributions�that�you�are�likely�to�encoun-
ter,� as� shown� in� Figure� 2�5�� These� are� briefly� described� here� and� more� fully� in� later�
chapters�� Figure� 2�5a� is� a� normal distribution� (or� bell-shaped� curve)� where� most� of� the�
scores�are�in�the�center�of�the�distribution�with�fewer�higher�and�lower�scores��The�normal�
distribution�plays�a�large�role�in�statistics,�both�for�descriptive�statistics�(as�we�show�begin-
ning�in�Chapter�4)�and�particularly�as�an�assumption�for�many�inferential�statistics�(as�we�
show�beginning�in�Chapter�6)��This�distribution�is�also�known�as�symmetric�because�if�we�
divide�the�distribution�into�two�equal�halves�vertically,�the�left�half�is�a�mirror�image�of�
the�right�half�(see�Chapter�4)��Figure�2�5b�is�a�positively skewed�distribution�where�most�
of�the�scores�are�fairly�low�and�there�are�a�few�higher�scores�(see�Chapter�4)��Figure�2�5c�is�
25
20
15
10
C
um
ul
at
iv
e
fr
eq
ue
nc
y
5
0
9 10 11 12 13 14 15
Quiz
16 17 18 19 20
FIGuRe 2.4
Cumulative� frequency� polygon� of�
statistics�quiz�data�

28 An Introduction to Statistical Concepts
a�negatively skewed�distribution�where�most�of�the�scores�are�fairly�high�and�there�are�a�
few�lower�scores�(see�Chapter�4)��Skewed�distributions�are�not�symmetric�as�the�left�half�is�
not�a�mirror�image�of�the�right�half�
2.2.6 Stem-and-leaf display
A�refined�form�of�the�grouped�frequency�distribution�is�the�stem-and-leaf display,�devel-
oped�by�John�Tukey�(1977)��This�is�shown�in�Figure�2�6�(SPSS�generated)�for�the�example�
statistics�quiz�data��The�stem-and-leaf�display�was�originally�developed�to�be�constructed�
on� a� typewriter� using� lines� and� numbers� in� a� minimal� amount� of� space�� In� a� way,� the�
f
x(a)
f
x(b)
f
x(c)
FIGuRe 2.5
Common�shapes�of�frequency�distributions:�(a)�normal,�(b)�positively�skewed,�and�(c)�negatively�skewed�
FIGuRe 2.6
Stem-and-leaf�display�of�statistics�quiz�data�
Quiz Stem-and-Leaf Plot
Frequency Stem and Leaf
1.00 0 . 9
7.00 1 . 0112334
16.00 1 . 5556777778889999
1.00 2 . 0
Stem width: 10.0
Each leaf: 1 case(s)

29Data Representation
stem-and-leaf�display�looks�like�a�grouped�type�of�histogram�on�its�side��The�vertical�value�
on�the�left�is�the�stem�and,�in�this�example,�represents�all�but�the�last�digit�(i�e�,�the�tens�digit)��
The� leaf� represents,� in� this� example,� the� remaining� digit� of� each� score� (i�e�,� the� unit’s�
digit)��Note�that�SPSS�has�grouped�values�in�increments�of�five��For�example,�the�second�
line�(“1�0112334”)�indicates�that�there�are�7�scores�from�10�to�14;�thus,�“1�0”�means�that�there�
is�one�frequency�for�the�score�of�10��The�fact�that�there�are�two�values�of�“1”�that�occur�in�
that�stem�indicates�that�the�score�of�11�occurred�twice��Interpreting�the�rest�of�this�stem,�we�
see�that�12�occurred�once�(i�e�,�there�is�only�one�2�in�the�stem),�13�occurred�twice�(i�e�,�there�
are�two�3s�in�the�stem),�and�14�occurred�once�(i�e�,�only�one�4�in�the�stem)��From�the�stem-
and-leaf�display,�one�can�determine�every�one�of�the�raw�scores;�this�is�not�possible�with�
a� typical� grouped� frequency� distribution� (i�e�,� no� information� is� lost� in� a� stem-and-leaf�
display)��However,�with�a�large�sample�the�display�can�become�rather�unwieldy��Consider�
what�a�stem-and-leaf�display�would�look�like�for�100,000�GRE�scores!
In�summary,�this�section�included�the�most�basic�types�of�statistical�graphics,�although�
more� advanced� graphics� are� described� in� later� chapters�� Note,� however,� that� there� are� a�
number�of�publications�on�how�to�properly�display�graphics,�that�is,�“how�to�do�graphics�
right�”�While�a�detailed�discussion�of�statistical�graphics�is�beyond�the�scope�of�this�text,�
the� following� publications� are� recommended:� Chambers,� Cleveland,� Kleiner,� and� Tukey�
(1983),�Schmid�(1983),�Wainer�(e�g�,�1984,�1992,�2000),�Tufte�(1992),�Cleveland�(1993),�Wallgren,�
Wallgren,�Persson,�Jorner,�and�Haaland�(1996),�Robbins�(2004),�and�Wilkinson�(2005)�
2.3 Percentiles
In�this�section,�we�consider�several�concepts�and�the�necessary�computations�for�the�area�
of�percentiles,�including�percentiles,�quartiles,�percentile�ranks,�and�the�box-and-whisker�
plot��For�instance,�you�might�be�interested�in�determining�what�percentage�of�the�distribu-
tion�of�the�GRE-Quantitative�subtest�fell�below�a�score�of�600�or�in�what�score�divides�the�
distribution�of�the�GRE-Quantitative�subtest�into�two�equal�halves�
2.3.1 percentiles
Let�us�define�a�percentile�as�that�score�below�which�a�certain�percentage�of�the�distribu-
tion�lies��For�instance,�you�may�be�interested�in�that�score�below�which�50%�of�the�distri-
bution�of�the�GRE-Quantitative�subscale�lies��Say�that�this�score�is�computed�as�480;�then�
this�would�mean�that�50%�of�the�scores�fell�below�a�score�of�480��Because�percentiles�are�
scores,�they�are�continuous�values�and�can�take�on�any�value�of�those�possible��The�30th�
percentile�could�be,�for�example,�the�score�of�387�6750��For�notational�purposes,�a�percen-
tile�will�be�known�as�Pi,�where�the�i�subscript�denotes�the�particular�percentile�of�interest,�
between�0�and�100��Thus,�the�30th�percentile�for�the�previous�example�would�be�denoted�
as�P30�=�387�6750�
Let�us�now�consider�how�percentiles�are�computed��The�formula�for�computing�the�Pi�
percentile�is
� P LRL
i n cf
f
wi = +
−



%( )
� (2�1)

30 An Introduction to Statistical Concepts
where
LRL�is�the�lower�real�limit�of�the�interval�containing�Pi
i%�is�the�percentile�desired�(expressed�as�a�proportion�from�0�to�1)
n�is�the�sample�size
cf� is� the� cumulative� frequency� less� than� but� not� including� the� interval� containing� Pi�
(known�as�cf�below)
f�is�the�frequency�of�the�interval�containing�Pi
w�is�the�interval�width
As� an� example,� consider� computing� the� 25th� percentile� of� our� statistics� quiz� data�� This�
would�correspond�to�that�score�below�which�25%�of�the�distribution�falls��For�the�example�
data�in�the�form�presented�in�Table�2�2,�using�Equation�2�1,�we�compute�P25�as�follows:
�
P LRL
i n cf
f
w25 12 5
25 25 5
2
1 12 5 0 625= +
−



= +
−



= +
%( )
.
%( )
. . == 13 125.
Conceptually,� let� us� discuss� how� the� equation� works�� First,� we� have� to� determine� what�
interval�contains�the�percentile�of�interest��This�is�easily�done�by�looking�in�the�crf�column�
of�the�frequency�distribution�for�the�interval�that�contains�a�crf�of��25�somewhere�within�
the� interval�� We� see� that� for� the� 13� interval� the� crf� =� �28,� which� means� that� the� interval�
spans�a�crf�of��20�(the�URL�of�the�12�interval)�up�to��28�(the�URL�of�the�13�interval)�and�thus�
contains��25��The�next�largest�interval�of�14�takes�us�from�a�crf�of��28�up�to�a�crf�of��32�and�
thus�is�too�large�for�this�particular�percentile��The�next�smallest�interval�of�12�takes�us�from�
a�crf�of��16�up�to�a�crf�of��20�and�thus�is�too�small��The�LRL�of�12�5�indicates�that�P25�is�at�least�
12�5��The�rest�of�the�equation�adds�some�positive�amount�to�the�LRL�
Next�we�have�to�determine�how�far�into�that�interval�we�need�to�go�in�order�to�reach�the�
desired�percentile��We�take�i�percent�of�n,�or�in�this�case�25%�of�the�sample�size�of�25,�which�is�
6�25��So�we�need�to�go�one-fourth�of�the�way�into�the�distribution,�or�6�25�scores,�to�reach�the�
25th�percentile��Another�way�to�think�about�this�is,�because�the�scores�have�been�rank-ordered�
from�lowest�or�smallest�(top�of�the�frequency�distribution)�to�highest�or�largest�(bottom�of�the�
frequency�distribution),�we�need�to�go�25%,�or�6�25�scores,�into�the�distribution�from�the�top�
(or�smallest�value)�to�reach�the�25th�percentile��We�then�subtract�out�all�cumulative�frequen-
cies�smaller�than�(or�below)�the�interval�we�are�looking�in,�where�cf�below�=�5��Again�we�just�
want�to�determine�how�far�into�this�interval�we�need�to�go,�and�thus,�we�subtract�out�all�of�
the�frequencies�smaller�than�this�interval,�or�cf�below��The�numerator�then�becomes�6�25�−�5�=�
1�25��Then�we�divide�by�the�number�of�frequencies�in�the�interval�containing�the�percentile�
we�are�looking�for��This�forms�the�ratio�of�how�far�into�the�interval�we�go��In�this�case,�we�
needed�to�go�1�25�scores�into�the�interval�and�the�interval�contains�2�scores;�thus,�the�ratio�is�
1�25/2�=��625��In�other�words,�we�need�to�go��625�unit�into�the�interval�to�reach�the�desired�
percentile��Now�that�we�know�how�far�into�the�interval�to�go,�we�need�to�weigh�this�by�the�
width�of�the�interval��Here�we�need�to�go�1�25�scores�into�an�interval�containing�2�scores�that�
is�1�unit�wide,�and�thus,�we�go��625�unit�into�the�interval�[(1�25/2)�1�=��625]��If�the�interval�width�
was�instead�10,�then�1�25�scores�into�the�interval�would�be�equal�to�6�25�units�
Consider�two�more�worked�examples�to�try�on�your�own,�either�through�statistical�software�
or�by�hand��The�50th�percentile,�P50,�is
�
P50 16 5
50 25 12
5
1 16 5 0 100 16 600= +
−



= + =.
%( )
. . .

31Data Representation
while�the�75th�percentile,�P75,�is
�
P75 17 5
75 25 17
3
1 17 5 0 583 18 083= +
−



= + =.
%( )
. . .
We�have�only�examined�a�few�example�percentiles�of�the�many�possibilities�that�exist��For�
example,�we�could�also�have�determined�P55�5�or�even�P99�5��Thus,�we�could�determine�any�
percentile,�in�whole�numbers�or�decimals,�between�0�and�100��Next�we�examine�three�par-
ticular�percentiles�that�are�often�of�interest,�the�quartiles�
2.3.2 Quartiles
One�common�way�of�dividing�a�distribution�of�scores�into�equal�groups�of�scores�is�known�
as�quartiles��This�is�done�by�dividing�a�distribution�into�fourths�or�quartiles�where�there�are�
four�equal�groups,�each�containing�25%�of�the�scores��In�the�previous�examples,�we�deter-
mined�P25,�P50,�and�P75,�which�divided�the�distribution�into�four�equal�groups,�from�0�to�25,�
from�25�to�50,�from�50�to�75,�and�from�75�to�100��Thus,�the�quartiles�are�special�cases�of�per-
centiles��A�different�notation,�however,�is�often�used�for�these�particular�percentiles�where�
we�denote�P25�as�Q1,�P50�as�Q2,�and�P75�as�Q3��Thus,�the�Qs�represent�the�quartiles�
An�interesting�aspect�of�quartiles�is�that�they�can�be�used�to�determine�whether�a�distri-
bution�of�scores�is�positively�or�negatively�skewed��This�is�done�by�comparing�the�values�of�
the�quartiles�as�follows��If�(Q3�−�Q2)�>�(Q2�−�Q1),�then�the�distribution�of�scores�is�positively�
skewed� as� the� scores� are� more� spread� out� at� the� high� end� of� the� distribution� and� more�
bunched�up�at�the�low�end�of�the�distribution�(remember�the�shapes�of�the�distributions�
from�Figure�2�5)��If�(Q3�−�Q2)�<�(Q2�−�Q1),�then�the�distribution�of�scores�is�negatively�skewed� as�the�scores�are�more�spread�out�at�the�low�end�of�the�distribution�and�more�bunched�up� at�the�high�end�of�the�distribution��If�(Q3�−�Q2)�=�(Q2�−�Q1),�then�the�distribution�of�scores� is�obviously�not�skewed,�but�is�symmetric�(see�Chapter�4)��For�the�example�statistics�quiz� data,�(Q3�−�Q2)�=�1�4833�and�(Q2�−�Q1)�=�3�4750;�thus,�(Q3�−�Q2)�<�(Q2�−�Q1)�and�we�know� that� the� distribution� is� negatively� skewed�� This� should� already� have� been� evident� from� examining� the� frequency� distribution� in� Figure� 2�3� as� scores� are� more� spread� out� at� the� low�end�of�the�distribution�and�more�bunched�up�at�the�high�end��Examining�the�quartiles� is�a�simple�method�for�getting�a�general�sense�of�the�skewness�of�a�distribution�of�scores� 2.3.3 percentile Ranks Let�us�define�a�percentile rank�as�the�percentage�of�a�distribution�of�scores�that�falls�below� (or�is�less�than)�a�certain� score��For�instance,�you�may�be�interested� in�the�percentage� of� scores�of�the�GRE-Quantitative�subscale�that�falls�below�the�score�of�480��Say�that�the�per- centile�rank�for�the�score�of�480�is�computed�to�be�50;�then�this�would�mean�that�50%�of� the�scores�fell�below�a�score�of�480��If�this�sounds�familiar,�it�should��The�50th�percentile� was� previously� stated� to� be� 480�� Thus,� we� have� logically� determined� that� the� percentile� rank�of�480�is�50��This�is�because�percentile�and�percentile�rank�are�actually�opposite�sides� of�the�same�coin��Many�are�confused�by�this�and�equate�percentiles�and�percentile�ranks;� however,� they� are� related� but� different� concepts�� Recall� earlier� we� said� that� percentiles� were�scores��Percentile�ranks�are�percentages,�as�they�are�continuous�values�and�can�take� on�any�value�from�0�to�100��The�score�of�400�can�have�a�percentile�rank�of�42�6750��For�nota- tional�purposes,�a�percentile�rank�will�be�known�as�PR(Pi),�where�Pi�is�the�particular�score� 32 An Introduction to Statistical Concepts whose�percentile�rank,�PR,�you�wish�to�determine��Thus,�the�percentile�rank�of�the�score� 400�would�be�denoted�as�PR(400)�=�42�6750��In�other�words,�about�43%�of�the�distribution� falls�below�the�score�of�400� Let�us�now�consider�how�percentile�ranks�are�computed��The�formula�for�computing�the� PR(Pi)�percentile�rank�is � PR P cf f P LRL w n i i ( ) ( ) %= + −          100 � (2�2) where PR(Pi)�indicates�that�we�are�looking�for�the�percentile�rank�PR�of�the�score�Pi cf� is� the� cumulative� frequency� up� to� but� not� including� the� interval� containing� PR(Pi)� (again�known�as�cf�below) f�is�the�frequency�of�the�interval�containing�PR(Pi) LRL�is�the�lower�real�limit�of�the�interval�containing�PR(Pi) w�is�the�interval�width n�is�the�sample�size,�and�finally�we�multiply�by�100%�to�place�the�percentile�rank�on�a� scale�from�0�to�100�(and�also�to�remind�us�that�the�percentile�rank�is�a�percentage) As�an�example,�consider�computing�the�percentile�rank�for�the�score�of�17��This�would�cor- respond�to�the�percentage�of�the�distribution�that�falls�below�a�score�of�17��For�the�example� data�again,�using�Equation�2�2,�we�compute�PR(17)�as�follows: � PR( ) ( . ) % . %17 12 5 17 16 5 1 25 100 12 2 5 25 100 5= + −          = +    = 88 00. % Conceptually,�let�us�discuss�how�the�equation�works��First,�we�have�to�determine�what�inter- val�contains�the�percentile�rank�of�interest��This�is�easily�done�because�we�already�know�the� score�is�17�and�we�simply�look�in�the�interval�containing�17��The�cf�below�the�17�interval�is� 12�and�n�is�25��Thus,�we�know�that�we�need�to�go�at�least�12/25,�or�48%,�of�the�way�into�the� distribution�to�obtain�the�desired�percentile�rank��We�know�that�Pi�=�17�and�the�LRL�of�that� interval�is�16�5��There�are�5�frequencies�in�that�interval,�so�we�need�to�go�2�5�scores�into� the�interval�to�obtain�the�proper�percentile�rank��In�other�words,�because�17�is�the�midpoint� of�an�interval�with�width�of�1,�we�need�to�go�halfway�or�2�5/5�of�the�way�into�the�interval� to�obtain�the�percentile�rank��In�the�end,�we�need�to�go�14�5/25�(or��58)�of�the�way�into�the� distribution�to�obtain�our�percentile�rank,�which�translates�to�58%� As� another� example,� we� have� already� determined� that� P50� =� 16�6000�� Therefore,� you� should�be�able�to�determine�on�your�own�that�PR(16�6000)�=�50%��This�verifies�that�percen- tiles�and�percentile�ranks�are�two�sides�of�the�same�coin��The�computation�of�percentiles� identifies� a� specific� score,� and� you� start� with� the� score� to� determine� the� score’s� percen- tile� rank�� You� can� further� verify� this� by� determining� that� PR(13�1250)� =� 25�00%� and� PR(18�0833)� =� 75�00%�� Next� we� consider� the� box-and-whisker� plot,� where� quartiles� and� percentiles�are�used�graphically�to�depict�a�distribution�of�scores� 33Data Representation 2.3.4 box-and-Whisker plot A�simplified�form�of�the�frequency�distribution�is�the�box-and-whisker plot�(often�referred� to� simply� as� a� “box� plot”),� developed� by� John� Tukey� (1977)�� This� is� shown� in� Figure� 2�7� (SPSS�generated)�for�the�example�data��The�box-and-whisker�plot�was�originally�developed� to�be�constructed�on�a�typewriter�using�lines�in�a�minimal�amount�of�space��The�box�in� the� center� of� the� figure� displays� the� middle� 50%� of� the� distribution� of� scores�� The� left- hand�edge�or�hinge�of�the�box�represents�the�25th�percentile�(or�Q1)��The�right-hand�edge� or�hinge�of�the�box�represents�the�75th�percentile�(or�Q3)��The�middle�vertical�line�in�the� box�represents�the�50th�percentile�(or�Q2)��The�lines�extending�from�the�box�are�known�as� the�whiskers��The�purpose�of�the�whiskers�is�to�display�data�outside�of�the�middle�50%�� The�left-hand�whisker�can�extend�down�to�the�lowest�score�(as�is�the�case�with�SPSS),�or� to�the�5th�or�the�10th�percentile�(by�other�means),�to�display�more�extreme�low�scores,�and� the�right-hand� whisker� correspondingly� can� extend� up� to�the�highest� score�(SPSS),� or�to� the� 95th� or� 90th� percentile� (elsewhere),� to� display� more� extreme� high� scores�� The� choice� of�where�to�extend�the�whiskers�is�the�preference�of�the�researcher�and/or�the�software�� Scores�that�fall�beyond�the�end�of�the�whiskers,�known�as�outliers�due�to�their�extreme- ness�relative�to�the�bulk�of�the�distribution,�are�often�displayed�by�dots�and/or�asterisks�� Box-and-whisker�plots�can�be�used�to�examine�such�things�as�skewness�(through�the�quar- tiles),�outliers,�and�where�most�of�the�scores�tend�to�fall� 2.4 SPSS The�purpose�of�this�section�is�to�briefly�consider�applications�of�SPSS�for�the�topics�covered� in�this�chapter�(including�important�screenshots)��The�following�SPSS�procedures�will�be� illustrated:�“Frequencies”�and�“Graphs.” 8 10 12 14 16 18 20 Q ui z FIGuRe 2.7 Box-and-whisker�plot�of�statistics�quiz�data� 34 An Introduction to Statistical Concepts Frequencies Frequencies: Step 1.�For�the�types�of�tables�discussed�in�this�chapter,�in�SPSS�go�to� “Analyze”� in� the� top� pulldown� menu,� then�“Descriptive Statistics,”� and� then� select� “Frequencies.”� Following� the� screenshot� for� “Frequencies: Step 1”� will� produce�the�“Frequencies”�dialog�box� � A B C Frequencies: Step 1 Stem and leaf plots (and many other statistics) can be generated using the “Explore” program. Frequencies: Step 2.�The�“Frequencies”�dialog�box�will�open�(see�screenshot�for� “Frequencies: Step 2”)��From�this�main�“Frequencies”�dialog�box,�click�the�vari- able�of�interest�from�the�list�on�the�left�(e�g�,�quiz)�and�move�it�into�the�“Variables”�box� by�clicking�on�the�arrow�button��By�default,�there�is�a�checkmark�in�the�box�for�“Display frequency tables,”�and�we�will�keep�this�checked��This�(i�e�,�selecting�“Display fre- quency tables”)�will�generate�a�table�of�frequencies,�relative�frequencies,�and�cumula- tive�relative�frequencies��There�are�three�buttons�on�the�right�side�of�the�“Frequencies”� dialog�box�(“Statistics,” “Charts,” and “Format”)��Let�us�first�cover�the�options� available�through�“Statistics.” Select the variable of interest from the list on the left and use the arrow to move to the “Variable” box on the right. �is is checked by default and will produce a frequency distribution table in the output. Clicking on these options will allow you to select various statistics and graphs. Frequencies: Step 2 35Data Representation Frequencies: Step 3a. If� you� click� on� the� “Statistics”� button� from� the� main� “Frequencies”�dialog�box�(see�“Frequencies: Step 2”),�a�new�box�labeled�“Frequencies: Statistics”�will�appear�(see�screenshot�for�“Frequencies: Step 3a”)��From�here,�you�can� obtain�quartiles�and�selected�percentiles�as�well�as�numerous�other�descriptive�statistics�simply� by�placing�a�checkmark�in�the�boxes�for�the�statistics�that�you�want�to�generate��For�better�accu- racy�when�generating�the�median,�quartiles,�and�percentiles,�check�the�box�for�“Values are group midpoints.”�However,�it�should�be�noted�that�these�values�are�not�always�as�precise� as�those�from�the�formula�given�earlier�in�this�chapter� Check this for better accuracy with the median, quartiles and percentiles. Options available when clicking on “Statistics” from the main dialog box for Frequencies. Placing a checkmark will generate the respective statistic in the output. Frequencies: Step 3a Frequencies: Step 3b.�If�you�click�on�the�“Charts”�button�from�the�main�“Frequencies”� dialog�box�(see�screenshot�for�“Frequencies: Step 2”),�a�new�box�labeled�“Frequencies: Charts”�will�appear�(see�screenshot�for�“Frequencies: Step 3b”)��From�here,�you�can� select� options� to� generate� bar� graphs,� pie� charts,� or� histograms�� If� you� select� bar� graphs� or� pie� charts,� you� can� plot� either� frequencies� or� percentages� (relative� frequencies)�� Thus,� the� “Frequencies”�program�enables�you�to�do�much�of�what�this�chapter�has�covered��In�addi- tion,� stem-and-leaf� plots� are� available� in� the�“Explore”� program� (see�“Frequencies: Step 1”�for�a�screenshot�on�where�the�“Explore”�program�can�be�accessed)� Options available when clicking on “Charts” from the main dialog box for frequencies. Frequencies: Step 3b 36 An Introduction to Statistical Concepts Graphs There�are�multiple�graphs�that�can�be�generated�in�SPSS��We�will�examine�how�to�generate� histograms,�boxplots,�bar�graphs,�and�more�using�the�“Graphs”�procedure�in�SPSS� Histograms Histograms: Step 1.�For�other�ways�to�generate�the�types�of�graphical�displays�covered� in�this�chapter,�go�to�“Graphs”�in�the�top�pulldown� menu��From� there,�select�“Legacy Dialogs,”�then�“Histogram”�(see�screenshot�for�“Graphs: Step 1”)��Another�option� for�creating�a�histogram,�although�not�shown�here,�starts�again�from�the�“Graphs”�option� in� the� top� pulldown� menu,� where� you� select�“Legacy Dialogs,”� then�“Graphboard Template Chooser,”�and�finally�“Histogram.” Options available when clicking on “Legacy Dialogs” from the main pulldown menu for graphs. Graphs: Step 1 A B C Histograms: Step 2.�This�will�bring�up�the�“Histogram”�dialog�box�(see�screenshot� for�“Histograms: Step 2”)��Click�the�variable�of�interest�(e�g�,�quiz)�and�move�it�into�the� “Variable(s)”�box�by�clicking�on�the�arrow��Place�a�checkmark�in�“Display normal curve,”� and� then� click�“OK.”� This� will� generate� the� same� histogram� as� was� produced� through�the�“Frequencies”�program�already�mentioned� Histograms: Step 2 37Data Representation Boxplots Boxplots: Step 1.�To�produce�a�boxplot�for�individual�variables,�click�on�“Graphs”� in�the�top�pulldown�menu��From�there,�select�“Legacy Dialogs,”�then�“Boxplot”� (see� “GRAPHS: Step 1”� for� screenshot� of� this� step)�� Another� option� for� creating� a� boxplot� (although� not� shown� here)� starts� again� from� the� “Graphs”� option� in� the� top� pulldown� menu,� where� you� select� “Graphboard Template chooser,”� then� “Boxplots.” Boxplots: Step 2.� This� will� bring� up� the� “Boxplot”� dialog� box� (see� screenshot� for�“Boxplots: Step 2”)��Select�the�“Simple”�option�(by�default,�this�will�already�be� selected)��To�generate�a�separate�boxplot�for�individual�variables,�click�on�the�“Summaries of separate variables”�radio�button��Then�click�“Define.” Boxplots: Step 2 Boxplots: Step 3.�This�will�bring�up�the�“Define Simple Boxplot: Summaries of Separate Variables”�dialog�box�(see�screenshot�for�“Boxplots: Step 3”)��Click� the�variable�of�interest�(e�g�,�quiz)�into�the�“Variable(s)”�box��Then�click�“OK.”�This�will� generate�a�boxplot� Boxplots: Step 3 38 An Introduction to Statistical Concepts Bar Graphs Bar Graphs: Step 1.� To� produce� a� bar� graph� for� individual� variables,� click� on� “Graphs”�in�the�top�pulldown�menu��From�there,�select�“Legacy Dialogs,”�then�“Bar”� (see�“Graphs: Step 1”�for�screenshot�of�this�step)� Bar Graphs: Step 2.�From�the�main�“Bar Chart”�dialog�box,�select�“Simple”�(which� will�be�selected�by�default)�and�click�on�the�“Summaries for groups of cases”�radio� button�(see�screenshot�for�“Bar Graphs: Step 2”)� Bar graphs: Step 2 Bar Graphs: Step 3.�A�new�box�labeled�“Define Simple Bar: Summaries for Groups of Cases”�will�appear��Click�the�variable�of�interest�(e�g�,�eye�color)�and�move� it�into�the�“Variable”�box�by�clicking�the�arrow�button��Then�a�decision�must�be�made� for�how�the�bars�will�be�displayed��Several�types�of�displays�for�bar�graph�data�are�avail- able,�including�“N of cases”�for�frequencies,�“cum. N”�for�cumulative�frequencies,� “% of cases”�for�relative�frequencies,�and�“cum. %”�for�cumulative�relative�frequen- cies�(see�screenshot�for�“Bar Graphs: Step 3”)��Additionally,�other�statistics�can�be� selected�through�the�“Other statistic (e.g., mean)”�option��The�most�common� bar�graph�is�one�which�simply�displays�the�frequencies�(i�e�,�selecting�the�radio�button� for�“N of cases”)��Once�your�selections�are�made,�click�“OK.”�This�will�generate�a� bar�graph� 39Data Representation When “Other statistic (e.g., mean)” is selected, a dialog box (shown here as “Statistic”) will appear. All other statistics that can be represented by the bars in the graph are listed. Clicking on the radio button will select the statistic. Once the selection is made, click on “Continue” to return to the “Define Simple:Summaries for Groups of Cases” dialog box. Bar graphs: Step 3 Frequency Polygons Frequency Polygons: Step 1.� Frequency� polygons� can� be� generated� by� clicking� on�“Graphs”� in� the� top� pulldown� menu�� From� there,� select�“Legacy Dialogs,”� then� “Line”�(see�“Graphs: Step 1”�for�a�screenshot�of�this�step)� Frequency Polygons: Step 2.�From�the�main�“Line Charts”�dialog�box,�select� “Simple”�(which�will�be�selected�by�default)�and�click�on�the�“Summaries for groups of cases”�(which�will�be�selected�by�default)�radio�button�(see�screenshot�for�“Frequency Polygons: Step 2”)� 40 An Introduction to Statistical Concepts Frequency polygons: Step 2 Frequency Polygons: Step 3.�A�new�box�labeled�“Define Simple Line: Summaries for Groups of Cases”� will� appear�� Click� the� variable� of� interest� (e�g�,� quiz)�and�move�it�into�the�“Variable”�box�by�clicking�the�arrow�button��Then�a�decision� must�be�made�for�how�the�lines�will�be�displayed��Several�types�of�displays�for�line�graph� (i�e�,� frequency� polygon)� data� are� available,� including� “N of cases”� for� frequencies,� “cum. N”�for�cumulative�frequencies,�“% of cases”�for�relative�frequencies,�and�“cum. %”�for�cumulative�relative�frequencies�(see�screenshot�for�“Frequency Polygons: Step 3”)��Additionally,�other�statistics�can�be�selected�through�the�“Other statistic (e.g., mean)”� option�� The� most� common� frequency� polygon� is� one� which� simply� displays� the� frequencies�(i�e�,�selecting�the�radio�button�for�“N of cases”)��Once�your�selections�are� made,�click�“OK.”�This�will�generate�a�frequency�polygon� When “Other statistic (e.g., mean)” is selected, a dialog box (shown here as “Statistic”) will appear. All other statistics that can be represented by the bars in the graph are listed. Clicking on the radio button will select the statistic. Once the selection is made, click on “Continue” to return to the “Define Simple: Summaries for Groups of Cases” dialog box. Frequency polygons: Step 3 41Data Representation Editing Graphs Once�a�graph�or�table�is�created,�double�clicking�on�the�table�or�graph�produced�in�the�out- put�will�allow�the�user�to�make�changes�such�as�changing�the�X�and/or�Y�axis,�colors,�and� more��An�illustration�of�the�options�available�in�chart�editor�is�presented�here� 5 4 3 Fr eq ue nc y 2 1 0 9.0 12.0 15.0 Quiz 18.0 21.0 Mean = 15.56 Std. Dev. = 3.163 N = 25 Chart editor 2.5 Templates for Research Questions and APA-Style Paragraph Depending�on�the�purpose�of�your�research�study,�you�may�or�may�not�write�a�research� question�that�corresponds�to�your�descriptive�statistics��If�the�end�result�of�your�research�paper� is� to� present� results� from� inferential� statistics,� it� may� be� that� your� research� questions� correspond�only�to�those�inferential�questions�and�thus�no�question�is�presented�to�rep- resent�the�descriptive�statistics��That�is�quite�common��On�the�other�hand,�if�the�ultimate� purpose�of�your�research�study�is�purely�descriptive�in�nature,�then�writing�one�or�more� research�questions�that�correspond�to�the�descriptive�statistics�is�not�only�entirely�appro- priate� but� (in� most� cases)� absolutely� necessary�� At� this� time,� let� us� revisit� our� gradu- ate� research� assistant,� Marie,� who� was� introduced� at� the� beginning� of� the� chapter�� As� you� may� recall,� her� task� was� to� summarize� data� from� 25� students� enrolled� in� a� statis- tics�course��The�questions�that�Marie’s�faculty�mentor�shared�with�her�were�as�follows:� How can the quiz scores of students enrolled in an introductory 42 An Introduction to Statistical Concepts statistics class be graphically represented in a table? In a figure? What is the distributional shape of the statistics quiz score? What is the 50th percentile of the quiz scores?�A�template�for�writing�descriptive� research�questions�for�summarizing�data�may�be�as�follows��Please�note�that�these�are� just�a�few�examples��Given�the�multitude�of�descriptive�statistics�that�can�be�generated,� these�are�not�meant�to�be�exhaustive� How can [variable] be graphically represented in a table? In a figure? What is the distributional shape of the [variable]? What is the 50th percentile of [variable]? Next,�we�present�an�APA-like�paragraph�summarizing�the�results�of�the�statistics�quiz�data� example� As shown in Table 2.2 and Figure 2.2, scores ranged from 9 to 20, with more students achieving a score of 17 than any other score (20%). From Figure 2.2, we also know that the distribution of scores was negatively skewed, with the bulk of the scores being at the high end of the distribution. Skewness was also evident as the quartiles were not equally spaced, as shown in Figure 2.7. Thus, overall the sample of students tended to do rather well on this particular quiz (must have been the awesome teach- ing), although a few low scores should be troubling (as 20% did not pass the quiz and need some remediation). 2.6 Summary In�this�chapter,�we�considered�both�tabular�and�graphical�methods�for�representing�data�� First,� we� discussed� the� tabular� display� of� distributions� in� terms� of� frequency� distribu- tions� (ungrouped� and� grouped),� cumulative� frequency� distributions,� relative� frequency� distributions,�and�cumulative�relative�frequency�distributions��Next,�we�examined�various� methods�for�depicting�data�graphically,�including�bar�graphs,�histograms�(ungrouped�and� grouped),� frequency� polygons,� cumulative� frequency� polygons,� shapes� of� distributions,� and� stem-and-leaf� displays�� Then,� concepts� and� procedures� related� to� percentiles� were� covered,� including� percentiles,� quartiles,� percentile� ranks,� and� box-and-whisker� plots�� Finally,� an� overview� of� SPSS� for� these� procedures� was� included,� as� well� as� a� summary� APA-style�paragraph�of�the�quiz�dataset��We�include�Box�2�1�as�a�summary�of�which�data� representation� techniques� are� most� appropriate� for� each� type� of� measurement� scale�� At� this�point,�you�should�have�met�the�following�objectives:�(a)�be�able�to�construct�and�inter- pret�statistical�tables,�(b)�be�able�to�construct�and�interpret�statistical�graphs,�and�(c)�be�able� to�determine�and�interpret�percentile-related�information��In�the�next�chapter,�we�address� the�major�population�parameters�and�sample�statistics�useful�for�looking�at�a�single�vari- able��In�particular,�we�are�concerned�with�measures�of�central�tendency�and�measures�of� dispersion� 43Data Representation STOp aNd ThINk bOx 2.1 Appropriate�Data�Representation�Techniques Measurement Scale Tables Figures Nominal •�Frequency�distribution •�Bar�graph •�Relative�frequency�distribution Ordinal,�interval,�or�ratio •�Frequency�distribution •�Histogram •��Cumulative�frequency� distribution •�Relative�frequency�distribution •��Cumulative�relative�frequency� distribution •�Frequency�polygon •�Relative�frequency�polygon •�Cumulative�frequency�polygon •��Cumulative�relative�frequency� polygon •�Stem-and-leaf�display �•�Box-and-whisker�plot Problems Conceptual problems 2.1� For�a�distribution�where�the�50th�percentile�is�100,�what�is�the�percentile�rank�of�100? � a�� 0 � b�� 50 � c�� 50 � d�� 100 2.2� Which�of�the�following�frequency�distributions�will�generate�the�same�relative�fre- quency�distribution? X f Y f Z f 100 2 100 6 100 8 99 5 99 15 99 18 98 8 98 24 98 28 97 5 97 15 97 18 96 2 96 6 96 8 � a�� X�and�Y�only � b�� X�and�Z�only � c�� Y�and�Z�only � d�� X,�Y,�and�Z � e�� None�of�the�above 44 An Introduction to Statistical Concepts 2.3� Which� of� the� following� frequency� distributions� will� generate� the� same� cumulative� relative�frequency�distribution? X f Y f Z f 100 2 100 6 100 8 99 5 99 15 99 18 98 8 98 24 98 28 97 5 97 15 97 18 96 2 96 6 96 8 � a�� X�and�Y�only � b�� X�and�Z�only � c�� Y�and�Z�only � d�� X,�Y,�and�Z � e�� None�of�the�above 2.4� In�a�histogram,�48%�of�the�area�lies�below�the�score�whose�percentile�rank�is�52��True� or�false? 2.5� Among�the�following,�the�preferred�method�of�graphing�data�pertaining�to�the�eth- nicity�of�a�sample�would�be � a�� A�histogram � b�� A�frequency�polygon � c�� A�cumulative�frequency�polygon � d�� A�bar�graph 2.6� The�proportion�of�scores�between�Q1�and�Q3�may�be�less�than��50��True�or�false? 2.7� The�values�of�Q1,�Q2,�and�Q3�in�a�positively�skewed�population�distribution�are�calcu- lated��What�is�the�expected�relationship�between�(Q2�−�Q1)�and�(Q3�−�Q2)? � a�� (Q2�−�Q1)�is�greater�than�(Q3�−�Q2)� � b�� (Q2�−�Q1)�is�equal�to�(Q3�−�Q2)� � c�� (Q2�−�Q1)�is�less�than�(Q3�−�Q2)� � d�� Cannot�be�determined�without�examining�the�data� 2.8� If�the�percentile�rank�of�a�score�of�72�is�65,�we�may�say�that�35%�of�the�scores�exceed� 72��True�or�false? 2.9� In�a�negatively�skewed�distribution,�the�proportion�of�scores�between�Q1�and�Q2�is� less�than��25��True�or�false? 2.10� A� group� of� 200� sixth-grade� students� was� given� a� standardized� test� and� obtained� scores�ranging�from�42�to�88��If�the�scores�tended�to�“bunch�up”�in�the�low�80s,�the� shape�of�the�distribution�would�be�which�one�of�the�following: � a�� Symmetrical � b�� Positively�skewed � c�� Negatively�skewed � d�� Normal 45Data Representation 2.11� The�preferred�method�of�graphing�data�on�the�eye�color�of�a�sample�is�which�one�of� the�following? � a�� Bar�graph � b�� Frequency�polygon � c�� Cumulative�frequency�polygon � d�� Relative�frequency�polygon 2.12� If�Q2�=�60,�then�what�is�P50? � a�� 50 � b�� 60 � c�� 95 � d�� Cannot�be�determined�with�the�information�provided 2.13� With�the�same�data�and�using�an�interval�width�of�1,�the�frequency�polygon�and�his- togram�will�display�the�same�information��True�or�false? 2.14� A�researcher�develops�a�histogram�based�on�an�interval�width�of�2��Can�she�recon- struct�the�raw�scores�using�only�this�histogram?�Yes�or�no? 2.15� Q2�=�50�for�a�positively�skewed�variable,�and�Q2�=�50�for�a�negatively�skewed�variable�� I�assert�that�Q1�will�not�necessarily�be�the�same�for�both�variables��Am�I�correct?�True� or�false? 2.16� Which�of�the�following�statements�is�correct�for�a�continuous�variable? � a�� The�proportion�of�the�distribution�below�the�25th�percentile�is�75%� � b�� The�proportion�of�the�distribution�below�the�50th�percentile�is�25%� � c�� The�proportion�of�the�distribution�above�the�third�quartile�is�25%� � d�� The�proportion�of�the�distribution�between�the�25th�and�75th�percentiles�is 25%� 2.17� For�a�dataset�with�four�unique�values�(55,�70,�80,�and�90),�the�relative�frequency�for�the� value�55�is�20%,�the�relative�frequency�for�70�is�30%,�the�relative�frequency�for�80�is�20%,� and�the�relative�frequency�for�90�is�30%��What�is�the�cumulative�relative�frequency�for� the�value�70? � a�� 20% � b�� 30% � c�� 50% � d�� 100% 2.18� In�examining�data�collected�over�the�past�10�years,�researchers�at�a�theme�park�find� the�following�for�5000�first-time�guests:�2250�visited�during�the�summer�months;� 675� visited� during� the� fall;� 1300� visited� during� the� winter;� and� 775� visited� dur- ing� the� spring�� What� is� the� relative� frequency� for� guests� who� visited� during� the� spring? � a�� 135 � b�� 155 � c�� 260 � d�� 450 46 An Introduction to Statistical Concepts Computational problems 2.1� The�following�scores�were�obtained�from�a�statistics�exam: 47 50 47 49 46 41 47 46 48 44 46 47 45 48 45 46 50 47 43 48 47 45 43 46 47 47 43 46 42 47 49 44 44 50 41 45 47 44 46 45 42 47 44 48 49 43 45 49 49 46 Using�an�interval�size�of�1,�construct�or�compute�each�of�the�following: � a�� Frequency�distribution � b�� Cumulative�frequency�distribution � c�� Relative�frequency�distribution � d�� Cumulative�relative�frequency�distribution � e� Histogram�and�frequency�polygon � f� Cumulative�frequency�polygon � g�� Quartiles � h�� P10�and�P90 � i�� PR(41)�and�PR(49�5) � j�� Box-and-whisker�plot � k�� Stem-and-leaf�display 2.2� The�following�data�were�obtained�from�classroom�observations�and�reflect�the�num- ber�of�incidences�that�preschool�children�shared�during�an�8�hour�period� 4 8 10 5 12 10 14 5 10 14 12 14 8 5 0 8 12 8 12 5 4 10 8 5 Using�an�interval�size�of�1,�construct�or�compute�each�of�the�following: � a�� Frequency�distribution � b�� Cumulative�frequency�distribution � c�� Relative�frequency�distribution � d�� Cumulative�relative�frequency�distribution � e�� Histogram�and�frequency�polygon � f�� Cumulative�frequency�polygon � g�� Quartiles � h�� P10�and�P90 � i�� PR(10) � j�� Box-and-whisker�plot � k�� Stem-and-leaf�display 47Data Representation 2.3� A�sample�distribution�of�variable�X�is�as�follows: X f 2 1 3 2 4 5 5 8 6 4 7 3 8 4 9 1 10 2 Calculate�or�draw�each�of�the�following�for�the�sample�distribution�of�X: � a�� Q1 � b�� Q2 � c�� Q3 � d�� P44�5 � e�� PR(7�0) � f�� Box-and-whisker�plot � g�� Histogram�(ungrouped) 2.4� A�sample�distribution�of�classroom�test�scores�is�as�follows: X f 70 1 75 2 77 3 79 2 80 6 82 5 85 4 90 4 96 3 Calculate�or�draw�each�of�the�following�for�the�sample�distribution�of�X: � a�� Q1 � b�� Q2 � c�� Q3 � d�� P44�5 � e�� PR(82) � f�� Box-and-whisker�plot � g�� Histogram�(ungrouped) 48 An Introduction to Statistical Concepts Interpretive problems Select�two�variables�from�the�survey1�dataset�on�the�website,�one�that�is�nominal�and�one� that�is�not� 2.1� �Write� research� questions� that� will� be� answered� from� these� data� using� descriptive� statistics�(you�may�want�to�review�the�research�question�template�in�this�chapter)� 2.2� �Construct�the�relevant�tables�and�figures�to�answer�the�questions�you�posed� 2.3� �Write�a�paragraph�which�summarizes�the�findings�for�each�variable�(you�may�want� to�review�the�writing�template�in�this�chapter)� 49 3 Univariate Population Parameters and Sample Statistics Chapter Outline 3�1� Summation�Notation 3�2� Measures�of�Central�Tendency 3�2�1� Mode 3�2�2� Median 3�2�3� Mean 3�2�4� Summary�of�Measures�of�Central�Tendency 3�3� Measures�of�Dispersion 3�3�1� Range 3�3�2� H�Spread 3�3�3� Deviational�Measures 3�3�4� Summary�of�Measures�of�Dispersion 3�4� SPSS 3�5� Templates�for�Research�Questions�and�APA-Style�Paragraph Key Concepts � 1�� Summation � 2�� Central�tendency � 3�� Outliers � 4�� Dispersion � 5�� Exclusive�versus�inclusive�range � 6�� Deviation�scores � 7�� Bias In�the�second�chapter,�we�began�our�discussion�of�descriptive�statistics�previously�defined�as� techniques�which�allow�us�to�tabulate,�summarize,�and�depict�a�collection�of�data�in�an�abbre- viated�fashion��There�we�considered�various�methods�for�representing�data�for�purposes�of� communicating�something�to�the�reader�or�audience��In�particular,�we�were�concerned�with� ways�of�representing�data�in�an�abbreviated�fashion�through�both�tables�and�figures� 50 An Introduction to Statistical Concepts In� this� chapter,� we� delve� more� into� the� field� of� descriptive� statistics� in� terms� of� three� general� topics�� First,� we� examine� summation� notation,� which� is� important� for� much� of� the� chapter� and,� to� some� extent,� the� remainder� of� the� text�� Second,� measures� of� central� tendency�allow�us�to�boil�down�a�set�of�scores�into�a�single�value,�a�point�estimate,�which� somehow� represents� the� entire� set�� The� most� commonly� used� measures� of� central� ten- dency�are�the�mode,�median,�and�mean��Finally,�measures�of�dispersion�provide�us�with� information�about� the�extent�to� which� the� set�of�scores� varies—in� other�words,� whether� the�scores�are�spread�out�quite�a�bit�or�are�pretty�much�the�same��The�most�commonly�used� measures�of�dispersion�are�the�range�(exclusive�and�inclusive�ranges),�H�spread,�and�vari- ance�and�standard�deviation��In�summary,�concepts�to�be�discussed�in�this�chapter�include� summation,�central�tendency,�and�dispersion��Within�this�discussion,�we�also�address�out- liers�and�bias��Our�objectives�are�that�by�the�end�of�this�chapter,�you�will�be�able�to�do� the�following:�(a)�understand�and�utilize�summation�notation,�(b)�determine�and�interpret� the�three�commonly�used�measures�of�central�tendency,�and�(c)�determine�and�interpret�dif- ferent�measures�of�dispersion� 3.1 Summation Notation We�were�introduced�to�the�following�research�scenario�in�Chapter�2�and�revisit�Marie�in� this�chapter� Marie,� a� graduate� student� pursuing� a� master’s� degree� in� educational� research,� has� been� assigned� to� her� first� task� as� a� research� assistant�� Her� faculty� mentor� has� given� Marie�quiz�data�collected�from�25�students�enrolled�in�an�introductory�statistics�course� and� has� asked� Marie� to� summarize� the� data�� The� faculty� member� was� pleased� with� the�descriptive�analysis�and�presentation�of�results�previously�shared,�and�has�asked� Marie�to�conduct�additional�analysis�related�to�the�following�research�questions:�How can quiz scores of students enrolled in an introductory statistics class be summarized using measures of central tendency? Measures of dispersion? Many� areas� of� statistics,� including� many� methods� of� descriptive� and� inferential� statis- tics,�require�the�use�of�summation�notation��Say�we�have�collected�heart�rate�scores�from� 100�students��Many�statistics�require�us�to�develop�“sums”�or�“totals”�in�different�ways�� For� example,� what� is� the� simple� sum� or� total� of� all� 100� heart� rate� scores?� Summation� (i�e�,�addition)�is�not�only�quite�tedious�to�do�computationally�by�hand,�but�we�also�need� a�system�of�notation�to�communicate�how�we�have�conducted�this�summation�process�� This�section�describes�such�a�notational�system� For�simplicity,�let�us�utilize�a�small�set�of�scores,�keeping�in�mind�that�this�system�can� be�used�for�a�set�of�numerical�values�of�any�size��In�other�words,�while�we�speak�in�terms� of�“scores,”�this�could�just�as�easily�be�a�set�of�heights,�distances,�ages,�or�other�measures�� Specifically� in� this� example,� we� have� a� set� of� five� ages:� 7,� 11,� 18,� 20,� and� 24�� Recall� from� Chapter�2�the�use�of�X�to�denote�a�variable��Here�we�define�Xi�as�the�score�for�variable�X�(in� this�example,�age)�for�a�particular�individual�or�object�i��The�subscript�i�serves�to�identify� one�individual�or�object�from�another��These�scores�would�then�be�denoted�as�follows:� X1�=�7,�X2�=�11,�X3�=�18,�X4�=�20,�and�X5�=�24��To�interpret�X1�=�7�means�that�for�variable�X� and�individual�1,�the�value�of�the�variable�age�is�7��In�other�words,�individual�1�is�7�years�of�age�� 51Univariate Population Parameters and Sample Statistics With�five�individuals�measured�on�age,�then�i�=�1,�2,�3,�4,�5��However,�with�a�large�set�of� values,�this�notation�can�become�quite�unwieldy,�so�as�shorthand�we�abbreviate�this�as� i�=�1,…,�5,�meaning�that�X�ranges�or�goes�from�i�=�1�to�i�=�5� Next�we�need�a�system�of�notation�to�denote�the�summation�or�total�of�a�set�of�scores�� The�standard�notation�used�is� Xi i a b = ∑ ,�where�Σ�is�the�Greek�capital�letter�sigma�and�merely� means�“the�sum�of,”�Xi�is�the�variable�we�are�summing�across�for�each�of�the�i�individuals,� i = a�indicates�that�a�is�the�lower�limit�(or�beginning)�of�the�summation�(i�e�,�the�first�value� with� which� we� begin� our� addition),� and� b� indicates� the� upper� limit� (or� end)� of� the� sum- mation�(i�e�,�the�last�value�added)��For�our�example�set�of�ages,�the�sum�of�all�of�the�ages� would�be�denoted�as� Xi i= ∑ 1 5 �in�shorthand�version�and�as� X X X X X Xi i= ∑ = + + + + 1 5 1 2 3 4 5�in� longhand�version��For�the�example�data,�the�sum�of�all�of�the�ages�is�computed�as�follows: X X X X X Xi i= ∑ = + + + + = + + + + = 1 5 1 2 3 4 5 7 11 18 20 24 80 Thus,�the�sum�of�the�age�variable�across�all�five�individuals�is�80� For�large�sets�of�values,�the�longhand�version�is�rather�tedious,�and,�thus,�the�shorthand� version�is�almost�exclusively�used��A�general�form�of�the�longhand�version�is�as�follows: X X X X Xi i a b a a b b = + −∑ = + + + +1 1… The�ellipse�notation�(i�e�,�…)�indicates�that�there�are�as�many�values�in�between�the�two� values� on� either� side� of� the� ellipse� as� are� necessary�� The� ellipse� notation� is� then� just� shorthand�for�“there�are�some�values�in�between�here�”�The�most�frequently�used�values� for�a�and�b�with�sample�data�are�a�=�1�and�b = n�(as�you�may�recall,�n�is�the�notation�used� to� represent� our� sample� size)�� Thus,� the� most� frequently� used� summation� notation� for� sample�data�is� Xi i n = ∑ 1 . 3.2 Measures of Central Tendency One�method�for�summarizing�a�set�of�scores�is�to�construct�a�single�index�or�value�that�can� somehow�be�used�to�represent�the�entire�collection�of�scores��In�this�section,�we�consider� the�three�most�popular�indices,�known�as�measures of central tendency��Although�other� indices�exist,�the�most�popular�ones�are�the�mode,�the�median,�and�the�mean� 3.2.1 Mode The� simplest� method� to� use� for� measuring� central� tendency� is� the� mode�� The� mode� is� defined� as� that� value� in� a� distribution� of� scores� that� occurs� most� frequently�� Consider� the�example�frequency�distributions�of�the�number�of�hours�of�TV�watched�per�week,�as� 52 An Introduction to Statistical Concepts shown� in� Table� 3�1�� In� distribution� (a),� the� mode� is� easy� to� determine,� as� the� interval� for� value�8�contains�the�most�scores,�3�(i�e�,�the�mode�number�of�hours�of�TV�watched�is�8)��In� distribution�(b),�the�mode�is�a�bit�more�complicated�as�two�adjacent�intervals�each�contain� the�most�scores;�that�is,�the�8�and�9�hour�intervals�each�contain�three�scores��Strictly�speak- ing,�this�distribution�is�bimodal,�that�is,�containing�two�modes,�one�at�8�and�one�at�9��This� is� our� personal� preference� for� reporting� this� particular� situation�� However,� because� the� two�modes�are�in�adjacent�intervals,�some�individuals�make�an�arbitrary�decision�to�aver- age�these�intervals�and�report�the�mode�as�8�5� Distribution�(c)�is�also�bimodal;�however,�here�the�two�modes�at�7�and�11�hours�are�not� in�adjacent�intervals��Thus,�one�cannot�justify�taking�the�average�of�these�intervals,�as�the� average� of�9�hours�[i�e�,�(7�+�11)/2]�is�not�representative�of� the�most�frequently�occurring� score��The�score�of�9�occurs�less�than�any�other�score�observed��We�recommend�reporting� both� modes� here� as� well�� Obviously,� there� are� other� possible� situations� for� the� mode� (e�g�,�trimodal�distribution),�but�these�examples�cover�the�basics��As�one�further�example,� the�example�data�on�the�statistics�quiz�from�Chapter�2�are�shown�in�Table�3�2�and�are�used� to� illustrate� the� methods� in� this� chapter�� The� mode� is� equal� to� 17� because� that� interval� contains�more�scores�(5)�than�any�other�interval��Note�also�that�the�mode�is�determined�in� Table 3.2 Frequency�Distribution� of Statistics�Quiz�Data X f cf rf crf 9 1 1 �04 �04 10 1 2 �04 �08 11 2 4 �08 �16 12 1 5 �04 �20 13 2 7 �08 �28 14 1 8 �04 �32 15 3 11 �12 �44 16 1 12 �04 �48 17 5 17 �20 �68 18 3 20 �12 �80 19 4 24 �16 �96 20 1 25 �04 1�00 n�=�25 1�00 Table 3.1 Example�Frequency�Distributions X f(a) f(b) f(c) 6 1 1 2 7 2 2 3 8 3 3 2 9 2 3 1 10 1 2 2 11 0 1 3 12 0 0 2 53Univariate Population Parameters and Sample Statistics precisely�the�same�way�whether�we�are�talking�about�the�population�mode�(i�e�,�the�popu- lation�parameter)�or�the�sample�mode�(i�e�,�the�sample�statistic)� Let�us�turn�to�a�discussion�of�the�general�characteristics�of�the�mode,�as�well�as�whether� a�particular�characteristic�is�an�advantage�or�a�disadvantage�in�a�statistical�sense��The�first� characteristic�of�the�mode�is�it�is�simple�to�obtain��The�mode�is�often�used�as�a�quick-and- dirty� method� for� reporting� central� tendency�� This� is� an� obvious� advantage�� The� second� characteristic�is�the�mode�does�not�always�have�a�unique�value��We�saw�this�in�distribu- tions� (b)� and� (c)� of� Table� 3�1�� This� is� generally� a� disadvantage,� as� we� initially� stated� we� wanted�a�single�index�that�could�be�used�to�represent�the�collection�of�scores��The�mode� cannot�guarantee�a�single�index� Third,�the�mode�is�not�a�function�of�all�of�the�scores�in�the�distribution,�and�this�is�generally� a�disadvantage��The�mode�is�strictly�determined�by�which�score�or�interval�contains�the�most� frequencies��In�distribution�(a),�as�long�as�the�other�intervals�have�fewer�frequencies�than�the� interval�for�value�8,�then�the�mode�will�always�be�8��That�is,�if�the�interval�for�value�8�contains� three�scores�and�all�of�the�other�intervals�contain�less�than�three�scores,�then�the�mode�will� be�8��The�number�of�frequencies�for�the�remaining�intervals�is�not�relevant�as�long�as�it�is�less� than�3��Also,�the�location�or�value�of�the�other�scores�is�not�taken�into�account� The�fourth�characteristic�of�the�mode�is�that�it�is�difficult�to�deal�with�mathematically��For� example,�the�mode�is�not�very�stable�from�one�sample�to�another,�especially�with�small�sam- ples��We�could�have�two�nearly�identical�samples�except�for�one�score,�which�can�alter�the� mode��For�example,�in�distribution�(a),�if�a�second�similar�sample�contains�the�same�scores� except�that�an�8�is�replaced�with�a�7,�then�the�mode�is�changed�from�8�to�7��Thus,�changing� a� single� score� can� change� the� mode,� and� this� is� considered� to� be� a� disadvantage�� A� fifth� and�final�characteristic�is�the�mode�can�be�used�with�any�type�of�measurement�scale,�from� nominal�to�ratio,�and�is�the�only�measure�of�central�tendency�appropriate�for�nominal�data� 3.2.2 Median A�second�measure�of�central�tendency�represents�a�concept�that�you�are�already�familiar� with��The�median�is�that�score�which�divides�a�distribution�of�scores�into�two�equal�parts�� In� other� words,� one-half� of� the� scores� fall� below� the� median,� and� one-half� of� the� scores� fall�above�the�median��We�already�know�this�from�Chapter�2�as�the�50th�percentile�or�Q2�� In� other� words,� the� 50th� percentile,� or� Q2,� represents� the� median� value�� The� formula�for� computing�the�median�is � Median LRL n cf f w= + −50% ( ) � (3�1) where�the�notation�is�the�same�as�previously�described�in�Chapter�2��Just�as�a�reminder,� LRL� is� the� lower� real� limit� of� the� interval� containing� the� median,� 50%� is� the� percentile� desired,�n�is�the�sample�size,�cf�is�the�cumulative�frequency�of�all�intervals�less�than�but� not�including�the�interval�containing�the�median�(cf�below),�f�is�the�frequency�of�the�interval� containing�the�median,�and�w�is�the�interval�width��For�the�example�quiz�data,�the�median� is�computed�as�follows: Median = + −      = + =16 5 50 25 12 5 1 16 5 0 1000 16 6000. % ( ) ( ) . . . 54 An Introduction to Statistical Concepts Occasionally,� you� will� run� into� simple� distributions� of� scores� where� the� median� is� easy� to�identify��If�you�have�an�odd�number�of�untied�scores,�then�the�median�is�the�middle- ranked�score��For�an�example,�say�we�have�measured�individuals�on�the�number�of�CDs� owned�and�find�values�of�1,�3,�7,�11,�and�21��For�these�data,�the�median�is�7�(e�g�,�7�CDs�is� the�middle-ranked�value�or�score)��If�you�have�an�even�number�of�untied�scores,�then�the� median�is�the�average�of�the�two�middle-ranked�scores��For�example,�a�different�sample� reveals�the�following�number�of�CDs�owned:�1,�3,�5,�11,�21,�and�32��The�two�middle�scores� are�5�and�11,�and,�thus,�the�median�is�the�average�of�8�CDs�owned�(i�e�,�(5�+�11)/2)��In�most� other� situations� where� there� are� tied� scores,� the� median� is� not� as� simple� to� locate� and� Equation�3�1�is�necessary��Note�also�that�the�median�is�computed�in�precisely�the�same�way� whether�we�are�talking�about�the�population�median�(i�e�,�the�population�parameter)�or�the� sample�median�(i�e�,�the�sample�statistic)� The�general�characteristics�of�the�median�are�as�follows��First,�the�median�is�not�influenced� by�extreme�scores�(scores�far�away�from�the�middle�of�the�distribution�are�known�as�outliers)�� Because�the�median�is�defined�conceptually�as�the�middle�score,�the�actual�size�of�an�extreme� score�is�not�relevant��For�the�example�statistics�quiz�data,�imagine�that�the�extreme�score�of�9� was�somehow�actually�0�(e�g�,�incorrectly�scored)��The�median�would�still�be�16�6,�as�half�of�the� scores�are�still�above�this�value�and�half�below��Because�the�extreme�score�under�consideration� here�still�remained�below�the�50th�percentile,�the�median�was�not�altered��This�characteristic� is�an�advantage,�particularly�when�extreme�scores�are�observed��As�another�example�using� salary�data,�say�that�all�but�one�of�the�individual�salaries�are�below�$100,000�and�the�median� is�$50,000��The�remaining�extreme�observation�has�a�salary�of�$5,000,000��The�median�is�not� affected�by�this�millionaire—the�extreme�individual�is�simply�treated�as�every�other�observa- tion�above�the�median,�no�more�or�no�less�than,�say,�the�salary�of�$65,000� A� second� characteristic� is� the� median� is� not� a� function� of� all� of� the� scores�� Because� we� already�know�that�the�median�is�not�influenced�by�extreme�scores,�we�know�that�the�median� does� not� take� such� scores� into� account�� Another� way� to� think� about� this� is� to� examine� Equation�3�1�for�the�median��The�equation�only�deals�with�information�for�the�interval�con- taining�the�median��The�specific�information�for�the�remaining�intervals�is�not�relevant�so� long�as�we�are�looking�in�the�median-contained�interval��We�could,�for�instance,�take�the�top� 25%�of�the�scores�and�make�them�even�more�extreme�(say�we�add�10�bonus�points�to�the�top� quiz�scores)��The�median�would�remain�unchanged��As�you�probably�surmised,�this�charac- teristic�is�generally�thought�to�be�a�disadvantage��If�you�really�think�about�the�first�two�char- acteristics,�no�measure�could�possibly�possess�both��That�is,�if�a�measure�is�a�function�of�all� of�the�scores,�then�extreme�scores�must�also�be�taken�into�account��If�a�measure�does�not�take� extreme�scores�into�account,�like�the�median,�then�it�cannot�be�a�function�of�all�of�the�scores� A� third� characteristic� is� the� median� is� difficult� to� deal� with� mathematically,� a� disad- vantage� as� with� the� mode�� The� median� is� somewhat� unstable� from� sample� to� sample,� especially�with�small�samples��As�a�fourth�characteristic,�the�median�always�has�a�unique� value,�another�advantage��This�is�unlike�the�mode,�which�does�not�always�have�a�unique� value��Finally,�the�fifth�characteristic�of�the�median�is�that�it�can�be�used�with�all�types�of� measurement�scales�except�the�nominal��Nominal�data�cannot�be�ranked,�and,�thus,�per- centiles�and�the�median�are�inappropriate� 3.2.3 Mean The� final� measure� of� central� tendency� to� be� considered� is� the� mean,� sometimes� known� as�the�arithmetic�mean�or�“average”�(although�the�term�average�is�used�rather�loosely�by� laypeople)��Statistically,�we�define�the�mean�as�the�sum�of�all�of�the�scores�divided�by�the� 55Univariate Population Parameters and Sample Statistics number�of�scores��Thought�of�in�those�terms,�you�may�have�been�computing�the�mean�for� many�years,�and�may�not�have�even�known�it� The�population�mean�is�denoted�by�μ�(Greek�letter�mu)�and�computed�as�follows: µ = = ∑X N i i N 1 For�sample�data,�the�sample�mean�is�denoted�by�X – �(read�“X�bar”)�and�computed�as�follows: X X n i i n = = ∑ 1 For�the�example�quiz�data,�the�sample�mean�is�computed�as�follows: X X n i i n = = == ∑ 1 389 25 15 5600. Here�are�the�general�characteristics�of�the�mean��First,�the�mean�is�a�function�of�every�score,� a�definite�advantage�in�terms�of�a�measure�of�central�tendency�representing�all�of�the�data�� If�you�look�at�the�numerator�of�the�mean,�you�see�that�all�of�the�scores�are�clearly�taken�into� account�in�the�sum��The�second�characteristic�of�the�mean�is�that�it�is�influenced�by�extreme� scores��Because�the�numerator�sum�takes�all�of�the�scores�into�account,�it�also�includes�the� extreme�scores,�which�is�a�disadvantage��Let�us�return�for�a�moment�to�a�previous�example� of�salary�data�where�all�but�one�of�the�individuals�have�an�annual�salary�under�$100,000,�and� the�one�outlier�is�making�$5,000,000��Because�this�one�outlying�value�is�so�extreme,�the�mean� will�be�greatly�influenced��In�fact,�the�mean�could�easily�fall�somewhere�between�the�second� highest�salary�and�the�millionaire,�which�does�not�represent�well�the�collection�of�scores� Third,�the�mean�always�has�a�unique�value,�another�advantage��Fourth,�the�mean�is�easy� to�deal�with�mathematically��The�mean�is�the�most�stable�measure�of�central�tendency�from� sample�to�sample,�and�because�of�that�is�the�measure�most�often�used�in�inferential�statistics� (as�we�show�in�later�chapters)��Finally,�the�fifth�characteristic�of�the�mean�is�that�it�is�only� appropriate�for�interval�and�ratio�measurement�scales��This�is�because�the�mean�implicitly� assumes�equal�intervals,�which�of�course�the�nominal�and�ordinal�scales�do�not�possess� 3.2.4 Summary of Measures of Central Tendency To�summarize�the�measures�of�central�tendency�then, � 1�� The�mode�is�the�only�appropriate�measure�for�nominal�data� � 2�� The�median�and�mode�are�both�appropriate�for�ordinal�data�(and�conceptually�the� median�fits�the�ordinal�scale�as�both�deal�with�ranked�scores)� � 3�� All�three�measures�are�appropriate�for�interval�and�ratio�data� A�summary�of�the�advantages�and�disadvantages�of�each�measure�is�presented�in�Box�3�1� 56 An Introduction to Statistical Concepts STOp aNd ThINk bOx 3.1 Advantages�and�Disadvantages�of�Measures�of�Central�Tendency Measure of Central Tendency Advantages Disadvantages Mode •��Quick�and�easy�method�for�reporting� central�tendency •��Can�be�used�with�any�measurement�scale� of variable •�Does�not�always�have�a�unique�value •��Not�a�function�of�all�scores�in�the� distribution •��Difficult�to�deal�with�mathematically� due�to�its�instability Median •�Not�influenced�by�extreme�scores •�Has�a�unique�value •��Can�be�used�with�ordinal,�interval,�and� ratio�measurement�scales�of�variables •��Not�a�function�of�all�scores�in�the� distribution •��Difficult�to�deal�with�mathematically� due�to�its�instability •�Cannot�be�used�with�nominal�data Mean •�Function�of�all�scores�in�the�distribution •�Has�a�unique�value •�Easy�to�deal�with�mathematically •��Can�be�used�with�interval�and�ratio� measurement�scales�of�variables •�Influenced�by�extreme�scores •��Cannot�be�used�with�nominal�or� ordinal�variables 3.3 Measures of Dispersion In�the�previous�section,�we�discussed�one�method�for�summarizing�a�collection�of�scores,� the�measures�of�central�tendency��Central�tendency�measures�are�useful�for�describing�a� collection�of�scores�in�terms�of�a�single�index�or�value�(with�one�exception:�the�mode�for� distributions�that�are�not�unimodal)��However,�what�do�they�tell�us�about�the�distribution� of�scores?�Consider�the�following�example��If�we�know�that�a�sample�has�a�mean�of�50,�what� do�we�know�about�the�distribution�of�scores?�Can�we�infer�from�the�mean�what�the�distri- bution�looks�like?�Are�most�of�the�scores�fairly�close�to�the�mean�of�50,�or�are�they�spread� out�quite�a�bit?�Perhaps�most�of�the�scores�are�within�two�points�of�the�mean��Perhaps�most� are�within�10�points�of�the�mean��Perhaps�most�are�within�50�points�of�the�mean��Do�we� know?�The�answer,�of�course,�is�that�the�mean�provides�us�with�no�information�about�what� the�distribution�of�scores�looks�like,�and�any�of�the�possibilities�mentioned,�and�many�oth- ers,�can�occur��The�same�goes�if�we�only�know�the�mode�or�the�median� Another�method�for�summarizing�a�set�of�scores�is�to�construct�an�index�or�value�that� can� be� used� to� describe� the� amount� of� spread� among� the� collection� of� scores�� In� other� words,� we� need� measures� that� can� be� used� to� determine� whether� the� scores� fall� fairly� close� to� the� central� tendency� measure,� are� fairly� well� spread� out,� or� are� somewhere� in� between��In�this�section,�we�consider�the�four�most�popular�such�indices,�which�are�known� as�measures of dispersion�(i�e�,�the�extent�to�which�the�scores�are�dispersed�or�spread�out)�� Although�other�indices�exist,�the�most�popular�ones�are�the�range�(exclusive�and�inclusive),� H�spread,�the�variance,�and�the�standard�deviation� 3.3.1 Range The�simplest�measure�of�dispersion�is�the�range��The�term�range�is�one�that�is�in�common� use�outside�of�statistical�circles,�so�you�have�some�familiarity�with�it�already��For�instance,� 57Univariate Population Parameters and Sample Statistics you�are�at�the�mall�shopping�for�a�new�pair�of�shoes��You�find�six�stores�have�the�same�pair� of�shoes�that�you�really�like,�but�the�prices�vary�somewhat��At�this�point,�you�might�actu- ally�make�the�statement�“the�price�for�these�shoes�ranges�from�$59�to�$75�”�In�a�way,�you� are�talking�about�the�range� Let�us�be�more�specific�as�to�how�the�range�is�measured��In�fact,�there�are�actually�two� different� definitions� of� the� range,� exclusive� and� inclusive,� which� we� consider� now�� The� exclusive range�is�defined�as�the�difference�between�the�largest�and�smallest�scores�in�a� collection� of� scores�� For� notational� purposes,� the� exclusive� range� (ER)� is� shown� as� ER = Xmax�−�Xmin,�where�Xmax�is�the�largest�or�maximum�score�obtained,�and�Xmin�is�the�smallest� or�minimum�score�obtained��For�the�shoe�example�then,�ER = Xmax�−�Xmin�=�75�−�59�=�16��In� other�words,�the�actual�exclusive�range�of�the�scores�is�16�because�the�price�varies�from�59� to�75�(in�dollar�units)� A�limitation�of�the�exclusive�range�is�that�it�fails�to�account�for�the�width�of�the�intervals� being�used��For�example,�if�we�use�an�interval�width�of�1�dollar,�then�the�59�interval�really� has�59�5�as�the�upper�real�limit�and�58�5�as�the�lower�real�limit��If�the�least�expensive�shoe� is� $58�95,� then� the� exclusive� range� covering� from� $59� to� $75� actually� excludes� the� least� expensive�shoe��Hence�the�term�exclusive range�means�that�scores�can�be�excluded�from� this�range��The�same�would�go�for�a�shoe�priced�at�$75�25,�as�it�would�fall�outside�of�the� exclusive�range�at�the�high�end�of�the�distribution� Because�of�this�limitation,�a�second�definition�of�the�range�was�developed,�known�as�the� inclusive range��As�you�might�surmise,�the�inclusive�range�takes�into�account�the�interval� width�so�that�all�scores�are�included�in�the�range��The�inclusive�range�is�defined�as�the�differ- ence�between�the�upper�real�limit�of�the�interval�containing�the�largest�score�and�the�lower� real�limit�of�the�interval�containing�the�smallest�score�in�a�collection�of�scores��For�notational� purposes,�the�inclusive�range�(IR)�is�shown�as�IR = URL�of�Xmax�−�LRL�of�Xmin��If�you�think� about�it,�what�we�are�actually�doing�is�extending�the�range�by�one-half�of�an�interval�at�each� extreme,�one-half�an�interval�width�at�the�maximum�value,�and�one-half�an�interval�width�at� the�minimum�value��In�notational�form,�IR = ER + w��For�the�shoe�example,�using�an�interval� width�of�1,�then�IR = URL�of�Xmax�−�LRL�of�Xmin�=�75�5�−�58�5�=�17��In�other�words,�the�actual� inclusive�range�of�the�scores�is�17�(in�dollar�units)��If�the�interval�width�was�instead�2,�then� we�would�add�1�unit�to�each�extreme�rather�than�the��5�unit�that�we�previously�added�to�each� extreme��The�inclusive�range�would�instead�be�18��For�the�example�quiz�data�(presented� in�Table�3�2),�note�that�the�exclusive�range�is�11�and�the�inclusive�range�is�12�(as�interval� width�is�1)� Finally,�we�need�to�examine�the�general�characteristics�of�the�range�(they�are�the�same� for�both�definitions�of�the�range)��First,�the�range�is�simple�to�compute,�which�is�a�definite� advantage�� One� can� look� at� a� collection� of� data� and� almost� immediately,� even� without� a� computer�or�calculator,�determine�the�range� The�second�characteristic�is�the�range�is�influenced�by�extreme�scores,�a�disadvantage�� Because� the� range� is� computed� from� the� two� most� extreme� scores,� this� characteristic� is� quite�obvious��This�might�be�a�problem,�for�instance,�if�all�of�the�salary�data�range�from� $10,000�to�$95,000�except�for�one�individual�with�a�salary�of�$5,000,000��Without�this�out- lier,�the�exclusive�range�is�$85,000��With�the�outlier,�the�exclusive�range�is�$4,990,000��Thus,� the�millionaire’s�salary�has�a�drastic�impact�on�the�range� Third,� the� range� is� only� a� function� of� two� scores,� another� disadvantage�� Obviously,� the� range�is�computed�from�the�largest�and�smallest�scores�and�thus�is�only�a�function�of�those� two�scores��The�spread�of�the�distribution�of�scores�between�those�two�extreme�scores�is�not� at�all�taken�into�account��In�other�words,�for�the�same�maximum�($5,000,000)�and�minimum� ($10,000)�salaries,�the�range�is�the�same�whether�the�salaries�are�mostly�near�the�maximum� 58 An Introduction to Statistical Concepts salary,�mostly�near�the�minimum�salary,�or�spread�out�evenly��The�fourth�characteristic�is� the� range� is� unstable� from� sample� to� sample,� another� disadvantage�� Say� a� second� sample� of�salary�data�yielded�the�exact�same�data�except�for�the�maximum�salary�now�being�a�less� extreme� $100,000�� The� range� is� now� dramatically� different�� Also,� in� statistics� we� tend� to� worry�about�measures�that�are�not�stable�from�sample�to�sample,�as�that�implies�the�results� are�not�very�reliable��Finally,�the�range�is�appropriate�for�data�that�are�ordinal,�interval,�or� ratio�in�measurement�scale� 3.3.2 H Spread The�next�measure�of�dispersion�is�H�spread,�a�variation�on�the�range�measure�with�one� major� exception�� Although� the� range� relies� upon� the� two� extreme� scores,� resulting� in� certain� disadvantages,� H� spread� relies� upon� the� difference� between� the� third� and� first� quartiles�� To� be� more� specific,� H� spread� is� defined� as� Q3� −� Q1,� the� simple� difference� between�the�third�and�first�quartiles��The�term�H�spread�was�developed�by�Tukey�(1977),� H�being�short�for�hinge�from�the�box-and-whisker�plot,�and�is�also�known�as�the�inter- quartile�range� For�the�example�statistics�quiz�data�(presented�in�Table�3�2),�we�already�determined�in� Chapter�2�that�Q3�=�18�0833�and�Q1�=�13�1250��Therefore,�H = Q3�−�Q1�=�18�0833�−�13�1250�=� 4�9583��H�measures�the�range�of�the�middle�50%�of�the�distribution��The�larger�the�value,� the�greater�is�the�spread�in�the�middle�of�the�distribution��The�size�or�magnitude�of�any�of� the� range� measures� takes� on� more� meaning� when� making� comparisons� across� samples�� For�example,�you�might�find�with�salary�data�that�the�range�of�salaries�for�middle�manage- ment�is�smaller�than�the�range�of�salaries�for�upper�management��As�another�example,�we� might�expect�the�salary�range�to�increase�over�time� What� are� the� characteristics� of� H� spread?� The� first� characteristic� is� H� is� unaffected� by� extreme�scores,�an�advantage��Because�we�are�looking�at�the�difference�between�the�third� and�first�quartiles,�extreme�observations�will�be�outside�of�this�range��Second,�H is�not�a� function�of�every�score,�a�disadvantage��The�precise�placement�of�where�scores�fall�above� Q3,�below�Q1,�and�between�Q3�and�Q1�is�not�relevant��All�that�matters�is�that�25%�of�the� scores�fall�above�Q3,�25%�fall�below�Q1,�and�50%�fall�between�Q3�and Q1��Thus,�H�is�not�a� function�of�very�many�of�the�scores�at�all,�just�those�around�Q3 and Q1��Third,�H�is�not�very� stable�from�sample�to�sample,�another�disadvantage�especially�in�terms�of�inferential�sta- tistics�and�one’s�ability�to�be�confident�about�a�sample�estimate�of�a�population�parameter�� Finally,�H�is�appropriate�for�all�scales�of�measurement�except�for�nominal� 3.3.3 deviational Measures In�this�section,�we�examine�deviation�scores,�population�variance�and�standard�deviation,� and�sample�variance�and�standard�deviation,�all�methods�that�deal�with�deviations�from� the�mean� 3.3.3.1 Deviation Scores In� the� last� category� of� measures� of� dispersion� are� those� that� utilize� deviations� from� the� mean��Let�us�define�a�deviation score�as�the�difference�between�a�particular�raw�score�and� the�mean�of�the�collection�of�scores�(population�or�sample,�either�will�work)��For�popula- tion�data,�we�define�a�deviation�as�di�=�Xi�−�μ��In�other�words,�we�can�compute�the�deviation� 59Univariate Population Parameters and Sample Statistics from�the�mean�for�each�individual�or�object��Consider�the�credit�card�dataset�as�shown�in� Table�3�3��To�make�matters�simple,�we�only�have�a�small�population�of�data,�five�values�to� be�exact��The�first�column�lists�the�raw�scores,�which�are�in�this�example�the�number�of� credit�cards�owned�for�five�individuals�and,�at�the�bottom�of�the�first�column,�indicates�the� sum�(Σ�=�30),�population�size�(N�=�5),�and�population�mean�(μ�=�6�0)��The�second�column� provides�the�deviation�scores�for�each�observation�from�the�population�mean�and,�at�the� bottom�of�the�second�column,�indicates�the�sum�of�the�deviation�scores,�denoted�by ( )Xi i N − = ∑ µ 1 From�the�second�column,�we�see�that�two�of�the�observations�have�positive�deviation�scores� as�their�raw�score�is�above�the�mean,�one�observation�has�a�zero�deviation�score�as�that�raw� score�is�at�the�mean,�and�two�other�observations� have�negative�deviation�scores�as�their� raw� score� is� below� the� mean�� However,� when� we� sum� the� deviation� scores,� we� obtain� a� value�of�zero��This�will�always�be�the�case�as�follows: ( )Xi i N − = = ∑ µ 0 1 The� positive� deviation� scores� will� exactly� offset� the� negative� deviation� scores�� Thus� any� measure�involving�simple�deviation�scores�will�be�useless�in�that�the�sum�of�the�deviation� scores�will�always�be�zero,�regardless�of�the�spread�of�the�scores� What�other�alternatives�are�there�for�developing�a�deviational�measure�that�will�yield�a� sum�other�than�zero?�One�alternative�is�to�take�the�absolute�value�of�the�deviation�scores� (i�e�,�where�the�sign�is�ignored)��Unfortunately,�however,�this�is�not�very�useful�mathematically� in� terms� of�deriving� other�statistics,�such� as�inferential� statistics��As�a�result,� this� devia- tional�measure�is�rarely�used�in�statistics� 3.3.3.2 Population Variance and Standard Deviation So�far,�we�found�the�sum�of�the�deviations�and�the�sum�of�the�absolute�deviations�not�to�be� very�useful�in�describing�the�spread�of�the�scores�from�the�mean��What�other�alternative� Table 3.3 Credit�Card�Data X X − μ (X − μ)2 1 −5 25 5 −1 1 6 0 0 8 2 4 10 4 16 =∑ 30 =∑ 0 =∑ 46 N�=�5 μ�=�6 60 An Introduction to Statistical Concepts might�be�useful?�As�shown�in�the�third�column�of�Table�3�3,�one�could�square�the�devia- tion�scores�to�remove�the�sign�problem��The�sum�of�the�squared�deviations�is�shown�at�the� bottom�of�the�column�as�Σ�=�46�and�denoted�as ( )Xi i N − = ∑ µ 2 1 As�you�might�suspect,�with�more�scores,�the�sum�of�the�squared�deviations�will�increase�� So�we�have�to�weigh�the�sum�by�the�number�of�observations�in�the�population��This�yields� a�deviational�measure�known�as�the�population variance,�which�is�denoted�as�σ2�(lower- case�Greek�letter�sigma)�and�computed�by�the�following�formula: σ µ 2 2 1= − = ∑( )X N i i N For�the�credit�card�example,�the�population�variance�σ2�=�(46/5)�=�9�2��We�refer�to�this�par- ticular�formula�for�the�population�variance�as�the�definitional formula,�as�conceptually� that�is�how�we�define�the�variance��Conceptually,�the�variance�is�a�measure�of�the�area�of�a� distribution��That�is,�the�more�spread�out�the�scores,�the�more�area�or�space�the�distribution� takes�up�and�the�larger�is�the�variance��The�variance�may�also�be�thought�of�as�an�average� distance�from�the�mean��The�variance�has�nice�mathematical�properties�and�is�useful�for� deriving�other�statistics,�such�as�inferential�statistics� The�computational formula�for�the�population�variance�is σ2 2 1 1 2 2= −       = = ∑ ∑N X X N i i N i i N This�method�is�computationally�easier�to�deal�with�than�the�definitional�formula��Imagine� if�you�had�a�population�of�100�scores��Using�hand�computations,�the�definitional�formula� would�take�considerably�more�time�than�the�computational�formula��With�the�computer,� this�is�a�moot�point,�obviously��But�if�you�do�have�to�compute�the�population�variance�by� hand,�then�the�easiest�formula�to�use�is�the�computational�one� Exactly� how� does� this� formula� work?� For� the� first� summation� in� the� numerator,� we� square�each�score�first,�then�sum�all�the�squared�scores��This�value�is�then�multiplied�by� the� population� size�� For� the� second� summation� in� the� numerator,� we� sum� all� the� scores� first,�then�square�the�summed�scores��After�subtracting�the�values�computed�in�the�numer- ator,�we�divide�by�the�squared�population�size� For the first summation in the numerator, we square each score first, then sum across the squared scores. For the second summation in the numerator, we sum across the scores �rst, then square the summed scores.N 2 σ 2 = Σ N X2i i=1 Σ N i=1 2 XiN 61Univariate Population Parameters and Sample Statistics The�two�quantities�derived�by�the�summation�operations�in�the�numerator�are�computed� in�much�different�ways�and�generally�yield�different�values� Let� us� return� to� the� credit� card� dataset� and� see� if� the� computational� formula� actually� yields�the�same�value�for�σ2�as�the�definitional�formula�did�earlier�(σ2�=�9�2)��The�computa- tional�formula�shows�σ2�to�be�as�follows: σ2 ( ) ( ) ( ) = −       = − = − == = ∑ ∑N X X N i i N i i N 2 1 1 2 2 2 2 5 226 30 5 1130 900 25 99 2000. which�is�precisely�the�value�we�computed�previously� A�few�individuals�(none�of�us,�of�course)�are�a�bit�bothered�about�the�variance�for�the� following�reason��Say�you�are�measuring�the�height�of�children�in�inches��The�raw�scores� are�measured�in�terms�of�inches,�the�mean�is�measured�in�terms�of�inches,�but�the�vari- ance�is�measured�in�terms�of�inches�squared��Squaring�the�scale�is�bothersome�to�some� as� the� scale� is� no� longer� in� the� original� units� of� measure,� but� rather� a� squared� unit� of� measure—making�interpretation�a�bit�difficult��To�generate�a�deviational�measure�in�the� original�scale�of�inches,�we�can�take�the�square�root�of�the�variance��This�is�known�as�the� standard deviation� and� is� the� final� measure� of� dispersion� we� discuss�� The� population� standard�deviation�is�defined�as�the�positive�square�root�of�the�population�variance�and� is�denoted�by�σ�(i�e�,�σ σ= + 2 )��The�standard�deviation,�then,�is�measured�in�the�original� scale�of�inches��For�the�credit�card�data,�the�standard�deviation�is�computed�as�follows: σ σ= + = + =2 9 2 3 0332. . What�are�the�major�characteristics�of�the�population�variance�and�standard�deviation?� First,�the�variance�and�standard�deviation�are�a�function�of�every�score,�an�advantage�� An� examination� of� either� the� definitional� or� computational� formula� for� the� variance� (and�standard�deviation�as�well)�indicates�that�all�of�the�scores�are�taken�into�account,� unlike�the�range�or�H�spread��Second,�therefore,�the�variance�and�standard�deviation�are� affected�by�extreme�scores,�a�disadvantage��As�we�said�earlier,�if�a�measure�takes�all�of� the�scores�into�account,�then�it�must�take�into�account�the�extreme�scores�as�well��Thus,�a� child�much�taller�than�all�of�the�rest�of�the�children�will�dramatically�increase�the�vari- ance,�as�the�area�or�size�of�the�distribution�will�be�much�more�spread�out��Another�way� to�think�about�this�is�the�size�of�the�deviation�score�for�such�an�outlier�will�be�large,�and� then�it�will�be�squared,�and�then�summed�with�the�rest�of�the�deviation�scores��Thus,�an� outlier�can�really�increase�the�variance��Also,�it�goes�without�saying�that�it�is�always�a� good�idea�when�using�the�computer�to�verify�your�data��A�data�entry�error�can�cause�an� outlier�and�therefore�a�larger�variance�(e�g�,�that�child�coded�as�700�inches�tall�instead�of� 70�will�surely�inflate�your�variance)� Third,� the� variance� and� standard� deviation� are� only� appropriate� for� interval� and� ratio� measurement�scales��Like�the�mean,�this�is�due�to�the�implicit�requirement�of�equal�intervals�� A� fourth� and� final� characteristic� of� the� variance� and� standard� deviation� is� they� are� quite� useful�for�deriving�other�statistics,�particularly�in�inferential�statistics,�another�advantage�� In�fact,�Chapter�9�is�all�about�making�inferences�about�variances,�and�many�other�inferential� statistics�make�assumptions�about�the�variance��Thus,�the�variance�is�quite�important�as�a� measure� of� dispersion�� It� is� also� interesting� to� compare� the� measures� of� central� tendency� with�the�measures�of�dispersion,�as�they�do�share�some�important�characteristics��The�mode� 62 An Introduction to Statistical Concepts and� the� range� share� certain� characteristics�� Both� only� take� some� of� the� data� into� account,� are�simple�to�compute,�and�are�unstable�from�sample�to�sample��The�median�shares�certain� characteristics�with�H�spread��These�are�not�influenced�by�extreme�scores,�are�not�a�function� of�every�score,�are�difficult�to�deal�with�mathematically�due�to�their�instability�from�sample� to�sample,�and�can�be�used�with�all�measurement�scales�except�the�nominal�scale��The�mean� shares�many�characteristics�with�the�variance�and�standard�deviation��These�all�are�a�func- tion�of�every�score,�are�influenced�by�extreme�scores,�are�useful�for�deriving�other�statistics,� and�are�only�appropriate�for�interval�and�ratio�measurement�scales� To�complete�this�section�of�the�chapter,�we�take�a�look�at�the�sample�variance�and�stan- dard�deviation�and�how�they�are�computed�for�large�samples�of�data�(i�e�,�larger�than�our� credit�card�dataset)� 3.3.3.3 Sample Variance and Standard Deviation Most�of�the�time,�we�are�interested�in�computing�the�sample�variance�and�standard�devia- tion;�we�also�often�have�large�samples�of�data�with�multiple�frequencies�for�many�of�the� scores��Here�we�consider�these�last�aspects�of�the�measures�of�dispersion��Recall�when�we� computed� the� sample� statistics� of� central� tendency�� The� computations� were� exactly� the� same� as� with� the� population� parameters� (although� the� notation� for� the� population� and� sample�means�was�different)��There�are�also�no�differences�between�the�sample�and�popu- lation�values�for�the�range,�or�H�spread��However,�there�is�a�difference�between�the�sample� and�population�values�for�the�variance�and�standard�deviation,�as�we�see�next� Recall�the�definitional�formula�for�the�population�variance�as�follows: σ µ 2 2 1= − = ∑( )X N i i N Why� not� just� take� this� equation� and� convert� everything� to� sample� statistics?� In� other� words,�we�could�simply�change�N�to�n�and�μ�to�X – ��What�could�be�wrong�with�that?�The� answer�is�that�there�is�a�problem�which�prevents�us�from�simply�changing�the�notation�in� the�formula�from�population�notation�to�sample�notation� Here�is�the�problem��First,�the�sample�mean,�X – ,�may�not�be�exactly�equal�to�the�popu- lation� mean,� μ�� In� fact,� for� most� samples,� the� sample� mean� will� be� somewhat� different� from� the� population� mean�� Second,� we� cannot� use� the� population� mean� anyway� as� it� is� unknown� (in� most� instances� anyway)�� Instead,� we� have� to� substitute� the� sample� mean� into�the�equation�(i�e�,�the�sample�mean,�X – ,�is�the�sample�estimate�for�the�population�mean,�μ)�� Because� the� sample� mean� is� different� from� the� population� mean,� the� deviations� will� all� be� affected�� Also,� the� sample� variance� that� would� be� obtained� in� this� fashion� would� be� a� biased� estimate� of� the� population� variance�� In� statistics,� bias� means� that� something� is� systematically� off�� In� this� case,� the� sample� variance� obtained� in� this� manner� would� be� systematically�too�small� In�order�to�obtain�an�unbiased�sample�estimate�of�the�population�variance,�the�following� adjustments�have�to�be�made�in�the�definitional�and�computational�formulas,�respectively: s X X n i i n 2 2 1 1 = − − = ∑( ) 63Univariate Population Parameters and Sample Statistics s n X X n n i i n i i n 2 2 1 1 2 1 = −       − = = ∑ ∑ ( ) In�terms�of�the�notation, s2�is�the�sample�variance n�has�been�substituted�for�N X – �has�been�substituted�for�μ These�changes�are�relatively�minor�and�expected��The�major�change�is�in�the�denominator,� where�instead�of�N�for�the�definitional�formula�we�have�n −�1,�and�instead�of�N 2�for�the�com- putational�formula�we�have�n(n�−�1)��This�turns�out�to�be�the�correction�that�early�statisticians� discovered�was�necessary�to�obtain�an�unbiased�estimate�of�the�population�variance� It�should�be�noted�that�(a)�when�sample�size�is�relatively�large�(e�g�,�n�=�1000),�the�correc- tion�will�be�quite�small,�and�(b)�when�sample�size�is�relatively�small�(e�g�,�n�=�5),�the�cor- rection�will�be�quite�a�bit�larger��One�suggestion�is�that�when�computing�the�variance�on�a� calculator�or�computer,�you�might�want�to�be�aware�of�whether�the�sample�or�population� variance�is�being�computed�as�it�can�make�a�difference�(typically�the�sample�variance�is� computed)��The�sample�standard�deviation�is�denoted�by�s�and�computed�as�the�positive� square�root�of�the�sample�variance�s2�(i�e�,�s s= + 2 )� For�our�example�statistics�quiz�data�(presented�in�Table�3�2),�we�have�multiple�frequen- cies�for�many�of�the�raw�scores�which�need�to�be�taken�into�account��A�simple�procedure� for�dealing�with�this�situation�when�using�hand�computations�is�shown�in�Table�3�4��Here� we�see�that�in�the�third�and�fifth�columns,�the�scores�and�squared�scores�are�multiplied�by� their�respective�frequencies��This�allows�us�to�take�into�account,�for�example,�that�the�score� of�19�occurred�four�times��Note�for�the�fifth�column�that�the�frequencies�are�not�squared;� only�the�scores�are�squared��At�the�bottom�of�the�third�and�fifth�columns�are�the�sums�we� need�to�compute�the�parameters�of�interest� Table 3.4 Sums�for�Statistics�Quiz�Data X f fX X2 fX2 9 1 9 81 81 10 1 10 100 100 11 2 22 121 242 12 1 12 144 144 13 2 26 169 338 14 1 14 196 196 15 3 45 225 675 16 1 16 256 256 17 5 85 289 1445 18 3 54 324 972 19 4 76 361 1444 20 1 20 400 400 n�=�25 =∑ 389 =∑ 6293 64 An Introduction to Statistical Concepts The�computations�are�as�follows��We�compute�the�sample�mean�to�be X fX n i i n = = == ∑ 1 389 25 15 5600. The�sample�variance�is�computed�to�be�as�follows: s n fX fX n n i i n i i n 2 2 1 1 2 2 1 25 6 293 389 25 24 = −       − = −= = ∑ ∑ ( ) ( , ) ( ) ( ) == − = = 157 325 151 321 600 6 004 600 10 0067 , , , . Therefore,�the�sample�standard�deviation�is s s= + = + =2 10 0067 3 1633. . 3.3.4 Summary of Measures of dispersion To�summarize�the�measures�of�dispersion�then, � 1�� The� range� is� the� only� appropriate� measure� for� ordinal� data�� The� H� spread,� vari- ance,� and� standard� deviation� can� be� used� with� interval� or� ratio� measurement� scales� � 2�� There�are�no�measures�of�dispersion�appropriate�for�nominal�data� A� summary� of� the� advantages� and� disadvantages� of� each� measure� is� presented� in� Box�3�2� STOp aNd ThINk bOx 3.2 Advantages�and�Disadvantages�of�Measures�of�Dispersion Measure of Dispersion Advantages Disadvantages Range •�Simple�to�compute •��Can�be�used�with�ordinal,�interval,�and� ratio�measurement�scales�of�variables •�Influenced�by�extreme�scores •�Function�of�only�two�scores •�Unstable�from�sample�to�sample •�Cannot�be�used�with�nominal�data H�spread •�Unaffected�by�extreme�scores •��Can�be�used�with�ordinal,�interval,�and� ratio�measurement�scales�of�variables •��Not�a�function�of�all�scores�in�the�distribution •��Difficult�to�deal�with�mathematically�due�to� its�instability •�Cannot�be�used�with�nominal�data Variance�and� standard� deviation •�Function�of�all�scores�in�the�distribution •�Useful�for�deriving�other�statistics •��Can�be�used�with�interval�and�ratio� measurement�scales�of�variables •�Influenced�by�extreme�scores •��Cannot�be�used�with�nominal�or�ordinal� variables 65Univariate Population Parameters and Sample Statistics 3.4 SPSS The�purpose�of�this�section�is�to�see�what�SPSS�has�to�offer�in�terms�of�computing�mea- sures� of� central� tendency� and� dispersion�� In� fact,� SPSS� provides� us� with� many� differ- ent�ways�to�obtain�such�measures��The�three�programs� that�we�have�found�to�be�most� useful� for� generating� descriptive� statistics� covered� in� this� chapter� are� “Explore,” “Descriptives,” and “Frequencies.”�Instructions�for�using�each�are�provided�as� follows� Explore Explore: Step 1.� The� first� program,�“Explore,”� can� be� invoked� by� clicking� on� “Analyze”�in�the�top�pulldown�menu,�then�“Descriptive Statistics,”�and�then� “Explore.”� Following� the� screenshot,� as� follows,� will� produce� the�Explore� dialog� box�� For� brevity,� we� have� not� reproduced� this� initial� screenshot� when� we� discuss� the� “Descriptives”� and�“Frequencies”� programs;� however,� you� can� see� here� where� they�can�be�found�from�the�pulldown�menus� A B C Descriptives and frequencies can also be invoked from this menu. Explore: Step 1 Explore: Step 2.�Next,�from�the�main�“Explore”�dialog�box,�click�the�variable�of� interest�from�the�list�on�the�left�(e�g�,�quiz),�and�move�it�into�the�“Dependent List”� box�by�clicking�on�the�arrow�button�(see�screenshot�for�“Explore: Step 2”)��Then� click�on�the�“OK”�button� 66 An Introduction to Statistical Concepts Explore: Step 2 Select the variable of interest from the list on the left and use the arrow to move to the “Dependent List” box on the right. This� will� automatically� generate� the� mean,� median� (approximate),� variance,� standard� deviation,�minimum,�maximum,�exclusive�range,�and�interquartile�range�(H)�(plus�skew- ness�and�kurtosis�to�be�covered�in�Chapter�4)��The�SPSS�output�from�“Explore”�is�shown� in�the�top�panel�of�Table�3�5� Table 3.5 Select�SPSS�Output�for�Statistics�Quiz�Data�Using�“Explore,”�“Descriptives,”� and “Frequencies” Descriptives Statistic Std. Error Mean .63267 Lower bound95% Confidence interval for mean Upper bound 5% Trimmed mean Median Variance Std. deviation Minimum Maximum Range Interquartile range Skewness .464 Quiz Kurtosis 15.5600 14.2542 16.8658 15.6778 17.0000 10.007 3.16333 9.00 20.00 11.00 5.00 –.598 –.741 .902 Descriptive Statistics N Range Minimum Maximum Mean Std. Deviation Variance Quiz 25 11.00 9.00 20.00 15.5600 3.16333 10.007 Valid N (listwise) 25 �is is an example of the output generated using the“Descriptives” procedure in SPSS. This is an example of the output generated using the “Explore” procedure in SPSS. By default, a stem-and- leaf plot and boxplot are also generated from “Explore” (but are not presented here). 67Univariate Population Parameters and Sample Statistics Table 3.5 (continued) Select�SPSS�Output�for�Statistics�Quiz�Data�Using�“Explore,”�“Descriptives,”� and�“Frequencies” Statistics Quiz Valid 25N Missing 0 Mean 15.5600 Median 16.3333a Mode 17.00 Std. deviation 3.16333 Variance 10.007 Range 11.00 Minimum 9.00 Maximum 20.00 a Calculated from grouped data. �is is an example of the output generated using the “Frequencies” procedure in SPSS. By default, a frequency table is generated from “Frequencies” (but is not presented here). Descriptives Descriptives: Step 1.� The� second� program� to� consider� is� “Descriptives.”� It� can� also� be� accessed� by� going� to�“Analyze”� in� the� top� pulldown� menu,� then� selecting� “Descriptive Statistics,”� and� then�“Descriptives”� (see�“Explore: Step 1”� for�a�screenshot�of�this�step)� Descriptives: Step 2.� This� will� bring� up� the� “Descriptives”� dialog� box� (see� “Descriptives: Step 2”� screenshot)�� From� the� main�“Descriptives”� dialog� box,� click�the�variable�of�interest�(e�g�,�quiz)�and�move�into�the�“Variable(s)”�box�by�clicking� on�the�arrow��Next,�click�on�the�“Options”�button� Select the variable of interest from the list on the left and use the arrow to move to the “Variable” box on the right. Clicking on “Options” will allow you to select various statistics to be generated. Descriptives: Step 2 Descriptives: Step 3.�A�new�box�called�“Descriptives: Options”�will�appear� (see�“Descriptives: Step 3”�screenshot),�and�you�can�simply�place�a�checkmark�in� the�boxes�for�the�statistics�that�you�want�to�generate��From�here,�you�can�obtain�the�mean,� variance,� standard� deviation,� minimum,� maximum,� and� exclusive� range� (among� oth- ers�available)��The�SPSS�output�from�“Descriptives”�is�shown�in�the�middle�panel�of� 68 An Introduction to Statistical Concepts Table�3�5��After�making�your�selections,�click�on�“Continue.”�You�will�then�be�returned� to�the�main�“Descriptives”�dialog�box��From�there,�click�“OK.” Descriptives: Step 3 Statistics available when clicking on “Options” from the main dialog box for Descriptives. Placing a checkmark will generate the respective statistic in the output. Frequencies Frequencies: Step 1.� The� final� program� to� consider� is� “Frequencies.”� Go� to� “Analyze”� in� the� top� pulldown� menu,� then�“Descriptive Statistics,”� and� then� select�“Frequencies.”�See�“Explore: Step 1”�for�a�screenshot�of�this�step� Frequencies: Step 2.�The�“Frequencies”�dialog�box�will�open�(see�screenshot�for� “Frequencies: Step 2”)�� From� this� main� “Frequencies”� dialog� box,� click� the� variable� of� interest� from� the� list� on� the� left� (e�g�,� quiz)� and� move� it� into� the�“Variables”� box� by� clicking� on� the� arrow� button�� By� default,� there� is� a� checkmark� in� the� box� for� “Display frequency tables,”�and�we�will�keep�this�checked��This�(i�e�,�selecting�“Display fre- quency tables”)�will�generate�a�table�of�frequencies,�relative�frequencies,�and�cumulative� relative�frequencies��Then�click�on�“Statistics”�located�in�the�top�right�corner� Select the variable of interest from the list on the left and use the arrow to move to the “Variable” box on the right. Clicking on “Statistics” will allow you to select various statistics to be generated. Frequencies: Step 2 69Univariate Population Parameters and Sample Statistics Frequencies: Step 3.�A�new�dialog�box�labeled�“Frequencies: Statistics”�will� appear� (see� screenshot� for�“Frequencies: Step 3”)�� Here� you� can� obtain� the� mean,� median� (approximate),� mode,� variance,� standard� deviation,� minimum,� maximum,� and� exclusive�range�(among�others)��In�order�to�obtain�the�closest�approximation�to�the�median,� check�the�“Values are group midpoints”�box,�as�shown��However,�it�should�be�noted� that�these�values�are�not�always�as�precise�as�those�from�the�formula�given�earlier�in�this� chapter��The�SPSS�output�from�“Frequencies”�is�shown�in�the�bottom�panel�of�Table�3�5�� After� making� your� selections,� click� on� “Continue.”� You� will� then� be� returned� to� the� main�“Frequencies”�dialog�box��From�there,�click�“OK.” Options available when clicking on “Statistics” from the main dialog box for Frequencies. Placing a checkmark will generate the respective statistic in the output. Check this for better accuracy with quartiles and percentiles (e.g., the median). Frequencies: Step 3 3.5 Templates for Research Questions and APA-Style Paragraph As�we�stated�in�Chapter�2,�depending�on�the�purpose�of�your�research�study,�you�may� or�may�not�write�a�research�question�that�corresponds�to�your�descriptive�statistics��If� the� end� result� of� your� research� paper� is� to� present� results� from� inferential� statistics,� it�may�be�that�your�research�questions�correspond�only�to�those�inferential�questions� and�thus�no�question�is�presented�to�represent�the�descriptive�statistics��That�is�quite� common��On�the�other�hand,�if�the�ultimate�purpose�of�your�research�study�is�purely� descriptive� in� nature,� then� writing� one� or� more� research� questions� that� correspond� to� the� descriptive� statistics� is� not� only� entirely� appropriate� but� (in� most� cases)� abso- lutely� necessary�� At� this� time,� let� us� revisit� our� graduate� research� assistant,� Marie,� who�was�introduced�at�the�beginning�of�the�chapter��As�you�may�recall,�her�task�was� 70 An Introduction to Statistical Concepts to�summarize�data�from�25�students�enrolled�in�a�statistics�course��The�questions�that� Marie’s�faculty�mentor�shared�with�her�were�as�follows:�How can quiz scores of students enrolled in an introductory statistics class be summarized using measures of central tendency? Measures of dispersion?�A�tem- plate�for�writing�descriptive�research�questions�for�summarizing�data�with�measures� of�central�tendency�and�dispersion�are�presented�as�follows: How can [variable] be summarized using measures of central tendency? Measures of dispersion? Next,�we�present�an�APA-like�paragraph�summarizing�the�results�of�the�statistics�quiz�data� example�answering�the�questions�posed�to�Marie: As shown in Table 3.5, scores ranged from 9 to 20. The mean was 15.56, the approximate median was 17.00 (or 16.33 when calculated from grouped data), and the mode was 17.00. Thus, the scores tended to lump together at the high end of the scale. A negatively skewed dis- tribution is suggested given that the mean was less than the median and mode. The exclusive range was 11, H spread (interquartile range) was 5.0, variance was 10.007, and standard deviation was 3.1633. From this, we can tell that the scores tended to be quite variable. For example, the middle 50% of the scores had a range of 5 (H spread) indicating that there was a reasonable spread of scores around the median. Thus, despite a high “average” score, there were some low performing students as well. These results are consistent with those described in Section 2.4. 3.6 Summary In�this�chapter,�we�continued�our�exploration�of�descriptive�statistics�by�considering�some� basic� univariate� population� parameters� and� sample� statistics�� First,� we� examined� sum- mation� notation� which� is� necessary� in� many� areas� of� statistics�� Then� we� looked� at� the� most�commonly�used�measures�of�central�tendency,�the�mode,�the�median,�and�the�mean�� The�next�section�of�the�chapter�dealt�with�the�most�commonly�used�measures�of�disper- sion��Here�we�discussed�the�range�(both�exclusive�and�inclusive�ranges),�H�spread,�and�the� population�variance�and�standard�deviation,�as�well�as�the�sample�variance�and�standard� deviation��We�concluded�the�chapter�with�a�look�at�SPSS,�a�template�for�writing�research� questions�for�summarizing�data�using�measures�of�central�tendency�and�dispersion,�and� then�developed�an�APA-style�paragraph�of�results��At�this�point,�you�should�have�met�the� following�objectives:�(a)�be�able�to�understand�and�utilize�summation�notation,�(b)�be�able� to�determine�and�interpret�the�three�commonly�used�measures�of�central�tendency,�and�(c)�be� able� to� determine� and� interpret� different� measures� of� dispersion�� A� summary� of� when� these�descriptive�statistics�are�most�appropriate�for�each�of�the�scales�of�measurement�is� shown�in�Box�3�3��In�the�next�chapter,�we�will�have�a�more�extended�discussion�of�the�nor- mal�distribution�(previously�introduced�in�Chapter�2),�as�well�as�the�use�of�standard�scores� as�an�alternative�to�raw�scores� 71Univariate Population Parameters and Sample Statistics STOp aNd ThINk bOx 3.3 Appropriate�Descriptive�Statistics Measurement Scale Measure of Central Tendency Measure of Dispersion Nominal •�Mode Ordinal •�Mode •�Range •�Median •�H�spread Interval/ratio •�Mode •�Range •�Median •�H�spread •�Mean •�Variance�and�standard�deviation Problems Conceptual problems 3.1� �Adding�just�one�or�two�extreme�scores�to�the�low�end�of�a�large�distribution�of�scores� will�have�a�greater�effect�on�which�one�of�the�following? � a�� Q�than�the�variance� � b�� The�variance�than�Q� � c�� The�mode�than�the�median� � d�� None�of�the�above�will�be�affected� 3.2� The�variance�of�a�distribution�of�scores�is�which�one�of�the�following? � a�� Always�1� � b�� May�be�any�number,�negative,�0,�or�positive� � c�� May�be�any�number�greater�than�0� � d�� May�be�any�number�equal�to�or�greater�than�0� 3.3� �A�20-item�statistics�test�was�graded�using�the�following�procedure:�a�correct�response� is�scored�+1,�a�blank�response�is�scored�0,�and�an�incorrect�response�is�scored�−1��The� highest�possible�score�is�+20;�the�lowest�score�possible�is�−20��Because�the�variance�of� the�test�scores�for�the�class�was�−3,�we�conclude�which�one�of�the�following? � a�� The�class�did�very�poorly�on�the�test� � b�� The�test�was�too�difficult�for�the�class� � c�� Some�students�received�negative�scores� � d�� A�computational�error�certainly�was�made� 3.4� �Adding� just� one� or� two� extreme� scores� to� the� high� end� of� a� large� distribution� of� scores�will�have�a�greater�effect�on�which�one�of�the�following? � a�� The�mode�than�the�median� � b�� The�median�than�the�mode� � c�� The�mean�than�the�median� � d�� None�of�the�above�will�be�affected� 3.5� �In� a� negatively� skewed� distribution,� the� proportion� of� scores� between� Q1� and� the� median�is�less�than��25��True�or�false? 72 An Introduction to Statistical Concepts 3.6� Median�is�to�ordinal�as�mode�is�to�nominal��True�or�false? 3.7� �I�assert�that�it�is�appropriate�to�utilize�the�mean�in�dealing�with�class�rank�data��Am� I�correct? 3.8� �For� a� perfectly� symmetrical� distribution� of� data,� the� mean,� median,� and� mode� are� calculated�� I� assert� that� the� values� of� all� three� measures� are� necessarily� equal�� Am� I correct? 3.9� �In� a� distribution� of� 100� scores,� the� top� 10� examinees� received� an� additional� bonus� of� 5� points�� Compared� to� the� original� median,� I� assert� that� the� median� of� the� new� (revised)�distribution�will�be�the�same�value��Am�I�correct? 3.10� �A�set�of�eight�scores�was�collected,�and�the�variance�was�found�to�be�0��I�assert�that�a� computational�error�must�have�been�made��Am�I�correct? 3.11� �Researcher�A�and�Researcher�B�are�using�the�same�dataset�(n�=�10),�where�Researcher� A� computes� the� sample� variance,� and� Researcher� B� computes� the� population� vari- ance�� The� values� are� found� to� differ� by� more� than� rounding� error�� I� assert� that� a� computational�error�must�have�been�made��Am�I�correct? 3.12� �For� a� set� of� 10� test� scores,� which� of� the� following� values� will� be� different� for� the� sample�statistic�and�population�parameter? � a�� Mean � b�� H � c�� Range � d�� Variance 3.13� Median�is�to�H�as�mean�is�to�standard�deviation��True�or�false? 3.14� �The� inclusive� range� will� be� greater� than� the� exclusive� range� for� any� data�� True� or� false? 3.15� �For�a�set�of�IQ�test�scores,�the�median�was�computed�to�be�95�and�Q1�to�be�100��I�assert� that�the�statistician�is�to�be�commended�for�their�work��Am�I�correct? 3.16� �A� physical� education� teacher� is� conducting� research� related� to� elementary� chil- dren’s� time� spent� in� physical� activity�� As� part� of� his� research,� he� collects� data� from�schools�related�to�the�number�of�minutes�that�they�require�children�to�par- ticipate� in� physical� education� classes�� She� finds� that� the� most� frequently� occur- ring�number�of�minutes�required�for�children�to�participate�in�physical�education� classes�is�22�00�minutes��Which�measure�of�central�tendency�does�this�statement� represent? � a�� Mean � b�� Median � c�� Mode � d�� Range � e�� Standard�deviation 3.17� �A�physical�education�teacher�is�conducting�research�related�to�elementary�children’s� time�spent�in�physical�activity��As�part�of�his�research,�he�collects�data�from�schools� related�to�the�number�of�minutes�that�they�require�children�to�participate�in�physical� education�classes��She�finds�that�the�fewest�number�of�minutes�required�per�week�is� 73Univariate Population Parameters and Sample Statistics 15�minutes�and�the�maximum�number�of�minutes�is�45��Which�measure�of�dispersion� do�these�values�reflect? � a�� Mean � b�� Median � c�� Mode � d�� Range � e�� Standard�deviation 3.18� �A�physical�education�teacher�is�conducting�research�related�to�elementary�children’s� time�spent�in�physical�activity��As�part�of�his�research,�he�collects�data�from�schools� related�to�the�number�of�minutes�that�they�require�children�to�participate�in�physical� education�classes��She�finds�that�50%�of�schools�required�20�or�more�minutes�of�par- ticipation�in�physical�education�classes��Which�measure�of�central�tendency�does�this� statement�represent? � a�� Mean � b�� Median � c�� Mode � d�� Range � e�� Standard�deviation 3.19� �One�item�on�a�survey�of�recent�college�graduates�asks�students�to�indicate�if�they�plan� to�live�within�a�50�mile�radius�of�the�university��Responses�to�the�question�include� “yes”�or�“no�”�The�researcher�who�gathers�these�data�computes�the�variance�of�this� variable��Is�this�appropriate�given�the�measurement�scale�of�this�variable? 3.20� �A�marriage�and�family�counselor�randomly�samples�250�clients�and�collects�data�on� the�number�of�hours�they�spent�in�counseling�during�the�past�year��What�is�the�most� stable� measure� of� central� tendency� to� compute� given� the� measurement� scale� of� this� variable? � a�� Mean � b�� Median � c�� Mode � d�� Range � e�� Standard�deviation Computational problems 3.1� �For�the�population�data�in�Computational�Problem�2�1,�and�again�assuming�an�inter- val�width�of�1,�compute�the�following: � a�� Mode � b�� Median � c�� Mean � d�� Exclusive�and�inclusive�range � e�� H�spread � f�� Variance�and�standard�deviation 74 An Introduction to Statistical Concepts 3.2� �Given�a�negatively�skewed�distribution�with�a�mean�of�10,�a�variance�of�81,�and�N�=�500,� what�is�the�numerical�value�of�the�following? ( )Xi i N − = ∑ µ 1 3.3� �For�the�sample�data�in�Computational�Problem�2�2,�and�again�assuming�an�interval� width�of�1,�compute�the�following: � a�� Mode � b�� Median � c�� Mean � d�� Exclusive�and�inclusive�range � e�� H�spread � f�� Variance�and�standard�deviation 3.4� �For�the�sample�data�in�Computational�Problem�4�(classroom�test�scores)�of�Chapter�2,� and�again�assuming�an�interval�width�of�1,�compute�the�following: � a�� Mode � b�� Median � c�� Mean � d�� Exclusive�and�inclusive�range � e�� H�spread � f�� Variance�and�standard�deviation 3.5� A�sample�of�30�test�scores�is�as�follows: X f 8 1 9 4 10 3 11 7 12 9 13 0 14 0 15 3 16 0 17 0 18 2 19 0 20 1 75Univariate Population Parameters and Sample Statistics Compute�each�of�the�following�statistics: � a�� Mode � b�� Median � c�� Mean � d�� Exclusive�and�inclusive�range � e�� H�spread � f�� Variance�and�standard�deviation 3.6� �Without�doing�any�computations,�which�of�the�following�distributions�has�the�largest� variance? X f Y f Z f 15 6 15 4 15 2 16 7 16 7 16 7 17 9 17 11 17 13 18 9 18 11 18 13 19 7 19 7 19 7 20 6 20 4 20 2 3.7� �Without� doing� any� computations,� which� of� the� following� distributions� has� the� largest�variance? X f Y f Z f 5 3 5 1 5 6 6 2 6 0 6 2 7 4 7 4 7 3 8 3 8 3 8 1 9 5 9 2 9 0 10 2 10 1 10 7 Interpretive problems 3.1� Select�one�interval�or�ratio�variable�from�the�survey1�sample�dataset�on�the�website� � a�� Calculate�all�of�the�measures�of�central�tendency�and�dispersion�discussed�in�this� chapter�that�are�appropriate�for�this�measurement�scale� � b�� Write�an�APA-style�paragraph�which�summarizes�the�findings� 3.2� Select�one�ordinal�variable�from�the�survey1�sample�dataset�on�the�website� � a�� Calculate� the� measures� of� central� tendency� and� dispersion� discussed� in� this� chapter�that�are�appropriate�for�this�measurement�scale� � b�� Write�an�APA-style�paragraph�which�summarizes�the�findings� 77 4 Normal Distribution and Standard Scores Chapter Outline 4�1� Normal�Distribution 4�1�1� History 4�1�2� Characteristics 4�2� Standard�Scores 4�2�1� z�Scores 4�2�2� Other�Types�of�Standard�Scores 4�3� Skewness�and�Kurtosis�Statistics 4�3�1� Symmetry 4�3�2� Skewness 4�3�3� Kurtosis 4�4� SPSS 4�5� Templates�for�Research�Questions�and�APA-Style�Paragraph Key Concepts � 1�� Normal�distribution�(family�of�distributions,�unit�normal�distribution,�area�under� the�curve,�points�of�inflection,�asymptotic�curve) � 2�� Standard�scores�[z,�College�Entrance�Examination�Board�(CEEB),�T,�IQ] � 3�� Symmetry � 4�� Skewness�(positively�skewed,�negatively�skewed) � 5�� Kurtosis�(leptokurtic,�platykurtic,�mesokurtic) � 6�� Moments�around�the�mean In�Chapter�3,�we�continued�our�discussion�of�descriptive�statistics,�previously�defined� as�techniques�that�allow�us�to�tabulate,�summarize,�and�depict�a�collection�of�data�in� an� abbreviated� fashion�� There� we� considered� the� following� three� topics:� summation� notation� (method� for� summing� a� set� of� scores),� measures� of� central� tendency� (mea- sures� for� boiling� down� a� set� of� scores� into� a� single� value� used� to� represent� the� data),� and�measures�of�dispersion�(measures�dealing�with�the�extent�to�which�a�collection�of� scores�vary)� 78 An Introduction to Statistical Concepts In�this�chapter,�we�delve�more�into�the�field�of�descriptive�statistics�in�terms�of�three�addi- tional�topics��First,�we�consider�the�most�commonly�used�distributional�shape,�the�normal� distribution��Although�in�this�chapter�we�discuss�the�major�characteristics�of�the�normal�dis- tribution�and�how�it�is�used�descriptively,�in�later�chapters�we�see�how�the�normal�distribu- tion�is�used�inferentially�as�an�assumption�for�certain�statistical�tests��Second,�several�types� of�standard�scores�are�considered��To�this�point,�we�have�looked�at�raw�scores�and�deviation� scores��Here�we�consider�scores�that�are�often�easier�to�interpret,�known�as�standard�scores�� Then� we� examine� two� other� measures� useful� for� describing� a� collection� of� data,� namely,� skewness�and�kurtosis��As�we�show�shortly,�skewness�refers�to�the�lack�of�symmetry�of�a�dis- tribution�of�scores,�and�kurtosis�refers�to�the�peakedness�of�a�distribution�of�scores��Finally,� we� provide� a� template� for� writing� research� questions,� develop� an� APA-style� paragraph� of� results�for�an�example�dataset,�and�also�illustrate�the�use�of�SPSS��Concepts�to�be�discussed� include�the�normal�distribution�(i�e�,�family�of�distributions,�unit�normal�distribution,�area� under�the�curve,�points�of�inflection,�asymptotic�curve),�standard�scores�(e�g�,�z,�CEEB,�T,�IQ),� symmetry,�skewness�(positively�skewed,�negatively�skewed),�kurtosis�(leptokurtic,�platykur- tic,�mesokurtic),�and�moments�around�the�mean��Our�objectives�are�that�by�the�end�of�this� chapter,�you�will�be�able�to�(a)�understand�the�normal�distribution�and�utilize�the�normal� table,� (b)� determine� and� interpret� different� types� of� standard� scores,� particularly� z� scores,� and�(c)�understand�and�interpret�skewness�and�kurtosis�statistics� 4.1 Normal Distribution You�may�remember�the�following�research�scenario�that�was�first�introduced�in�Chapter�2�� We�will�revisit�Marie�in�this�chapter� Marie,�a�graduate�student�pursuing�a�master’s�degree�in�educational�research,�has�been� assigned�to�her�first�task�as�a�research�assistant��Her�faculty�mentor�has�given�Marie� quiz�data�collected�from�25�students�enrolled�in�an�introductory�statistics�course�and� has� asked� Marie� to� summarize� the� data�� The� faculty� member,� who� continues� to� be� pleased� with� the� descriptive� analysis� and� presentation� of� results� previously� shared,� has� asked� Marie� to� revisit� the� following� research� question� related� to� distributional� shape:� What is the distributional shape of the statistics quiz score?� Additionally,� Marie’s� faculty� mentor� has� asked� Marie� to� standardize� the� quiz� score� and� compare� student� 1� to�student�3�relative�to�the�mean��The�corresponding�research�question�that�Marie�is� provided� for� this� analysis� is� as� follows:� In standard deviation units, what is the relative standing to the mean of student 1 compared to student 3? Recall�from�Chapter�2�that�there�are�several�commonly�seen�distributions��The�most�com- monly�observed�and�used�distribution�is�the�normal�distribution��It�has�many�uses�both�in� descriptive�and�inferential�statistics,�as�we�show��In�this�section,�we�discuss�the�history�of� the�normal�distribution�and�the�major�characteristics�of�the�normal�distribution� 4.1.1 history Let�us�first�consider�a�brief�history�of�the�normal�distribution��From�the�time�that�data�were� collected�and�distributions�examined,�a�particular�bell-shaped�distribution�occurred�quite� often�for�many�variables�in�many�disciplines�(e�g�,�many�physical,�cognitive,�physiological,� 79Normal Distribution and Standard Scores and� motor� attributes)�� This� has� come� to� be� known� as� the� normal distribution�� Back� in� the� 1700s,� mathematicians� were� called� on� to� develop� an� equation� that� could� be� used� to� approximate�the�normal�distribution��If�such�an�equation�could�be�found,�then�the�prob- ability� associated� with� any� point� on� the� curve� could� be� determined,� and� the� amount� of� space�or�area�under�any�portion�of�the�curve�could�also�be�determined��For�example,�one� might� want� to� know� what� the� probability� of� being� taller� than� 6′2″� would� be� for� a� male,� given� that� height� is� normally� shaped� for� each� gender�� Until� the� 1920s,� the� development� of� this� equation� was� commonly� attributed� to� Karl� Friedrich� Gauss�� Until� that� time,� this� distribution�was�known�as�the�Gaussian�curve��However,�in�the�1920s,�Karl�Pearson�found� this�equation�in�an�earlier�article�written�by�Abraham�DeMoivre�in�1733�and�renamed�the� curve�as�the�normal�distribution��Today�the�normal�distribution�is�obviously�attributed�to� DeMoivre� 4.1.2 Characteristics There�are�seven�important�characteristics�of�the�normal�distribution��Because�the�nor- mal�distribution�occurs�frequently,�features�of�the�distribution�are�standard�across�all� normal� distributions�� This� “standard� curve”� allows� us� to� make� comparisons� across� two�or�more�normal�distributions�as�well�as�look�at�areas�under�the�curve,�as�becomes� evident� 4.1.2.1 Standard Curve First,�the�normal�distribution�is�a�standard�curve�because�it�is�always�(a)�symmetric�around� the�mean,�(b)�unimodal,�and�(c)�bell-shaped��As�shown�in�Figure�4�1,�if�we�split�the�distri- bution�in�one-half�at�the�mean�(μ),�the�left-hand�half�(below�the�mean)�is�the�mirror�image� of�the�right-hand�half�(above�the�mean)��Also,�the�normal�distribution�has�only�one�mode,� and�the�general�shape�of�the�distribution�is�bell-shaped�(some�even�call�it�the�bell-shaped� curve)��Given�these�conditions,�the�mean,�median,�and�mode�will�always�be�equal�to�one� another�for�any�normal�distribution� –3 –2 –1 1 Mean 2 3 13.59%13.59% 34.13% 34.13% 2.14% 2.14% FIGuRe 4.1 The�normal�distribution� 80 An Introduction to Statistical Concepts 4.1.2.2 Family of Curves Second,�there�is�no�single�normal�distribution,�but�rather�the�normal�distribution�is�a�fam- ily�of�curves��For�instance,�one�particular�normal�curve�has�a�mean�of�100�and�a�vari- ance�of�225�(recall�that�the�standard�deviation�is�the�square�root�of�the�variance;�thus,� the�standard�deviation�in�this�instance�is�15)��This�normal�curve�is�exemplified�by�the� Wechsler� intelligence� scales�� Another� specific� normal� curve� has� a� mean� of� 50� and� a� variance�of�100�(standard�deviation�of�10)��This�normal�curve�is�used�with�most�behav- ior�rating�scales��In�fact,�there�are�an�infinite�number�of�normal�curves,�one�for�every� distinct�pair�of�values�for�the�mean�and�variance��Every�member�of�the�family�of�nor- mal� curves� has� the� same� characteristics;� however,� the� scale� of� X,� the� mean� of� X,� and� the�variance�(and�standard�deviation)�of�X�can�differ�across�different�variables�and/or� populations� To� keep� the� members� of� the� family� distinct,� we� use� the� following� notation�� If� the� variable�X�is�normally�distributed,�we�write�X ∼ N(μ,�σ2)��This�is�read�as�“X�is�distrib- uted�normally�with�population�mean�μ�and�population�variance�σ2�”�This�is�the�general� notation;�for�notation�specific�to�a�particular�normal�distribution,�the�mean�and�vari- ance�values�are�given��For�our�examples,�the�Wechsler�intelligence�scales�are�denoted� by�X ∼ N(100,�225),�whereas�the�behavior�rating�scales�are�denoted�by�X ∼ N(50,�100)�� Narratively�speaking�therefore,�the�Wechsler�intelligence�scale�is�distributed�normally� with�a�population�mean�of�100�and�population�variance�of�225��A�similar�interpretation� can�be�made�on�the�behavior�rating�scale� 4.1.2.3 Unit Normal Distribution Third,�there�is�one�particular�member�of�the�family�of�normal�curves�that�deserves�addi- tional�attention��This�member�has�a�mean�of�0�and�a�variance�(and�standard�deviation)�of�1� and�thus�is�denoted�by�X ∼ N(0,�1)��This�is�known�as�the�unit normal distribution�(unit� referring�to�the�variance�of�1)�or�as�the�standard unit normal distribution��On�a�related� matter,�let�us�define�a�z�score�as�follows: z X i i= −( )µ σ The� numerator� of� this� equation� is� actually� a� deviation� score,� previously� described� in� Chapter� 3,� and� indicates� how� far� above� or� below� the� mean� an� individual’s� score� falls�� When�we�divide�the�deviation�from�the�mean�(i�e�,�the�numerator)�by�the�standard�devia- tion�(i�e�,�denominator),�the�value�derived�indicates�how�many�deviations�above�or�below�the� mean�an�individual’s�score�falls��If�one�individual�has�a�z�score�of�+1�00,�then�the�person� falls�one�standard�deviation�above�the�mean��If�another�individual�has�a�z�score�of�−2�00,� then�that�person�falls�two�standard�deviations�below�the�mean��There�is�more�to�say�about�this� as�we�move�along�in�this�section� 4.1.2.4 Area The� fourth� characteristic� of� the� normal� distribution� is� the� ability� to� determine� any� area� under�the�curve��Specifically,�we�can�determine�the�area�above�any�value,�the�area�below� any�value,�or�the�area�between�any�two�values�under�the�curve��Let�us�chat�about�what�we� mean�by�area��If�you�return�to�Figure�4�1,�areas�for�different�portions�of�the�curve�are�listed�� 81Normal Distribution and Standard Scores Here�area�is�defined�as�the�percentage�or�amount�of�space�of�a�distribution,�either�above� a� certain� score,� below� a� certain� score,� or� between� two� different� scores�� For� example,� we� see�that�the�area�between�the�mean�and�one�standard�deviation�above�the�mean�is�34�13%�� In�other�words,�roughly�a�third�of�the�entire�distribution�falls�into�that�region��The�entire� area� under� the� curve� then� represents� 100%,� and� smaller� portions� of� the� curve� represent� somewhat�less�than�that� For�example,�say�you�wanted�to�know�what�percentage�of�adults�had�an�IQ�score�over�120,� what�percentage�of�adults�had�an�IQ�score�under�107,�or�what�percentage�of�adults�had�an�IQ� score�between�107�and�120��How�can�we�compute�these�areas�under�the�curve?�A�table�of�the� unit�normal�distribution�has�been�developed�for�this�purpose��Although�similar�tables�could� also�be�developed�for�every�member�of�the�normal�family�of�curves,�these�are�unnecessary,� as�any�normal�distribution�can�be�converted�to�a�unit�normal�distribution��The�unit�normal� table�is�given�in�Table�A�1� Turn�to�Table�A�1�now�and�familiarize�yourself�with�its�contents��To�help�illustrate,�a� portion�of�the�table�is�presented�in�Figure�4�2��The�first�column�simply�lists�the�values� of�z��These�are�standardized�scores�on�the�X�axis��Note�that�the�values�of�z�only�range� from�0�to�4�0��There�are�two�reasons�for�this��First,�values�above�4�0�are�rather�unlikely,� as�the�area�under�that�portion�of�the�curve�is�negligible�(less�than��003%)��Second,�val- ues�below�0�(i�e�,�negative�z�scores)�are�not�really�necessary�to�present�in�the�table,�as�the� normal�distribution�is�symmetric�around�the�mean�of�0��Thus,�that�portion�of�the�table� would�be�redundant�and�is�not�shown�here�(we�show�how�to�deal�with�this�situation�for� some�example�problems�in�a�bit)� The� second� column,� labeled� P(z),� gives� the� area� below� the� respective� value� of� z—in� other�words,�the�area�between�that�value�of�z�and�the�most�extreme�left-hand�portion�of� the�curve�[i�e�,�−∞�(negative�infinity)�on�the�far�negative�or�left-hand�side�of�0]��So�if�we� wanted�to�know�what�the�area�was�below�z�=�+1�00,�we�would�look�in�the�first�column� under�z�=�1�00�and�then�look�in�the�second�column�(P(z))�to�find�the�area�of��8413��This� value,��8413,�represents�the�percentage�of�the�distribution�that�is�smaller�than�z�of�+1�00��It� also�represents�the�probability�that�a�score�will�be�smaller�than�z�of�+1�00��In�other�words,� about�84%�of�the�distribution�is�less�than�z�of�+1�00,�and�the�probability�that�a�value�will� be�less�than�z�of�+1�00�is�about�84%��More�examples�are�considered�later�in�this�section� z scores are standardized scores on the X axis. .5000000 .5039694 .5079783 .5119665 .5159534 .5199388 .6914625 .6949743 .6984682 .7019440 .7054015 .7088403 .8414625 .8437524 .8461358 .8484950 .8508300 .8531409 .9331928 .9344783 .9357445 .9369916 .9382198 .9394292 .00 .01 .02 .03 .04 .04 .50 .51 .52 .53 .54 .55 1.00 1.01 1.02 1.03 1.04 1.05 1.50 1.51 1.52 1.53 1.54 1.55 P(z) P(z)P(z)P(z) zzzz P(z) values indicate the percentage of the z distribution that is smaller than the respective z value and it also represents the probability that a value will be less than that respective z value. FIGuRe 4.2 Portion�of�z�table� 82 An Introduction to Statistical Concepts 4.1.2.5 Transformation to Unit Normal Distribution A� fifth� characteristic� is� any� normally� distributed� variable,� regardless� of� the� mean� and� variance,�can�be�converted�into�a�unit�normally�distributed�variable��Thus,�our�Wechsler� intelligence� scales� as� denoted� by� X ∼ N(100,� 225)� can� be� converted� into� z ∼ N(0,� 1)�� Conceptually,�this�transformation�is�done�by�moving�the�curve�along�the�X�axis�until�it� is�centered�at�a�mean�of�0�(by�subtracting�out�the�original�mean)�and�then�by�stretching� or� compressing� the� distribution� until� it� has� a� variance� of� 1� (remember,� however,� that� the�shape�of�the�distribution�does�not�change�during�the�standardization�process—only� those� values� on� the� X� axis)�� This� allows� us� to� make� the� same� interpretation� about� any� individual’s� score� on� any� normally� distributed� variable�� If� z� =� +1�00,� then� for� any� vari- able,�this�implies�that�the�individual�falls�one�standard�deviation�above�the�mean� This� also� allows� us� to� make� comparisons� between� two� different� individuals� or� across� two� different� variables�� If� we� wanted� to� make� comparisons� between� two� different� indi- viduals�on�the�same�variable�X,�then�rather�than�comparing�their�individual�raw�scores,� X1�and�X2,�we�could�compare�their�individual�z�scores,�z1�and�z2,�where z X 1 1= −( )µ σ and z X 2 2= −( )µ σ This� is� the� reason� we� only� need� the� unit� normal� distribution� table� to� determine� areas� under� the� curve� rather� than� a� table� for� every� member� of� the� normal� distribution� fam- ily�� In� another� situation,� we� may� want� to� compare� scores� on� the� Wechsler� intelligence� scales�[X ∼ N(100,�225)]�to�scores�on�behavior�rating�scales�[X ∼ N(50,�100)]�for�the�same� individual��We�would�convert�to�z�scores�again�for�two�variables,�and�then�direct�com- parisons�could�be�made� It�is�important�to�note�that�in�standardizing�a�variable,�it�is�only�the�values�on�the�X�axis� that�change��The�shape�of�the�distribution�(e�g�,�skewness�and�kurtosis)�remains�the�same� 4.1.2.6 Constant Relationship with Standard Deviation The� sixth� characteristic� is� that� the� normal� distribution� has� a� constant� relationship� with� the�standard�deviation��Consider�Figure�4�1�again��Along�the�X�axis,�we�see�values�repre- sented�in�standard�deviation�increments��In�particular,�from�left�to�right,�the�values�shown� are�three,�two,�and�one�standard�deviation�units�below�the�mean�and�one,�two,�and�three� standard�deviation�units�above�the�mean��Under�the�curve,�we�see�the�percentage�of�scores� that�are�under�different�portions�of�the�curve��For�example,�the�area�between�the�mean�and� one�standard�deviation�above�or�below�the�mean�is�34�13%��The�area�between�one�standard� deviation�and�two�standard�deviations�on�the�same�side�of�the�mean�is�13�59%,�the�area� between�two�and�three�standard�deviations�on�the�same�side�is�2�14%,�and�the�area�beyond� three�standard�deviations�is��13%� In�addition,�three�other�areas�are�often�of�interest��The�area�within�one�standard�devi- ation�of�the�mean,�from�one�standard�deviation�below�the�mean�to�one�standard�devia- tion�above�the�mean,�is�approximately�68%�(or�roughly�two-thirds�of�the�distribution)�� The� area� within� two� standard� deviations� of� the� mean,� from� two� standard� deviations� 83Normal Distribution and Standard Scores below� the� mean� to� two� standard� deviations� above� the� mean,� is� approximately� 95%�� The�area�within�three�standard�deviations�of�the�mean,�from�three�standard�deviations� below�the�mean�to�three�standard�deviations�above�the�mean,�is�approximately�99%��In� other�words,�nearly�all�of�the�scores�will�be�within�two�or�three�standard�deviations�of� the�mean�for�any�normal�curve� 4.1.2.7 Points of Inflection and Asymptotic Curve The� seventh� and� final� characteristic� of� the� normal� distribution� is� as� follows�� The� points of inflection� are� where� the� curve� changes� from� sloping� down� (concave)� to� sloping� up� (convex)��These�points�occur�precisely�at�one�standard�deviation�unit�above�and�below�the� mean��This�is�more�a�matter�of�mathematical�elegance� than�a�statistical�application��The� curve�also�never�touches�the�X�axis��This�is�because�with�the�theoretical�normal�curve,�all� values�from�negative�infinity�to�positive�infinity�have�a�nonzero�probability�of�occurring�� Thus,� while� the� curve� continues� to� slope� ever-downward� toward� more� extreme� scores,� it�approaches,�but�never�quite�touches,�the�X�axis��The�curve�is�referred�to�here�as�being� asymptotic��This�allows�for�the�possibility�of�extreme�scores� Examples:�Now�for�the�long-awaited�examples�for�finding�area�using�the�unit�normal�dis- tribution��These�examples�require�the�use�of�Table�A�1��Our�personal�preference�is�to�draw� a�picture�of�the�normal�curve�so�that�the�proper�area�is�determined��Let�us�consider�four� examples�of�finding�the�area�below�a�certain�value�of�z:�(1)�below�z�=�−2�50,�(2)�below�z�=�0,� (3)�below�z�=�1�00,�and�(4)�between�z�=�−2�50�and�z�=�1�00� To�determine�the�value�below�z�=�−2�50,�we�draw�a�picture�as�shown�in�Figure�4�3a��We� draw�a�vertical�line�at�the�value�of�z,�then�shade�in�the�area�we�want�to�find��Because�the� shaded�region�is�relatively�small,�we�know�the�area�must�be�considerably�smaller�than��50�� In�the�unit�normal�table,�we�already�know�negative�values�of�z�are�not�included��However,� because�the�normal�distribution�is�symmetric,�we�know�the�area�below�−2�50�is�the�same�as� the�area�above�+2�50��Thus,�we�look�up�the�area�below�+2�50�and�find�the�value�of��9938��We� subtract�this�from�1�0000�and�find�the�value�of��0062,�or��62%,�a�very�small�area�indeed� How�do�we�determine�the�area�below�z�=�0�(i�e�,�the�mean)?�As�shown�in�Figure�4�3b,�we� already�know�from�reading�this�section�that�the�area�has�to�be��5000�or�one-half�of�the�total� area�under�the�curve��However,�let�us�look�in�the�table�again�for�area�below�z�=�0,�and�we� find�the�area�is��5000��How�do�we�determine�the�area�below�z�=�1�00?�As�shown�in�Figure� 4�3c,�this�region�exists�on�both�sides�of�0�and�actually�constitutes�two�smaller�areas,�the�first� area�below�0�and�the�second�area�between�0�and�1��For�this�example,�we�use�the�table�directly� and�find�the�value�of��8413��We�leave�you�with�two�other�problems�to�solve�on�your�own�� First,�what�is�the�area�below�z�=��50�(answer:��6915)?�Second,�what�is�the�area�below�z�=�1�96� (answer:��9750)? Because�the�unit�normal�distribution�is�symmetric,�finding�the�area�above�a�certain�value� of�z�is�solved�in�a�similar�fashion�as�the�area�below�a�certain�value�of�z��We�need�not�devote� any�further�attention�to�that�particular�situation��However,�how�do�we�determine�the�area� between� two� values� of� z?� This� is� a� little� different� and� needs� some� additional� discussion�� Consider� as� an� example� finding� the� area� between� z� =� −2�50� and� z� =� 1�00,� as� depicted� in� Figure� 4�3d�� Here� we� see� that� the� shaded� region� consists� of� two� smaller� areas,� the� area� between�the�mean�and�−2�50�and�the�area�between�the�mean�(z�=�0)�and�1�00��Using�the�table� again,� we� find� the� area� below� 1�00� is� �8413� and� the� area� below� −2�50� is� �0062�� Thus,� the� shaded�region�is�the�difference�as�computed�by��8413�−��0062�=��8351��On�your�own,�determine� the�area�between�z�=�−1�27�and�z�=��50�(answer:��5895)� 84 An Introduction to Statistical Concepts Finally,�what�if�we�wanted�to�determine�areas�under�the�curve�for�values�of�X�rather�than� z?�The�answer�here�is�simple,�as�you�might�have�guessed��First�we�convert�the�value�of�X� to�a�z�score;�then�we�use�the�unit�normal�table�to�determine�the�area��Because�the�normal� curve�is�standard�for�all�members�of�the�family�of�normal�curves,�the�scale�of�the�variable,� X�or�z,�is�irrelevant�in�terms�of�determining�such�areas��In�the�next�section,�we�deal�more� with�such�transformations� 4.2 Standard Scores We�have�already�devoted�considerable�attention�to�z�scores,�which�are�one�type�of�standard� score��In�this�section,�we�describe�an�application�of�z�scores�leading�up�to�a�discussion�of� other� types� of� standard� scores�� As� we� show,� the� major� purpose� of� standard� scores� is� to� place�scores�on�the�same�standard�scale�so�that�comparisons�can�be�made�across�individu- als�and/or�variables��Without�some�standard�scale,�comparisons�across�individuals�and/or� variables�would�be�difficult�to�make��Examples�are�coming�right�up� 4.2.1 z Scores A�child�comes�home�from�school�with�the�results�of�two�tests�taken�that�day��On�the�math� test,�she�receives�a�score�of�75,�and�on�the�social�studies�test,�she�receives�a�score�of�60�� As�a�parent,�the�natural�question�to�ask�is,�“Which�performance�was�the�stronger�one?”� .0062 –2.5(a) .5000 0(b) (c) (d) .8413 1.0 .8351 0–2.5 1.0 FIGuRe 4.3 Examples�of�area�under�the�unit�normal�distribution:�(a)�Area�below�z�=�−2�5��(b)�Area�below�z�=�0��(c)�Area�below� z�=�1�0��(d)�Area�between�z�=�−2�5�and�z�=�1�0� 85Normal Distribution and Standard Scores No�information�about�any�of�the�following�is�available:�maximum�score�possible,�mean� of�the�class�(or�any�other�central�tendency�measure),�or�standard�deviation�of�the�class� (or�any�other�dispersion�measure)��It�is�possible�that�the�two�tests�had�a�different�number� of� possible� points,� different� means,� and/or� different� standard� deviations�� How� can� we� possibly�answer�our�question? The�answer,�of�course,�is�to�use�z�scores�if�the�data�are�assumed�to�be�normally�distrib- uted,�once�the�relevant�information�is�obtained��Let�us�take�a�minor�digression�before�we� return�to�answer�our�question�in�more�detail��Recall�the�formula�for�standardizing�vari- able�X�into�a�z�score: z X i i X X = −( )µ σ where�the�X�subscript�has�been�added�to�the�mean�and�standard�deviation�for�purposes� of�clarifying�which�variable�is�being�considered��If�variable�X�is�the�number�of�items�cor- rect�on�a�test,�then�the�numerator�is�the�deviation�of�a�student’s�raw�score�from�the�class� mean� (i�e�,� the� numerator� is� a� deviation� score� as� previously� defined� in� Chapter� 3),� mea- sured�in�terms�of�items�correct,�and�the�denominator�is�the�standard�deviation�of�the�class,� measured� in� terms� of� items� correct�� Because� both� the� numerator� and� denominator� are� measured�in�terms�of�items�correct,�the�resultant�z�score�is�measured�in�terms�of�no�units� (as�the�units�of�the�numerator�and�denominator�essentially�cancel�out)��As�z�scores�have� no�units�(i�e�,�the�z�score�is�interpreted�as�the�number�of�standard�deviation�units�above�or� below�the�mean),�this�allows�us�to�compare�two�different�raw�score�variables�with�different� scales,�means,�and/or�standard�deviations��By�converting�our�two�variables�to�z�scores,�the� transformed�variables�are�now�on�the�same�z�score�scale�with�a�mean�of�0,�and�a�variance� and�standard�deviation�of�1� Let� us� return� to� our� previous� situation� where� the� math� test� score� is� 75� and� the� social� studies�test�score�is�60��In�addition,�we�are�provided�with�information�that�the�standard� deviation�for�the�math�test�is�15�and�the�standard�deviation�for�the�social�studies�test�is�10�� Consider�the�following�three�examples��In�the�first�example,�the�means�are�60�for�the�math� test�and�50�for�the�social�studies�test��The�z�scores�are�then�computed�as�follows: z zmath ss= − = = − = ( ) . ( ) . 75 60 15 1 0 60 50 10 1 0 The�conclusion�for�the�first�example�is�that�the�performance�on�both�tests�is�the�same;�that� is,�the�child�scored�one�standard�deviation�above�the�mean�for�both�tests� In�the�second�example,�the�means�are�60�for�the�math�test�and�40�for�the�social�studies� test��The�z�scores�are�then�computed�as�follows: z zmath ss= − = = − = ( ) . ( ) . 75 60 15 1 0 60 40 10 2 0 The�conclusion�for�the�second�example�is�that�performance�is�better�on�the�social�studies� test;� that� is,� the� child� scored� two� standard� deviations� above� the� mean� for� the� social� studies�test�and�only�one�standard�deviation�above�the�mean�for�the�math�test� 86 An Introduction to Statistical Concepts In�the�third�example,�the�means�are�60�for�the�math�test�and�70�for�the�social�studies�test�� The�z�scores�are�then�computed�as�follows: z zmath ss= − = = − = − ( ) . ( ) . 75 60 15 1 0 60 70 10 1 0 The�conclusion�for�the�third�example�is�that�performance�is�better�on�the�math�test;�that�is,� the�child�scored�one�standard�deviation�above�the�mean�for�the�math�test�and�one�standard� deviation� below� the� mean� for� the� social� studies� test�� These� examples� serve� to� illustrate� a� few� of� the� many� possibilities,� depending� on� the� particular� combinations� of� raw� score,� mean,�and�standard�deviation�for�each�variable� Let�us�conclude�this�section�by�mentioning�the�major�characteristics�of�z�scores��The�first� characteristic�is�that�z�scores�provide�us�with�comparable�distributions,�as�we�just�saw�in� the� previous� examples�� Second,� z� scores� take� into� account� the� entire� distribution� of� raw� scores��All�raw�scores�can�be�converted�to�z�scores�such�that�every�raw�score�will�have�a� corresponding�z�score��Third,�we�can�evaluate�an�individual’s�performance�relative�to�the� scores�in�the�distribution��For�example,�saying�that�an�individual’s�score�is�one�standard� deviation�above�the�mean�is�a�measure�of�relative�performance��This�implies�that�approxi- mately�84%�of�the�scores�will�fall�below�the�performance�of�that�individual��Finally,�nega- tive�values�(i�e�,�below�0)�and�decimal�values�(e�g�,�z�=�1�55)�are�obviously�possible�(and�will� most�certainly�occur)�with�z�scores��On�the�average,�about�one-half�of�the�z�scores�for�any� distribution�will�be�negative,�and�some�decimal�values�are�quite�likely��This�last�character- istic�is�bothersome�to�some�individuals�and�has�led�to�the�development�of�other�types�of� standard�scores,�as�described�in�the�next�section� 4.2.2 Other Types of Standard Scores Over�the�years,�other�standard�scores�besides�z�scores�have�been�developed,�either�to�allevi- ate�the�concern�over�negative�and/or�decimal�values�associated�with�z�scores,�or�to�obtain�a� particular�mean�and�standard�deviation��Let�us�examine�three�common�examples��The�first� additional�standard�score�is�known�as�the�College�Entrance�Examination�Board�(CEEB)�score�� This�standard�score�is�used�in�exams�such�as�the�SAT�and�the�GRE��The�subtests�for�these� exams�all�have�a�mean�of�500�and�a�standard�deviation�of�100��A�second�additional�standard� score�is�known�as�the�T�score�and�is�used�in�tests�such�as�most�behavior�rating�scales,�as�pre- viously�mentioned��The�T�scores�have�a�mean�of�50�and�a�standard�deviation�of�10��A�third� additional�standard�score�is�known�as�the�IQ�score�and�is�used�in�the�Wechsler�intelligence� scales��The�IQ�score�has�a�mean�of�100�and�a�standard�deviation�of�15�(the�Stanford–Binet� intelligence�scales�have�a�mean�of�100�and�a�standard�deviation�of�16)� Say�we�want�to�develop�our�own�type�of�standard�score,�where�we�determine�in�advance� the�mean�and�standard�deviation�that�we�would�like�to�have��How�would�that�be�done?�As� the�equation�for�z�scores�is�as�follows: z X i i X X = −( )µ σ then�algebraically�the�following�can�be�shown: X zi X X i= +µ σ 87Normal Distribution and Standard Scores If,�for�example,�we�want�to�develop�our�own�“stat”�standardized�score,�then�the�following� equation�would�be�used: stat zi stat stat i= +µ σ where stati�is�the�“stat”�standardized�score�for�a�particular�individual μstat�is�the�desired�mean�of�the�“stat”�distribution σstat�is�the�desired�standard�deviation�of�the�“stat”�distribution If� we� want� to� have� a� mean� of� 10� and� a� standard� deviation� of� 2,� then� our� equation� becomes stat zi i= +10 2 We�would�then�have�the�computer�simply�plug�in�a�z�score�and�compute�an�individual’s� “stat”�score��Thus,�a�z�score�of�1�0�would�yield�a�“stat”�standardized�score�of�12�0� Consider�a�realistic�example�where�we�have�a�raw�score�variable�we�want�to�transform� into�a�standard�score,�and�we�want�to�control�the�mean�and�standard�deviation��For�exam- ple,�we�have�statistics�midterm�raw�scores�with�225�points�possible��We�want�to�develop� a�standard�score�with�a�mean�of�50�and�a�standard�deviation�of�5��We�also�have�scores�on� other� variables� that� are� on� different� scales� with� different� means� and� different� standard� deviations�(e�g�,�statistics�final�exam�scores�worth�175�points,�a�set�of�20�lab�assignments� worth�a�total�of�200�points,�a�statistics�performance�assessment�worth�100�points)��We�can� standardize�each�of�those�variables�by�placing�them�on�the�same�scale�with�the�same�mean� and�same�standard�deviation,�thereby�allowing�comparisons�across�variables��This�is�pre- cisely� the�rationale�used�by�testing�companies�and�researchers� when�they�develop� stan- dard�scores��In�short,�from�z�scores,�we�can�develop�a�CEEB,�T,�IQ,�“stat,”�or�any�other�type� of�standard�score� 4.3 Skewness and Kurtosis Statistics In� previous� chapters,� we� discussed� the� distributional� concepts� of� symmetry,� skewness,� central�tendency,�and�dispersion��In�this�section,�we�more�closely�define�symmetry�as�well� as�the�statistics�commonly�used�to�measure�skewness�and�kurtosis� 4.3.1 Symmetry Conceptually,�we�define�a�distribution�as�being�symmetric�if�when�we�divide�the�dis- tribution� precisely� in� one-half,� the� left-hand� half� is� a� mirror� image� of� the� right-hand� half�� That� is,� the� distribution� above� the� mean� is� a� mirror� image� of� the� distribution� below�the�mean��To�put�it�another�way,�a�distribution�is�symmetric around the mean� if�for�every�score�q�units�below�the�mean,�there�is�a�corresponding�score�q�units�above� the�mean� 88 An Introduction to Statistical Concepts Two� examples� of� symmetric� distributions� are� shown� in� Figure� 4�4�� In� Figure� 4�4a,� we� have�a�normal�distribution,�which�is�clearly�symmetric�around�the�mean��In�Figure�4�4b,� we� have� a� symmetric� distribution� that� is� bimodal,� unlike� the� previous� example�� From� these�and�other�numerous�examples,�we�can�make�the�following�two�conclusions��First,�if�a� distribution�is�symmetric,�then�the�mean�is�equal�to�the�median��Second,�if�a�distribution�is� symmetric�and�unimodal,�then�the�mean,�median,�and�mode�are�all�equal��This�indicates� we�can�determine�whether�a�distribution�is�symmetric�by�simply�comparing�the�measures� of�central�tendency� 4.3.2 Skewness We� define� skewness� as� the� extent� to� which� a� distribution� of� scores� deviates� from� per- fect�symmetry��This�is�important�as�perfectly�symmetrical�distributions�rarely�occur�with� actual�sample�data�(i�e�,�“real”�data)��A�skewed�distribution�is�known�as�being�asymmetri- cal�� As� shown� in� Figure� 4�5,� there� are� two� general� types� of� skewness,� distributions� that� are�negatively�skewed,�as�in�Figure�4�5a,�and�those�that�are�positively�skewed,�as�in�Figure� 4�5b��Negatively�skewed�distributions,�which�are�skewed�to�the�left,�occur�when�most�of� the�scores�are�toward�the�high�end�of�the�distribution�and�only�a�few�scores�are�toward� the�low�end��If�you�make�a�fist�with�your�thumb�pointing�to�the�left�(skewed�to�the�left),� you� have� graphically� defined� a� negatively� skewed� distribution�� For� a� negatively� skewed� (a) (b) FIGuRe 4.4 Symmetric�distributions:�(a)�Normal�distribution��(b)�Bimodal�distribution� (a) (b) FIGuRe 4.5 Skewed�distributions:�(a)�Negatively�skewed�distribution��(b)�Positively�skewed�distribution� 89Normal Distribution and Standard Scores distribution,�we�also�find�the�following:�mode > median > mean��This�indicates�that�we�can�
determine�whether�a�distribution�is�negatively�skewed�by�simply�comparing�the�measures�
of�central�tendency�
Positively�skewed�distributions,�which�are�skewed�to�the�right,�occur�when�most�of�the�
scores� are� toward� the� low� end� of� the� distribution� and� only� a� few� scores� are� toward� the�
high�end��If�you�make�a�fist�with�your�thumb�pointing�to�the�right�(skewed�to�the�right),�
you� have� graphically� defined� a� positively� skewed� distribution�� For� a� positively� skewed�
distribution,�we�also�find�the�following:�mode < median < mean��This�indicates�that�we�can� determine�whether�a�distribution�is�positively�skewed�by�simply�comparing�the�measures� of�central�tendency� The� most� commonly� used� measure� of� skewness� is� known� as� γ1� (Greek� letter� gamma),� which�is�mathematically�defined�as�follows: γ 1 3 1= = ∑ z N i i N where�we�take�the�z�score�for�each�individual,�cube�it,�sum�across�all�N�individuals,�and�then� divide� by� the� number� of� individuals� N�� This� measure� is� available� in� nearly� all� computer� packages,� so� hand� computations� are� not� necessary�� The� characteristics� of� this� measure� of� skewness�are�as�follows:�(a)�a�perfectly�symmetrical�distribution�has�a�skewness�value�of 0,� (b)�the�range�of�values�for�the�skewness�statistic�is�approximately�from�−3�to�+3,�(c) nega- tively�skewed�distributions�have�negative�skewness�values,�and�(d)�positively�skewed�dis- tributions�have�positive�skewness�values� There�are�different�rules�of�thumb�for�determining�how�extreme�skewness�can�be�and� still�retain�a�relatively�normal�distribution��One�simple�rule�of�thumb�is�that�skewness� values� within� ±2�0� are� considered� relatively� normal,� with� more� conservative� research- ers� applying� a� ±3�0� guideline,� and� more� stringent� researchers� using� ±1�0�� Another� rule� of� thumb� for� determining� how� extreme� a� skewness� value� must� be� for� the� distribution� to� be� considered� nonnormal� is� as� follows:� Skewness� values� outside� the� range� of� ±� two� standard�errors�of�skewness�suggest�a�distribution�that�is�nonnormal��Applying�this�rule� of� thumb,� if� the� standard� error� of� skewness� is� �85,� then� anything� outside� of� −2(�85)� to� +2(�85),�or�−1�7�to +1�7,�would�be�considered�nonnormal��It�is�important�to�note�that�this� second�rule�of�thumb�is�sensitive�to�small�sample�sizes�and�should�only�be�considered�as� a�general�guide� 4.3.3 kurtosis Kurtosis� is� the� fourth� and� final� property� of� a� distribution� (often� referred� to� as� the� moments around the mean)��These�four�properties�are�central�tendency�(first�moment),� dispersion� (second� moment),� skewness� (third� moment),� and� kurtosis� (fourth� moment)�� Kurtosis�is�conceptually�defined�as�the�“peakedness”�of�a�distribution�(kurtosis�is�Greek� for�peakedness)��Some�distributions�are�rather�flat,�and�others�have�a�rather�sharp�peak�� Specifically,�there�are�three�general�types�of�peakedness,�as�shown�in�Figure�4�6��A�distri- bution�that�is�very�peaked�is�known�as�leptokurtic�(“lepto”�meaning�slender�or�narrow)� (Figure�4�6a)��A�distribution�that�is�relatively�flat�is�known�as�platykurtic�(“platy”�mean- ing�flat�or�broad)�(Figure�4�6b)��A�distribution�that�is�somewhere�in�between�is�known�as� mesokurtic�(“meso”�meaning�intermediate)�(Figure�4�6c)� 90 An Introduction to Statistical Concepts The�most�commonly�used�measure�of�kurtosis�is�known�as�γ2,�which�is�mathematically� defined�as γ 2 4 1 3= −= ∑ z N i i N where�we�take�the�z�score�for�each�individual,�take�it�to�the�fourth�power�(being�the�fourth� moment),� sum� across� all� N� individuals,� divide� by� the� number� of� individuals� N,� and� then� subtract�3��This�measure�is�available�in�nearly�all�computer�packages,�so�hand�computations� are�not�necessary��The�characteristics�of�this�measure�of�kurtosis�are�as�follows:�(a)�a�perfectly� mesokurtic�distribution,�which�would�be�a�normal�distribution,�has�a�kurtosis�value�of�0,� (b)� platykurtic�distributions�have�negative�kurtosis�values�(being�flat�rather�than�peaked),� and�(c)�leptokurtic�distributions�have�positive�kurtosis�values�(being�peaked)��Kurtosis�values� can�range�from�negative�to�positive�infinity� There�are�different�rules�of�thumb�for�determining�how�extreme�kurtosis�can�be�and�still� retain� a� relatively� normal� distribution�� One� simple� rule� of� thumb� is� that� kurtosis� values� within�±2�0�are�considered�relatively�normal,�with�more�conservative�researchers�applying� a�±3�0�guideline,�and�more�stringent�researchers�using�±1�0��A�rule�of�thumb�for�determin- ing�how�extreme�a�kurtosis�value�may�be�for�the�distribution�to�be�considered�nonnormal� is�as�follows:�Kurtosis�values�outside�the�range�of�±�two�standard�errors�of�kurtosis�suggest� (c) (a) (b) FIGuRe 4.6 Distributions� of� different� kurtosis:� (a)� Leptokurtic� distribution�� (b)� Platykurtic� distribution�� (c)� Mesokurtic� distribution� 91Normal Distribution and Standard Scores a� distribution� that� is� nonnormal�� Applying� this� rule� of� thumb,� if� the� standard� error� of� kurtosis� is� 1�20,� then� anything� outside� of� −2(1�20)� to� +2(1�20),� or� −2�40� to� +2�40,� would� be� considered�nonnormal��It�is�important�to�note�that�this�second�rule�of�thumb�is�sensitive�to� small�sample�sizes�and�should�only�be�considered�as�a�general�guide� Skewness�and�kurtosis�statistics�are�useful�for�the�following�two�reasons:�(a)�as�descrip- tive�statistics�used�to�describe�the�shape�of�a�distribution�of�scores�and�(b)�in�inferential� statistics,�which�often�assume�a�normal�distribution,�so�the�researcher�has�some�indication� of�whether�the�assumption�has�been�met�(more�about�this�beginning�in�Chapter�6)� 4.4 SPSS Here�we�review�what�SPSS�has�to�offer�for�examining�distributional�shape�and�computing� standard�scores��The�following�programs�have�proven�to�be�quite�useful�for�these�purposes:� “Explore,” “Descriptives,” “Frequencies,” “Graphs,”� and� “Transform.”� Instructions�for�using�each�are�provided�as�follows� Explore Explore: Step 1.� The� first� program,� “Explore,”� can� be� invoked� by� clicking� on� “Analyze”�in�the�top�pulldown�menu,�then�“Descriptive Statistics,”�and�then� “Explore.”�Following�the�screenshot�(step�1),�as�follows,�produces�the�“Explore”�dia- log�box��For�brevity,�we�have�not�reproduced�this�initial�screenshot�when�we�discuss�the� “Descriptives”� and�“Frequencies”� programs;� however,� you� see� here� where� they� can�be�found�from�the�pulldown�menus� Explore: Step 1 B A C Frequencies and Descriptives can also be invoked from this menu. 92 An Introduction to Statistical Concepts Explore: Step 2.� Next,� from� the� main�“Explore”� dialog� box,� click� the� variable� of� interest�from�the�list�on�the�left�(e�g�,�quiz),�and�move�it�into�the�“Dependent List”�box� by�clicking�on�the�arrow�button��Next,�click�on�the�“Statistics”�button�located�in�the� top�right�corner�of�the�main�dialog�box� Select the variable of interest from the list on the left and use the arrow to move to the “Dependent List” box on the right. Clicking on “Statistics” will allow you to select descriptive statistics. Explore: Step 2 Explore: Step 3.�A�new�box�labeled�“Explore: Statistics”�will�appear��Simply� place�a�checkmark�in�the�“Descriptives”�box��Next�click�“Continue.”�You�will�then�be� returned�to�the�main�“Explore”�dialog�box��From�there,�click�“OK.”�This�will�automati- cally�generate�the�skewness�and�kurtosis�values,�as�well�as�measures�of�central�tendency� and� dispersion� which� were� covered� in� Chapter� 3�� The� output� from� this� was� previously� shown�in�the�top�panel�of�Table�3�5� Explore: Step 3 Descriptives Descriptives: Step 1. The� second� program� to� consider� is� “Descriptives.”� It� can� also� be� accessed� by� going� to�“Analyze”� in� the� top� pulldown� menu,� then� selecting� “Descriptive Statistics,”�and�then�“Descriptives”�(see�“Explore: Step 1”�for� screenshots�of�these�steps)� Descriptives: Step 2.� This� will� bring� up� the� “Descriptives”� dialog� box� (see� screenshot,�step�2)��From�the�main�“Descriptives”�dialog�box,�click�the�variable�of�inter- est�(e�g�,�quiz)�and�move�into�the�“Variable(s)”�box�by�clicking�on�the�arrow��If�you�want� 93Normal Distribution and Standard Scores to�obtain�z�scores�for�this�variable�for�each�case�(e�g�,�person�or�object�that�was�measured— your�unit�of�analysis),�check�the�“Save standardized values as variables”�box� located�in�the�bottom�left�corner�of�the�main�“Descriptives”�dialog�box��This�will�insert� a�new�variable�into�your�dataset�for�subsequent�analysis�(see�screenshot�for�how�this�will� appear�in�“Data View”)��Next,�click�on�the�“Options”�button� Select the variable of interest from the list on the left and use the arrow to move to the “Variable” box on the right. Placing a checkmark here will generate a new, standardized variable in your datafile for each variable selected. Clicking on “Options” will allow you to select various statistics to be generated. Descriptives: Step 2 Descriptives: Step 3.�A�new�box�called�“Descriptives: Options”�will�appear� (see�screenshot,�step�3)�and�you�can�simply�place�a�checkmark�in�the�boxes�for�the�statistics� that� you� want� to� generate�� This� will� allow� you� to� obtain� the� skewness� and� kurtosis� val- ues,�as�well�as�measures�of�central�tendency�and�dispersion�discussed�in�Chapter�3��After� making� your� selections,� click� on� “Continue.”� You� will� then� be� returned� to� the� main� “Descriptives”�dialog�box��From�there,�click�“OK.” Statistics available when clicking on “Options” from the main dialog box for Descriptives. Placing a checkmark will generate the respective statistic in the output. Descriptives: Step 3 94 An Introduction to Statistical Concepts X – μ σ If “Save standardized values as variables” was checked on the main “Descriptives” dialog box, a new standardized variable will be created. By default, this variable name is the name of the original variable prefixed with a “Z” (denoting its standardization). It is computed using the unit normal formula: Descriptives: Saving standardized variable – – – – – – – – – Frequencies Frequencies: Step 1.�The�third�program�to�consider�is�“Frequencies,”�which�is� also� accessible� by� clicking� on� “Analyze”� in� the� top� pulldown� menu,� then� clicking� on� “Descriptive Statistics,”� and� then� selecting� “Frequencies”� (see� “Explore: Step 1”�for�screenshots�of�these�steps)� Frequencies: Step 2.�This�will�bring�up�the�“Frequencies”�dialog�box��Click�the� variable�of�interest�(e�g�,�quiz)�into�the�“Variable(s)”�box,�then�click�on�the�“Statistics”� button� 95Normal Distribution and Standard Scores Select the variable of interest from the list on the left and use the arrow to move to the “Variable” box on the right. Clicking on “Charts” will allow you to generate a histogram with normal curve (and other types of graphs). Clicking on “Statistics” will allow you to select various statistics to be generated. Frequencies: Step 2 Frequencies: Step 3.�A�new�box�labeled�“Frequencies:�Statistics”�will�appear�� Again,�you�can�simply�place�a�checkmark�in�the�boxes�for�the�statistics�that�you�want�to� generate��Here�you�can�obtain�the�skewness�and�kurtosis�values,�as�well�as�measures�of� central�tendency�and�dispersion�from�Chapter�3��If�you�click�on�the�“Charts”�button,�you� can� also� obtain� a� histogram� with� a� normal� curve� overlay� by� clicking� the�“Histogram”� radio� button� and� checking� the�“With normal curve”� box�� This� histogram� output� is� shown�in�Figure�4�7��After�making�your�selections,�click�on�“Continue.”�You�will�then�be� returned�to�the�main�“Frequencies”�dialog�box��From�there,�click�“OK.” 9 10 11 12 13 14 15 Quiz 16 17 18 19 20 1 2 3 Fr eq ue nc y 4 5 FIGuRe 4.7 SPSS� histogram� of� statistics� quiz� data� with� nor- mal�distribution�overlay� 96 An Introduction to Statistical Concepts Options available when clicking on “Statistics” from the main dialog box for Frequencies. Placing a checkmark will generate the respective statistic in the output. Check this for better accuracy with quartiles and percentiles (i.e., the median). Frequencies: Step 3 Graphs Graphs:� Two� other� programs� also� yield� a� histogram� with� a� normal� curve� overlay�� Both� can� be� accessed� by� first� going� to�“Graphs”� in� the� top� pulldown� menu�� From� there,� select� “Legacy Dialogs,”�then�“Histogram.”�Another�option�for�creating�a�histogram,�starting� again�from�the�“Graphs”�option�in�the�top�pulldown�menu,�is�to�select�“Legacy Dialogs,”� then� “Interactive,”� and� finally� “Histogram.”� From� there,� both� work� similarly� to� the� “Frequencies”�program�described�earlier� A B C Graphs: Step 1 97Normal Distribution and Standard Scores Transform Transform: Step 1.�A�final�program�that�comes�in�handy�is�for�transforming�variables,� such� as� creating� a� standardized� version� of� a� variable� (most� notably� standardization� other� than�the�application�of�the�unit�normal�formula,�where�the�unit�normal�standardization�can� be� easily� performed� as� seen� previously� by� using�“Descriptives”)�� Go� to�“Transform”� from�the�top�pulldown�menu,�and�then�select�“Compute Variables.”�A�dialog�box�labeled� “Compute Variables”�will�appear� A B Transform: Step 1 Transform: Step 2.�The�“Target Variable”�is�the�name�of�the�new�variable�you�are� creating,�and�the�“Numeric Expression”�box�is�where�you�insert�the�commands�of�which� original�variable�to�transform�and�how�to�transform�it�(e�g�,�stat�variable)��When�you�are�done� defining�the�formula,�simply�click�“OK”�to�generate�the�new�variable�in�the�data�file� The name specified here becomes the column header in “Data View.” This name must begin with a letter, and no spaces can be included. “Numeric Expression” will be where you enter the formula for your new variable. For user’s convenience, a number of formulas are already defined within SPSS and accessible through the “Function group” formulas listed below. Transform: Step 2 98 An Introduction to Statistical Concepts 4.5 Templates for Research Questions and APA-Style Paragraph As�stated�in�the�previous�chapter,�depending�on�the�purpose�of�your�research�study,�you� may�or�may�not�write�a�research�question�that�corresponds�to�your�descriptive�statistics�� If� the� end� result� of� your� research� paper� is� to� present� results� from� inferential� statistics,� it� may� be� that� your� research� questions� correspond� only� to� those� inferential� questions,� and,� thus,� no� question� is� presented� to� represent� the� descriptive� statistics�� That� is� quite� common�� On� the� other� hand,� if� the� ultimate� purpose� of� your� research� study� is� purely� descriptive� in� nature,� then� writing� one� or� more� research� questions� that� correspond� to� the� descriptive� statistics� is� not� only� entirely� appropriate� but� (in� most� cases)� absolutely� necessary� It�is�time�again�to�revisit�our�graduate�research�assistant,�Marie,�who�was�reintroduced� at�the�beginning�of�the�chapter��As�a�reminder,�her�task�was�to�continue�to�summarize�data� from� 25� students� enrolled� in� a� statistics� course,� this� time� paying� particular� attention� to� distributional�shape�and�standardization��The�questions�posed�this�time�by�Marie’s�faculty� mentor�were�as�follows:�What is the distributional shape of the statistics quiz score? In standard deviation units, what is the relative stand- ing to the mean of student 1 compared to student 3? A�template�for�writ- ing�a�descriptive�research�question�for�summarizing�distributional�shape�is�presented�as� follows�(this�may�sound�familiar�as�this�was�first�presented�in�Chapter�2�when�we�initially� discussed�distributional�shape)��This�is�followed�by�a�template�for�writing�a�research�ques- tion�related�to�standardization: What is the distributional shape of the [variable]? In standard devi- ation units, what is the relative standing to the mean of [unit 1] compared to [unit 3]? Next,� we� present� an� APA-style� paragraph� summarizing� the� results� of� the� statistics� quiz� data�example�answering�the�questions�posed�to�Marie: As shown in the top panel of Table 3.5, the skewness value is −.598 (SE = .464) and the kurtosis value is −.741 (SE = .902). Skewness and kurtosis values within the range of +/−2(SE) are generally considered normal. Given our values, skewness is within the range of −.928 to +.928 and kurtosis is within the range of −1.804 and +1.804, and these would be considered normal. Another rule of thumb is that the skew- ness and kurtosis values should fall within an absolute value of 2.0 to be considered normal. Applying this rule, normality is still evi- dent. The histogram with a normal curve overlay is depicted in Figure 4.7. Taken with the skewness and kurtosis statistics, these results indicate that the quiz scores are reasonably normally distributed. There is a slight negative skew such that there are more scores at the high end of the distribution than a typical normal distribu- tion. There is also a slight negative kurtosis indicating that the distribution is slightly flatter than a normal distribution, with a few more extreme scores at the low end of the distribution. Again, however, the values are within the range of what is considered a reasonable approximation to the normal curve. 99Normal Distribution and Standard Scores The quiz score data were standardized using the unit normal formula. After standardization, student 1’s score was −2.07 and student 3’s score was 1.40. This suggests that student 1 was slightly more than two stan- dard deviation units below the mean on the statistics quiz score, while student 3 was nearly 1.5 standard deviation units above the mean. 4.6 Summary In� this� chapter,� we� continued� our� exploration� of� descriptive� statistics� by� considering� an� important� distribution,� the� normal� distribution,� standard� scores,� and� other� characteristics� of�a�distribution�of�scores��First�we�discussed�the�normal�distribution,�with�its�history�and� important� characteristics�� In� addition,� the� unit� normal� table� was� introduced� and� used� to� determine� various� areas� under� the� curve�� Next� we� examined� different� types� of� standard� scores,�in�particular�z�scores,�as�well�as�CEEB�scores,�T�scores,�and�IQ�scores��Examples�of� types�of�standard�scores�are�summarized�in�Box�4�1��The�next�section�of�the�chapter�included� a�detailed�description�of�symmetry,�skewness,�and�kurtosis��The�different�types�of�skewness� and� kurtosis� were� defined� and� depicted�� We� finished� the� chapter� by� examining� SPSS� for� these�statistics�as�well�as�how�to�write�up�an�example�set�of�results��At�this�point,�you�should� have� met� the� following� objectives:� (a)� understand� the� normal� distribution� and� utilize� the� normal�table;�(b)�determine�and�interpret�different�types�of�standard�scores,�particularly�z� scores;�and�(c)�understand�and�interpret�skewness�and�kurtosis�statistics��In�the�next�chapter,� we�move�toward�inferential�statistics�through�an�introductory�discussion�of�probability�as� well�as�a�more�detailed�discussion�of�sampling�and�estimation� STOp aNd ThINk bOx 4.1 Examples�of�Types�of�Standard�Scores Standard Score Distributiona Z�(unit�normal) N(0,�1) CEEB�score N(500,�10,000) T�score N(50,�100) Wechsler�intelligence�scale N(100,�225) Stanford–Binet�intelligence�scale N(100,�256) a� N(μ,�σ2)� Problems Conceptual problems 4.1� For�which�of�the�following�distributions�will�the�skewness�value�be�0? � a�� N(0,�1) � b�� N(0,�2) � c�� N(10,�50) � d�� All�of�the�above 100 An Introduction to Statistical Concepts 4.2� For�which�of�the�following�distributions�will�the�kurtosis�value�be�0? � a�� N(0,�1) � b�� N(0,�2) � c�� N(10,�50) � d�� All�of�the�above 4.3� A�set�of�400�scores�is�approximately�normally�distributed�with�a�mean�of�65�and�a� standard�deviation�of�4�5��Approximately�95%�of�the�scores�would�fall�within�which� range�of�scores? � a�� 60�5�and�69�5 � b�� 56�and�74 � c�� 51�5�and�78�5 � d�� 64�775�and�65�225 4.4� What�is�the�percentile�rank�of�60�in�the�distribution�of�N(60,100)? � a�� 10 � b�� 50 � c�� 60 � d�� 100 4.5� Which� of� the� following� parameters� can� be� found� on� the� X� axis� for� a� frequency� polygon�of�a�population�distribution? � a�� Skewness � b�� Median � c�� Kurtosis � d�� Q 4.6� The�skewness�value�is�calculated�for�a�set�of�data�and�is�found�to�be�equal�to�+2�75�� This�indicates�that�the�distribution�of�scores�is�which�one�of�the�following? � a�� Highly�negatively�skewed � b�� Slightly�negatively�skewed � c�� Symmetrical � d�� Slightly�positively�skewed � e�� Highly�positively�skewed 4.7� The�kurtosis�value�is�calculated�for�a�set�of�data�and�is�found�to�be�equal�to�+2�75��This� indicates�that�the�distribution�of�scores�is�which�one�of�the�following? � a�� Mesokurtic � b�� Platykurtic � c�� Leptokurtic � d�� Cannot�be�determined 4.8� For�a�normal�distribution,�all�percentiles�above�the�50th�must�yield�positive�z�scores�� True�or�false? 4.9� If� one� knows� the� raw� score,� the� mean,� and� the� z� score,� then� one� can� calculate� the� value�of�the�standard�deviation��True�or�false? 101Normal Distribution and Standard Scores 4.10� In�a�normal�distribution,�a�z�score�of�1�0�has�a�percentile�rank�of�34��True�or�false? 4.11� The�mean�of�a�normal�distribution�of�scores�is�always�1��True�or�false? 4.12� If�in�a�distribution�of�200�IQ�scores,�the�mean�is�considerably�above�the�median,�then� the�distribution�is�which�one�of�the�following? � a�� Negatively�skewed � b�� Symmetrical � c�� Positively�skewed � d�� Bimodal 4.13� Which� of� the� following� is� indicative� of� a� distribution� that� has� a� skewness� value� of� −3�98�and�a�kurtosis�value�of�−6�72? � a�� A�left�tail�that�is�pulled�to�the�left�and�a�very�flat�distribution � b�� A�left�tail�that�is�pulled�to�the�left�and�a�distribution�that�is�neither�very�peaked� nor�very�flat � c�� A�right�tail�that�is�pulled�to�the�right�and�a�very�peaked�distribution � d�� A�right�tail�that�is�pulled�to�the�right�and�a�very�flat�distribution 4.14� Which�of�the�following�is�indicative�of�a�distribution�that�has�a�kurtosis�value�of�+4�09? � a�� Leptokurtic�distribution � b�� Mesokurtic�distribution � c�� Platykurtic�distribution � d�� Positive�skewness � e�� Negative�skewness 4.15� For�which�of�the�following�distributions�will�the�kurtosis�value�be�greatest? A f B f C f D f 11 3 11 4 11 1 11 1 12 4 12 4 12 3 12 5 13 6 13 4 13 12 13 8 14 4 14 4 14 3 14 5 15 3 15 4 15 1 15 1 � a�� Distribution�A � b�� Distribution�B � c�� Distribution�C � d�� Distribution�D 4.16� The�distribution�of�variable�X�has�a�mean�of�10�and�is�positively�skewed��The�distri- bution�of�variable�Y�has�the�same�mean�of�10�and�is�negatively�skewed��I�assert�that� the�medians�for�the�two�variables�must�also�be�the�same��Am�I�correct? 4.17� The�variance�of�z�scores�is�always�equal�to�the�variance�of�the�raw�scores�for�the�same� variable��True�or�false? 102 An Introduction to Statistical Concepts 4.18� The� mode� has� the� largest� value� of� the� central� tendency� measures� in� a� positively� skewed�distribution��True�or�false? 4.19� Which� of� the� following� represents� the� highest� performance� in� a� normal� distribution? � a�� P90 � b�� z�=�+1�00 � c�� Q3 � d�� IQ�=�115 4.20� Suzie�Smith�came�home�with�two�test�scores,�z�=�+1�in�math�and�z�=�−1�in�biology�� For which�test�did�Suzie�perform�better? 4.21� A�psychologist�analyzing�data�from�creative�intelligence�scores�finds�a�relatively�nor- mal�distribution�with�a�population�mean�of�100�and�population�standard�deviation� of� 10�� When� standardized� into� a� unit� normal� distribution,� what� is� the� mean� of� the� (standardized)�creative�intelligence�scores? � a�� 0 � b�� 70 � c�� 100 � d�� Cannot�be�determined�from�the�information�provided Computational problems 4.1� Give�the�numerical�value�for�each�of�the�following�descriptions�concerning�normal� distributions�by�referring�to�the�table�for�N(0,�1)� � a�� The�proportion�of�the�area�below�z�=�−1�66 � b�� The�proportion�of�the�area�between�z�=�−1�03�and�z�=�+1�03 � c�� The�fifth�percentile�of�N(20,�36) � d�� The�99th�percentile�of�N(30,�49) � e�� The�percentile�rank�of�the�score�25�in�N(20,�36) � f�� The�percentile�rank�of�the�score�24�5�in�N(30,�49) � g�� The�proportion�of�the�area�in�N(36,�64)�between�the�scores�of�18�and�42 4.2� Give�the�numerical�value�for�each�of�the�following�descriptions�concerning�normal� distributions�by�referring�to�the�table�for�N(0,�1)� � a�� The�proportion�of�the�area�below�z�=�−�80 � b�� The�proportion�of�the�area�between�z�=�−1�49�and�z�=�+1�49 � c�� The�2�5th�percentile�of�N(50,�81) � d�� The�50th�percentile�of�N(40,�64) � e�� The�percentile�rank�of�the�score�45�in�N(50,�81) � f�� The�percentile�rank�of�the�score�53�in�N(50,�81) � g�� The�proportion�of�the�area�in�N(36,�64)�between�the�scores�of�19�7�and�45�1 103Normal Distribution and Standard Scores 4.3� Give�the�numerical�value�for�each�of�the�following�descriptions�concerning�normal� distributions�by�referring�to�the�table�for�N(0,�1)� � a�� The�proportion�of�the�area�below�z�=�+1�50 � b�� The�proportion�of�the�area�between�z�=�−�75�and�z�=�+2�25 � c�� The�15th�percentile�of�N(12,�9) � d�� The�80th�percentile�of�N(100,000,�5,000) � e�� The�percentile�rank�of�the�score�300�in�N(200,�2500) � f�� The�percentile�rank�of�the�score�61�in�N(60,�9) � g�� The�proportion�of�the�area�in�N(500,�1600)�between�the�scores�of�350�and�550 Interpretive problems 4.1� Select� one� interval� or� ratio� variable� from� the� survey� 1� dataset� on� the� website� (e�g�,� one�idea�is�to�select�the�same�variable�you�selected�for�the�interpretive�problem�from� Chapter�3)� � a�� Determine�the�measures�of�central�tendency,�dispersion,�skewness,�and�kurtosis� � b�� Write�a�paragraph�which�summarizes�the�findings,�particularly�commenting�on� the�distributional�shape� 4.2� Using�the�same�variable�selected�in�the�previous�problem,�standardize�it�using�SPSS� � a�� Determine�the�measures�of�central�tendency,�dispersion,�skewness,�and�kurtosis� for�the�standardized�variable� � b�� Determine�the�measures�of�central�tendency,�dispersion,�skewness,�and�kurtosis� for�the�variable�in�its�original�scale�(i�e�,�the�unstandardized�variable)� � c�� Compare� and� contrast� the� differences� between� the� standardized� and� unstan- dardized�variables� 105 5 Introduction to Probability and Sample Statistics Chapter Outline 5�1� Brief�Introduction�to�Probability 5�1�1� Importance�of�Probability 5�1�2� Definition�of�Probability 5�1�3� Intuition�Versus�Probability 5�2� Sampling�and�Estimation 5�2�1� Simple�Random�Sampling 5�2�2� Estimation�of�Population�Parameters�and�Sampling�Distributions Key Concepts � 1�� Probability � 2�� Inferential�statistics � 3�� Simple�random�sampling�(with�and�without�replacement) � 4�� Sampling�distribution�of�the�mean � 5�� Variance�and�standard�error�of�the�mean�(sampling�error) � 6�� Confidence�intervals�(CIs)�(point�vs��interval�estimation) � 7�� Central�limit�theorem In�Chapter�4,�we�extended�our�discussion�of�descriptive�statistics��There�we�considered�the� following�three�general�topics:�the�normal�distribution,�standard�scores,�and�skewness�and� kurtosis��In�this�chapter,�we�begin�to�move�from�descriptive�statistics�into�inferential�statis- tics�(in�which�normally�distributed�data�play�a�major�role)��The�two�basic�topics�described� in�this�chapter�are�probability,�and�sampling�and�estimation��First,�as�a�brief�introduction� to�probability,�we�discuss�the�importance�of�probability�in�statistics,�define�probability�in�a� conceptual�and�computational�sense,�and�discuss�the�notion�of�intuition�versus�probabil- ity�� Second,� under� sampling� and� estimation,� we� formally� move� into� inferential� statistics� by�considering�the�following�topics:�simple�random�sampling�(and�briefly�other�types�of� sampling),�and�estimation�of�population�parameters�and�sampling�distributions��Concepts� to� be� discussed� include� probability,� inferential� statistics,� simple� random� sampling� (with� and�without�replacement),�sampling�distribution�of�the�mean,�variance�and�standard�error� 106 An Introduction to Statistical Concepts of�the�mean�(sampling�error),�CIs�(point�vs��interval�estimation),�and�central�limit�theorem�� Our�objectives�are�that�by�the�end�of�this�chapter,�you�will�be�able�to�(a)�understand�the� most�basic�concepts�of�probability;�(b)�understand�and�conduct�simple�random�sampling;� and�(c)�understand,�determine,�and�interpret�the�results�from�the�estimation�of�population� parameters�via�a�sample� 5.1 Brief Introduction to Probability The�area�of�probability�became�important�and�began�to�be�developed�during�the�Middle� Ages�(seventeenth�and�eighteenth�centuries),�when�royalty�and�other�well-to-do�gamblers� consulted� with� mathematicians� for� advice� on� games� of� chance�� For� example,� in� poker� if� you�hold�two�jacks,�what�are�your�chances�of�drawing�a�third�jack?�Or�in�craps,�what�is�the� chance�of�rolling�a�“7”�with�two�dice?�During�that�time,�probability�was�also�used�for�more� practical� purposes,� such� as� to� help� determine� life� expectancy� to� underwrite� life� insur- ance� policies�� Considerable� development� in� probability� has� obviously� taken� place� since� that� time�� In� this� section,� we� discuss� the� importance� of� probability,� provide� a� definition� of� probability,� and� consider� the� notion� of� intuition� versus� probability�� Although� there� is� much�more�to�the�topic�of�probability,�here�we�simply�discuss�those�aspects�of�probability� necessary�for�the�remainder�of�the�text��For�additional�information�on�probability,�take�a� look�at�texts�by�Rudas�(2004)�or�Tijms�(2004)� 5.1.1 Importance of probability Let�us�first�consider�why�probability�is�important�in�statistics��A�researcher�is�out�collect- ing�some�sample�data�from�a�group�of�individuals�(e�g�,�students,�parents,�teachers,�voters,� corporations,�animals)��Some�descriptive�statistics�are�generated�from�the�sample�data��Say� the�sample�mean,�X – ,�is�computed�for�several�variables�(e�g�,�number�of�hours�of�study�time� per� week,� grade� point� average,� confidence� in� a� political� candidate,� widget� sales,� animal� food�consumption)��To�what�extent�can�we�generalize�from�these�sample�statistics�to�their� corresponding�population�parameters?�For�example,�if�the�mean�amount�of�study�time�per� week�for�a�given�sample�of�graduate�students�is�X – �=�10�hours,�to�what�extent�are�we�able�to� generalize�to�the�population�of�graduate�students�on�the�value�of�the�population�mean�μ? As�we�see,�beginning�in�this�chapter,�inferential�statistics�involve�making�an�inference� about�population�parameters�from�sample�statistics��We�would�like�to�know�(a)�how�much� uncertainty�exists�in�our�sample�statistics�as�well�as�(b)�how�much�confidence�to�place�in� our�sample�statistics��These�questions�can�be�addressed�by�assigning�a�probability�value� to�an�inference��As�we�show�beginning�in�Chapter�6,�probability�can�also�be�used�to�make� statements�about�areas�under�a�distribution�of�scores�(e�g�,�the�normal�distribution)��First,� however,�we�need�to�provide�a�definition�of�probability� 5.1.2 definition of probability In�order�to�more�easily�define�probability,�consider�a�simple�example�of�rolling�a�six-sided�die� (as�there�are�dice�with�different�numbers�of�sides)��Each�of�the�six�sides,�of�course,�has�any- where�from�one�to�six�dots��Each�side�has�a�different�number�of�dots��What�is�the�probability� of�rolling�a�“4”?�Technically,�there�are�six�possible�outcomes�or�events�that�can�occur��One�can� 107Introduction to Probability and Sample Statistics also�determine�how�many�times�a�specific�outcome�or�event�actually�can�occur��These�two� concepts�are�used�to�define�and�compute�the�probability�of�a�particular�outcome�or�event�by � p A S T ( ) = where p(A)�is�the�probability�that�outcome�or�event�A�will�occur S�is�the�number�of�times�that�the�specific�outcome�or�event�A�can�occur T�is�the�total�number�of�outcomes�or�events�possible Let�us�revisit�our�example,�the�probability�of�rolling�a�“4�”�A�“4”�can�occur�only�once,�thus� S�=�1��There�are�six�possible�values�that�can�be�rolled,�thus�T�=�6��Therefore�the�probability� of�rolling�a�“4”�is�determined�by � p S T ( )4 1 6 = = This�assumes,�however,�that�the�die�is�unbiased,�which�means�that�the�die�is�fair�and�that� the�probability�of�obtaining�any�of�the�six�outcomes�is�the�same��For�a�fair,�unbiased�die,� the�probability�of�obtaining�any�outcome�is�1/6��Gamblers�have�been�known�to�possess�an� unfair,�biased�die�such�that�the�probability�of�obtaining�a�particular�outcome�is�different� from�1/6�(e�g�,�to�cheat�their�opponent�by�shaving�one�side�of�the�die)� Consider�one�other�classic�probability�example��Imagine�you�have�an�urn�(or�other�con- tainer)��Inside�of�the�urn�and�out�of�view�are�a�total�of�nine�balls�(thus�T�=�9),�six�of�the�balls� being�red�(event�A;�S�=�6)�and�the�other�three�balls�being�green�(event�B;�S�=�3)��Your�task� is�to�draw�one�ball�out�of�the�urn�(without�looking)�and�then�observe�its�color��The�prob- ability�of�each�of�these�two�events�occurring�on�the�first�draw�is�as�follows: � p A S T ( ) = = = 6 9 2 3 � p B S T ( ) = = = 3 9 1 3 Thus�the�probability�of�drawing�a�red�ball�is�2/3,�and�the�probability�of�drawing�a�green� ball�is�1/3� Two�notions�become�evident�in�thinking�about�these�examples��First,�the�sum�of�the� probabilities�for�all�distinct�or�independent�events�is�precisely�1��In�other�words,�if�we� take�each�distinct�event�and�compute�its�probability,�then�the�sum�of�those�probabilities� must�be�equal�to�one�so�as�to�account�for�all�possible�outcomes��Second,�the�probability� of�any�given�event�(a)�cannot�exceed�one�and�(b)�cannot�be�less�than�zero��Part�(a)�should� be� obvious� in� that� the� sum� of� the� probabilities� for� all� events� cannot� exceed� one,� and� therefore�the�probability�of�any�one�event�cannot�exceed�one�either�(it�makes�no�sense� to�talk�about�an�event�occurring�more�than�all�of�the�time)��An�event�would�have�a�prob- ability�of�one�if�no�other�event�can�possibly�occur,�such�as�the�probability�that�you�are� currently�breathing��For�part�(b)�no�event�can�have�a�negative�probability�(it�makes�no� 108 An Introduction to Statistical Concepts sense�to�talk�about�an�event�occurring�less�than�never);�however,�an�event�could�have� a� zero� probability� if� the� event� can� never� occur�� For� instance,� in� our� urn� example,� one� could�never�draw�a�purple�ball� 5.1.3 Intuition Versus probability At�this�point,�you�are�probably�thinking�that�probability�is�an�interesting�topic��However,� without�extensive�training�to�think�in�a�probabilistic�fashion,�people�tend�to�let�their�intu- ition�guide�them��This�is�all�well�and�good,�except�that�intuition�can�often�guide�you�to�a� different�conclusion�than�probability��Let�us�examine�two�classic�examples�to�illustrate�this� dilemma��The�first�classic�example�is�known�as�the�“birthday�problem�”�Imagine�you�are�in� a�room�of�23�people��You�ask�each�person�to�write�down�their�birthday�(month�and�day)�on� a�piece�of�paper��What�do�you�think�is�the�probability�that�in�a�room�of�23�people�at�least� two�will�have�the�same�birthday? Assume� first� that� we� are� dealing� with� 365� different� possible� birthdays,� where� leap� year� (February�29)�is�not�considered��Also�assume�the�sample�of�23�people�is�randomly�drawn�from� some�population�of�people��Taken�together,�this�implies�that�each�of�the�365�different�possible� birthdays�has�the�same�probability�(i�e�,�1/365)��An�intuitive�thinker�might�have�the�following� thought�processing��“There�are�365�different�birthdays�in�a�year�and�there�are�23�people�in�the� sample��Therefore�the�probability�of�two�people�having�the�same�birthday�must�be�close�to�zero�”� We�try�this�on�our�introductory�students�each�year,�and�their�guesses�are�usually�around�zero� Intuition� has� led� us� astray,� and� we� have� not� used� the� proper� thought� processing�� True,� there�are�365�days�and�23�people��However,�the�question�really�deals�with�pairs�of�people�� There� is� a� fairly� large� number� of� different� possible� pairs� of� people� [i�e�,� person� 1� with� 2,� 1� with�3,�etc�,�where�the�total�number�of�different�pairs�of�people�is�equal�to�n(n�−�1)/2�=�23(22)/� 2 = 253]��All�we�need�is�for�one�pair�to�have�the�same�birthday��While�the�probability�compu- tations�are�a�little�complex�(see�Appendix),�the�probability�that�at�least�two�individuals�will� have�the�same�birthday�in�a�group�of�23�is�equal�to��507��That�is�right,�about�one-half�of�the� time�a�group�of�23�people�will�have�two�or�more�with�the�same�birthday��Our�introductory� classes�typically�have�between�20�and�40�students��More�often�than�not,�we�are�able�to�find� two�students�with�the�same�birthday��One�year�one�of�us�wrote�each�birthday�on�the�board�so� that�students�could�see�the�data��The�first�two�students�selected�actually�had�the�same�birth- day,�so�our�point�was�very�quickly�shown��What�was�the�probability�of�that�event�occurring? The� second� classic� example� is� the� “gambler’s� fallacy,”� sometimes� referred� to� as� the� “law�of�averages�”�This�works�for�any�game�of�chance,�so�imagine�you�are�flipping�a�coin�� Obviously�there�are�two�possible�outcomes�from�a�coin�flip,�heads�and�tails��Assume�the� coin�is�fair�and�unbiased�such�that�the�probability�of�flipping�a�head�is�the�same�as�flipping� a�tail,�that�is,��5��After�flipping�the�coin�nine�times,�you�have�observed�a�tail�every�time�� What�is�the�probability�of�obtaining�a�head�on�the�next�flip? An�intuitive�thinker�might�have�the�following�thought�processing��“I�have�just�observed�a� tail�each�of�the�last�nine�flips��According�to�the�law�of�averages,�the�probability�of�observing�a� head�on�the�next�flip�must�be�near�certainty��The�probability�must�be�nearly�one�”�We�also�try� this�on�our�introductory�students�every�year,�and�their�guesses�are�almost�always�near�one� Intuition� has� led� us� astray� once� again� as� we� have� not� used� the� proper� thought� pro- cessing��True,�we�have�just�observed�nine�consecutive�tails��However,�the�question�really� deals� with� the� probability� of� the� 10th� flip� being� a� head,� not� the� probability� of� obtaining� 10�consecutive�tails��The�probability�of�a�head�is�always��5�with�a�fair,�unbiased�coin��The� coin�has�no�memory;�thus�the�probability�of�tossing�a�head�after�nine�consecutive�tails�is� the�same�as�the�probability�of�tossing�a�head�after�nine�consecutive�heads,��5��In�technical� 109Introduction to Probability and Sample Statistics terms,�the�probabilities�of�each�event�(each�toss)�are�independent�of�one�another��In�other� words,�the�probability�of�flipping�a�head�is�the�same�regardless�of�the�preceding�flips��This� is� not� the� same� as� the� probability� of� tossing� 10� consecutive� heads,� which� is� rather� small� (approximately��0010)��So�when�you�are�gambling�at�the�casino�and�have�lost�the�last�nine� games,�do�not�believe�that�you�are�guaranteed�to�win�the�next�game��You�can�just�as�easily� lose�game�10�as�you�did�game�1��The�same�goes�if�you�have�won�a�number�of�games��You� can�just�as�easily�win�the�next�game�as�you�did�game�1��To�some�extent,�the�casinos�count� on�their�customers�playing�the�gambler’s�fallacy�to�make�a�profit� 5.2 Sampling and Estimation In�Chapter�3,�we�spent�some�time�discussing�sample�statistics,�including�the�measures�of� central�tendency�and�dispersion��In�this�section,�we�expand�upon�that�discussion�by�defin- ing�inferential�statistics,�describing�different�types�of�sampling,�and�then�moving�into�the� implications�of�such�sampling�in�terms�of�estimation�and�sampling�distributions� Consider� the� situation� where� we� have� a� population� of� graduate� students�� Population� parameters�(characteristics�of�a�population)�could�be�determined,�such�as�the�population� size N,� the� population� mean� μ,� the� population� variance� σ2,� and� the� population� standard� deviation�σ��Through�some�method�of�sampling,�we�then�take�a�sample�of�students�from� this�population��Sample�statistics�(characteristics�of�a�sample)�could�be�determined,�such� as�the�sample�size�n,�the�sample�mean�X – ,�the�sample�variance�s2,�and�the�sample�standard� deviation�s� How� often� do� we� actually� ever� deal� with� population� data?� Except� when� dealing� with� very� small,� well-defined� populations,� we� almost� never� deal� with� population� data�� The� main�reason�for�this�is�cost,�in�terms�of�time,�personnel,�and�economics��This�means�then� that� we� are� almost� always� dealing� with� sample� data�� With� descriptive� statistics,� dealing� with�sample�data�is�very�straightforward,�and�we�only�need�to�make�sure�we�are�using�the� appropriate�sample�statistic�equation��However,�what�if�we�want�to�take�a�sample�statistic� and�make�some�generalization�about�its�relevant�population�parameter?�For�example,�you� have�computed�a�sample�mean�on�grade�point�average�(GPA)�of�X – �=�3�25�for�a�sample�of�25� graduate�students�at�State�University��You�would�like�to�make�some�generalization�from� this� sample� mean� to� the� population� mean� μ� at� State� University�� How� do� we� do� this?� To� what�extent�can�we�make�such�a�generalization?�How�confident�are�we�that�this�sample� mean�actually�represents�the�population�mean? This� brings� us� to� the� field� of� inferential� statistics�� We� define� inferential statistics� as� statistics�that�allow�us�to�make�an�inference�or�generalization�from�a�sample�to�the�popu- lation�� In� terms� of� reasoning,� inductive� reasoning� is� used� to� infer� from� the� specific� (the� sample)� to� the� general� (the� population)�� Thus� inferential� statistics� is� the� answer� to� all� of� our�preceding�questions�about�generalizing�from�sample�statistics�to�population�param- eters��How�the�sample�is�derived,�however,�is�important�in�determining�to�what�extent�the� statistical�results�we�derive�can�be�inferred�from�the�sample�back�to�the�population��Thus,� it�is�important�to�spend�a�little�time�talking�about�simple�random�sampling,�the�only�sam- pling�procedure�that�allows�generalizations�to�be�made�from�the�sample�to�the�population�� (Although�there�are�statistical�means�to�correct�for�non-simple�random�samples,�they�are� beyond� the� scope� of� this� textbook�)� In� the� remainder� of� this� section,� and� in� much� of� the� remainder� of� this� text,� we� take� up� the� details� of� inferential� statistics� for� many� different� procedures� 110 An Introduction to Statistical Concepts 5.2.1 Simple Random Sampling There�are�several�different�ways�in�which�a�sample�can�be�drawn�from�a�population�� In�this�section�we�introduce�simple�random�sampling,�which�is�a�commonly�used�type� of� sampling� and� which� is� also� assumed� for� many� inferential� statistics� (beginning� in� Chapter� 6)�� Simple random sampling� is� defined� as� the� process� of� selecting� sample� observations�from�a�population�so�that�each�observation�has�an�equal�and�independent� probability� of� being� selected�� If� the� sampling� process� is� truly� random,� then� (a)� each� observation� in� the� population� has� an� equal� chance� of� being� included� in� the� sample,� and�(b)�each�observation�selected�into�the�sample�is�independent�of�(or�not�affected�by)� every�other�selection��Thus�a�volunteer�or�“street-corner”�sample�would�not�meet�the� first�condition�because�members�of�the�population�who�do�not�frequent�that�particular� street�corner�have�no�chance�of�being�included�in�the�sample� In� addition,� if� the� selection� of� spouses� required� the� corresponding� selection� of� their� respective�mates,�then�the�second�condition�would�not�be�met��For�example,�if�the�selection� of�Mr��Joe�Smith�III�also�required�the�selection�of�his�wife,�then�these�two�selections�are�not� independent�of�one�another��Because�we�selected�Mr��Joe�Smith�III,�we�must�also�therefore� select�his�wife��Note�that�through�independent�sampling�it�is�possible�for�Mr��Smith�and� his�wife�to�both�be�sampled,�but�it�is�not�required��Thus,�independence�implies�that�each� observation�is�selected�without�regard�to�any�other�observation�sampled� We� also� would� fail� to� have� equal� and� independent� probability� of� selection� if� the� sam- pling�procedure�employed�was�something�other�than�a�simple�random�sample—because�it� is�only�with�a�simple�random�sample�that�we�have�met�the�conditions�(a)�and�(b)�presented� earlier� in� the� paragraph�� (Although� there� are� statistical� means� to� correct� for� non-simple� random�samples,�they�are�beyond�the�scope�of�this�textbook�)�This�concept�of�independence� is� an� important� assumption� that� we� will� become� acquainted� with� more� in� the� remain- ing�chapters��If�we�have�independence,�then�generalizations�from�the�sample�back�to�the� population� can� be� made� (you� may� remember� this� as� external validity� which� was� likely� introduced� in� your� research� methods� course)� (see� Figure� 5�1)�� Because� of� the� connection� between�simple�random�sampling�and�independence,�let�us�expand�our�discussion�on�the� two�types�of�simple�random�sampling� 5.2.1.1 Simple Random Sampling With Replacement There�are�two�specific�types�of�simple�random�sampling��Simple random sampling with replacement�is�conducted�as�follows��The�first�observation�is�selected�from�the�population� into�the�sample,�and�that�observation�is�then�replaced�back�into�the�population��The�second� observation�is�selected�and�then�replaced�in�the�population��This�continues�until�a�sample� of� the� desired� size� is� obtained�� The� key� here� is� that� each� observation� sampled� is� placed� back�into�the�population�and�could�be�selected�again� This�scenario�makes�sense�in�certain�applications�and�not�in�others��For�example,�return� to�our�coin�flipping�example�where�we�now�want�to�flip�a�coin�100�times�(i�e�,�a�sample�size� of�100)��How�does�this�operate�in�the�context�of�sampling?�We�flip�the�coin�(e�g�,�heads)�and� record�the�result��This�“head”�becomes�the�first�observation�in�our�sample��This�observa- tion� is� then� placed� back� into� the� population�� Then� a� second� observation� is� made� and� is� placed�back�into�the�population��This�continues�until�our�sample�size�requirement�of�100�is� reached��In�this�particular�scenario�we�always�sample�with�replacement,�and�we�automati- cally�do�so�even�if�we�have�never�heard�of�sampling�with�replacement��If�no�replacement� took�place,�then�we�could�only�ever�have�a�sample�size�of�two,�one�“head”�and�one�“tail�” 111Introduction to Probability and Sample Statistics 5.2.1.2 Simple Random Sampling Without Replacement In� other� scenarios,� sampling� with� replacement� does� not� make� sense�� For� example,� say� we�are�conducting�a�poll�for�the�next�major�election�by�randomly�selecting�100�students� (the� sample)� at� a� local� university� (the� population)�� As� each� student� is� selected� into� the� sample,�they�are�removed�and�cannot�be�sampled�again��It�simply�would�make�no�sense� if� our� sample� of� 100� students� only� contained� 78� different� students� due� to� replacement� (as� some� students� were� polled� more� than� once)�� Our� polling� example� represents� the� other�type�of�simple�random�sampling,�this�time�without�replacement��Simple random sampling without replacement� is� conducted� in� a� similar� fashion� except� that� once� an� observation� is� selected� for� inclusion� in� the� sample,� it� is� not� replaced� and� cannot� be� selected�a�second�time� 5.2.1.3 Other Types of Sampling There� are� several� other� types� of� sampling�� These� other� types� of� sampling� include� con- venient� sampling� (i�e�,� volunteer� or� “street-corner”� sampling� previously� mentioned),� systematic� sampling� (e�g�,� select� every� 10th� observation� from� the� population� into� the� sample),�cluster�sampling�(i�e�,�sample�groups�or�clusters�of�observations�and�include�all� members�of�the�selected�clusters�in�the�sample),�stratified�sampling�(i�e�,�sampling�within� subgroups� or� strata� to� ensure� adequate� representation� of� each� strata),� and� multistage� sampling�(e�g�,�stratify�at�one�stage�and�randomly�sample�at�another�stage)��These�types� of� sampling� are� beyond� the� scope� of� this� text,� and� the� interested� reader� is� referred� to� sampling� texts� such� as� Sudman� (1976),� Kalton� (1983),� Jaeger� (1984),� Fink� (1995),� or� Levy� and�Lemeshow�(1999)� Step 1: Population Step 2: Draw simple random sample Step 3: Compute sample statistics Step 4: Make inference back to the population FIGuRe 5.1 Cycle�of�inference� 112 An Introduction to Statistical Concepts 5.2.2 estimation of population parameters and Sampling distributions Take� as� an� example� the� situation� where� we� select� one� random� sample� of� n� females� (e�g�,� n = 20),�measure�their�weight,�and�then�compute�the�mean�weight�of�the�sample��We�find� the�mean�of�this�first�sample�to�be�102�pounds�and�denote�it�by�X – 1�=�102,�where�the�subscript� identifies�the�first�sample��This�one�sample�mean�is�known�as�a�point�estimate�of�the�popu- lation�mean�μ,�as�it�is�simply�one�value�or�point��We�can�then�proceed�to�collect�weight�data� from�a�second�sample�of�n�females�and�find�that�X – 2�=�110��Next�we�collect�weight�data�from� a�third�sample�of�n�females�and�find�that�X – 3�=�119��Imagine�that�we�go�on�to�collect�such�data� from�many�other�samples�of�size�n�and�compute�a�sample�mean�for�each�of�those�samples� 5.2.2.1 Sampling Distribution of the Mean At� this� point,� we� have� a� collection� of� sample� means,� which� we� can� use� to� construct� a� frequency�distribution�of�sample�means��This�frequency�distribution�is�formally�known� as�the�sampling distribution of the mean��To�better�illustrate�this�new�distribution,�let� us�take�a�very�small�population�from�which�we�can�take�many�samples��Here�we�define� our�population�of�observations�as�follows:�1,�2,�3,�5,�9�(in�other�words,�we�have�five�values� in� our� population)�� As� the� entire� population� is� known� here,� we� can� better� illustrate� the� important�underlying�concepts��We�can�determine�that�the�population�mean�μX�=�4�and� the�population�variance�σX 2 �=�8,�where�X�indicates�the�variable�we�are�referring�to��Let�us� first�take�all�possible�samples�from�this�population�of�size�2�(i�e�,�n�=�2)�with�replacement�� As� there� are� only� five� observations,� there� will� be� 25� possible� samples� as� shown� in� the� upper�portion�of�Table�5�1,�called�“Samples�”�Each�entry�represents�the�two�observations� for�a�particular�sample��For�instance,�in�row�1�and�column�4,�we�see�1,5��This�indicates�that� the�first�observation�is�a�1�and�the�second�observation�is�a�5��If�sampling�was�done�without� replacement,�then�the�diagonal�of�the�table�from�upper�left�to�lower�right�would�not�exist�� For�instance,�a�1,1�sample�could�not�be�selected�if�sampling�without�replacement� Now� that� we� have� all� possible� samples� of� size� 2,� let� us� compute� the� sample� means� for� each� of� the� 25� samples�� The� sample� means� are� shown� in� the� middle� portion� of� Table� 5�1,� called�“Sample�means�”�Just�eyeballing�the�table,�we�see�the�means�range�from�1�to�9�with� numerous�different�values�in�between��We�then�compute�the�mean�of�the�25�sample�means� to�be�4,�as�shown�in�the�bottom�portion�of�Table�5�1,�called�“Mean�of�the�sample�means�” This� is� a� matter� for� some� discussion,� so� consider� the� following� three� points�� First,� the� distribution�of�X – �for�all�possible�samples�of�size�n�is�known�as�the�sampling�distribution� of�the�mean��In�other�words,�if�we�were�to�take�all�of�the�“sample�mean”�values�in�Table�5�1� and�construct�a�histogram�of�those�values,�then�that�is�what�is�referred�to�as�a�“sampling� distribution�of�the�mean�”�It�is�simply�the�distribution�(i�e�,�histogram)�of�all�the�“sample� mean”�values��Second,�the�mean�of�the�sampling�distribution�of�the�mean�for�all�possible� samples�of�size�n�is�equal�to�μX–��As�the�mean�of�the�sampling�distribution�of�the�mean�is� denoted�by�μX–�(the�mean�of�the�X – s),�then�we�see�for�the�example�that�μX–�=�μX�=�4��In�other� words,�the�mean�of�the�sampling�distribution�of�the�mean�is�simply�the�average�of�all�of� the�“sample�means”�in�Table�5�1��The�mean�of�the�sampling�distribution�of�the�mean�will� always�be�equal�to�the�population�mean� Third,�we�define�sampling error�in�this�context�as�the�difference�(or�deviation)�between� a�particular�sample�mean�and�the�population�mean,�denoted�as�X – �−�μX��A�positive�sam- pling�error�indicates�a�sample�mean�greater�than�the�population�mean,�where�the�sam- ple�mean�is�known�as�an�overestimate�of�the�population�mean��A�zero�sampling�error� indicates� a� sample� mean� exactly� equal� to� the� population� mean�� A� negative� sampling� 113Introduction to Probability and Sample Statistics error�indicates�a�sample�mean�less�than�the�population�mean,�where�the�sample�mean� is� known� as� an� underestimate� of� the� population� mean�� As� a� researcher,� we� want� the� sampling�error�to�be�as�close�to�zero�as�possible�to�suggest�that�the�sample�reflects�the� population�well� 5.2.2.2 Variance Error of the Mean Now�that�we�have�a�measure�of�the�mean�of�the�sampling�distribution�of�the�mean,�let�us� consider�the�variance�of�this�distribution��We�define�the�variance�of�the�sampling�distribu- tion�of�the�mean,�known�as�the�variance error of the mean,�as� σX 2 ��This�will�provide�us� with� a� dispersion� measure� of� the� extent� to� which� the� sample� means� vary� and� will� also� provide�some�indication�of�the�confidence�we�can�place�in�a�particular�sample�mean��The� variance�error�of�the�mean�is�computed�as � σ σ X X n 2 2 = where σX 2 �is�the�population�variance�of�X n�is�the�sample�size Table 5.1 All�Possible�Samples�and�Sample�Means�for�n�=�2�from�the�Population�of�1,�2,�3,�5,�9 First Observation Second Observation Samples 1 2 3 5 9 1 1,1 1,2 1,3 1,5 1,9 2 2,1 2,2 2,3 2,5 2,9 3 3,1 3,2 3,3 3,5 3,9 5 5,1 5,2 5,3 5,5 5,9 9 9,1 9,2 9,3 9,5 9,9 Sample Means 1 1�0 1�5 2�0 3�0 5�0 2 1�5 2�0 2�5 3�5 5�5 3 2�0 2�5 3�0 4�0 6�0 5 3�0 3�5 4�0 5�0 7�0 9 5�0 5�5 6�0 7�0 9�0 X =∑ 12 5. X =∑ 15 0. X =∑ 17 5. X =∑ 22 5. X =∑ 32 5. Mean�of�the�sample�means: µX X number of samples = = = ∑ 100 25 4 0. Variance�of�the�sample�means: σX number of samples X X number of samples 2 2 2 2 25 500 = − ( ) = −∑ ∑( ) ( ) ( ) (1100 25 25 500 10 000 25 4 0 2 2 2 ) ( ) ( ) , ( ) .= − = 114 An Introduction to Statistical Concepts For�the�example,�we�have�already�determined�that�σX 2 �=�8�and�that�n�=�2;�therefore, � σ σ X X n 2 2 8 2 4= = = This�is�verified�in�the�bottom�portion�of�Table�5�1,�called�“Variance�of�the�sample�means,”� where�the�variance�error�is�computed�from�the�collection�of�sample�means� What�will�happen�if�we�increase�the�size�of�the�sample?�If�we�increase�the�sample�size�to� n�=�4,�then�the�variance�error�is�reduced�to�2��Thus�we�see�that�as�the�size�of�the�sample�n� increases,�the�magnitude�of�the�sampling�error�decreases��Why?�Conceptually,�as�sample� size�increases,�we�are�sampling�a�larger�portion�of�the�population��In�doing�so,�we�are�also� obtaining� a� sample� that� is� likely� more� representative� of� the� population�� In� addition,� the� larger�the�sample�size,�the�less�likely�it�is�to�obtain�a�sample�mean�that�is�far�from�the�popu- lation�mean��Thus,�as�sample�size�increases,�we�hone�in�closer�and�closer�to�the�population� mean�and�have�less�and�less�sampling�error� For�example,�say�we�are�sampling�from�a�voting�district�with�a�population�of�5000�vot- ers��A�survey�is�developed�to�assess�how�satisfied�the�district�voters�are�with�their�local� state�representative��Assume�the�survey�generates�a�100-point�satisfaction�scale��First�we� determine�that�the�population�mean�of�satisfaction�is�75��Next�we�take�samples�of�different� sizes��For�a�sample�size�of�1,�we�find�sample�means�that�range�from�0�to�100�(i�e�,�each�mean� really�only�represents�a�single�observation)��For�a�sample�size�of�10,�we�find�sample�means� that�range�from�50�to�95��For�a�sample�size�of�100,�we�find�sample�means�that�range�from� 70�to�80��We�see�then�that�as�sample�size�increases,�our�sample�means�become�closer�and� closer�to�the�population�mean,�and�the�variability�of�those�sample�means�becomes�smaller� and�smaller� 5.2.2.3 Standard Error of the Mean We� can� also� compute� the� standard� deviation� of� the� sampling� distribution� of� the� mean,� known�as�the�standard error of the mean,�by � σ σ X X n = Thus�for�the�example�we�have � σ σ X X n = = = 2 8284 2 2 . Because�the�applied�researcher�typically�does�not�know�the�population�variance,�the�pop- ulation�variance�error�of�the�mean�and�the�population�standard�error�of�the�mean�can�be� estimated�by�the�following,�respectively: � s s nX X2 2 = 115Introduction to Probability and Sample Statistics and � s s n X X= 5.2.2.4 Confidence Intervals Thus� far� we� have� illustrated� how� a� sample� mean� is� a� point estimate� of� the� popula- tion� mean� and� how� a� variance� error� gives� us� some� sense� of� the� variability� among� the� sample� means�� Putting� these� concepts� together,� we� can� also� build� an� interval estimate� for� the� population� mean� to� give� us� a� sense� of� how� confident� we� are� in� our� particular�sample�mean��We�can�form�a�confidence interval (CI)�around�a�particular� sample�mean�as�follows��As�we�learned�in�Chapter�4,�for�a�normal�distribution,�68%�of� the�distribution�falls�within�one�standard�deviation�of�the�mean��A�68%�CI�of�a�sample� mean�can�be�formed�as follows: � 68% CI = ±X Xσ Conceptually,� this� means� that� if� we� form� 68%� CIs� for� 100� sample� means,� then� 68� of� those� 100� intervals� would� contain� or� include� the� population� mean� (it� does� not� mean� that� there� is� a� 68%� probability� of� the� interval� containing� the� population� mean—the� interval� either� contains� it� or� does� not)�� Because� the� applied� researcher� typically� only� has�one�sample�mean�and�does�not�know�the�population�mean,�he�or�she�has�no�way� of�knowing�if�this�one�CI�actually�contains�the�population�mean�or�not��If�one�wanted� to�be�more�confident�in�a�sample�mean,�then�a�90%�CI,�a�95%�CI,�or�a�99%�CI�could�be� formed�as�follows: � 90 1 645% CI .= ±X Xσ � 95 1 96% CI .= ±X Xσ � 99 2 5758% CI .= ±X Xσ Thus�for�the�90%�CI,�the�population�mean�will�be�contained�in�90�out�of�100�CIs;�for�the� 95%�CI,�the�population�mean�will�be�contained�in�95�out�of�100�CIs;�and�for�the�99%�CI,�the� population�mean�will�be�contained�in�99�out�of�100�CIs��The�critical�values�of�1�645,�1�96,� and�2�5758�come�from�the�standard�unit�normal�distribution�table�(Table�A�1)�and�indicate� the�width�of�the�CI��Wider�CIs,�such�as�the�99%�CI,�enable�greater�confidence��For�example,� with�a�sample�mean�of�70�and�a�standard�error�of�the�mean�of�3,�the�following�CIs�result:� 68%�CI�=�(67,�73)�[i�e�,�ranging�from�67�to�73];�90%�CI�=�(65�065,�74�935);�95%�CI�=�(64�12,�75�88);� and�99%�CI�=�(62�2726,�77�7274)��We�can�see�here�that�to�be�assured�that�99%�of�the�CIs�con- tain�the�population�mean,�then�our�interval�must�be�wider�(i�e�,�ranging�from�about�62�27�to� 77�73,�or�a�range�of�about�15)�than�the�CIs�that�are�lesser�(e�g�,�the�95%�CI�ranges�from�64�12� to�75�88,�or�a�range�of�about�11)� 116 An Introduction to Statistical Concepts In�general,�a�CI�for�any�level�of�confidence�(i�e�,�XX%�CI)�can�be�computed�by�the�follow- ing�general�formula: � XX X zcv X% CI = ± σ where�zcv�is�the�critical� value�taken� from�the�standard�unit�normal�distribution�table�for� that�particular�level�of�confidence,�and�the�other�values�are�as�before� 5.2.2.5 Central Limit Theorem In�our�discussion�of�CIs,�we�used�the�normal�distribution�to�help�determine�the�width� of�the�intervals��Many�inferential�statistics�assume�the�population�distribution�is�nor- mal� in� shape�� Because� we� are� looking� at� sampling� distributions� in� this� chapter,� does� the�shape�of�the�original�population�distribution�have�any�relationship�to�the�sampling� distribution�of�the�mean�we�obtain?�For�example,�if�the�population�distribution�is�non- normal,� what� form� does� the� sampling� distribution� of� the� mean� take� (i�e�,� is� the� sam- pling�distribution�of�the�mean�also�nonnormal)?�There�is�a�nice�concept,�known�as�the� central limit theorem,�to�assist�us�here��The�central�limit�theorem�states�that�as�sample� size�n�increases,�the�sampling�distribution�of�the�mean�from�a�random�sample�of�size� n� more� closely� approximates� a� normal� distribution�� If� the� population� distribution� is� normal�in�shape,�then�the�sampling�distribution�of�the�mean�is�also�normal�in�shape�� If� the� population� distribution� is� not� normal� in� shape,� then� the� sampling� distribution� of� the� mean� becomes� more� nearly� normal� as� sample� size� increases�� This� concept� is� graphically�depicted�in�Figure�5�2� The�top�row�of�the�figure�depicts�two�population�distributions,�the�left�one�being�normal� and�the�right�one�being�positively�skewed��The�remaining�rows�are�for�the�various�sam- pling� distributions,� depending� on� the� sample� size�� The� second� row� shows� the� sampling� distributions�of�the�mean�for�n�=�1��Note�that�these�sampling�distributions�look�precisely� like�the�population�distributions,�as�each�observation�is�literally�a�sample�mean��The�next� row�gives�the�sampling�distributions�for�n�=�2;�here�we�see�for�the�skewed�population�that� the�sampling�distribution�is�slightly�less�skewed��This�is�because�the�more�extreme�obser- vations�are�now�being�averaged�in�with�less�extreme�observations,�yielding�less�extreme� Normal Positively skewed Population ------------------------------------------------------------------ n = 1 n = 2 n = 4 n = 25 FIGuRe 5.2 Central�limit�theorem�for�normal�and�positively�skewed�population�distributions� 117Introduction to Probability and Sample Statistics means��For�n�=�4,�the�sampling�distribution�in�the�skewed�case�is�even�less�skewed�than�for� n = 2��Eventually�we�reach�the�n�=�25�sampling�distribution,�where�the�sampling�distribu- tion� for� the� skewed� case� is� nearly� normal� and� nearly� matches� the� sampling� distribution� for�the�normal�case��This�phenomenon�will�occur�for�other�nonnormal�population�distri- butions�as�well�(e�g�,�negatively�skewed)��The�morale�of�the�story�here�is�a�good�one��If�the� population�distribution�is�nonnormal,�then�this�will�have�minimal�effect�on�the�sampling� distribution� of� the� mean� except� for� rather� small� samples�� This� can� come� into� play� with� inferential�statistics�when�the�assumption�of�normality�is�not�satisfied,�as�we�see�in�later� chapters� 5.3 Summary In� this� chapter,� we� began� to� move� from� descriptive� statistics� to� the� realm� of� inferential� statistics� The� two� main� topics� we� considered� were� probability,� and� sampling� and� estimation�� First� we� briefly� introduced� probability� by� looking� at� the� importance� of� probability� in� statistics,� defining� probability,� and� comparing� conclusions� often� reached� by� intuition� versus�probability��The�second�topic�involved�sampling�and�estimation,�a�topic�we�return� to�in�most�of�the�remaining�chapters��In�the�sampling�section,�we�defined�and�described� simple�random�sampling,�both�with�and�without�replacement,�and�briefly�outlined�other� types�of�sampling��In�the�estimation�section,�we�examined�the�sampling�distribution�of� the� mean,� the� variance� and� standard� error� of� the� mean,� CIs� around� the� mean,� and� the� central�limit�theorem��At�this�point�you�should�have�met�the�following�objectives:�(a)�be� able�to�understand�the�most�basic�concepts�of�probability,�(b)�be�able�to�understand�and� conduct� simple� random� sampling,� and� (c)� be� able� to� understand,� determine,� and� inter- pret�the�results�from�the�estimation�of�population�parameters�via�a�sample��In�the�next� chapter� we� formally� discuss� our� first� inferential� statistics� situation,� testing� hypotheses� about�a�single�mean� Appendix: Probability That at Least Two Individuals Have the Same Birthday This� probability� can� be� shown� by� either� of� the� following� equations�� Note� that� there� are� n = 23�individuals�in�the�room��One�method�is�as�follows: � 1 365 364 363 365 1 365 1 365 364 363 343 365 5023− × × × × − + = − × × × × = � �( ) . n n 77 An�equivalent�method�is�as�follows: � 1 365 365 364 365 363 365 365 1 365 1 365 365 364 36 − × × × × − +      = − ×� ( )n 55 363 365 343 365 507× × ×      =� . 118 An Introduction to Statistical Concepts Problems Conceptual problems 5.1� The�standard�error�of�the�mean�is�which�one�of�the�following? � a�� Standard�deviation�of�a�sample�distribution � b�� Standard�deviation�of�the�population�distribution � c�� Standard�deviation�of�the�sampling�distribution�of�the�mean � d�� Mean�of�the�sampling�distribution�of�the�standard�deviation 5.2� An�unbiased�six-sided�die�is�tossed�on�two�consecutive�trials,�and�the�first�toss�results� in�a�“2�”�What�is�the�probability�that�a�“2”�will�result�on�the�second�toss? � a�� Less�than�1/6 � b�� 1/6 � c�� More�than�1/6 � d�� Cannot�be�determined 5.3� An�urn�contains�9�balls:�3�green,�4�red,�and�2�blue��The�probability�that�a�ball�selected� at�random�is�blue�is�equal�to�which�one�of�the�following? � a�� 2/9 � b�� 5/9 � c�� 6/9 � d�� 7/9 5.4� Sampling�error�is�which�one�of�the�following? � a�� The�amount�by�which�a�sample�mean�is�greater�than�the�population�mean � b�� The�amount�of�difference�between�a�sample�statistic�and�a�population�parameter � c�� The�standard�deviation�divided�by�the�square�root�of�n � d�� When�the�sample�is�not�drawn�randomly 5.5� What�does�the�central�limit�theorem�state? � a�� The� means� of� many� random� samples� from� a� population� will� be� normally� distributed� � b�� The�raw�scores�of�many�natural�events�will�be�normally�distributed� � c�� z�scores�will�be�normally�distributed� � d�� None�of�the�above� 5.6� For� a� normal� population,� the� variance� of� the� sampling� distribution� of� the� mean� increases�as�sample�size�increases��True�or�false? 5.7� All� other� things� being� equal,� as� the� sample� size� increases,� the� standard� error� of� a� statistic�decreases��True�or�false? 5.8� I�assert�that�the�95%�CI�has�a�larger�(or�wider)�range�than�the�99%�CI�for�the�same� parameter�using�the�same�data��Am�I�correct? 5.9� I�assert�that�the�90%�CI�has�a�smaller�(or�more�narrow)�range�than�the�68%�CI�for�the� same�parameter�using�the�same�data��Am�I�correct? 119Introduction to Probability and Sample Statistics 5.10� I�assert�that�the�mean�and�median�of�any�random�sample�drawn�from�a�symmetric� population�distribution�will�be�equal��Am�I�correct? 5.11� A�random�sample�is�to�be�drawn�from�a�symmetric�population�with�mean�100�and� variance�225��I�assert�that�the�sample�mean�is�more�likely�to�have�a�value�larger�than� 105�if�the�sample�size�is�16�than�if�the�sample�size�is�25��Am�I�correct? 5.12� A� gambler� is� playing� a� card� game� where� the� known� probability� of� winning� is� �40� (win�40%�of�the�time)��The�gambler�has�just�lost�10�consecutive�hands��What�is�the� probability�of�the�gambler�winning�the�next�hand? � a�� Less�than��40 � b�� Equal�to��40 � c�� Greater�than��40 � d�� Cannot�be�determined�without�observing�the�gambler 5.13� On� the� evening� news,� the� anchorwoman� announces� that� the� state’s� lottery� has� reached�$72�billion�and�reminds�the�viewing�audience�that�there�has�not�been�a�win- ner�in�over�5�years��In�researching�lottery�facts,�you�find�a�report�that�states�the�prob- ability� of� winning� the� lottery� is� 1� in� 2� million� (i�e�,� a� very,� very� small� probability)�� What�is�the�probability�that�you�will�win�the�lottery? � a�� Less�than�1�in�2�million � b�� Equal�to�1�in�2�million � c�� Greater�than�1�in�2�million � d�� Cannot�be�determined�without�additional�statistics 5.14� The�probability�of�being�selected�into�a�sample�is�the�same�for�every�individual�in�the� population�for�the�convenient�method�of�sampling��True�or�false? 5.15� Malani�is�conducting�research�on�elementary�teacher�attitudes�toward�changes� in�mathematics�standards��Malani’s�population�consists�of�all�elementary�teach- ers� within� one� district� in� the� state�� Malani� wants� her� sampling� method� to� be� such� that� every� teacher� in� the� population� has� an� equal� and� independent� prob- ability� of� selection�� Which� of� the� following� is� the� most� appropriate� sampling� method? � a�� Convenient�sampling � b�� Simple�random�sampling�with�replacement � c�� Simple�random�sampling�without�replacement � d�� Systematic�sampling 5.16� Sampling�error�increases�with�larger�samples��True�or�false? 5.17� If�a�population�distribution�is�highly�positively�skewed,�then�the�distribution�of�the� sample�means�for�samples�of�size�500�will�be � a�� Highly�negatively�skewed � b�� Highly�positively�skewed � c�� Approximately�normally�distributed � d�� Cannot�be�determined�without�further�information 120 An Introduction to Statistical Concepts Computational problems 5.1� The�population�distribution�of�variable�X,�the�number�of�pets�owned,�consists�of�the� five�values�of�1,�4,�5,�7,�and�8� � a�� Calculate�the�values�of�the�population�mean�and�variance� � b�� List�all�possible�samples�of�size�2�where�samples�are�drawn�with�replacement� � c�� Calculate�the�values�of�the�mean�and�variance�of�the�sampling�distribution�of�the� mean� 5.2� The�following�is�a�random�sampling�distribution�of�the�mean�number�of�children�for� samples�of�size�3,�where�samples�are�drawn�with�replacement� Sample Mean f 1 1 2 2 3 4 4 2 5 1 � a�� What�is�the�population�mean? � b�� What�is�the�population�variance? � c�� What�is�the�mean�of�the�sampling�distribution�of�the�mean? � d�� What�is�the�variance�error�of�the�mean? 5.3� In�a�study�of�the�entire�student�body�of�a�large�university,�if�the�standard�error�of�the� mean�is�20�for�n�=�16,�what�must�the�sample�size�be�to�reduce�the�standard�error�to�5? 5.4� A�random�sample�of�13�statistics�texts�had�a�mean�number�of�pages�of�685�and�a�stan- dard�deviation�of�42��First�calculate�the�standard�error�of�the�mean��Then�calculate� the�95%�CI�for�the�mean�length�of�statistics�texts� 5.5� A�random�sample�of�10�high�schools�employed�a�mean�number�of�guidance�counsel- ors�of�3�and�a�standard�deviation�of�2��First�calculate�the�standard�error�of�the�mean�� Then�calculate�the�90%�CI�for�the�mean�number�of�guidance�counselors� Interpretive problems 5.1� Take�a�six-sided�die,�where�the�population�values�are�obviously�1,�2,�3,�4,�5,�and�6��Take� 20�samples,�each�of�size�2�(e�g�,�every�two�rolls�is�one�sample)��For�each�sample,�calcu- late�the�mean��Then�determine�the�mean�of�the�sampling�distribution�of�the�mean�and� the�variance�error�of�the�mean��Compare�your�results�to�those�of�your�colleagues� 5.2� You�will�need�20�plain�M&M�candy�pieces�and�one�cup��Put�the�candy�pieces�in�the� cup�and�toss�them�onto�a�flat�surface��Count�the�number�of�candy�pieces�that�land� with�the�“M”�facing�up��Write�down�that�number��Repeat�these�steps�five�times��These� steps� will� constitute� one sample�� Next,� generate� four� additional� samples� (i�e�,� repeat� the�process�of�tossing�the�candy�pieces,�counting�the�“Ms,”�and�writing�down�that� number)��Then�determine�the�mean�of�the�sampling�distribution�of�the�mean�and�the� variance�error�of�the�mean��Compare�your�results�to�those�of�your�colleagues� 121 6 Introduction to Hypothesis Testing: Inferences About a Single Mean Chapter Outline 6�1� Types�of�Hypotheses 6�2� Types�of�Decision�Errors 6�2�1� Example�Decision-Making�Situation 6�2�2� Decision-Making�Table 6�3� Level�of�Significance�(α) 6�4� Overview�of�Steps�in�Decision-Making�Process 6�5� Inferences�About�μ�When�σ�Is�Known 6�5�1� z�Test 6�5�2� Example 6�5�3� Constructing�Confidence�Intervals�Around�the�Mean 6�6� Type�II�Error�(β)�and�Power�(1�−�β) 6�6�1� Full�Decision-Making�Context 6�6�2� Power�Determinants 6�7� Statistical�Versus�Practical�Significance 6�8� Inferences�About�μ�When�σ�Is�Unknown 6�8�1� New�Test�Statistic�t 6�8�2� t�Distribution 6�8�3� t�Test 6�8�4� Example 6�9� SPSS 6�10� G*Power 6�11� Template�and�APA-Style�Write-Up Key Concepts � 1�� Null�or�statistical�hypothesis�versus�scientific�or�research�hypothesis � 2�� Type�I�error�(α),�type�II�error�(β),�and�power�(1�−�β) � 3�� Two-tailed�versus�one-tailed�alternative�hypotheses � 4�� Critical�regions�and�critical�values 122 An Introduction to Statistical Concepts � 5�� z�test�statistic � 6�� Confidence�interval�(CI)�around�the�mean � 7�� t�test�statistic � 8�� t�distribution,�degrees�of�freedom,�and�table�of�t�distributions In�Chapter�5,�we�began�to�move�into�the�realm�of�inferential�statistics��There�we�considered� the� following� general� topics:� probability,� sampling,� and� estimation�� In� this� chapter,� we� move�totally�into�the�domain�of�inferential�statistics,�where�the�concepts�involved�in�prob- ability,�sampling,�and�estimation�can�be�implemented��The�overarching�theme�of�the�chap- ter�is�the�use�of�a�statistical�test�to�make�inferences�about�a�single�mean��In�order�to�properly� cover� this� inferential� test,� a� number� of� basic� foundational� concepts� are� described� in� this� chapter�� Many� of� these� concepts� are� utilized� throughout� the� remainder� of� this� text�� The� topics�described�include�the�following:�types�of�hypotheses,�types�of�decision�errors,�level� of� significance� (α),� overview� of� steps� in� the� decision-making� process,� inferences� about μ� when�σ�is�known,�Type�II�error�(β)�and�power�(1�−�β),�statistical�versus�practical�significance,� and� inferences� about� μ� when� σ� is� unknown�� Concepts� to� be� discussed� include� the� fol- lowing:�null�or�statistical�hypothesis�versus�scientific�or�research�hypothesis;�Type�I�error� (α),� Type II� error� (β),� and� power� (1� −� β);� two-tailed� versus� one-tailed� alternative� hypoth- eses;�critical�regions�and�critical�values;�z�test�statistic;�confidence�interval�(CI)�around�the� mean;� t� test� statistic;� and� t� distribution,� degrees� of� freedom,� and� table� of� t� distributions�� Our�objectives�are�that�by�the�end�of�this�chapter,�you�will�be�able�to�(a)�understand�the� basic�concepts�of�hypothesis�testing;�(b)�utilize�the�normal�and�t�tables;�and�(c)�understand,� determine,�and�interpret�the�results�from�the�z�test,�t�test,�and�CI�procedures� 6.1 Types of Hypotheses You�may�remember�Marie�from�previous�chapters��We�now�revisit�Marie�in�this�chapter� Marie,� a� graduate� student� pursuing� a� master’s� degree� in� educational� research,� has� completed� her� first� tasks� as� a� research� assistant—determining� a� number� of� descrip- tive�statistics�on�data�provided�to�her�by�her�faculty�mentor��The�faculty�member�was� so�pleased�with�the�descriptive�analyses�and�presentation�of�results�previously�shared� that�she�has�asked�Marie�to�consult�with�a�local�hockey�coach,�Oscar,�who�is�interested� in� examining� team� skating� performance�� Based� on� Oscar’s� research� question:� Is the mean skating speed of a hockey team different from the league mean speed of 12 seconds?�Marie� suggests�a�one-sample�test�of�means�as�the�test�of�inference��Her�task�is�to�assist�Oscar� in�generating�the�test�of�inference�to�answer�his�research�question� Hypothesis�testing�is�a�decision-making�process�where�two�possible�decisions�are�weighed� in�a�statistical�fashion��In�a�way,�this�is�much�like�any�other�decision�involving�two�possi- bilities,�such�as�whether�to�carry�an�umbrella�with�you�today�or�not��In�statistical�decision- making,�the�two�possible�decisions�are�known�as�hypotheses��Sample�data�are�then�used� to�help�us�select�one�of�these�decisions��The�two�types�of�hypotheses�competing�against� one�another�are�known�as�the�null�or�statistical hypothesis,�denoted�by�H0,�and�the�scien- tific, alternative,�or�research hypothesis,�denoted�by�H1� 123Introduction to Hypothesis Testing: Inferences About a Single Mean The�null�or�statistical�hypothesis�is�a�statement�about�the�value�of�an�unknown�popula- tion� parameter�� Considering� the� procedure� we� are� discussing� in� this� chapter,� the� one- sample� mean� test,� one� example� H0� might� be� that� the� population� mean� IQ� score� is� 100,� which�we�denote�as � H H0 000 00 0: 1 or : 1µ µ= − = Mathematically,� both� equations� say� the� same� thing�� The� version� on� the� left� is� the� more� traditional�form�of�the�null�hypothesis�involving�a�single�mean��However,�the�version�on� the� right� makes� clear� to� the� reader� why� the� term� “null”� is� appropriate�� That� is,� there� is� no�difference�or�a�“null”�difference�between�the�population�mean�and�the�hypothesized� mean�value�of�100��In�general,�the�hypothesized�mean�value�is�denoted�by�μ0�(here�μ0�=�100)�� Another� H0� might� be� the� statistics� exam� population� means� are� the� same� for� male� and� female�students,�which�we�denote�as � H0 00: 11 2µ µ− = where μ1�is�the�population�mean�for�males μ2�is�the�population�mean�for�females Here�there�is�no�difference�or�a�“null”�difference�between�the�two�population�means��The� test�of�the�difference�between�two�means�is�presented�in�Chapter�7��As�we�move�through� subsequent�chapters,�we�become�familiar�with�null�hypotheses�that�involve�other�popula- tion�parameters�such�as�proportions,�variances,�and�correlations� The�null�hypothesis�is�basically�set�up�by�the�researcher�in�an�attempt�to�reject�the�null� hypothesis�in�favor�of�our�own�personal�scientific,�alternative,�or�research�hypothesis��In� other�words,�the�scientific�hypothesis�is�what�we�believe�the�outcome�of�the�study�will�be,� based�on�previous�theory�and�research��Thus,�we�are�trying�to�reject�the�null�hypothesis� and�find�evidence�in�favor�of�our�scientific�hypothesis��The�scientific�hypotheses�H1�for�our� two�examples�are � H H1 1: 1 or : 1µ µ≠ − ≠00 00 0 and � H1 1 2 1: µ µ− ≠ 00 Based�on�the�sample�data,�hypothesis�testing�involves�making�a�decision�as�to�whether�the� null�or�the�research�hypothesis�is�supported��Because�we�are�dealing�with�sample�statistics� in�our�decision-making�process,�and�trying�to�make�an�inference�back�to�the�population� parameter(s),�there�is�always�some�risk�of�making�an�incorrect�decision��In�other�words,�the� sample�data�might�lead�us�to�make�a�decision�that�is�not�consistent�with�the�population�� We�might�decide�to�take�an�umbrella�and�it�does�not�rain,�or�we�might�decide�to�leave�the� umbrella�at�home�and�it�rains��Thus,�as�in�any�decision,�the�possibility�always�exists�that� an�incorrect�decision�may�be�made��This�uncertainty�is�due�to�sampling�error,�which,�we� will�see,�can�be�described�by�a�probability�statement��That�is,�because�the�decision�is�made� based�on�sample�data,�the�sample�may�not�be�very�representative�of�the�population�and� therefore�leads�us�to�an�incorrect�decision��If�we�had�population�data,�we�would�always� 124 An Introduction to Statistical Concepts make�the�correct�decision�about�a�population�parameter��Because�we�usually�do�not,�we� use�inferential�statistics�to�help�make�decisions�from�sample�data�and�infer�those�results� back� to� the� population�� The� nature� of� such� decision� errors� and� the� probabilities� we� can� attribute�to�them�are�described�in�the�next�section� 6.2 Types of Decision Errors In� this� section,� we� consider� more� specifically� the� types� of� decision� errors� that� might� be� made�in�the�decision-making�process��First�an�example�decision-making�situation�is�pre- sented��This�is�followed�by�a�decision-making�table�whereby�the�types�of�decision�errors� are�easily�depicted� 6.2.1 example decision-Making Situation Let�us�propose�an�example�decision-making�situation�using�an�adult�intelligence�instru- ment��It�is�known�somehow�that�the�population�standard�deviation�of�the�instrument�is� 15�(i�e�,�σ2�=�225,�σ�=�15)��(In�the�real�world,�it�is�rare�that�the�population�standard�deviation� is�known,�and�we�return�to�reality�later�in�the�chapter�when�the�basic�concepts�have�been� covered��But�for�now,�assume�that�we�know�the�population�standard�deviation�)�Our�null� and�alternative�hypotheses,�respectively,�are�as�follows: � H H0 000 00 0: 1 or : 1µ µ= − = � H H1 1: 1 or : 1µ µ≠ − ≠00 00 0 Thus,�we�are�interested�in�testing�whether�the�population�mean�for�the�intelligence�instru- ment�is�equal�to�100,�our�hypothesized�mean�value,�or�not�equal�to�100� Next�we�take�several�random�samples�of�individuals�from�the�adult�population��We�find� for�our�first�sample�Y – 1�=�105�(i�e�,�denoting�the�mean�for�sample�1)��Eyeballing�the�informa- tion�for�sample�1,�the�sample�mean�is�one-third�of�a�standard�deviation�above�the�hypoth- esized�value�[i�e�,�by�computing�a�z�score�of�(105�−�100)/15�=��3333],�so�our�conclusion�would� probably�be�to�fail�to�reject�H0��In�other�words,�if�the�population�mean�actually�is�100,�then� we�believe�that�one�is�quite�likely�to�observe�a�sample�mean�of�105��Thus,�our�decision�for� sample� 1� is� to� fail� to� reject� H0;� however,� there� is� some� likelihood� or� probability� that� our� decision�is�incorrect� We� take� a� second� sample� and� find� Y – 2� =� 115� (i�e�,� denoting� the� mean� for� sample� 2)�� Eyeballing�the�information�for�sample�2,�the�sample�mean�is�one�standard�deviation�above� the�hypothesized�value�[i�e�,�z�=�(115�−�100)/15�=�1�0000],�so�our�conclusion�would�probably� be�to�fail�to�reject�H0��In�other�words,�if�the�population�mean�actually�is�100,�then�we�believe� that�it�is�somewhat�likely�to�observe�a�sample�mean�of�115��Thus,�our�decision�for�sample�2�is� to�fail�to�reject�H0��However,�there�is�an�even�greater�likelihood�or�probability�that�our�deci- sion�is�incorrect�than�was�the�case�for�sample�1;�this�is�because�the�sample�mean�is�further� away�from�the�hypothesized�value� We�take�a�third�sample�and�find�Y – 3�=�190�(i�e�,�denoting�the�mean�for�sample�3)��Eyeballing� the� information� for� sample� 3,� the� sample� mean� is� six� standard� deviations� above� the� hypothesized�value�[i�e�,�z�=�(190�−�100)/15�=�6�0000],�so�our�conclusion�would�probably�be� 125Introduction to Hypothesis Testing: Inferences About a Single Mean reject�H0��In�other�words,�if�the�population�mean�actually�is�100,�then�we�believe�that�it�is� quite�unlikely�to�observe�a�sample�mean�of�190��Thus,�our�decision�for�sample�3�is�to�reject� H0;�however,�there�is�some�small�likelihood�or�probability�that�our�decision�is�incorrect� 6.2.2 decision-Making Table Let�us�consider�Table�6�1�as�a�mechanism�for�sorting�out�the�possible�outcomes�in�the�sta- tistical�decision-making�process��The�table�consists�of�the�general�case�and�a�specific�case�� First,� in� part� (a)� of� the� table,� we� have� the� possible� outcomes� for� the� general� case�� For� the� state�of�nature�or�reality�(i�e�,�how�things�really�are�in�the�population),�there�are�two�distinct� possibilities�as�depicted�by�the�rows�of�the�table��Either�H0�is�indeed�true�or�H0�is�indeed� false��In�other�words,�according�to�the�real-world�conditions�in�the�population,�either�H0�is� actually�true�or�H0�is�actually�false��Admittedly,�we�usually�do�not�know�what�the�state�of� nature�truly�is;�however,�it�does�exist�in�the�population�data��It�is�the�state�of�nature�that�we� are�trying�to�best�approximate�when�making�a�statistical�decision�based�on�sample�data� For� our� statistical� decision,� there� are� two� distinct� possibilities� as� depicted� by� the� col- umns� of� the� table�� Either� we� fail� to� reject� H0� or� we� reject� H0�� In� other� words,� based� on� our�sample�data,�we�either�fail�to�reject�H0�or�reject�H0��As�our�goal�is�usually�to�reject�H0� in� favor� of� our� research� hypothesis,� we� prefer� the� term� fail to reject� rather� than� accept�� Accept�implies� you� are�willing�to�throw� out�your� research� hypothesis� and�admit�defeat� based� on� one� sample�� Fail to reject� implies� you� still� have� some� hope� for� your� research� hypothesis,�despite�evidence�from�a�single�sample�to�the�contrary� If�we�look�inside�of�the�table,�we�see�four�different�outcomes�based�on�a�combination�of� our�statistical�decision�and�the�state�of�nature��Consider�the�first�row�of�the�table�where�H0� is�in�actuality�true��First,�if�H0�is�true�and�we�fail�to�reject�H0,�then�we�have�made�a�correct� decision;�that�is,�we�have�correctly�failed�to�reject�a�true�H0��The�probability�of�this�first�out- come�is�known�as�1�−�α�(where�α�represents�alpha)��Second,�if�H0�is�true�and�we�reject�H0,� then�we�have�made�a�decision�error�known�as�a�Type I error��That�is,�we�have�incorrectly� rejected�a�true�H0��Our�sample�data�have�led�us�to�a�different�conclusion�than�the�popula- tion�data�would�have��The�probability�of�this�second�outcome�is�known�as�α��Therefore,�if� H0�is�actually�true,�then�our�sample�data�lead�us�to�one�of�two�conclusions,�either�we�cor- rectly�fail�to�reject�H0,�or�we�incorrectly�reject�H0��The�sum�of�the�probabilities�for�these�two� outcomes�when�H0�is�true�is�equal�to�1�[i�e�,�(1�−�α)�+�α�=�1]� Consider� now� the� second� row� of� the� table� where� H0� is� in� actuality� false�� First,� if� H0� is� really�false�and�we�fail�to�reject�H0,�then�we�have�made�a�decision�error�known�as�a�Type II Table 6.1 Statistical�Decision�Table State of Nature (Reality) Decision Fail to Reject H0 Reject H0 (a) General case H0�is�true Correct�decision�(1�−�α) Type�I�error�(α) H0�is�false Type�II�error�(β) Correct�decision�(1�−�β)�=�power (b) Example rain case H0�is�true�(no rain) Correct�decision�(do not take umbrella and no umbrella needed)�(1�−�α) Type�I�error�(take umbrella and look silly)�(α) H0�is�false�(rains) Type�II�error�(do not take umbrella and get wet)�(β) Correct�decision�(take umbrella and stay dry)� (1�−�β)�=�power 126 An Introduction to Statistical Concepts error��That�is,�we�have�incorrectly�failed�to�reject�a�false�H0��Our�sample�data�have�led�us� to�a�different�conclusion�than�the�population�data�would�have��The�probability�of�this�out- come�is�known�as�β�(beta)��Second,�if�H0�is�really�false�and�we�reject�H0,�then�we�have�made� a�correct�decision;�that�is,�we�have�correctly�rejected�a�false�H0��The�probability�of�this�sec- ond�outcome�is�known�as�1�−�β�or�power�(to�be�more�fully�discussed�later�in�this�chapter)�� Therefore,�if�H0�is�actually�false,�then�our�sample�data�lead�us�to�one�of�two�conclusions,� either�we�incorrectly�fail�to�reject�H0,�or�we�correctly�reject�H0��The�sum�of�the�probabilities� for�these�two�outcomes�when�H0�is�false�is�equal�to�1�[i�e�,�β�+�(1�−�β)�=�1]� As�an�application�of�this�table,�consider�the�following�specific�case,�as�shown�in�part�(b)�of� Table�6�1��We�wish�to�test�the�following�hypotheses�about�whether�or�not�it�will�rain�tomorrow� H0:�no�rain�tomorrow H1:�rains�tomorrow We� collect� some� sample� data� from� prior� years� for� the� same� month� and� day,� and� go� to� make�our�statistical�decision��Our�two�possible�statistical�decisions�are�(a)�we�do�not�believe� it�will�rain�tomorrow�and�therefore�do�not�bring�an�umbrella�with�us,�or�(b)�we�do�believe�it� will�rain�tomorrow�and�therefore�do�bring�an�umbrella� Again� there� are� four� potential� outcomes�� First,� if� H0� is� really� true� (no� rain)� and� we� do� not�carry�an�umbrella,�then�we�have�made�a�correct�decision�as�no�umbrella�is�necessary� (probability�=�1�−�α)��Second,�if�H0�is�really�true�(no�rain)�and�we�carry�an�umbrella,�then� we�have�made�a�Type�I�error�as�we�look�silly�carrying�that�umbrella�around�all�day�(prob- ability�=�α)��Third,�if�H0�is�really�false�(rains)�and�we�do�not�carry�an�umbrella,�then�we�have� made�a�Type�II�error�and�we�get�wet�(probability�=�β)��Fourth,�if�H0�is�really�false�(rains)� and�we�carry�an�umbrella,�then�we�have�made�the�correct�decision�as�the�umbrella�keeps� us�dry�(probability�=�1�−�β)� Let� us� make� two� concluding� statements� about� the� decision� table�� First,� one� can� never� prove�the�truth�or�falsity�of�H0�in�a�single�study��One�only�gathers�evidence�in�favor�of�or� in�opposition�to�the�null�hypothesis��Something�is�proven�in�research�when�an�entire�col- lection�of�studies�or�evidence�reaches�the�same�conclusion�time�and�time�again��Scientific� proof�is�difficult�to�achieve�in�the�social�and�behavioral�sciences,�and�we�should�not�use� the�term�prove�or�proof�loosely��As�researchers,�we�gather�multiple�pieces�of�evidence�that� eventually�lead�to�the�development�of�one�or�more�theories��When�a�theory�is�shown�to�be� unequivocally�true�(i�e�,�in�all�cases),�then�proof�has�been�established� Second,�let�us�consider�the�decision�errors�in�a�different�light��One�can�totally�eliminate� the� possibility� of� a� Type� I� error� by� deciding� to� never� reject� H0�� That� is,� if� we� always� fail� to� reject� H0� (do� not� ever� carry� umbrella),� then� we� can� never� make� a� Type� I� error� (look� silly�with�unnecessary�umbrella)��Although�this�strategy�sounds�fine,�it�totally�takes�the� decision-making�power�out�of�our�hands��With�this�strategy,�we�do�not�even�need�to�collect� any�sample�data,�as�we�have�already�decided�to�never�reject�H0� One� can� totally� eliminate� the� possibility� of� a� Type� II� error� by� deciding� to� always� reject H0��That�is,�if�we�always�reject�H0�(always�carry�umbrella),�then�we�can�never�make� a� Type� II� error� (get� wet� without� umbrella)�� Although� this� strategy� also� sounds� fine,� it� totally�takes�the�decision-making�power�out�of�our�hands��With�this�strategy,�we�do�not� even� need� to� collect� any� sample� data� as� we� have� already� decided� to� always� reject H0�� Taken� together,� one� can� never� totally� eliminate� the� possibility� of� both� a� Type� I� and� a� Type�II�error��No�matter�what�decision�we�make,�there�is�always�some�possibility�of�mak- ing�a�Type�I�and/or�Type�II�error��Therefore,�as�researchers,�our�job�is�to�make�conscious� decisions�in�designing�and�conducting�our�study�and�in�analyzing�the�data�so�that�the� possibility�of�decision�error�is�minimized� 127Introduction to Hypothesis Testing: Inferences About a Single Mean 6.3 Level of Significance (α) We�have�already�stated�that�a�Type�I�error�occurs�when�the�decision�is�to�reject�H0�when� in�fact�H0�is�actually�true��We�defined�the�probability�of�a�Type�I�error�as�α,�which�is�also� known�as�the�level�of�significance�or�significance�level��We�now�examine�α�as�a�basis�for� helping� us� make� statistical� decisions�� Recall� from� a� previous� example� that� the� null� and� alternative�hypotheses,�respectively,�are�as�follows: � H H0 000 00 0: 1 or : 1µ µ= − = � H H1 1: 1 or : 1µ µ≠ − ≠00 00 0 We� need� a� mechanism� for� deciding� how� far� away� a� sample� mean� needs� to� be� from� the� hypothesized� mean� value� of� μ0� =� 100� in� order� to� reject� H0�� In� other� words,� at� a� certain� point�or�distance�away�from�100,�we�will�decide�to�reject�H0��We�use�α�to�determine�that� point� for� us,� where� in� this� context,� α� is� known� as� the� level of significance�� Figure� 6�1a� shows�a�sampling�distribution�of�the�mean�where�the�hypothesized�value�μ0�is�depicted� at�the�center�of�the�distribution��Toward�both�tails�of�the�distribution,�we�see�two�shaded� regions�known�as�the�critical regions�or�regions�of�rejection��The�combined�areas�of�the� two�shaded�regions�is�equal�to�α,�and,�thus,�the�area�of�either�the�upper�or�the�lower�tail� critical�region�is�equal�to�α/2�(i�e�,�we�split�α�in�half�by�dividing�by�two)��If�the�sample�mean� (a) α/2 Critical region α/2 Critical region Critical value Critical value µ0 Hypothesized value (b) α Critical region Critical value µ0 Hypothesized value (c) α Critical region Critical value µ0 Hypothesized value FIGuRe 6.1 Alternative�hypotheses�and�critical�regions:�(a)�two-tailed�test;�(b)�one-tailed,�right�tailed�test;�(c)�one-tailed,�left� tailed�test� 128 An Introduction to Statistical Concepts is�far�enough�away�from�the�hypothesized�mean�value,�μ0,�that�it�falls�into�either�critical� region,�then�our�statistical�decision�is�to�reject�H0��In�this�case,�our�decision�is�to�reject�H0� at�the�α�level�of�significance��If,�however,�the�sample�mean�is�close�enough�to�μ0�that�it�falls� into�the�unshaded�region�(i�e�,�not�into�either�critical�region),�then�our�statistical�decision� is� to� fail to reject� H0�� The� precise� points� on� the� X� axis� at� which� the� critical� regions� are� divided�from�the�unshaded�region�are�known�as�the�critical values��Determining�critical� values�is�discussed�later�in�this�chapter� Note�that�under�the�alternative�hypothesis�H1,�we�are�willing�to�reject�H0�when�the�sample� mean�is�either�significantly�greater�than�or�significantly�less�than�the�hypothesized�mean� value�μ0��This�particular�alternative�hypothesis�is�known�as�a�nondirectional alternative hypothesis,�as�no�direction�is�implied�with�respect�to�the�hypothesized�value��That�is,�we� will� reject� the� null� hypothesis� in� favor� of� the� alternative� hypothesis� in� either� direction,� either� above� or� below� the� hypothesized� mean� value�� This� also� results� in� what� is� known� as�a�two-tailed test of significance�in�that�we�are�willing�to�reject�the�null�hypothesis�in� either�tail�or�critical�region� Two� other� alternative� hypotheses� are� also� possible,� depending� on� the� researcher’s� sci- entific�hypothesis,�which�are�known�as�a�directional alternative hypothesis��One�direc- tional�alternative�is�that�the�population�mean�is�greater�than�the�hypothesized�mean�value,� also�known�as�a�right-tailed�test,�as�denoted�by � H H1 1: 1 or : 1µ µ> − >00 00 0
Mathematically,�both�of�these�equations�say�the�same�thing��With�a�right-tailed�alternative�
hypothesis,�the�entire�region�of�rejection�is�contained�in�the�upper�tail,�with�an�area�of�α,�
known� as� a� one-tailed test of significance� (and� specifically� the� right� tail)�� If� the� sample�
mean�is�significantly�greater�than�the�hypothesized�mean�value�of�100,�then�our�statistical�
decision�is�to�reject�H0��If,�however,�the�sample�mean�falls�into�the�unshaded�region,�then�
our�statistical�decision�is�to�fail�to�reject�H0��This�situation�is�depicted�in�Figure�6�1b�
A� second� directional� alternative� is� that� the� population� mean� is� less� than� the� hypoth-
esized�mean�value,�also�known�as�a�left-tailed�test,�as�denoted�by
� H H1 1: 1 or : 1µ µ< − <00 00 0 Mathematically,�both�of�these�equations�say�the�same�thing��With�a�left-tailed�alternative� hypothesis,�the�entire�region�of�rejection�is�contained�in�the�lower�tail,�with�an�area�of�α,� also�known�as�a�one-tailed test of significance�(and�specifically�the�left�tail)��If�the�sam- ple�mean�is�significantly�less�than�the�hypothesized�mean�value�of�100,�then�our�statisti- cal�decision�is�to�reject�H0��If,�however,�the�sample�mean�falls�into�the�unshaded�region,� then�our�statistical�decision�is�to�fail�to�reject�H0��This�situation�is�depicted�in�Figure�6�1c� There� is� some� potential� for� misuse� of� the� different� alternatives,� which� we� consider� to� be�an�ethical� matter��For�example,� a�researcher� conducts�a�one-tailed� test�with�an�upper� tail�critical�region�and�fails�to�reject�H0��However,�the�researcher�notices�that�the�sample� mean�is�considerably�below�the�hypothesized�mean�value�and�then�decides�to�change�the� alternative�hypothesis�to�either�a�nondirectional�test�or�a�one-tailed�test�in�the�other�tail�� This� is� unethical,� as� the� researcher� has� examined� the� data� and� changed� the� alternative� hypothesis��The�morale�of�the�story�is�this:�If there is previous and consistent empirical evidence to use a specific directional alternative hypothesis, then you should do so. If, however, there is mini- mal or inconsistent empirical evidence to use a specific directional alternative, then you should not. Instead, you should use a nondirectional alternative��Once�you�have�decided�which�alternative� 129Introduction to Hypothesis Testing: Inferences About a Single Mean hypothesis�to�go�with,�then�you�need�to�stick�with�it�for�the�duration�of�the�statistical�deci- sion��If�you�find�contrary�evidence,�then�report�it�as�it�may�be�an�important�finding,�but�do� not�change�the�alternative�hypothesis�in�midstream� 6.4 Overview of Steps in Decision-Making Process Before�we�get�into�the�specific�details�of�conducting�the�test�of�a�single�mean,�we�want�to� discuss�the�basic�steps�for�hypothesis�testing�of�any�inferential�test: � 1�� State�the�null�and�alternative�hypotheses� � 2�� Select�the�level�of�significance�(i�e�,�alpha,�α)� � 3�� Calculate�the�test�statistic�value� � 4�� Make�a�statistical�decision�(reject�or�fail�to�reject�H0)� Step 1:�The�first�step�in�the�decision-making�process�is�to�state�the�null�and�alternative� hypotheses��Recall�from�our�previous�example�that�the�null�and�nondirectional�alterna- tive�hypotheses,�respectively,�for�a�two-tailed�test�are�as�follows: � H H0 000 00 0: 1 or : 1µ µ= − = � H H1 1: 1 or : 1µ µ≠ − ≠00 00 0 One� could� also� choose� one� of� the� other� directional� alternative� hypotheses� described� previously� If� we� choose� to� write� our� null� hypothesis� as� H0:� μ� =� 100,� we� would� want� to� write� our� research�hypothesis�in�a�consistent�manner,�H1:�μ�≠�100�(rather�than�H1:�μ�−�100�≠�0)��In�pub- lication,�many�researchers�opt�to�present�the�hypotheses�in�narrative�form�(e�g�,�“the�null� hypothesis�states�that�the�population�mean�will�equal�100,�and�the�alternative�hypothesis� states� that� the� population� mean� will� not� equal� 100”)�� How� you� present� your� hypotheses� (mathematically�or�using�statistical�notation)�is�up�to�you� Step 2:�The�second�step�in�the�decision-making�process�is�to�select�a�level�of�significance�α�� There�are�two�considerations�to�make�in�terms�of�selecting�a�level�of�significance��One�con- sideration�is�the�cost�associated�with�making�a�Type�I�error,�which�is�what�α�really�is��Recall� that�alpha�is�the�probability�of�rejecting�the�null�hypothesis�if�in�reality�the�null�hypothesis� is�true��When�a�Type�I�error�is�made,�that�means�evidence�is�building�in�favor�of�the�research� hypothesis�(which�is�actually�false)��Let�us�take�an�example�of�a�new�drug��To�test�the�efficacy� of�the�drug,�an�experiment�is�conducted�where�some�individuals�take�the�new�drug�while� others� receive� a� placebo�� The� null� hypothesis,� stated� nondirectionally,� would� essentially� indicate�that�the�effects�of�the�drug�and�placebo�are�the�same��Rejecting�that�null�hypothesis� would�mean�that�the�effects�are�not�equal—suggesting�that�perhaps�this�new�drug,�which�in� reality�is�not�any�better�than�a�placebo,�is�being�touted�as�effective�medication��That�is�obvi- ously�problematic�and�potentially�very�hazardous� Thus,�if�there�is�a�relatively�high�cost�associated�with�a�Type�I�error—for�example,�such� that�lives�are�lost,�as�in�the�medical�profession—then�one�would�want�to�select�a�relatively� small�level�of�significance�(e�g�,��01�or�smaller)��A�small�alpha�would�translate�to�a�very�small� probability�of�rejecting�the�null�if�it�were�really�true�(i�e�,�a�small�probability�of�making�an� 130 An Introduction to Statistical Concepts incorrect�decision)��If�there�is�a�relatively�low�cost�associated�with�a�Type�I�error—for�exam- ple,�such�that�children�have�to�eat�the�second-rated�candy�rather�than�the�first—then�select- ing�a�larger�level�of�significance�may�be�appropriate�(e�g�,��05�or�larger)��Costs�are�not�always� known,�however��A�second�consideration�is�the�level�of�significance�commonly�used�in�your� field� of� study�� In� many� disciplines,� the� �05� level� of� significance� has� become� the� standard� (although�no�one�seems�to�have�a�really�good�rationale)��This�is�true�in�many�of�the�social� and�behavioral�sciences��Thus,�you�would�do�well�to�consult�the�published�literature�in�your� field�to�see�if�some�standard�is�commonly�used�and�to�consider�it�for�your�own�research� Step 3:�The�third�step�in�the�decision-making�process�is�to�calculate�the�test�statistic��For� the� one-sample� mean� test,� we� will� compute� the� sample� mean� Y – � and� compare� it� to� the� hypothesized�value�μ0��This�allows�us�to�determine�the�size�of�the�difference�between� Y – � and� μ0,� and� subsequently,� the� probability� associated� with� the� difference�� The� larger� the�difference,�the�more�likely�it�is�that�the�sample�mean�really�differs�from�the�hypoth- esized�mean�value�and�the�larger�the�probability�associated�with�the�difference� Step 4:�The�fourth�and�final�step�in�the�decision-making�process�is�to�make�a�statistical�deci- sion�regarding�the�null�hypothesis�H0��That�is,�a�decision�is�made�whether�to�reject�H0�or�to� fail�to�reject�H0��If�the�difference�between�the�sample�mean�and�the�hypothesized�value�is� large�enough�relative�to�the�critical�value�(we�will�talk�about�critical�values�in�more�detail� later),�then�our�decision�is�to�reject�H0��If�the�difference�between�the�sample�mean�and�the� hypothesized�value�is�not�large�enough�relative�to�the�critical�value,�then�our�decision�is�to� fail�to�reject�H0��This�is�the�basic�four-step�process�for�hypothesis�testing�of�any�inferential� test��The�specific�details�for�the�test�of�a�single�mean�are�given�in�the�following�section� 6.5 Inferences About μ When σ Is Known In�this�section,�we�examine�how�hypotheses�about�a�single�mean�are�conducted�when�the� population�standard�deviation�is�known��Specifically,�we�consider�the�z�test,�an�example� illustrating�the�use�of�the�z�test,�and�how�to�construct�a�CI�around�the�mean� 6.5.1 z Test Recall�from�Chapter�4�the�definition�of�a�z�score�as � z Yi Y = − µ σ where Yi�is�the�score�on�variable�Y�for�individual�i μ�is�the�population�mean�for�variable�Y σY�is�the�population�standard�deviation�for�variable�Y The�z�score�is�used�to�tell�us�how�many�standard�deviation�units�an�individual’s�score�is� from�the�mean� In� the� context� of� this� chapter,� however,� we� are� concerned� with� the� extent� to� which� a� sample�mean�differs�from�some�hypothesized�mean�value��We�can�construct�a�variation�of� 131Introduction to Hypothesis Testing: Inferences About a Single Mean the�z�score�for�testing�hypotheses�about�a�single�mean��In�this�situation,�we�are�concerned� with�the�sampling�distribution�of�the�mean�(introduced�in�Chapter�5),�so�the�equation�must� reflect�means�rather�than�raw�scores��Our�z�score�equation�for�testing�hypotheses�about�a� single�mean�becomes � z Y Y = − µ σ 0 where Y – �is�the�sample�mean�for�variable�Y μ0�is�the�hypothesized�mean�value�for�variable�Y σY– is�the�population�standard�error�of�the�mean�for�variable�Y From�Chapter�5,�recall�that�the�population�standard�error�of�the�mean�σY–�is�computed�by � σ σ Y Y n = � where σY�is�the�population�standard�deviation�for�variable�Y n�is�sample�size Thus,� the� numerator� of� the� z� score� equation� is� the� difference� between� the� sample� mean� and�the�hypothesized�value�of�the�mean,�and�the�denominator�is�the�standard�error�of�the� mean��What�we�are�really�determining�here�is�how�many�standard�deviation�(or�standard� error)� units� the� sample� mean� is� from� the� hypothesized� mean�� Henceforth,� we� call� this� variation�of�the�z�score�the�test statistic for the test of a single mean,�also�known�as�the� z�test��This�is�the�first�of�several�test�statistics�we�describe�in�this�text;�every�inferential�test� requires�some�test�statistic�for�purposes�of�testing�hypotheses� We�need�to�make�a�statistical�assumption�regarding�this�hypothesis�testing�situation��We� assume�that�z�is�normally�distributed�with�a�mean�of�0�and�a�standard�deviation�of�1��This� is�written�statistically�as�z ∼ N(0,�1)�following�the�notation�we�developed�in�Chapter�4��Thus,� the�assumption�is�that�z�follows�the�unit�normal�distribution�(in�other�words,�the�shape�of� the�distribution�is�approximately�normal)��An�examination�of�our�test�statistic�z�reveals�that� only� the� sample� mean� can� vary� from� sample� to� sample�� The� hypothesized� value� and� the� standard�error�of�the�mean�are�constant�for�every�sample�of�size�n�from�the�same�population� In�order�to�make�a�statistical�decision,�the�critical�regions�need�to�be�defined��As�the�test� statistic�is�z�and�we�have�assumed�normality,�then�the�relevant�theoretical�distribution�we� compare�the�test�statistic�to�is�the�unit�normal�distribution��We�previously�discussed�this�dis- tribution�in�Chapter�4,�and�the�table�of�values�is�given�in�Table�A�1��If�the�alternative�hypoth- esis�is�nondirectional,�then�there�would�be�two�critical�regions,�one�in�the�upper�tail�and�one� in�the�lower�tail��Here�we�would�split�the�area�of�the�critical�region,�known�as�α,�in�two��If�the� alternative�hypothesis�is�directional,�then�there�would�only�be�one�critical�region,�either�in� the�upper�tail�or�in�the�lower�tail,�depending�on�which�direction�one�is�willing�to�reject�H0� 6.5.2 example Let�us�illustrate�the�use�of�this�inferential�test�through�an�example��We�are�interested�in� testing�whether�the�population�of�undergraduate�students�from�Awesome�State�University� (ASU)�has�a�mean�intelligence�test�score�different�from�the�hypothesized�mean�value�of� 132 An Introduction to Statistical Concepts μ0�=�100�(remember�that�the�hypothesized�mean�value�does�not�come�from�our�sample�but� from�another�source;�in�this�example,�let�us�say�that�this�value�of�100�is�the�national�norm� as�presented�in�the�technical�manual�of�this�particular�intelligence�test)� Recall that our first step in hypothesis testing is to state the hypothesis��A�nondirectional�alter- native�hypothesis�is�of�interest�as�we�simply�want�to�know�if�this�population�has�a�mean� intelligence�different�from�the�hypothesized�value,�either�greater�than�or�less�than��Thus,� the�null�and�alternative�hypotheses�can�be�written�respectively�as�follows: � H H0 000 00 0: 1 or : 1µ µ= − = � H H1 1: 1 or : 1µ µ≠ − ≠00 00 0 A�sample�mean�of�Y – �=�103�is�observed�for�a�sample�of�n�=�100�ASU�undergraduate�students�� From� the� development� of� this� intelligence� test,� we� know� that� the� theoretical� population� standard�deviation�is�σY�=�15�(again,�for�purposes�of�illustration,�let�us�say�that�the�popula- tion�standard�deviation�of�15�was�noted�in�the�technical�manual�for�this�test)� Our second step is to select a level of significance��The�standard�level�of�significance�in�this� field�is�the��05�level;�thus,�we�perform�our�significance�test�at�α�=��05� The third step is to compute the test statistic value��To�compute�our�test�statistic�value,�first� we�compute�the�standard�error�of�the�mean�(the�denominator�of�our�test�statistic�formula)� as�follows: � σ σ Y Y n = = = 15 100 1 5000. Then� we� compute� the� test� statistic� z,� where� the� numerator� is� the� difference� between� the� mean�of�our�sample�(Y – �=�103)�and�the�hypothesized�mean�value�(μ0�=�100),�and�the�denomi- nator�is�the�standard�error�of�the�mean: � z Y Y = − = − = µ σ 0 103 100 1 5000 2 0000 . . Finally, in the last step, we make our statistical decision by comparing the test statistic z to the critical values��To�determine�the�critical�values�for�the�z�test,�we�use�the�unit�normal�distribution�in� Table A�1�� Since� α� =� �05� and� we� are� conducting� a� nondirectional� test,� we� need� to� find� criti- cal�values�for�the�upper�and�lower�tails,�where�the�area�of�each�of�the�two�critical�regions�is� equal�to��025�(i�e�,�splitting�alpha�in�half:�α/2�or��05/2�=��025)��From�the�unit�normal�table,�we� find�these�critical�values�to�be�+1�96�(the�point�on�the�X�axis�where�the�area�above�that�point� is�equal�to��025)�and�−1�96�(the�point�on�the�X�axis�where�the�area�below�that�point�is�equal�to� �025)��As�shown�in�Figure�6�2,�the�test�statistic�z�=�2�00�falls�into�the�upper�tail�critical�region,� just�slightly�larger�than�the�upper�tail�critical�value�of�+1�96��Our�decision�is�to�reject�H0�and� conclude�that�the�ASU�population�from�which�the�sample�was�selected�has�a�mean�intelligence� score�that�is�statistically�significantly�different�from�the�hypothesized�mean�of�100�at�the��05� level�of�significance� A�more�precise�way�of�thinking�about�this�process�is�to�determine�the�exact probability� of�observing�a�sample�mean�that�differs�from�the�hypothesized�mean�value��From�the�unit� normal�table,�the�area�above�z�=�2�00�is�equal�to��0228��Therefore,�the�area�below�z�=�−2�00�is� also�equal�to��0228��Thus,�the�probability�p�of�observing,�by�chance,�a�sample�mean�of�2�00� or�more�standard�errors�(i�e�,�z�=�2�00)�from�the�hypothesized�mean�value�of�100,�in�either� direction,�is�two�times�the�observed�probability�level�or�p�=�(2)(�0228)�= �0456��To�put�this�in� 133Introduction to Hypothesis Testing: Inferences About a Single Mean the�context�of�the�values�in�this�example,�there�is�a�relatively�small�probability�(less�than�5%)� of�observing�a�sample�mean�of�103�just�by�chance�if�the�true�population�mean�is�really 100�� As� this� exact� probability� (p� =� �0456)� is� smaller� than� our� level� of� significance� α� = �05,� we� reject H0�� Thus,� there� are� two� approaches� to� dealing� with� probability�� One� approach� is� a� decision�based�solely�on�the�critical�values��We�reject�or�fail�to�reject�H0�at�a�given�α�level,� but�no�other�information�is�provided��The�other�approach�is�a�decision�based�on�compar- ing�the�exact�probability�to�the�given�α�level��We�reject�or�fail�to�reject�H0�at�a�given�α�level,� but�we�also�have�information�available�about�the�closeness�or�confidence�in�that�decision� For�this�example,�the�findings�in�a�manuscript�would�be�reported�based�on�comparing� the�p�value�to�alpha�and�reported�either�as�z�=�2�(p�<��05)�or�as�z�=�2�(p�=��0456)��(You�may� want�to�refer�to�the�style�manual�relevant�to�your�discipline,�such�as�the�Publication Manual for the American Psychological Association� (2010),� for� information� on� which� is� the� recom- mended�reporting�style�)�Obviously�the�conclusion�is�the�same�with�either�approach;�it�is� just�a�matter�of�how�the�results�are�reported��Most�statistical�computer�programs,�includ- ing�SPSS,�report�the�exact�probability�so�that�the�reader�can�make�a�decision�based�on�their� own� selected� level� of� significance�� These� programs� do� not� provide� the� critical� value(s),� which�are�only�found�in�the�appendices�of�statistics�textbooks� 6.5.3 Constructing Confidence Intervals around the Mean Recall�our�discussion�from�Chapter�5�on�CIs��CIs�are�often�quite�useful�in�inferential�sta- tistics� for� providing� the� researcher� with� an� interval� estimate� of� a� population� parameter�� Although�the�sample�mean�gives�us�a�point�estimate�(i�e�,�just�one�value)�of�a�population� mean,�a�CI�gives�us�an�interval�estimate�of�a�population�mean�and�allows�us�to�determine� the�accuracy�or�precision�of�the�sample�mean��For�the�inferential�test�of�a�single�mean,�a�CI� around�the�sample�mean�Y – �is�formed�from � Y zcv Y± σ where zcv�is�the�critical�value�from�the�unit�normal�distribution σY–�is�the�population�standard�error�of�the�mean α/2 Critical region α/2 Critical region –1.96 z critical value +1.96 z critical value +2.00 z test statistic value µ0 Hypothesized value FIGuRe 6.2 Critical�regions�for�example� 134 An Introduction to Statistical Concepts CIs�are�typically�formed�for�nondirectional�or�two-tailed�tests�as�shown�in�the�equation�� A�CI�will�generate�a�lower�and�an�upper�limit��If�the�hypothesized�mean�value�falls�within� the�lower�and�upper�limits,�then�we�would�fail�to�reject�H0��In�other�words,�if�the�hypoth- esized�mean�is�contained�in�(or�falls�within)�the�CI�around�the�sample�mean,�then�we�con- clude�that�the�sample�mean�and�the�hypothesized�mean�are�not�significantly�different�and� that�the�sample�mean�could�have�come�from�a�population�with�the�hypothesized�mean�� If� the� hypothesized� mean� value� falls� outside� the� limits� of� the� interval,� then� we� would� reject�H0��Here�we�conclude�that�it�is�unlikely�that�the�sample�mean�could�have�come�from� a�population�with�the�hypothesized�mean� One�way�to�think�about�CIs�is�as�follows��Imagine�we�take�100�random�samples�of�the� same�sample�size�n,�compute�each�sample�mean,�and�then�construct�each�95%�CI��Then�we� can�say�that�95%�of�these�CIs�will�contain�the�population�parameter�and�5%�will�not��In� short,� 95%� of� similarly� constructed� CIs� will� contain� the� population� parameter�� It� should� also� be� mentioned� that� at� a� particular� level� of� significance,� one� will� always� obtain� the� same�statistical�decision�with�both�the�hypothesis�test�and�the�CI��The�two�procedures�use� precisely� the� same� information�� The� hypothesis� test� is� based� on� a� point� estimate;� the� CI� is�based�on�an�interval�estimate�providing�the�researcher�with�a�little�more�information� For�the�ASU�example�situation,�the�95%�CI�would�be�computed�by � Y zcv Y± = ± = ± =σ 103 1 96 1 5 103 2 94 100 06 105 94. ( . ) . ( . , . ) Thus,�the�95%�CI�ranges�from�100�06�to�105�94��Because�the�interval�does�not�contain�the� hypothesized�mean�value�of�100,�we�reject�H0�(the�same�decision�we�arrived�at�by�walking� through�the�steps�for�hypothesis�testing)��Thus,�it�is�quite�unlikely�that�our�sample�mean� could�have�come�from�a�population�distribution�with�a�mean�of�100� 6.6 Type II Error (β) and Power (1 − β) In� this� section,� we� complete� our� discussion� of� Type� II� error� (β)� and� power� (1� −� β)�� First� we�return�to�our�rain�example�and�discuss�the�entire�decision-making�context��Then�we� describe�the�factors�which�determine�power� 6.6.1 Full decision-Making Context Previously,� we� defined� Type� II� error� as� the� probability� of� failing� to� reject� H0� when� H0� is� really�false��In�other�words,�in�reality,�H0�is�false,�yet�we�made�a�decision�error�and�did�not� reject�H0��The�probability�associated�with�a�Type�II�error�is�denoted�by�β��Power�is�a�related� concept�and�is�defined�as�the�probability�of�rejecting�H0�when�H0�is�really�false��In�other� words,�in�reality,�H0�is�false,�and�we�made�the�correct�decision�to�reject�H0��The�probability� associated�with�power�is�denoted�by�1�−�β��Let�us�return�to�our�“rain”�example�to�describe� Type�I�and�Type�II�errors�and�power�more�completely� The�full�decision-making�context�for�the�“rain”�example�is�given�in�Figure�6�3��The�dis- tribution�on�the�left-hand�side�of�the�figure�is�the�sampling�distribution�when�H0�is�true,� meaning�in�reality�it�does�not�rain��The�vertical�line�represents�the�critical�value�for�decid- ing� whether� to� carry� an� umbrella� or� not�� To� the� left� of� the� vertical� line,� we� do� not� carry� an� umbrella,� and� to� the� right� side� of� the� vertical� line,� we� do� carry� an� umbrella�� For� the� 135Introduction to Hypothesis Testing: Inferences About a Single Mean no-rain�sampling�distribution�on�the�left,�there�are�two�possibilities��First,�we�do�not�carry� an�umbrella�and�it�does�not�rain��This�is�the�unshaded�portion�under�the�no-rain�sampling� distribution�to�the�left�of�the�vertical�line��This is a correct decision,�and�the�probability�asso- ciated�with�this�decision�is�1�−�α��Second,�we�do�carry�an�umbrella�and�it�does�not�rain�� This�is�the�shaded�portion�under�the�no-rain�sampling�distribution�to�the�right�of�the�verti- cal�line��This is an incorrect decision,�a�Type�I�error,�and�the�probability�associated�with�this� decision�is�α/2�in�either�the�upper�or�lower�tail,�and�α�collectively� The�distribution�on�the�right-hand�side�of�the�figure�is�the�sampling�distribution�when� H0� is� false,� meaning� in� reality,� it� does� rain�� For� the� rain� sampling� distribution,� there� are� two�possibilities��First,�we�do�carry�an�umbrella�and�it�does�rain��This�is�the�unshaded�por- tion�under�the�rain�sampling�distribution�to�the�right�of�the�vertical�line��This�is�a�correct decision,�and�the�probability�associated�with�this�decision�is�1�−�β�or�power��Second,�we�do� not�carry�an�umbrella�and�it�does�rain��This�is�the�shaded�portion�under�the�rain�sampling� distribution�to�the�left�of�the�vertical�line��This�is�an�incorrect decision,�a�Type�II�error,�and� the�probability�associated�with�this�decision�is�β� As�a�second�illustration,�consider�again�the�example�intelligence�test�situation��This�situ- ation�is� depicted� in� Figure� 6�4��The�distribution� on� the�left-hand� side�of�the�figure� is� the� sampling�distribution�of�Y – �when�H0�is�true,�meaning�in�reality,�μ�=�100��The�distribution�on� the�right-hand�side�of�the�figure�is�the�sampling�distribution�of�Y – �when�H1�is�true,�meaning� in�reality,�μ�=�115�(and�in�this�example,�while�there�are�two�critical�values,�only�the�right� tail�matters�as�that�relates�to�the�H1�sampling�distribution)��The�vertical�line�represents�the� critical� value� for� deciding� whether� to� reject� the� null� hypothesis� or� not�� To� the� left� of� the� vertical�line,�we�do�not�reject�H0�and�to�the�right�of�the�vertical�line,�we�reject�H0��For�the�H0� is�true�sampling�distribution�on�the�left,�there�are�two�possibilities��First,�we�do�not�reject� H0�and�H0�is�really�true��This is the unshaded portion under the H0�is true sampling distribution to the left of the vertical line��This�is�a�correct�decision,�and�the�probability�associated�with� this�decision�is�1�−�α��Second,�we�reject�H0�and�H0�is�true��This is the shaded portion under the H0�is true sampling distribution to the right of the vertical line��This�is�an�incorrect�decision,�a� Type II error (got wet) Do not carry umbrella. Correct decision Correct decision Do carry umbrella. Sampling distribution when H0 “No Rain” is true. Sampling distribution when H0 “No Rain” is false. Type I error (did not need umbrella) FIGuRe 6.3 Sampling�distributions�for�the�rain�case� 136 An Introduction to Statistical Concepts Type�I�error,�and�the�probability�associated�with�this�decision�is�α/2�in�either�the�upper�or� lower�tail,�and�α�collectively� The�distribution�on�the�right-hand�side�of�the�figure�is�the�sampling�distribution�when� H0�is�false,�and�in�particular,�when�H1:�μ�=�115�is�true��This�is�a�specific�sampling�distribu- tion�when�H0�is�false,�and�other�possible�sampling�distributions�can�also�be�examined�(e�g�,� μ�=�85,�110)��For�the�H1:�μ�=�115�is� true�sampling�distribution,� there�are� two�possibilities�� First,�we�do�reject�H0,�as�H0�is�really�false,�and�H1:�μ�=�115�is�really�true��This�is�the�unshaded� portion�under�the�H1:�μ�=�115�is�true�sampling�distribution�to�the�right�of�the�vertical�line�� This� is� a� correct� decision,� and� the� probability� associated� with� this� decision� is� 1� −� β� or� power��Second,�we�do�not�reject�H0,�H0�is�really�false,�and�H1:�μ�=�115�is�really�true��This� is�the�shaded�portion�under�the�H1:�μ�=�115�is�true�sampling�distribution�to�the�left�of�the� vertical�line��This�is�an�incorrect�decision,�a�Type�II�error,�and�the�probability�associated� with�this�decision�is�β� 6.6.2 power determinants Power�is�determined�by�five�different�factors:�(1)�level�of�significance,�(2)�sample�size,�(3)�popu- lation�standard�deviation,�(4)�difference�between�the�true�population�mean�μ�and�the�hypoth- esized�mean�value�μ0,�and�(5)�directionality�of�the�test�(i�e�,�one-�or�two-tailed�test)��Let�us�talk� about�each�of�these�factors�in�more�detail� First,�power�is�determined�by�the�level�of�significance�α��As�α�increases,�power�increases�� Thus,�if�α�increases�from��05�to��10,�then�power�will�increase��This�would�occur�in�Figure�6�4� if�the�vertical�line�were�shifted�to�the�left�(thus�creating�a�larger�critical�region�and�thereby� making� it� easier� to� reject� the� null� hypothesis)�� This� would� increase� the� α� level� and� also� increase�power��This�factor�is�under�the�control�of�the�researcher� Second,�power�is�determined�by�sample�size��As�sample�size�n�increases,�power�increases�� Thus,�if�sample�size�increases,�meaning�we�have�a�sample�that�consists�of�a�larger�propor- tion�of�the�population,�this�will�cause�the�standard�error�of�the�mean�to�decrease,�as�there� Type II error (β) Do not reject H0. Correct decision (1 – α) Correct decision (1 – β) Reject H0. Sampling distribution when H0: µ = 100 is true. Sampling distribution when H1: µ = 115 is true (i.e., H0: µ = 100 is false). Type I error (α/2) Type I error (α/2) Critical value Critical value FIGuRe 6.4 Sampling�distributions�for�the�intelligence�test�case� 137Introduction to Hypothesis Testing: Inferences About a Single Mean is�less�sampling�error�with�larger�samples��This�would�also�result�in�the�vertical�line�being� moved� to� the� left� (again� thereby� creating� a� larger� critical� region� and� thereby� making� it� easier�to�reject�the�null�hypothesis)��This�factor�is�also�under�the�control�of�the�researcher�� In� addition,� because� a� larger� sample� yields� a� smaller� standard� error,� it� will� be� easier� to� reject�H0�(all�else�being�equal),�and�the�CIs�generated�will�also�be�narrower� Third,�power�is�determined�by�the�size�of�the�population�standard�deviation�σ��Although� not� under� the� researcher’s� control,� as� σ� increases,� power� decreases�� Thus,� if� σ� increases,� meaning�the�variability�in�the�population�is�larger,�this�will�cause�the�standard�error�of�the� mean�to�increase�as�there�is�more�sampling�error�with�larger�variability��This�would�result� in�the�vertical�line�being�moved�to�the�right��If�σ�decreases,�meaning�the�variability�in�the� population�is�smaller,�this�will�cause�the�standard�error�of�the�mean�to�decrease�as�there� is�less�sampling�error�with�smaller�variability��This�would�result�in�the�vertical�line�being� moved�to�the�left��Considering,�for�example,�the�one-sample�mean�test,�the�standard�error� of�the�mean�is�the�denominator�of�the�test�statistic�formula��When�the�standard�error�term� decreases,�the�denominator�is�smaller�and�thus�the�test�statistic�value�becomes�larger�(and� thereby�easier�to�reject�the�null�hypothesis)� Fourth,�power�is�determined�by�the�difference�between�the�true�population�mean�μ�and� the�hypothesized�mean�value�μ0��Although�not�always�under�the�researcher’s�control�(only� in�true�experiments�as�described�in�Chapter�14),�as�the�difference�between�the�true�popula- tion�mean�and�the�hypothesized�mean�value�increases,�power�increases��Thus,�if�the�differ- ence�between�the�true�population�mean�and�the�hypothesized�mean�value�is�large,�it�will� be� easier� to� correctly� reject� H0�� This� would� result� in� greater� separation� between� the� two� sampling�distributions��In�other�words,�the�entire�H1�is�true�sampling�distribution�would� be� shifted� to� the� right�� Consider,� for� example,� the� one-sample� mean� test�� The� numerator� is�the�difference�between�the�means��The�larger�the�numerator�(holding�the�denominator� constant),�the�more�likely�it�will�be�to�reject�the�null�hypothesis� Finally,� power� is� determined� by� directionality� and� type� of� statistical� procedure— whether� we� conduct� a� one-� or� a� two-tailed� test� as� well� as� the� type� of� test� of� inference�� There�is�greater�power�in�a�one-tailed�test,�such�as�when�μ�>�100,�than�in�a�two-tailed�test��
In�a�one-tailed�test,�the�vertical�line�will�be�shifted�to�the�left,�creating�a�larger�rejection�
region��This�factor�is�under�the�researcher’s�control��There�is�also�often�greater�power�in�
conducting�parametric�as�compared�to�nonparametric�tests�of�inference�(we�will�talk�more�
about� parametric� versus� nonparametric� tests� in� later� chapters)�� This� factor� is� under� the�
researcher’s�control�to�some�extent�depending�on�the�scale�of�measurement�of�the�variables�
and�the�extent�to�which�the�assumptions�of�parametric�tests�are�met�
Power� has� become� of� much� greater� interest� and� concern� to� the� applied� researcher� in�
recent�years��We�begin�by�distinguishing�between�a priori power,�when�power�is�deter-
mined�as�a�study�is�being�planned�or�designed�(i�e�,�prior�to�the�study),�and�post hoc power,�
when�power�is�determined�after�the�study�has�been�conducted�and�the�data�analyzed�
For�a�priori�power,�if�you�want�to�insure�a�certain�amount�of�power�in�a�study,�then�you�can�
determine�what�sample�size�would�be�needed�to�achieve�such�a�level�of�power��This�requires�
the�input�of�characteristics�such�as�α,�σ,�the�difference�between�μ�and�μ0,�and�one-�versus�two-
tailed�test��Alternatively,�one�could�determine�power�given�each�of�those�characteristics��This�
can� be� done� either� by� using� statistical� software� [such� as� Power� and� Precision,� Ex-Sample,�
G*Power�(freeware),�or�a�CD�provided�with�the�Murphy,�Myors,�and�Wolach�(2008)�text]�or�
by�using�tables�[the�most�definitive�collection�of�tables�being�in�Cohen�(1988)]�
For�post�hoc�power�(also�called�observed�power),�most�statistical�software�packages�(e�g�,�
SPSS,�SAS,�STATGRAPHICS)�will�compute�this�as�part�of�the�analysis�for�many�types�of�
inferential� statistics� (e�g�,� analysis� of� variance)�� However,� even� though� post� hoc� power� is�

138 An Introduction to Statistical Concepts
routinely�reported�in�some�journals,�it�has�been�found�to�have�some�flaws��For�example,�
Hoenig�and�Heisey�(2001)�concluded�that�it�should�not�be�used�to�aid�in�interpreting�non-
significant� results�� They� found� that� low� power� may� indicate� a� small� effect� (e�g�,� a� small�
mean�difference)�rather�than�an�underpowered�study��Thus,�increasing�sample�size�may�
not� make� much� of� a� difference�� Yuan� and� Maxwell� (2005)� found� that� observed� power� is�
almost�always�biased�(too�high�or�too�low),�except�when�true�power�is��50��Thus,�we�do�not�
recommend� the� sole� use� of� post� hoc� power� to� determine� sample� size� in� the� next� study;�
rather�it�is�recommended�that�CIs�be�used�in�addition�to�post�hoc�power��(An�example�pre-
sented�later�in�this�chapter�will�use�G*Power�to�illustrate�both�a�priori�sample�size�require-
ments�given�desired�power�and�post�hoc�power�analysis�)
6.7 Statistical Versus Practical Significance
We�have�discussed�the�inferential�test�of�a�single�mean�in�terms�of�statistical�significance��
However,�are�statistically�significant�results�always�practically�significant?�In�other�words,�
if�a�result�is�statistically�significant,�should�we�make�a�big�deal�out�of�this�result�in�a�practi-
cal�sense?�Consider�again�the�simple�example�where�the�null�and�alternative�hypotheses�
are�as�follows:
� H H0 000 00 0: 1 or : 1µ µ= − = �
� H H1 1: 1 or : 1µ µ≠ − ≠00 00 0
A�sample�mean�intelligence�test�score�of�Y
–
�=�101�is�observed�for�a�sample�size�of�n�=�2000�and�
a�known�population�standard�deviation�of�σY�=�15��If�we�perform�the�test�at�the��01�level�of�
significance,�we�find�we�are�able�to�reject�H0�even�though�the�observed�mean�is�only�1�unit�
away�from�the�hypothesized�mean�value��The�reason�is,�because�the�sample�size�is�rather�
large,�a�rather�small�standard�error�of�the�mean�is�computed�(σY–�=�0�3354),�and�we�thus�reject�
H0�as�the�test�statistic�(z�=�2�9815)�exceeds�the�critical�value�(z�=�2�5758)��Holding�the�mean�
and�standard�deviation�constant,�if�we�had�a�sample�size�of�200�instead�of�2000,�the�standard�
error�becomes�much�larger�(σY–�=�1�0607),�and�we�thus�fail�to�reject�H0�as�the�test�statistic�
(z�=�0�9428)�does�not�exceed�the�critical�value�(z�=�2�5758)��From�this�example,�we�can�see�how�
the�sample�size�can�drive�the�results�of�the�hypothesis�test,�and�how�it�is�possible�that�statisti-
cal�significance�can�be�influenced�simply�as�an�artifact�of�sample�size�
Should�we�make�a�big�deal�out�of�an�intelligence�test�sample�mean�that�is�1�unit�away�
from�the�hypothesized�mean�intelligence?�The�answer�is�“maybe�not�”�If�we�gather�enough�
sample� data,� any� small� difference,� no� matter� how� small,� can� wind� up� being� statistically�
significant�� Thus,� larger� samples� are� more� likely� to� yield� statistically� significant� results��
Practical�significance�is�not�entirely�a�statistical�matter��It�is�also�a�matter�for�the�substan-
tive� field� under� investigation�� Thus,� the� meaningfulness� of� a� small� difference� is� for� the�
substantive�area�to�determine��All�that�inferential�statistics�can�really�determine�is�statis-
tical� significance�� However,� we� should� always� keep� practical� significance� in� mind� when�
interpreting�our�findings�
In�recent�years,�a�major�debate�has�been�ongoing�in�the�statistical�community�about�the�
role�of�significance�testing��The�debate�centers�around�whether�null�hypothesis�significance�
testing�(NHST)�best�suits�the�needs�of�researchers��At�one�extreme,�some�argue�that�NHST�is�

139Introduction to Hypothesis Testing: Inferences About a Single Mean
fine�as�is��At�the�other�extreme,�others�argue�that�NHST�should�be�totally�abandoned��In�the�
middle,�yet�others�argue�that�NHST�should�be�supplemented�with�measures�of�effect�size��In�
this�text,�we�have�taken�the�middle�road�believing�that�more�information�is�a�better�choice�
Let�us�formally�introduce�the�notion�of�effect size��While�there�are�a�number�of�different�
measures�of�effect�size,�the�most�commonly�used�measure�is�Cohen’s�δ�(delta)�or�d�(1988)��
For�the�population�case�of�the�one-sample�mean�test,�Cohen’s�delta�is�computed�as�follows:
� δ
µ µ
σ
=
− 0
For�the�corresponding�sample�case,�Cohen’s�d�is�computed�as�follows:
� d
Y
s
=
− µ0
For�the�one-sample�mean�test,�d�indicates�how�many�standard�deviations�the�sample�mean�
is�from�the�hypothesized�mean��Thus,�if�d�=�1�0,�the�sample�mean�is�one�standard�deviation�
away� from� the� hypothesized� mean�� Cohen� has� proposed� the� following� subjective� stan-
dards�for�the�social�and�behavioral�sciences�as�a�convention�for�interpreting�d:�small�effect�
size,�d�=��2;�medium�effect�size,�d�=��5;�large�effect�size,�d�=��8��Interpretation�of�effect�size�
should�always�be�made�first�based�on�a�comparison�to�similar�studies;�what�is�considered�
a�“small”�effect�using�Cohen’s�rule�of�thumb�may�actually�be�quite�large�in�comparison�to�
other�related�studies�that�have�been�conducted��In�lieu�of�a�comparison�to�other�studies,�
such�as�in�those�cases�where�there�are�no�or�minimal�related�studies,�then�Cohen’s�subjec-
tive�standards�may�be�appropriate�
Computing�CIs�for�effect�sizes�is�also�valuable��The�benefit�in�creating�CIs�for�effect�size�
values�is�similar�to�that�of�creating�CIs�for�parameter�estimates—CIs�for�the�effect�size�pro-
vide�an�added�measure�of�precision�that�is�not�obtained�from�knowledge�of�the�effect�size�
alone�� Computing� CIs� for� effect� size� indices,� however,� is� not� as� straightforward� as� simply�
plugging�in�known�values�into�a�formula��This�is�because�d�is�a�function�of�both�the�popula-
tion�mean�and�population�standard�deviation�(Finch�&�Cumming,�2009)��Thus,�specialized�
software�must�be�used�to�compute�the�CIs�for�effect�sizes,�and�interested�readers�are�referred�
to�appropriate�sources�(e�g�,�Algina�&�Keselman,�2003;�Algina,�Keselman,�&�Penfield,�2005;�
Cumming�&�Finch,�2001)�
While�a�complete�discussion�of�these�issues�is�beyond�this�text,�further�information�on�
effect� sizes� can� be� seen� in� special� sections� of� Educational and Psychological Measurement�
(2001a;� 2001b)� and� Grissom� and� Kim� (2005),� while� additional� material� on� NHST� can� be�
viewed� in� Harlow,� Mulaik,� and� Steiger� (1997)� and� a� special� section� of� Educational and
Psychological Measurement� (2000,� October)�� Additionally,� style� manuals� (e�g�,� American�
Psychological�Association,�2010)�often�provide�useful�guidelines�on�reporting�effect�size�
6.8 Inferences About μ When σ Is Unknown
We�have�already�considered�the�inferential�test�involving�a�single�mean�when�the�popula-
tion�standard�deviation�σ�is�known��However,�rarely�is�σ�known�to�the�applied�researcher��
When�σ�is�unknown,�then�the�z�test�previously�discussed�is�no�longer�appropriate��In�this�

140 An Introduction to Statistical Concepts
section,�we�consider�the�following:�the�test�statistic�for�inferences�about�the�mean�when�the�
population�standard�deviation�is�unknown,�the�t�distribution,�the�t�test,�and�an�example�
using�the�t�test�
6.8.1 New Test Statistic t
What�is�the�applied�researcher�to�do�then�when�σ�is�unknown?�The�answer�is�to�estimate�
σ�by�the�sample�standard�deviation�s��This�changes�the�standard�error�of�the�mean�to�be
� s
s
nY
Y=
Now�we�are�estimating�two�population�parameters:�(1)�the�population�mean,�μY,�is�being�
estimated�by�the�sample�mean,�Y
–
;�and�(2)�the�population�standard�deviation,�σY,�is�being�
estimated�by�the�sample�standard�deviation,�sY��Both�Y
–
�and�sY�can�vary�from�sample�to�
sample��Thus,�although�the�sampling�error�of�the�mean�is�taken�into�account�explicitly�in�
the�z�test,�we�also�need�to�take�into�account�the�sampling�error�of�the�standard�deviation,�
which�the�z�test�does�not�at�all�consider��We�now�develop�a�new�inferential�test�for�the�
situation�where�σ�is�unknown��The�test�statistic�is�known�as�the�t�test�and�is�computed�
as�follows:
�
t
Y
sY
=
− µ0
The� t� test� was� developed� by� William� Sealy� Gossett,� also� known� by� the� pseudonym�
Student,�previously�mentioned�in�Chapter�1��The�unit�normal�distribution�cannot�be�used�
here� for� the� unknown� σ� situation�� A� different� theoretical� distribution� must� be� used� for�
determining�critical�values�for�the�t�test,�known�as�the�t�distribution�
6.8.2 t distribution
The�t�distribution�is�the�theoretical�distribution�used�for�determining�the�critical�values�of�
the�t�test��Like�the�normal�distribution,�the�t�distribution�is�actually�a�family�of�distribu-
tions�� There� is� a� different� t� distribution� for� each� value� of� degrees� of� freedom�� However,�
before�we�look�more�closely�at�the�t�distribution,�some�discussion�of�the�degrees of free-
dom�concept�is�necessary�
As�an�example,�say�we�know�a�sample�mean�Y
–
�=�6�for�a�sample�size�of�n�=�5��How�many�
of� those� five� observed� scores� are� free� to� vary?� The� answer� is� that� four� scores� are� free� to�
vary��If�the�four�known�scores�are�2,�4,�6,�and�8�and�the�mean�is�6,�then�the�remaining�score�
must�be�10��The�remaining�score�is�not�free�to�vary,�but�is�already�totally�determined��We�
see�this�in�the�following�equation�where,�to�arrive�at�a�solution�of�6,�the�sum�in�the�numera-
tor�must�equal�30,�and�Y5�must�be�10:
�
Y
Y
n
Y
Y
i
i
n
i
i= = =
+ + + +
== =
∑ ∑
1 1
5
5
5
2 4 6 8
5
6
Therefore,�the�number�of�degrees�of�freedom�is�equal�to�4�in�this�particular�case�and�n�−�1�
in�general��For�the�t�test�being�considered�here,�we�specify�the�degrees�of�freedom�as�

141Introduction to Hypothesis Testing: Inferences About a Single Mean
ν� =� n� −� 1� (ν� is� the� Greek� letter� “nu”)�� We� use� ν� often� in� statistics� to� denote� some� type� of�
degrees�of�freedom�
Another�way�to�think�about�degrees�of�freedom�is�that�we�know�the�sum�of�the�devia-
tions� from� the� mean� must� equal� 0� (recall� the� unsquared� numerator� of� the� variance� con-
ceptual�formula)��For�example,�if�n�=�10,�there�are�10�deviations�from�the�mean��Once�the�
mean�is�known,�only�nine�of�the�deviations�are�free�to�vary��A�final�way�to�think�about�this�
is�that,�in�general,�df�=�(n�−�number�of�restrictions)��For�the�one-sample�t�test,�because�the�
population�variance�is�unknown,�we�have�to�estimate�it�resulting�in�one�restriction��Thus,�
df�=�(n�−�1)�for�this�particular�inferential�test�
Several�members�of�the�family�of�t�distributions�are�shown�in�Figure�6�5��The�distribu-
tion�for�ν�=�1�has�thicker�tails�than�the�unit�normal�distribution�and�a�shorter�peak��This�
indicates�that�there�is�considerable�sampling�error�of�the�sample�standard�deviation�with�
only�two�observations�(as�ν�=�2�−�1�=�1)��For�ν�=�5,�the�tails�are�thinner�and�the�peak�is�
taller�than�for�ν�=�1��As�the�degrees�of�freedom�increase,�the�t�distribution�becomes�more�
nearly� normal�� For� ν� =� ∞� (i�e�,� infinity),� the� t� distribution� is� precisely� the� unit� normal�
distribution�
A� few� important� characteristics� of� the� t� distribution� are� worth� mentioning�� First,� like�
the�unit�normal�distribution,�the�mean�of�any�t�distribution�is�0,�and�the�t�distribution�is�
symmetric�around�the�mean�and�unimodal��Second,�unlike�the�unit�normal�distribution,�
which�has�a�variance�of�1,�the�variance�of�a�t�distribution�is�as�follows:
� σ
ν
ν
ν2
2
2=
−
>for
Thus,�the�variance�of�a�t�distribution�is�somewhat�greater�than�1�but�approaches�1�as�
ν�increases�
The�table�for�the�t�distribution�is�given�in�Table�A�2,�and�a�snapshot�of�the�table�is�pre-
sented�in�Figure�6�6�for�illustration�purposes��In�looking�at�the�table,�each�column�header�
has�two�values��The�top�value�is�the�significance�level�for�a�one-tailed�test,�denoted�by�α1��
Thus,�if�you�were�doing�a�one-tailed�test�at�the��05�level�of�significance,�you�want�to�look�in�
the�second�column�of�numbers��The�bottom�value�is�the�significance�level�for�a�two-tailed�
test,�denoted�by�α2��Thus,�if�you�were�doing�a�two-tailed�test�at�the��05�level�of�significance,�
you�want�to�look�in�the�third�column�of�numbers��The�rows�of�the�table�denote�the�various�
degrees�of�freedom�ν�
0.4
0.3
0.2
Re
la
tiv
e
fr
eq
ue
nc
y
0.1
0
–4 0
t
4
1
5
Normal
FIGuRe 6.5
Several�members�of�the�family�of�t�distributions�

142 An Introduction to Statistical Concepts
Thus,�if�ν�=�3,�meaning�n�=�4,�you�want�to�look�in�the�third�row�of�numbers��If�ν�=�3�for�
α1�=��05,�the�tabled�value�is�2�353��This�value�represents�the�95th�percentile�point�in�a�t�dis-
tribution�with�three�degrees�of�freedom��This�is�because�the�table�only�presents�the�upper�
tail�percentiles��As�the�t�distribution�is�symmetric�around�0,�the�lower�tail�percentiles�are�
the�same�values�except�for�a�change�in�sign��The�fifth�percentile�for�three�degrees�of�free-
dom�then�is�−2�353��Thus,�for�a�right-tailed�directional�hypothesis,�the�critical�value�will�be�
+2�353,�and�for�a�left-tailed�directional�hypothesis,�the�critical�value�will�be�−2�353�
If�ν�=�120�for�α1�=��05,�then�the�tabled�value�is�1�658��Thus,�as�sample�size�and�degrees�of�
freedom�increase,�the�value�of�t�decreases��This�makes�it�easier�to�reject�the�null�hypothesis�
when�sample�size�is�large�
6.8.3 t Test
Now�that�we�have�covered�the�theoretical�distribution�underlying�the�test�of�a�single�mean�
for�an�unknown�σ,�we�can�go�ahead�and�look�at�the�inferential�test��First,�the�null�and�alter-
native�hypotheses�for�the�t�test�are�written�in�the�same�fashion�as�for�the�z�test�presented�
earlier��Thus,�for�a�two-tailed�test,�we�have�the�same�notation�as�previously�presented:
� H H0 000 00 0: 1 or : 1µ µ= − =
� H H1 1: 1 or : 1µ µ≠ − ≠00 00 0 �
The�test�statistic�t�is�written�as�follows:
�
t
Y
sY
=
− µ0
In�order�to�use�the�theoretical�t�distribution�to�determine�critical�values,�we�must�assume�
that�Yi�∼�N(μ,�σ2)�and�that�the�observations�are�independent�of�each�other�(also�referred�to�
as�“independent�and�identically�distributed”�or�IID)��In�terms�of�the�distribution�of�scores�
on� Y,� in� other� words,� we� assume� that� the� population� of� scores� on� Y� is� normally� distrib-
uted�with�some�population�mean�μ�and�some�population�variance�σ2��The�most�important�
assumption�for�the�t�test�is�normality�of�the�population��Conventional�research�has�shown�
that� the� t� test� is� very� robust� to� nonnormality� for� a� two-tailed� test� except� for� very� small�
samples�(e�g�,�n�<�5)��The�t�test�is�not�as�robust�to�nonnormality�for�a�one-tailed�test,�even� for�samples�as�large�as�40�or�more�(e�g�,�Noreen,�1989;�Wilcox,�1993)��Recall�from�Chapter�5� on�the�central�limit�theorem�that�when�sample�size�increases,�the�sampling�distribution�of� the�mean�becomes�more�nearly�normal��As�the�shape�of�a�population�distribution�may�be� unknown,�conservatively�one�would�do�better�to�conduct�a�two-tailed�test�when�sample� size�is�small,�unless�some�normality�evidence�is�available� 1ν = .10 1 = .20 .05 .10 .025 .050 .01 .02 .005 .010 .0025 .0050 .001 .002 .0005 .0010 1 3.078 6.314 12.706 31.821 63.657 127.32 318.31 636.62 2 1.886 2.920 4.303 6. 965 9.925 14.089 22.327 31.598 3 1.638 2.353 3.182 4.541 5.841 7.453 10.214 12.924 … … … … … … … … … FIGuRe 6.6 Snapshot�of�t�distribution�table� 143Introduction to Hypothesis Testing: Inferences About a Single Mean However,�recent�research�(e�g�,�Basu�&�DasGupta,�1995;�Wilcox,�1997,�2003)�suggests�that� small�departures�from�normality�can�inflate�the�standard�error�of�the�mean�(as�the�stan- dard�deviation�is�larger)��This�can�reduce�power�and�also�affect�control�over�Type�I�error�� Thus,�a�cavalier�attitude�about�ignoring�nonnormality�may�not�be�the�best�approach,�and� if� nonnormality� is� an� issue,� other� procedures,� such� as� the� nonparametric� Kolmogorov– Smirnov�one-sample�test,�may�be�considered��In�terms�of�the�assumption�of�independence,� this�assumption�is�met�when�the�cases�or�units�in�your�sample�have�been�randomly�selected� from� the� population�� Thus,� the� extent� to� which� this� assumption� is� met� is� dependent� on� your�sampling�design��In�reality,�random�selection�is�often�difficult�in�education�and�the� social�sciences�and�may�or�may�not�be�feasible�given�your�study� The� critical� values� for� the� t� distribution� are� obtained� from� the� t� table� in� Table� A�2,� where�you�take�into�account�the�α�level,�whether�the�test�is�one-�or�two-tailed,�and�the� degrees�of�freedom�ν�=�n�−�1��If�the�test�statistic�falls�into�a�critical�region,�as�defined�by� the�critical�values,�then�our�conclusion�is�to�reject�H0��If�the�test�statistic�does�not�fall�into� a�critical�region,�then�our�conclusion�is�to�fail�to�reject�H0��For�the�t�test,�the�critical�values� depend�on�sample�size,�whereas�for�the�z�test,�the�critical�values�do�not� As�was�the�case�for�the�z�test,�for�the�t�test,�a�CI�for�μ0�can�be�developed��The�(1�−�α)%�CI� is�formed�from � Y t scv Y± where�tcv�is�the�critical�value�from�the�t�table��If�the�hypothesized�mean�value�μ0�is�not�con- tained�in�the�interval,�then�our�conclusion�is�to�reject�H0��If�the�hypothesized�mean�value� μ0�is�contained�in�the�interval,�then�our�conclusion�is�to�fail�to�reject�H0��The�CI�procedure� for�the�t�test�then�is�comparable�to�that�for�the�z�test� 6.8.4 example Let�us�consider�an�example�of�the�entire�t�test�process��A�hockey�coach�wanted�to�determine� whether� the� mean� skating� speed� of� his� team� differed� from� the� hypothesized� league� mean� speed�of�12�seconds��The�hypotheses�are�developed�as�a�two-tailed�test�and�written�as�follows: � H H0 0 0: 12 or : 12µ µ= − = � H H1 1: 12 or : 12µ µ≠ − ≠ 0 Skating�speed�around�the�rink�was�timed�for�each�of�16�players�(data�are�given�in�Table�6�2�and� on�the�website�as�chap6data)��The�mean�speed�of�the�team�was�Y – �=�10�seconds�with�a�standard� deviation�of�sY�=�1�7889�seconds��The�standard�error�of�the�mean�is�then�computed�as�follows: � s s nY Y= = = 1 7889 16 0 4472 . . We�wish�to�conduct�a�t�test�at�α�=��05,�where�we�compute�the�test�statistic�t�as � t Y sY = − = − = − µ0 10 12 0 4472 4 4722 . . 144 An Introduction to Statistical Concepts Table 6.2 SPSS�Output�for�Skating�Example Raw data: 8, 12, 9, 7, 8, 10, 9, 11, 13.5, 8.5, 10.5, 9.5, 11.5, 12.5, 9.5, 10.5 One-Sample Statistics N Mean Std. Deviation Std. Error Mean Time 16 10.000 1.7889 .4472 One-Sample Test Test Value = 12 95% Confidence Interval of the Difference t df Sig. (2-Tailed) Mean Difference Lower Upper Time –4.472 15 “t” is the t test statistic value. .000 –2.0000 –2.953 –1.047 “Sig.” is the observed p value. It is interpreted as: there is less than a 1% probability of a sample mean of 10.00 occurring by chance if the null hypothesis is really true (i.e., if the population mean is really 12). The mean difference is simply the difference between the sample mean value (in this case, 10) and the hypothesized mean value (in this example, 12). In other words, 10 – 12 = –2.00 df are the degrees of freedom. For the one sample t test, they are calculated as n – 1 The table labeled “One-Sample Statistics” provides basic descriptive statistics for the sample. The standard error of the mean is: sY sY n =– sY 1.7889= =– 16 0.4472 .4472 t 10 – 12= = –4.472 SPSS reports the 95% confidence interval of the difference which means that in 95% of sample CIs, the true population mean difference will fall between –2.953 and –1.047. It is computed as: The 95% confidence interval of the mean (although not provided by SPSS) could also be calculated as: –2.00 ± (2.131)(.4472) ± sYtcv – –Ydifference 10 ± 2.131(0.4472) = 10 ± ( .9530) [9.047, 10.953] ± sYtcv ––Y sY– t = – Y – μ0 = 145Introduction to Hypothesis Testing: Inferences About a Single Mean We� turn� to� the�t� table� in�Table� A�2�and� determine� the�critical� values� based� on� α2�=��05� and�ν�=�15�degrees�of�freedom��The�critical�values�are�+�2�131,�which�defines�the�upper�tail� critical�region,�and�−2�131,�which�defines�the�lower�tail�critical�region��As�the�test�statistic t� (i�e�, −4�4722)� falls� into� the� lower� tail� critical� region� (i�e�,� the� test� statistic� is� less� than� the� lower� tail� critical� value),� our� decision� is� to� reject� H0� and� conclude� that� the� mean� skating� speed�of�this�team�is�significantly�different�from�the�hypothesized�league�mean�speed�at�the� �05�level�of�significance��A�95%�CI�can�be�computed�as�follows: � Y t scv Y± = ± = ± =10 2 131 0 4472 10 9530 9 0470 10 9530. ( . ) (. ) ( . , . ) As�the�CI�does�not�contain�the�hypothesized�mean�value�of�12,�our�conclusion�is�to�again� reject�H0��Thus,�there�is�evidence�to�suggest�that�the�mean�skating�speed�of�the�team�differs� from�the�hypothesized�league�mean�speed�of�12�seconds� 6.9 SPSS Here�we�consider�what�SPSS�has�to�offer�in�the�way�of�testing�hypotheses�about�a�single� mean��As�with�most�statistical�software,�the�t�test�is�included�as�an�option�in�SPSS,�but�the� z�test�is�not��Instructions�for�determining�the�one-sample�t�test�using�SPSS�are�presented� first��This�is�followed�by�additional�steps�for�examining�the�normality�assumption� One-Sample t Test Step 1:�To�conduct�the�one-sample�t�test,�go�to�“Analyze”�in�the�top�pulldown�menu,� then�select�“Compare Means,”�and�then�select�“One-Sample T Test.”�Following�the� screenshot�(step�1)�as�follows�produces�the�“One-Sample T Test”�dialog�box� A B C Step 1 146 An Introduction to Statistical Concepts Step 2:�Next,�from�the�main�“One-Sample T Test”�dialog�box,�click�the�variable�of� interest�from�the�list�on�the�left�(e�g�,�time),�and�move�it�into�the�“Test Variable”�box�by� clicking�on�the�arrow�button��At�the�bottom�right�of�the�screen�is�a�box�for�“Test Value,”� where�you�indicate�the�hypothesized�value�(e�g�,�12)� Select the variable of interest from the list on the left and use the arrow to move to the “Test Variable” box on the right. Clicking on “Options” will allow you to define a confidence interval percentage. �e default is 95% (corresponding to an alpha of .05). Step 2 Step 3 (Optional):�The�default�alpha�level�in�SPSS�is��05,�and,�thus,�the�default�cor- responding�CI�is�95%��If�you�wish�to�test�your�hypothesis�at�an�alpha�level�other�than��05� (and�thus�obtain�CIs�other�than�95%),�then�click�on�the�“Options”�button�located�in�the� top�right�corner�of�the�main�dialog�box��From�here,�the�CI�percentage�can�be�adjusted�to� correspond�to�the�alpha�level�at�which�your�hypothesis�is�being�tested��(For�purposes�of� this�example,�the�test�has�been�generated�using�an�alpha�level�of��05�) Step 3 The�one-sample�t�test�output�for�the�skating�example�is�provided�in�Table�6�2� Using Explore to Examine Normality of Sample Distribution Generating normality evidence:�As�alluded�to�earlier�in�the�chapter,�understanding� the�distributional�shape�of�your�variable,�specifically�the�extent�to�which�normality�is�a�reason- able�assumption,�is�important��In�earlier�chapters,�we�saw�how�we�could�use�the�“Explore”� tool�in�SPSS�to�generate�a�number�of�useful�descriptive�statistics��In�conducting�our�one-sample� t�test,�we�can�again�use�“Explore”�to�examine�the�extent�to�which�the�assumption�of�normal- ity� is� met� for� our� sample� distribution�� As� the� general� steps� for� accessing�“Explore”� have� 147Introduction to Hypothesis Testing: Inferences About a Single Mean been�presented�in�previous�chapters�(e�g�,�Chapter�4),�they�will�not�be�reiterated�here��After�the� variable�of�interest�has�been�selected�and�moved�to�the�“Dependent List”�box�on�the�main� “Explore”�dialog�box,�click�on�“Plots”�in�the�upper�right�corner��Place�a�checkmark�in�the� boxes�for�“Normality plots with tests”�and�also�for “Histogram.” Select the variable of interest from the list on the left and use the arrow to move to the “Dependent List” box on the right. Then click on “Plots.” Generating normality evidence Interpreting normality evidence:� We� have� already� developed� a� good� under- standing�of�how�to�interpret�some�forms�of�evidence�of�normality,�including�skewness�and� kurtosis,�histograms,�and�boxplots��Using�our�hockey�data,�the�skewness�statistic�is��299� and�kurtosis�is�−�483—both�within�the�range�of�an�absolute�value�of�2�0,�suggesting�some� evidence�of�normality��The�histogram�also�suggests�relative�normality� 3 Histogram 2 1 Fr eq ue nc y 0 8.0 10.0 Time 12.0 14.0 Mean = 10.0 Std. dev. = 1.789 N = 16 148 An Introduction to Statistical Concepts There�are�a�few�other�statistics�that�can�be�used�to�gauge�normality�as�well��Using�SPSS,�we� can�obtain�two�statistical�tests�of�normality��The�Kolmogorov–Smirnov�(K–S)�(Chakravart,� Laha,�&�Roy,�1967)�with�Lilliefors�significance�(Lilliefors,�1967)�and�the�Shapiro-Wilk�(S–W)� (Shapiro�&�Wilk,�1965)�are�tests�that�provide�evidence�of�the�extent�to�which�our�sample� distribution�is�statistically�different�from�a�normal�distribution��The�K–S�test�tends�to�be� conservative,�whereas�the�S–W�test�is�usually�considered�the�more�powerful�of�the�two�for� testing�normality�and�is�recommended�for�use�with�small�sample�sizes�(n�<�50)��Both�of� these�statistics�are�generated�from�the�selection�of�“Normality plots with tests.”� The�output�for�the�K–S�and�S–W�tests�is�presented�as�follows��As�we�have�learned�in�this� chapter,�when�the�observed�probability�(i�e�,�p�value�which�is�reported�in�SPSS�as�“Sig�”)�is� less�than�our�stated�alpha�level,�then�we�reject�the�null�hypothesis��We�follow�those�same� rules�of�interpretation�here��Regardless�of�which�test�(K–S�or�S–W)�we�examine,�both�pro- vide�the�same�evidence—our�sample�distribution�is�not�statistically�significantly�different� than�what�would�be�expected�from�a�normal�distribution� Time a Lilliefors significance correction. * This is a lower bound of the true significance. .110 16 .200 .982 16 .978 Statistic Statisticdf dfSig. Tests of Normality Kolmogorov–Smirnova Shapiro–Wilk Sig. Quantile–quantile� (Q–Q)� plots� are� also� often� examined� to� determine� evidence� of� nor- mality��Q–Q�plots�are�graphs�that�depict�quantiles�of�the�sample�distribution�to�quantiles� of� the� theoretical� normal� distribution�� Points� that� fall� on� or� closely� to� the� diagonal� line� suggest�evidence�of�normality��The�Q–Q�plot�of�our�hockey�skating�time�provides�another� form�of�evidence�of�normality� 3 2 1 Ex pe ct ed n or m al 0 –1 –2 6 8 10 Observed value 12 14 Normal Q–Q plot of time 149Introduction to Hypothesis Testing: Inferences About a Single Mean The� detrended� normal� Q–Q� plot� shows� deviations� of� the� observed� values� from� the� theoretical� normal� distribution�� Evidence� of� normality� is� suggested� when� the� points� exhibit�little�or�no�pattern�around�0�(the�horizontal�line);�however�due�to�subjectivity�in� determining�the�extent�of�a�pattern,�this�graph�can�often�be�difficult�to�interpret��Thus,� in� many� cases,� you� may� wish� to� rely� more� heavily� on� the� other� forms� of� evidence� of� normality� 0.4 0.3 0.2 0.1 D ev . fr om n or m al 0.0 –0.1 –0.2 8 10 Observed value 12 14 Detrended normal Q–Q plot of time 6.10 G*Power In�our�discussion�of�power�presented�earlier�in�this�chapter,�we�indicated�that�the�sample� size� to� achieve� a� desired� level� of� power� can� be� determined� a� priori� (before� the� study� is� conducted),�and�observed�power�can�also�be�determined�post�hoc�(after�the�study�is�con- ducted)�using�statistical�software�or�power�tables��One�freeware�program�for�calculating� power� is� G*Power� (http://www�psycho�uni-duesseldorf�de/abteilungen/aap/gpower3/),� which� can� be� used� to� compute� both� a� priori� sample� size� and� post� hoc� power� analyses� (among�other�things)��Using�the�results�of�the�one-sample�t�test�just�conducted,�let�us�uti- lize�G*Power�to�first�determine�the�required�sample�size�given�various�estimated�param- eters�and�then�compute�the�post�hoc�power�of�our�test� A Priori Sample Size Using G*Power Step 1 (A priori sample size):�As�seen�in�step�1,�there�are�several�decisions�that� need�to�be�made�from�the�initial�G*Power�screen��First,�the�correct�test�family�needs�to�be� selected��In�our�case,�we�conducted�a�one-sample�t�test;�therefore,�the�default�selection�of� http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/), 150 An Introduction to Statistical Concepts “t tests”�is�the�correct�test�family��Next,�we�need�to�select�the�appropriate�statistical�test�� The�default�is�“Correlation: Point biserial model.”�This�is�not�the�correct�option� for� us,� and� so� we� use� the� arrow� to� toggle� to�“Means: Difference from constant (one sample case).” �e default selection for “Test Family” is “t tests.” �e default selection for “Statistical Test” is “Correlation: Point biserial model.” Use the arrow to toggle to the desired statistical test. For the one sample t test, we need “Means: Difference from constant (one sample case).” Step 1 �is is the test needed for a one sample t test. Step 2 (A priori sample size):�The�“Type of Power Analysis”�desired�then� needs� to� be� selected�� The� default� is“� A� priori:� Compute� required� sample� size—given� α,� power,�and�effect�size�”�For�this�illustration,�we�will�first�conduct�an�example�of�comput- ing� the� a� priori� sample� size� (i�e�,� the� default� option),� and� then�we� will�compute� post�hoc� power��Although�we�do�not�illustrate�the�use�of�these�here,�we�see�that�there�are�also�three� additional�forms�of�power�analysis�that�can�be�conducted�using�G*Power:�(1)�compromise,� (2)�criterion,�and�(3)�sensitivity� 151Introduction to Hypothesis Testing: Inferences About a Single Mean The default selection for “Type of Power Analysis” is “A priori: Compute required sample size–given , power, and effect size.” Step 2 Step 3 (A priori sample size):� The� “Input Parameters”� must� then� be� specified�� The� first� parameter� is� the� selection� of� whether� your� test� is� one-tailed� (i�e�,� directional)� or� two-tailed� (i�e�,� nondirectional)�� In� this� example,� we� have� a� two-tailed� test,� so� we� use� the� arrow� to� toggle�“Tails”� to�“Two.”� For� a� priori� power,� we� have� to� indicate� the� anticipated� effect� size�� Your� best� estimate� of� the� effect� size� you� can� anticipate� achieving� is� usually� to� rely� on� previous� studies� that� have� been� conducted� that� are� similar� to� yours�� In� G*Power,� the� default� effect� size� is� d� =� �50�� For� purposes� of� this� illustration,� let� us� use� the� default�� The� alpha� level� must� also� be� defined�� The� default� significance� level� in� G*Power� is� �05,� which� is� the� alpha� level� we� will� be� using� for�our�example��The�desired�level�of�power�must�also�be�defined��The�G*Power�default� for� power� is� �95�� Many� researchers� in� education� and� the� behavioral� sciences� indicate� that�a�desired�power�of��80�or�above�is�usually�desired��Thus,��95�may�be�higher�than� 152 An Introduction to Statistical Concepts what� many� would� consider� sufficient� power�� For� purposes� of� this� example,� however,� we�will�use�the�default�power�of��95��Once�the�parameters�are�specified,�simply�click�on� “Calculate”�to�generate�the�a�priori�power�statistics� Once the parameters are specified, click on “Calculate.” Step 3 The “Input Parameters” to determine a prior sample size must be specified including: 1. One versus two tailed test; 2. Anticipated effect size; 3. Alpha level; and 4. Desired power. Step 4 (A priori sample size):�The�“Output Parameters”�provide�the�relevant� statistics�given�the�input�specified��In�this�example,�we�were�interested�in�determining�the� a�priori�sample�size�given�a�two-tailed�test,�with�an�anticipated�effect�size�of��50,�an�alpha� level�of��05,�and�desired�power�of��95��Based�on�those�criteria,�the�required�sample�size�for� our�one-sample�t�test�is�54��In�other�words,�if�we�have�a�sample�size�of�54�individuals�or� cases�in�our�study,�testing�at�an�alpha�level�of��05,�with�a�two-tailed�test,�and�achieving�a� moderate�effect�size�of��50,�then�the�power�of�our�test�will�be��95—the�probability�of�reject- ing�the�null�hypothesis�when�it�is�really�false�will�be�95%� 153Introduction to Hypothesis Testing: Inferences About a Single Mean Step 4 The “Output Parameters” provide the relevant statistics given the input specified. In this case, we were interested in determining the required sample size given various parameters. Based on the parameters specified, we need a sample size of 54 for our one sample t test. If�we�had�anticipated�a�smaller�effect�size,�say��20,�but�left�all�of�the�other�input�parameters� the�same,�the�required�sample�size�needed�to�achieve�a�power�of��95�increases�greatly—to�327� If a small e�ect is anticipated, the needed sample size increases greatly to achieve the desired power. Post Hoc Power Using G*Power Now,�let�us�use�G*Power�to�compute�post�hoc�power��Step�1,�as�presented�earlier�for�a�priori� power,�remains�the�same;�thus,�we�will�start�from�step�2� 154 An Introduction to Statistical Concepts Step 2 (Post hoc power):�The�“Type of Power Analysis”�desired�then�needs�to� be�selected��The�default�is�“A�priori:�Compute�required�sample�size—given�α,�power,�and� effect�size�”�To�compute�post�hoc�power,�we�need�to�select�“Post�hoc:�Compute�achieved� power—given�α,�sample�size,�and�effect�size�” Step 3 (Post hoc power):�The�“Input Parameters”�must�then�be�specified��The� first�parameter�is�the�selection�of�whether�your�test�is�one-tailed�(i�e�,�directional)�or�two- tailed�(i�e�,�nondirectional)��In�this�example,�we�have�a�two-tailed�test�so�we�use�the�arrow� to�toggle�to�“Tails”�to�“Two.”�The�achieved�or�observed�effect�size�was�−1�117��The�alpha� level� we� tested� at� was� �05,� and� the� actual� sample� size� was� 16�� Once� the� parameters� are� specified,�simply�click�on�“Calculate”�to�generate�the�achieved�power�statistics� Step 4 (Post hoc power):�The�“Output Parameters”�provide�the�relevant�statis- tics�given�the�input�specified��In�this�example,�we�were�interested�in�determining�post�hoc� power�given�a�two-tailed�test,�with�an�observed�effect�size�of�−1�117,�an�alpha�level�of��05,�and� sample�size�of�16��Based�on�those�criteria,�the�post�hoc�power�was��96��In�other�words,�with�a� sample�size�of�16�skaters�in�our�study,�testing�at�an�alpha�level�of��05,�with�a�two-tailed�test,� and�observing�a�large�effect�size�of�−1�117,�then�the�power�of�our�test�was��96—the�probability� of�rejecting�the�null�hypothesis�when�it�is�really�false�will�be�96%,�an�excellent�level�of�power�� Keep�in�mind�that�conducting�power�analysis�a�priori�is�highly�recommended�so�that�you� avoid�a�situation�where,�post�hoc,�you�find�that�the�sample�size�was�not�sufficient�to�reach�the� desired�power�(given�the�observed�effect�size�and�alpha�level)� The “Input Parameters” must be specified including: Once the parameters are specified, click on “Calculate.” Steps 2–4 1. One versus two tailed test; 2. Actual effect size (for post hoc power); 3. Alpha level; and 4. Total sample size. 155Introduction to Hypothesis Testing: Inferences About a Single Mean 6.11 Template and APA-Style Write-Up Let� us� revisit� our� graduate� research� assistant,� Marie,� who� was� working� with� Oscar,� a� local� hockey� coach,� to� assist� in� analyzing� his� team’s� data�� As� a� reminder,� her� task� was� to� assist� Oscar� in� generating� the� test� of� inference� to� answer� his� research� question,� “Is the mean skating speed of our hockey team different from the league mean speed of 12 seconds”?� Marie�suggested�a�one-sample�test�of�means�as�the�test�of�inference��A�template�for�writ- ing�a�research�question�for�a�one-sample�test�of�inference�(i�e�,�one-sample�t�test)�is�pre- sented�as�follows: Is the mean of [sample variable] different from [hypothesized mean value]? It�may�be�helpful�to�preface�the�results�of�the�one-sample�t�test�with�the�information�we� gathered�to�examine�the�extent�to�which�the�assumption�of�normality�was�met��This�assists� the�reader�in�understanding�that�you�were�thorough�in�data�screening�prior�to�conducting� the�test�of�inference� The distributional shape of skating speed was examined to determine the extent to which the assumption of normality was met. Skewness (.299, SE = .564), kurtosis (−.483, SE = 1.091), and the Shapiro-Wilk test of normality (S-W = .982, df = 16, p = .978) suggest that normality is a reasonable assumption. Visually, a relatively bell-shaped distribution displayed in the histogram (reflected similarly in the boxplot) as well as a Q–Q plot with points adhering closely to the diagonal line also suggest evidence of normality. Additionally, the boxplot did not suggest the presence of any potential outliers. These indices suggest evidence that the assumption of normality was met. An� additional� assumption� of� the� one-sample� t� test� is� the� assumption� of� independence�� This�assumption�is�met�when�the�cases�in�our�sample�have�been�randomly�selected�from� the� population�� This� is� an� often� overlooked,� but� important,� assumption� for� researchers� when� presenting� the� results� of� their� test�� One� or� two� sentences� are� usually� sufficient� to� indicate�if�this�assumption�was�met� Because the skaters in this sample represented a random sample, the assumption of independence was met. It�is� also� desirable� to�include� a� measure� of� effect� size�� Recall� our� formula�for� computing� the�effect�size,�d,�presented�earlier�in�the�chapter��Plugging�in�the�values�for�our�skating� example,�we�find�an�effect�size�of�−1�117,�interpreted�according�to�Cohen’s�(1988)�guidelines� as�a�large�effect: � d Y s = − = − = − µ0 10 12 1 7889 1 117 . . Remember�that�for�the�one-sample�mean�test,�d�indicates�how�many�standard�deviations� the�sample�mean�is�from�the�hypothesized�mean��Thus,�with�an�effect�size�of�−1�117,�there� 156 An Introduction to Statistical Concepts are�nearly�one�and�one-quarter�standard�deviation�units�between�our�sample�mean�and� the�hypothesized�mean��The�negative�sign�simply�indicates�that�our�sample�mean�was�the� smaller� mean� (as� it� is� the� first� value� in� the� numerator� of� the� formula)�� In� this� particular� example,� the� negative� effect� is� desired� as� it� suggests� the� team’s� average� skating� time� is� quicker�than�the�league�mean� Here�is�an�example�APA-style�paragraph�of�results�for�the�skating�data�(remember�that� this�will�be�prefaced�by�the�paragraph�reporting�the�extent�to�which�the�assumptions�of� the�test�were�met)� A one-sample t test was conducted at an alpha level of .05 to answer the research question: Is the mean skating speed of a hockey team dif- ferent from the league mean speed of 12 seconds? The null hypothesis stated that the team mean speed would not differ from the league mean speed of 12. The alternative hypothesis stated that the team average speed would differ from the league mean. As depicted in Table 6.2, based on a random sample of 16 skaters, there was a mean time of 10 seconds, and a standard deviation of 1.7889 seconds. When compared against the hypothesized mean of 12 seconds, the one-sample t test was shown to be statistically significant (t = −4.472, df = 15, p < .001). Therefore, the null hypothesis that the team average time would be 12 seconds was rejected. This provides evidence to suggest that the sample mean skating time for this particular team was statistically different from the hypothesized mean skating time of the league. Additionally, the effect size d was −1.117, generally interpreted as a large effect (Cohen, 1988), and indicating that there is more than a one standard deviation difference between the team and league mean skating times. The post hoc power of the test, given the sample size, two-tailed test, alpha level, and observed effect size, was .96. 6.12 Summary In� this� chapter,� we� considered� our� first� inferential� testing� situation,� testing� hypotheses� about�a�single�mean��A�number�of�topics�and�new�concepts�were�discussed��First,�we�intro- duced�the�types�of�hypotheses�utilized�in�inferential�statistics,�that�is,�the�null�or�statistical� hypothesis�versus�the�scientific�or�alternative�or�research�hypothesis��Second,�we�moved� on�to�the�types�of�decision�errors�(i�e�,�Type�I�and�Type�II�errors)�as�depicted�by�the�deci- sion�table�and�illustrated�by�the�rain�example��Third,�the�level�of�significance�was�intro- duced� as� well� as� the� types� of� alternative� hypotheses� (i�e�,� nondirectional� vs�� directional� alternative�hypotheses)��Fourth,�an�overview�of�the�steps�in�the�decision-making�process� of� inferential� statistics� was� given�� Fifth,� we� examined� the� z� test,� which� is� the� inferential� test�about�a�single�mean�when�the�population�standard�deviation�is�known��This�was�fol- lowed� by� a� more� formal� description� of� Type� II� error� and� power�� We� then� discussed� the� notion�of�statistical�significance�versus�practical�significance��Finally,�we�considered�the�t� test,�which�is�the�inferential�test�about�a�single�mean�when�the�population�standard�devia- tion�is�unknown,�and�then�completed�the�chapter�with�an�example,�SPSS�information,�a� G*Power�illustration,�and�an�APA-style�write-up�of�results��At�this�point,�you�should�have� 157Introduction to Hypothesis Testing: Inferences About a Single Mean met�the�following�objectives:�(a)�be�able�to�understand�the�basic�concepts�of�hypothesis�testing,� (b)�be�able�to�utilize�the�normal�and�t�tables,�and�(c)�be�able�to�understand,�determine,�and� interpret�the�results�from�the�z�test,�t test,�and�CI�procedures��Many�of�the�concepts�in�this� chapter�carry�over�into�other�inferential�tests��In�the�next�chapter,�we�discuss�inferential� tests�involving�the�difference�between�two�means��Other�inferential�tests�will�be�consid- ered�in�subsequent�chapters� Problems Conceptual problems 6.1� In� hypothesis� testing,� the� probability� of� failing� to� reject� H0� when� H0� is� false� is� denoted by � a�� α � b�� 1�−�α � c�� β � d�� 1�−�β 6.2� The� probability� of� observing� the� sample� mean� (or� some� value� greater� than� the� sample� mean)� by� chance� if� the� null� hypothesis� is� really� true� is� which� one� of� the� following? � a�� α � b�� Level�of�significance � c�� p�value � d�� Test�statistic�value 6.3� When�testing�the�hypotheses�presented�in�the�following,�at�a��05�level�of�significance� with�the�t�test,�where�is�the�rejection�region? � H H 0 1 100 100 : : µ µ = < � a�� The�upper�tail � b�� The�lower�tail � c�� Both�the�upper�and�lower�tails � d�� Cannot�be�determined 6.4� A�research�question�asks,�“Is�the�mean�age�of�children�who�enter�preschool�different� from�48�months”?�Which�one�of�the�following�is�implied? � a�� Left-tailed�test � b�� Right-tailed�test � c�� Two-tailed�test � d�� Cannot�be�determined�based�on�this�information 158 An Introduction to Statistical Concepts 6.5� The�probability�of�making�a�Type�II�error�when�rejecting�H0�at�the��05�level�of�signifi- cance�is�which�one�of�the�following? � a�� 0 � b�� 05 � c�� Between��05�and��95 � d�� 95 6.6� If�the�90%�CI�does�not�include�the�value�for�the�parameter�being�estimated�in�H0,�then� which�one�of�the�following�is�a�correct�statement? � a�� H0�cannot�be�rejected�at�the��10�level� � b�� H0�can�be�rejected�at�the��10�level� � c�� A�Type�I�error�has�been�made� � d�� A�Type�II�error�has�been�made� 6.7� Other�things�being�equal,�which�of�the�values�of�t�given�next�is�least�likely�to�result� when�H0�is�true,�for�a�two-tailed�test? � a�� 2�67 � b�� 1�00 � c�� 0�00 � d�� −1�96 � e�� −2�70 6.8� The�fundamental�difference�between�the�z�test�and�the�t�test�for�testing�hypotheses� about�a�population�mean�is�which�one�of�the�following? � a�� Only�z�assumes�the�population�distribution�be�normal� � b�� z�is�a�two-tailed�test,�whereas�t�is�one-tailed� � c�� Only�t�becomes�more�powerful�as�sample�size�increases� � d�� Only�z�requires�the�population�variance�be�known� 6.9� If�one�fails�to�reject�a�true�H0,�one�is�making�a�Type�I�error��True�or�false? 6.10� Which�one�of�the�following�is�a�correct�interpretation�of�d? � a�� Alpha�level � b�� CI � c�� Effect�size � d�� Observed�probability � e�� Power 6.11� A�one-sample�t�test�is�conducted�at�an�alpha�level�of��10��The�researcher�finds�a�p�value� of��08�and�concludes�that�the�test�is�statistically�significant��Is�the�researcher�correct? 6.12� When�testing�the�following�hypotheses�at�the��01�level�of�significance�with�the�t�test,�a� sample�mean�of�301�is�observed��I�assert�that�if�I�calculate�the�test�statistic�and�compare�it� to�the�t�distribution�with�n�−�1�degrees�of�freedom,�it�is�possible�to�reject�H0��Am�I�correct? � H H 0 1 295 295 : : µ µ = < 159Introduction to Hypothesis Testing: Inferences About a Single Mean 6.13� If�the�sample�mean�exceeds�the�hypothesized�mean�by�200�points,�I�assert�that�H0�can� be�rejected��Am�I�correct? 6.14� I� assert� that� H0� can� be� rejected� with� 100%� confidence� if� the� sample� consists� of� the� entire�population��Am�I�correct? 6.15� I� assert� that� the� 95%� CI� has� a� larger� width� than� the� 99%� CI� for� a� population� mean� using�the�same�data��Am�I�correct? 6.16� I�assert�that�the�critical�value�of�z,�for�a�test�of�a�single�mean,�will�increase�as�sample� size�increases��Am�I�correct? 6.17� The� mean� of� the� t� distribution� increases� as� degrees� of� freedom� increase?� True� or� false? 6.18� It�is�possible�that�the�results�of�a�one-sample�t�test�and�for�the�corresponding�CI�will� differ�for�the�same�dataset�and�level�of�significance��True�or�false? 6.19� The�width�of�the�95%�CI�does�not�depend�on�the�sample�mean��True�or�false? 6.20� The�null�hypothesis�is�a�numerical�statement�about�which�one�of�the�following? � a�� An�unknown�parameter � b�� A�known�parameter � c�� An�unknown�statistic � d�� A�known�statistic Computational problems 6.1� Using�the�same�data�and�the�same�method�of�analysis,�the�following�hypotheses�are� tested� about� whether� mean� height� is� 72� inches�� Researcher� A� uses� the� �05� level� of� significance,�and�Researcher�B�uses�the��01�level�of�significance: � H H 0 1 72 72 : : µ µ = ≠ � a�� If�Researcher�A�rejects�H0,�what�is�the�conclusion�of�Researcher�B? � b�� If�Researcher�B�rejects�H0,�what�is�the�conclusion�of�Researcher�A? � c�� If�Researcher�A�fails�to�reject�H0,�what�is�the�conclusion�of�Researcher�B? � d�� If�Researcher�B�fails�to�reject�H0,�what�is�the�conclusion�of�Researcher�A? 6.2� Give�a�numerical�value�for�each�of�the�following�descriptions�by�referring�to�the� t�table� � a�� The�percentile�rank�of�t5�=�1�476 � b�� The�percentile�rank�of�t10�=�3�169 � c�� The�percentile�rank�of�t21�=�2�518 � d�� The�mean�of�the�distribution�of�t23 � e�� The�median�of�the�distribution�of�t23 � f�� The�variance�of�the�distribution�of�t23 � g�� The�90th�percentile�of�the�distribution�of�t27 160 An Introduction to Statistical Concepts 6.3� Give�a�numerical�value�for�each�of�the�following�descriptions�by�referring�to�the� t�table� � a�� The�percentile�rank�of�t5�=�2�015 � b�� The�percentile�rank�of�t20�=�1�325 � c�� The�percentile�rank�of�t30�=�2�042 � d�� The�mean�of�the�distribution�of�t10 � e�� The�median�of�the�distribution�of�t10 � f�� The�variance�of�the�distribution�of�t10 � g�� The�95th�percentile�of�the�distribution�of�t14 6.4� The� following� random� sample� of� weekly� student� expenses� is� obtained� from� a� normally� distributed� population� of� undergraduate� students� with� unknown� parameters: 68 56 76 75 62 81 72 69 91 84 49 75 69 59 70 53 65 78 71 87 71 74 69 65 64 � a�� Test�the�following�hypotheses�at�the��05�level�of�significance: � H H 0 1 74 74 : : µ µ = ≠ � b�� Construct�a�95%�CI� 6.5� The�following�random�sample�of�hours�spent�per�day�answering�e-mail�is�obtained� from�a�normally�distributed�population�of�community�college�faculty�with�unknown� parameters: 2 3�5 4 1�25 2�5 3�25 4�5 4�25 2�75 3�25 1�75 1�5 2�75 3�5 3�25 3�75 2�25 1�5 1�25 3�25 � a�� Test�the�following�hypotheses�at�the��05�level�of�significance: � H H 0 1 3 0 3 0 : : µ µ = ≠ . . � b�� Construct�a�95%�CI� 161Introduction to Hypothesis Testing: Inferences About a Single Mean 6.6� In�the�population,�it�is�hypothesized�that�flags�have�a�mean�usable�life�of�100�days�� Twenty-five�flags�are�flown�in�the�city�of�Tuscaloosa�and�are�found�to�have�a�sample� mean�usable�life�of�200�days�with�a�standard�deviation�of�216�days��Does�the�sample� mean�in�Tuscaloosa�differ�from�that�of�the�population�mean? � a�� Conduct�a�two-tailed�t�test�at�the��01�level�of�significance� � b�� Construct�a�99%�CI� Interpretive problems 6.1� Using�item�7�from�the�survey�1�dataset�accessible�from�the�website,�use�SPSS�to�con- duct� a� one-sample� t� test� to� determine� whether� the� mean� number� of� compact� disks� owned�is�significantly�different�from�25,�at�the��05�level�of�significance��Test�for�the� extent�to�which�the�assumption�of�normality�has�been�met��Calculate�an�effect�size�as� well�as�post�hoc�power��Then�write�an�APA-style�paragraph�reporting�your�results� 6.2� Using� item� 14� from� the� survey� 1� dataset� accessible� from� the� website,� use� SPSS� to� conduct�a�one-sample�t�test�to�determine�whether�the�mean�number�of�hours�slept� is�significantly�different�from�8,�at�the��05�level�of�significance��Test�for�the�extent�to� which�the�assumption�of�normality�has�been�met��Calculate�an�effect�size�as�well�as� post�hoc�power��Then�write�an�APA-style�paragraph�reporting�your�results� 163 7 Inferences About the Difference Between Two Means Chapter Outline 7�1� New�Concepts 7�1�1� Independent�Versus�Dependent�Samples 7�1�2� Hypotheses 7�2� Inferences�About�Two�Independent�Means 7�2�1� Independent�t�Test 7�2�2� Welch�t′�Test 7�2�3� Recommendations 7�3� Inferences�About�Two�Dependent�Means 7�3�1� Dependent�t�Test 7�3�2� Recommendations 7�4� SPSS 7�5� G*Power 7�6� Template�and�APA-Style�Write-Up Key Concepts � 1�� Independent�versus�dependent�samples � 2�� Sampling�distribution�of�the�difference�between�two�means � 3�� Standard�error�of�the�difference�between�two�means � 4�� Parametric�versus�nonparametric�tests In�Chapter�6,�we�introduced�hypothesis�testing�and�ultimately�considered�our�first�inferen- tial�statistic,�the�one-sample�t�test��There�we�examined�the�following�general�topics:�types� of�hypotheses,�types�of�decision�errors,�level�of�significance,�steps�in�the�decision-making� process,�inferences�about�a�single�mean�when�the�population�standard�deviation�is�known� (the� z� test),� power,� statistical� versus� practical� significance,� and� inferences� about� a� single� mean�when�the�population�standard�deviation�is�unknown�(the�t�test)� In�this�chapter,�we�consider�inferential�tests�involving�the�difference�between�two�means�� In�other�words,�our�research�question�is�the�extent�to�which�two�sample�means�are�statis- tically�different�and,�by�inference,�the�extent�to�which�their�respective�population�means� are�different��Several�inferential�tests�are�covered�in�this�chapter,�depending�on�whether� 164 An Introduction to Statistical Concepts the� two� samples� are� selected� in� an� independent� or� dependent� manner,� and� on� whether� the�statistical�assumptions�are�met��More�specifically,�the�topics�described�include�the�fol- lowing�inferential�tests:�for�two�independent�samples—the�independent�t�test,�the�Welch� t′�test,�and�briefly�the�Mann–Whitney–Wilcoxon�test;�and�for�two�dependent�samples—the� dependent�t�test�and�briefly�the�Wilcoxon�signed�ranks�test��We�use�many�of�the�founda- tional�concepts�previously�covered�in�Chapter�6��New�concepts�to�be�discussed�include�the� following:� independent� versus� dependent� samples,� the� sampling� distribution� of� the� dif- ference�between�two�means,�and�the�standard�error�of�the�difference�between�two�means�� Our�objectives�are�that�by�the�end�of�this�chapter,�you�will�be�able�to�(a)�understand�the� basic�concepts�underlying�the�inferential�tests�of�two�means,�(b)�select�the�appropriate�test,� and�(c)�determine�and�interpret�the�results�from�the�appropriate�test� 7.1 New Concepts Remember� Marie,� our� very� capable� educational� researcher� graduate� student?� Let� us� see� what�Marie�has�in�store�for�her�now…� Marie’s�first�attempts�at�consulting�went�so�well�that�her�faculty�advisor�has�assigned� Marie�two�additional�consulting�responsibilities�with�individuals�from�their�commu- nity�� Marie� has� been� asked� to� consult� with� a� local� nurse� practitioner,� JoAnn,� who� is� studying�cholesterol�levels�of�adults�and�how�they�differ�based�on�gender��Marie�sug- gests�the�following�research�question:�Is there a mean difference in cholesterol level between males and females?� Marie� suggests� an� independent� samples� t� test� as� the� test� of� infer- ence��Her�task�is�then�to�assist�JoAnn�in�generating�the�test�of�inference�to�answer�her� research�question� Marie�has�also�been�asked�to�consult�with�the�swimming�coach,�Mark,�who�works� with� swimming� programs� that� are� offered� through� their� local� Parks� and� Recreation� Department�� Mark� has� just� conducted� an� intensive� 2� month� training� program� for� a� group�of�10�swimmers��He�wants�to�determine�if,�on�average,�their�time�in�the�50�meter� freestyle�event�is�different�after�the�training��The�following�research�question�is�sug- gested�by�Marie:�Is there a mean difference in swim time for the 50-meter freestyle event before participation in an intensive training program as compared to swim time for the 50-meter free- style event after participation in an intensive training program?�Marie�suggests�a�dependent� samples�t�test�as�the�test�of�inference��Her�task�is�then�to�assist�Mark�in�generating�the� test�of�inference�to�answer�his�research�question� Before� we� proceed� to� inferential� tests� of� the� difference� between� two� means,� a� few� new� concepts� need� to� be� introduced�� The� new� concepts� are� the� difference� between� the� selec- tion�of�independent�samples�and�dependent�samples,�the�hypotheses�to�be�tested,�and�the� sampling�distribution�of�the�difference�between�two�means� 7.1.1 Independent Versus dependent Samples The�first�new�concept�to�address�is�to�make�a�distinction�between�the�selection�of�indepen- dent samples�and�dependent samples��Two�samples�are�independent�when�the�method� of� sample� selection� is� such� that� those� individuals� selected� for� sample� 1� do� not� have� any� 165Inferences About the Difference Between Two Means relationship� to� those� individuals� selected� for� sample� 2�� In� other� words,� the� selection� of� individuals�to�be�included�in�the�two�samples�are�unrelated�or�uncorrelated�such�that�they� have�absolutely�nothing�to�do�with�one�another��You�might�think�of�the�samples�as�being� selected� totally� separate� from� one� another�� Because� the� individuals� in� the� two� samples� are�independent�of�one�another,�their�scores�on�the�dependent�variable,�Y,�should�also�be� independent�of�one�another��The�independence�condition�leads�us�to�consider,�for�example,� the�independent samples�t�test��(This�should�not,�however,�be�confused�with�the�assump- tion�of�independence,�which�was�introduced�in�the�previous�chapter��The�assumption�of� independence�still�holds�for�the�independent�samples�t�test,�and�we�will�talk�later�about� how�this�assumption�can�be�met�with�this�particular�procedure�) Two� samples� are� dependent� when� the� method� of� sample� selection� is� such� that� those� individuals�selected�for�sample�1�do�have�a�relationship�to�those�individuals�selected�for� sample�2��In�other�words,�the�selections�of�individuals�to�be�included�in�the�two�samples� are�related�or�correlated��You�might�think�of�the�samples�as�being�selected�simultaneously� such�that�there�are�actually�pairs�of�individuals��Consider�the�following�two�typical�exam- ples��First,�if�the�same�individuals�are�measured�at�two�points�in�time,�such�as�during�a� pretest�and�a�posttest,�then�we�have�two�dependent�samples��The�scores�on�Y�at�time�1�will� be�correlated�with�the�scores�on�Y�at�time�2�because�the�same�individuals�are�assessed�at� both�time�points��Second,�if�husband-and-wife�pairs�are�selected,�then�we�have�two�depen- dent�samples��That�is,�if�a�particular�wife�is�selected�for�the�study,�then�her�corresponding� husband�is�also�automatically�selected—this�is�an�example�where�individuals�are�paired� or�matched�in�some�way�such�that�they�share�characteristics�that�makes�the�score�of�one� person�related�to�(i�e�,�dependent�on)�the�score�of�the�other�person��In�both�examples,�we� have�natural�pairs�of�individuals�or�scores��The�dependence�condition�leads�us�to�consider� the�dependent samples�t�test,�alternatively�known�as�the�correlated samples�t�test�or�the� paired samples�t�test��As�we�show�in�this�chapter,�whether�the�samples�are�independent� or�dependent�determines�the�appropriate�inferential�test� 7.1.2 hypotheses The�hypotheses�to�be�evaluated�for�detecting�a�difference�between�two�means�are�as�fol- lows�� The� null� hypothesis� H0� is� that� there� is� no� difference� between� the� two� population� means,�which�we�denote�as�the�following: � H H0 1 2 0 1 20: :µ µ µ µ− = =or where μ1�is�the�population�mean�for�sample�1 μ2�is�the�population�mean�for�sample�2 Mathematically,�both�equations�say�the�same�thing��The�version�on�the�left�makes�it�clear� to�the�reader�why�the�term�“null”�is�appropriate��That�is,�there�is�no�difference�or�a�“null”� difference�between�the�two�population�means��The�version�on�the�right�indicates�that�the� population� mean� of� sample� 1� is� the� same� as� the� population� mean� of� sample� 2—another� way�of�saying�that�there�is�no�difference�between�the�means�(i�e�,�they�are�the�same)��The� nondirectional�scientific�or�alternative�hypothesis�H1�is�that�there�is�a�difference�between� the�two�population�means,�which�we�denote�as�follows: � H H1 1 2 1 1 20: :µ µ µ µ− ≠ or ≠ 166 An Introduction to Statistical Concepts The�null�hypothesis�H0�will�be�rejected�here�in�favor�of�the�alternative�hypothesis�H1�if�the� population�means�are�different��As�we�have�not�specified�a�direction�on�H1,�we�are�will- ing�to�reject�either�if�μ1�is�greater�than�μ2�or�if�μ1�is�less�than�μ2��This�alternative�hypothesis� results�in�a�two-tailed�test� Directional�alternative�hypotheses�can�also�be�tested�if�we�believe�μ1�is�greater�than�μ2,� denoted�as�follows: � H H1 1 2 1 1 20: :µ µ µ µ− > >or
In�this�case,�the�equation�on�the�left�tells�us�that�when�μ2�is�subtracted�from�μ1,�a�positive�
value�will�result�(i�e�,�some�value�greater�than�0)��The�equation�on�the�right�makes�it�some-
what�clearer�what�we�hypothesize�
Or�if�we�believe�μ1�is�less�than�μ2,�the�directional�alternative�hypotheses�will�be�denoted�
as�we�see�here:
� H H1 1 2 1 1 20: :µ − < <µ µ µor In�this�case,�the�equation�on�the�left�tells�us�that�when�μ2�is�subtracted�from�μ1,�a�negative� value�will�result�(i�e�,�some�value�less�than�0)��The�equation�on�the�right�makes�it�somewhat� clearer�what�we�hypothesize��Regardless�of�how�they�are�denoted,�directional�alternative� hypotheses�result�in�a�one-tailed�test� The� underlying� sampling� distribution� for� these� tests� is� known� as� the� sampling dis- tribution of the difference between two means��This�makes�sense,�as�the�hypotheses� examine�the�extent�to�which�two�sample�means�differ��The�mean�of�this�sampling�dis- tribution�is�0,�as�that�is�the�hypothesized�difference�between�the�two�population�means� μ1�−�μ2��The�more�the�two�sample�means�differ,�the�more�likely�we�are�to�reject�the�null� hypothesis��As�we�show�later,�the�test�statistics�in�this�chapter�all�deal�in�some�way�with� the�difference�between�the�two�means�and�with�the�standard�error�(or�standard�devia- tion)�of�the�difference�between�two�means� 7.2 Inferences About Two Independent Means In� this� section,� three� inferential� tests� of� the� difference� between� two� independent� means� are� described:� the� independent� t� test,� the� Welch� t′� test,� and� briefly� the� Mann–Whitney– Wilcoxon�test��The�section�concludes�with�a�list�of�recommendations� 7.2.1 Independent t Test First,�we�need�to�determine�the�conditions�under�which�the�independent�t�test�is�appropri- ate��In�part,�this�has�to�do�with�the�statistical�assumptions�associated�with�the�test�itself�� The�assumptions�of�the�independent�t�test�are�that�the�scores�on�the�dependent�variable�Y� (a)�are�normally�distributed�within�each�of�the�two�populations,�(b)�have�equal�population� variances�(known�as�homogeneity�of�variance�or�homoscedasticity),�and�(c)�are�indepen- dent�� (The� assumptions� of� normality� and� independence� should� sound� familiar� as� they� were�introduced�as�we�learned�about�the�one-sample�t�test�)�Later�in�the�chapter,�we�more� 167Inferences About the Difference Between Two Means fully�discuss�the�assumptions�for�this�particular�procedure��When�these�assumptions�are� not�met,�other�procedures�may�be�more�appropriate,�as�we�also�show�later� The�measurement�scales�of�the�variables�must�also�be�appropriate��Because�this�is�a�test� of�means,�the�dependent�variable�must�be�measured�on�an�interval�or�ratio�scale��The�inde- pendent�variable,�however,�must�be�nominal�or�ordinal,�and�only�two�categories�or�groups� of�the�independent�variable�can�be�used�with�the�independent�t�test��(In�later�chapters,�we� will�learn�about�analysis�of�variance�(ANOVA)�which�can�accommodate�an�independent� variable� with� more� than� two� categories�)� It� is� not� a� condition� of� the� independent� t� test� that�the�sample�sizes�of�the�two�groups�be�the�same��An�unbalanced�design�(i�e�,�unequal� sample�sizes)�is�perfectly�acceptable� The�test�statistic�for�the�independent�t�test�is�known�as�t�and�is�denoted�by�the�following� formula: � t Y Y sY Y = − − 1 2 1 2 where Y – 1�and�Y – 2�are�the�means�for�sample�1�and�sample�2,�respectively sY Y1 2− �is�the�standard�error�of�the�difference�between�two�means This�standard�error�is�the�standard�deviation�of�the�sampling�distribution�of�the�difference� between�two�means�and�is�computed�as�follows: � s s n nY Y p1 2 1 1 1 2 − = + where�sp�is�the�pooled�standard�deviation�computed�as � s n s n s n n p = − + − + − ( ) ( )1 1 2 2 2 2 1 2 1 1 2 and�where s1 2�and� s2 2 �are�the�sample�variances�for�groups�1�and�2,�respectively n1�and�n2�are�the�sample�sizes�for�groups�1�and�2,�respectively Conceptually,�the�standard�error� sY Y1 2− �is�a�pooled�standard�deviation�weighted�by�the� two� sample� sizes;� more� specifically,� the� two� sample� variances� are� weighted� by� their� respective� sample� sizes� and� then� pooled�� This� is� conceptually� similar� to� the� standard� error�for�the�one-sample�t�test,�which�you�will�recall�from�Chapter�6�as � s s n Y Y= where�we�also�have�a�standard�deviation�weighted�by�sample�size��If�the�sample�variances� are�not�equal,�as�the�test�assumes,�then�you�can�see�why�we�might�not�want�to�take�a�pooled� or�weighted�average�(i�e�,�as�it�would�not�represent�well�the�individual�sample�variances)� 168 An Introduction to Statistical Concepts The� test� statistic� t� is� then� compared� to� a� critical� value(s)� from� the� t� distribution�� For� a� two-tailed� test,� from� Table� A�2,� we� would� use� the� appropriate� α2� column� depending� on� the� desired� level� of� significance� and� the� appropriate� row� depending� on� the� degrees� of� freedom�� The� degrees� of� freedom� for� this� test� are� n1� +� n2� −� 2�� Conceptually,� we� lose� one� degree�of�freedom�from�each�sample�for�estimating�the�population�variances�(i�e�,�there�are� two�restrictions�along�the�lines�of�what�was�discussed�in�Chapter�6)��The�critical�values�are� denoted�as� ± + −α2 1 2 2tn n ��The�subscript�α2�of�the�critical�values�reflects�the�fact�that�this�is�a� two-tailed�test,�and�the�subscript�n1�+�n2�−�2�indicates�these�particular�degrees�of�freedom�� (Remember�that�the�critical�value�can�be�found�based�on�the�knowledge�of�the�degrees�of� freedom�and�whether�it�is�a�one-�or�two-tailed�test�)�If�the�test�statistic�falls�into�either�criti- cal�region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�H0� For�a�one-tailed�test,�from�Table�A�2,�we�would�use�the�appropriate�α1�column�depend- ing�on�the�desired�level�of�significance�and�the�appropriate�row�depending�on�the�degrees� of�freedom��The�degrees�of�freedom�are�again�n1�+�n2�−�2��The�critical�value�is�denoted�as� +α1 1 2 2tn n+ − �for�the�alternative�hypothesis�H1:�μ1�−�μ2�>�0�(i�e�,�right-tailed�test�so�the�critical�
value�will�be�positive),�and�as�− + −α1 1 2 2tn n �for�the�alternative�hypothesis�H1:�μ1�−�μ2�<�0�(i�e�,� left-tailed�test�and�thus�a�negative�critical�value)��If�the�test�statistic�t�falls�into�the�appro- priate�critical�region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�H0� 7.2.1.1 Confidence Interval For�the�two-tailed�test,�a�(1�−�α)%�confidence�interval�(CI)�can�also�be�examined��The�CI�is� formed�as�follows: � ( ) ( )Y Y t sn n Y Y1 2 22 1 21 2− ± + − −α If�the�CI�contains�the�hypothesized�mean�difference�of�0,�then�the�conclusion�is�to�fail�to� reject�H0;�otherwise,�we�reject�H0��The�interpretation�and�use�of�CIs�is�similar�to�that�of�the� one-sample�test�described�in�Chapter�6��Imagine�we�take�100�random�samples�from�each�of� two�populations�and�construct�95%�CIs��Then�95%�of�the�CIs�will�contain�the�true�popula- tion�mean�difference�μ1�−�μ2�and�5%�will�not��In�short,�95%�of�similarly�constructed�CIs�will� contain�the�true�population�mean�difference� 7.2.1.2 Effect Size Next�we�extend�Cohen’s�(1988)�sample�measure�of�effect�size�d�from�Chapter�6�to�the�two� independent�samples�situation��Here�we�compute�d�as�follows: � d Y Y sp = −1 2 The�numerator�of�the�formula�is�the�difference�between�the�two�sample�means��The�denomi- nator� is� the� pooled� standard� deviation,� for� which� the� formula� was� presented� previously�� Thus,�the�effect�size�d�is�measured�in�standard�deviation�units,�and�again�we�use�Cohen’s� proposed�subjective�standards�for�interpreting�d:�small�effect�size,�d�=��2;�medium�effect�size,� d�=��5;�large�effect�size,�d�=��8��Conceptually,�this�is�similar�to�d�in�the�one-sample�case�from� Chapter�6��The�effect�size�d�is�considered�a�standardized�group�difference�type�of�effect�size� (Huberty,�2002)��There�are�other�types�of�effect�sizes,�however��Another�is�eta�squared�(η2),� 169Inferences About the Difference Between Two Means also�a�standardized�effect�size,�and�it�is�considered�a�relationship�type�of�effect�size�(Huberty,� 2002)��For�the�independent�t�test,�eta�squared�can�be�calculated�as�follows: � η2 2 2 2 2 1 2 2 = + = + + − t t df t t n n( ) Here� the� numerator� is� the� squared� t� test� statistic� value,� and� the� denominator� is� sum� of� the�squared�t�test�statistic�value�and�the�degrees�of�freedom��Values�for�eta�squared�range� from�0�to�+1�00,�where�values�closer�to�one�indicate�a�stronger�association��In�terms�of�what� this�effect�size�tells�us,�eta�squared�is�interpreted�as�the�proportion�of�variance�accounted� for�in�the�dependent�variable�by�the�independent�variable�and�indicates�the�degree�of�the� relationship�between�the�independent�and�dependent�variables��If�we�use�Cohen’s�(1988)� metric�for�interpreting�eta�squared:�small�effect�size,�η2�=��01;�moderate�effect�size,�η2�=��06;� large�effect�size,�η2�=��14� 7.2.1.3 Example of the Independent t Test Let�us�now�consider�an�example�where�the�independent�t�test�is�implemented��Recall�from� Chapter�6�the�basic�steps�for�hypothesis�testing�for�any�inferential�test:�(1)�State�the�null� and� alternative� hypotheses,� (2)� select� the� level� of� significance� (i�e�,� alpha,� α),� (3)� calculate� the�test�statistic�value,�and�(4)�make�a�statistical�decision�(reject�or�fail�to�reject�H0)��We�will� follow�these�steps�again�in�conducting�our�independent�t�test��In�our�example,�samples�of�8� female�and�12�male�middle-age�adults�are�randomly�and�independently�sampled�from�the� populations�of�female�and�male�middle-age�adults,�respectively��Each�individual�is�given� a�cholesterol�test�through�a�standard�blood�sample��The�null�hypothesis�to�be�tested�is�that� males�and�females�have�equal�cholesterol�levels��The�alternative�hypothesis�is�that�males� and�females�will�not�have�equal�cholesterol�levels,�thus�necessitating�a�nondirectional�or� two-tailed�test��We�will�conduct�our�test�using�an�alpha�level�of��05��The�raw�data�and�sum- mary�statistics�are�presented�in�Table�7�1��For�the�female�sample�(sample�1),�the�mean�and� variance�are�185�0000�and�364�2857,�respectively,�and�for�the�male�sample�(sample�2),�the� mean�and�variance�are�215�0000�and�913�6363,�respectively� In�order�to�compute�the�test�statistic�t,�we�first�need�to�determine�the�standard�error�of� the�difference�between�the�two�means��The�pooled�standard�deviation�is�computed�as � s n s n s n n p = − + − + − = − + −( ) ( ) ( ) . ( ) .1 1 2 2 2 2 1 2 1 1 2 8 1 364 2857 12 1 913 6363 88 12 2 26 4575 + − = . and�the�standard�error�of�the�difference�between�two�means�is�computed�as � s s n nY Y p1 2 1 1 26 4575 1 8 1 12 12 0752 1 2 − = + = + =. . The�test�statistic�t�can�then�be�computed�as � t Y Y sY Y = − = − = − − 1 2 1 2 185 215 12 0752 2 4844 . . 170 An Introduction to Statistical Concepts The�next�step�is�to�use�Table�A�2�to�determine�the�critical�values��As�there�are�18�degrees� of�freedom�(n1�+�n2�−�2�=�8�+�12�−�2�=�18),�using�α�=��05�and�a�two-tailed�or�nondirectional� test,�we�find�the�critical�values�using�the�appropriate�α2�column�to�be�+2�101�and�−2�101�� Since�the�test�statistic�falls�beyond�the�critical�values�as�shown�in�Figure�7�1,�we�therefore� reject�the�null�hypothesis�that�the�means�are�equal�in�favor�of�the�nondirectional�alterna- tive�that�the�means�are�not�equal��Thus,�we�conclude�that�the�mean�cholesterol�levels�for� males�and�females�are�not�equal�at�the��05�level�of�significance�(denoted�by�p�<��05)� The� 95%� CI� can� also� be� examined�� For� the� cholesterol� example,� the� CI� is� formed� as� follows: ( ) ( ) ( ) . ( . )Y Y t sn n Y Y1 2 22 1 2 1 2 185 215 2 101 12 0752 30 25− ± = − ± = − ±+ − −α .. ( . , . )3700 55 3700 4 6300= − − Table 7.1 Cholesterol�Data�for�Independent� Samples Female (Sample 1) Male (Sample 2) 205 245 160 170 170 180 180 190 190 200 200 210 210 220 165 230 240 250 260 185 X – 1�=�185�0000 X – 2�=�215�0000 s1 2 364 2857= . s22 913 6363= . FIGuRe 7.1 Critical� regions� and� test� statistics� for� the� cholesterol�example� α = .025 α = .025 +2.101 Critical value –2.101 Critical value –2.4884 t test statistic value –2.7197 Welch t΄ test statistic value 171Inferences About the Difference Between Two Means As�the�CI�does�not�contain�the�hypothesized�mean�difference�value�of�0,�then�we�would� again�reject�the�null�hypothesis�and�conclude�that�the�mean�gender�difference�in�choles- terol�levels�was�not�equal�to�0�at�the��05�level�of�significance�(p�<��05)��In�other�words,�there� is�evidence�to�suggest�that�the�males�and�females�differ,�on�average,�on�cholesterol�level�� More�specifically,�the�mean�cholesterol�level�for�males�is�greater�than�the�mean�cholesterol� level�for�females� The�effect�size�for�this�example�is�computed�as�follows: � d Y Y sp = − = − = −1 2 185 215 26 4575 1 1339 . . According�to�Cohen’s�recommended�subjective�standards,�this�would�certainly�be�a�rather� large�effect�size,�as�the�difference�between�the�two�sample�means�is�larger�than�one�stan- dard�deviation��Rather�than�d,�had�we�wanted�to�compute�eta�squared,�we�would�have�also� found�a�large�effect: � η2 2 2 2 2 2 4844 2 4844 18 2553= + = − − + = t t df ( . ) ( . ) ( ) . An� eta� squared� value� of� �26� indicates� a� large� relationship� between� the� independent� and� dependent�variables,�with�26%�of�the�variance�in�the�dependent�variable�(i�e�,�cholesterol� level)�accounted�for�by�the�independent�variable�(i�e�,�gender)� 7.2.1.4 Assumptions Let�us�return�to�the�assumptions�of�normality,�independence,�and�homogeneity�of�vari- ance�� For� the� independent� t� test,� the� assumption� of� normality� is� met� when� the� depen- dent�variable�is�normally�distributed�for�each�sample�(i�e�,�each�category�or�group)�of�the� independent�variable��The�normality�assumption�is�made�because�we�are�dealing�with�a� parametric�inferential�test��Parametric tests�assume�a�particular�underlying�theoretical� population� distribution,� in� this� case,� the� normal� distribution�� Nonparametric tests� do� not�assume�a�particular�underlying�theoretical�population�distribution� Conventional� wisdom� tells� us� the� following� about� nonnormality�� When� the� normality� assumption�is�violated�with�the�independent�t�test,�the�effects�on�Type�I�and�Type�II�errors� are�minimal�when�using�a�two-tailed�test�(e�g�,�Glass,�Peckham,�&�Sanders,�1972;�Sawilowsky�&� Blair,�1992)��When�using�a�one-tailed�test,�violation�of�the�normality�assumption�is�minimal� for� samples� larger� than� 10� and� disappears� for� samples� of� at� least� 20� (Sawilowsky� &� Blair,� 1992;� Tiku� &� Singh,� 1981)�� The� simplest� methods� for� detecting� violation� of� the� normality� assumption� are� graphical� methods,� such� as� stem-and-leaf� plots,� box� plots,� histograms,� or� Q–Q�plots,�statistical�procedures�such�as�the�Shapiro–Wilk�(S–W)�test�(1965),�and/or�skew- ness�and�kurtosis�statistics��However,�more�recent�research�by�Wilcox�(2003)�indicates�that� power� for� both� the� independent� t� and� Welch� t′� can� be� reduced� even� for� slight� departures� from�normality,�with�outliers�also�contributing�to�the�problem��Wilcox�recommends�several� procedures�not�readily�available�and�beyond�the�scope�of�this�text�(such�as�bootstrap�meth- ods,�trimmed�means,�medians)��Keep�in�mind,�though,�that�the�independent�t�test�is�fairly� robust�to�nonnormality�in�most�situations� The�independence�assumption�is�also�necessary�for�this�particular�test��For�the�indepen- dent� t� test,� the� assumption� of� independence� is� met� when� there� is� random� assignment� of� 172 An Introduction to Statistical Concepts individuals�to�the�two�groups�or�categories�of�the�independent�variable��Random�assignment� to�the�two�samples�being�studied�provides�for�greater�internal�validity—the�ability�to�state� with�some�degree�of�confidence�that�the�independent�variable�caused�the�outcome�(i�e�,�the� dependent�variable)��If�the�independence�assumption�is�not�met,�then�probability�statements� about� the� Type� I� and� Type� II� errors� will� not� be� accurate;� in� other� words,� the� probability� of�a�Type�I�or�Type�II�error�may�be�increased�as�a�result�of�the�assumption�not�being�met�� Zimmerman�(1997)�found�that�Type�I�error�was�affected�even�for�relatively�small�relations� or�correlations�between�the�samples�(i�e�,�even�as�small�as��10�or��20)��In�general,�the�assump- tion�can�be�met�by�(a)�keeping�the�assignment�of�individuals�to�groups�separate�through�the� design�of�the�experiment�(specifically�random�assignment—not�to�be�confused�with�random� selection),�and�(b)�keeping�the�individuals�separate�from�one�another�through�experimen- tal�control�so�that�the�scores�on�the�dependent�variable�Y�for�sample�1�do�not�influence�the� scores�for�sample�2��Zimmerman�also�stated�that�independence�can�be�violated�for�suppos- edly� independent� samples� due� to� some� type� of� matching� in� the� design� of� the� experiment� (e�g�,�matched�pairs�based�on�gender,�age,�and�weight)��If�the�observations�are�not�indepen- dent,�then�the�dependent�t�test,�discussed�further�in�the�chapter,�may�be�appropriate� Of�potentially�more�serious�concern�is�violation�of�the�homogeneity�of�variance�assump- tion��Homogeneity�of�variance�is�met�when�the�variances�of�the�dependent�variable�for�the� two�samples�(i�e�,�the�two�groups�or�categories�of�the�independent�variables)�are�the�same�� Research� has� shown� that� the� effect� of� heterogeneity� (i�e�,� unequal� variances)� is� minimal� when�the�sizes�of�the�two�samples,�n1�and�n2,�are�equal;�this�is�not�the�case�when�the�sample� sizes�are�not�equal��When�the�larger�variance�is�associated�with�the�smaller�sample�size� (e�g�,� group� 1� has� the� larger� variance� and� the� smaller� n),� then� the� actual� α� level� is� larger� than�the�nominal�α�level��In�other�words,�if�you�set�α�at��05,�then�you�are�not�really�conduct- ing�the�test�at�the��05�level,�but�at�some�larger�value��When�the�larger�variance�is�associated� with�the�larger�sample�size�(e�g�,�group�1�has�the�larger�variance�and�the�larger�n),�then�the� actual�α�level�is�smaller�than�the�nominal�α�level��In�other�words,�if�you�set�α�at��05,�then� you�are�not�really�conducting�the�test�at�the��05�level,�but�at�some�smaller�value� One�can�use�statistical�tests�to�detect�violation�of�the�homogeneity�of�variance�assump- tion,� although� the� most� commonly� used� tests� are� somewhat� problematic�� These� tests� include�Hartley’s�Fmax�test�(for�equal�n’s,�but�sensitive�to�nonnormality;�it�is�the�unequal� n’s�situation�that�we�are�concerned�with�anyway),�Cochran’s�test�(for�equal�n’s,�but�even� more� sensitive� to� nonnormality� than� Hartley’s� test;� concerned� with� unequal� n’s� situa- tion�anyway),�Levene’s�test�(for�equal�n’s,�but�sensitive�to�nonnormality;�concerned�with� unequal� n’s� situation� anyway)� (available� in� SPSS),� the� Bartlett� test� (for� unequal� n’s,� but� very� sensitive� to� nonnormality),� the� Box–Scheffé–Anderson� test� (for� unequal� n’s,� fairly� robust�to�nonnormality),�and�the�Browne–Forsythe�test�(for�unequal�n’s,�more�robust�to� nonnormality�than�the�Box–Scheffé–Anderson�test�and�therefore�recommended)��When� the�variances�are�unequal�and�the�sample�sizes�are�unequal,�the�usual�method�to�use�as� an�alternative�to�the�independent�t�test�is�the�Welch�t′�test�described�in�the�next�section�� Inferential� tests� for� evaluating� homogeneity� of� variance� are� more� fully� considered� in� Chapter�9� 7.2.2 Welch t′ Test The�Welch�t′�test�is�usually�appropriate�when�the�population�variances�are�unequal�and� the� sample� sizes� are� unequal�� The� Welch� t′� test� assumes� that� the� scores� on� the� depen- dent� variable� Y� (a)� are� normally� distributed� in� each� of� the� two� populations� and� (b)� are� independent� 173Inferences About the Difference Between Two Means The�test�statistic�is�known�as�t′�and�is�denoted�by � ′ = − = − + = − +− t Y Y s Y Y s s Y Y s n s n Y Y Y Y 1 2 1 2 2 2 1 2 1 2 1 2 2 2 1 2 1 2 where Y – 1�and�Y – 2�are�the�means�for�samples�1�and�2,�respectively sY1 2 �and� sY2 2 �are�the�variance�errors�of�the�means�for�samples�1�and�2,�respectively Here�we�see�that�the�denominator�of�this�test�statistic�is�conceptually�similar�to�the�one- sample� t� and� the� independent� t� test� statistics�� The� variance� errors� of� the� mean� are� com- puted�for�each�group�by � s s nY1 2 1 2 1 = � s s nY2 2 2 2 2 = where� s1 2 �and� s2 2 �are�the�sample�variances�for�groups�1�and�2,�respectively��The�square�root� of�the�variance�error�of�the�mean�is�the�standard�error�of�the�mean�(i�e�,� sY1 �and� sY2 )��Thus,� we�see�that�rather�than�take�a�pooled�or�weighted�average�of�the�two�sample�variances�as� we�did�with�the�independent�t�test,�the�two�sample�variances�are�treated�separately�with� the�Welch�t′�test� The�test�statistic�t′�is�then�compared�to�a�critical�value(s)�from�the�t�distribution�in�Table� A�2��We�again�use�the�appropriate�α�column�depending�on�the�desired�level�of�significance� and�whether�the�test�is�one-�or�two-tailed�(i�e�,�α1�and�α2),�and�the�appropriate�row�for�the� degrees�of�freedom��The�degrees�of�freedom�for�this�test�are�a�bit�more�complicated�than� for� the� independent� t� test�� The� degrees� of� freedom� are� adjusted� from� n1� +� n2� −� 2� for� the� independent�t�test�to�the�following�value�for�the�Welch�t′�test: � ν = +( ) ( ) − + ( ) − s s s n s n Y Y Y Y 1 2 1 2 2 2 2 2 2 1 2 2 21 1 The� degrees� of� freedom� ν� are� approximated� by� rounding� to� the� nearest� whole� number� prior� to� using� the� table�� If� the� test� statistic� falls� into� a� critical� region,� then� we� reject� H0;� otherwise,�we�fail�to�reject�H0� For�the�two-tailed�test,�a�(1�−�α)%�CI�can�also�be�examined��The�CI�is�formed�as�follows: � ( ) ( )Y Y t sY Y1 2 2 1 2− ± −α ν If�the�CI�contains�the�hypothesized�mean�difference�of�0,�then�the�conclusion�is�to�fail�to� reject�H0;�otherwise,�we�reject�H0��Thus,�interpretation�of�this�CI�is�the�same�as�with�the� independent�t�test� 174 An Introduction to Statistical Concepts Consider� again� the� example� cholesterol� data� where� the� sample� variances� were� some- what� different� and� the� sample� sizes� were� different�� The� variance� errors� of� the� mean� are� computed�for�each�sample�as�follows: � s s nY1 2 1 2 1 364 2857 8 45 5357= = = . . � s s nY2 2 2 2 2 913 6363 12 76 1364= = = . . The�t′�test�statistic�is�computed�as � ′ = − + = − + = − = −t Y Y s sY Y 1 2 2 2 1 2 185 215 45 5357 76 1364 30 11 0305 2 719 . . . . 77 Finally,�the�degrees�of�freedom�ν�are�determined�to�be � ν = +( ) ( ) − + ( ) − = +s s s n s n Y Y Y Y 1 2 1 2 2 2 2 2 2 1 2 2 2 2 1 1 45 5357 76 1364 4 ( . . ) ( 55 5357 8 1 76 1364 12 1 17 98382 2. ) ( . ) . − + − = which� is� rounded� to� 18,� the� nearest� whole� number�� The� degrees� of� freedom� remain� 18� as� they�were�for�the�independent�t�test,�and�thus,�the�critical�values�are�still�+2�101�and�−2�101�� As�the�test�statistic�falls�beyond�the�critical�values�as�shown�in�Figure�7�1,�we�therefore�reject� the�null�hypothesis�that�the�means�are�equal�in�favor�of�the�alternative�that�the�means�are� not�equal��Thus,�as�with�the�independent�t�test,�with�the�Welch�t′�test,�we�conclude�that�the� mean�cholesterol�levels�for�males�and�females�are�not�equal�at�the��05�level�of�significance��In� this�particular�example,�then,�we�see�that�the�unequal�sample�variances�and�unequal�sample� sizes�did�not�alter�the�outcome�when�comparing�the�independent�t�test�result�with�the�Welch� t′�test�result��However,�note�that�the�results�for�these�two�tests�may�differ�with�other�data� Finally,�the�95%�CI�can�be�examined��For�the�example,�the�CI�is�formed�as�follows: � ( ) ( ) ( ) . ( . ) .Y Y t sY Y1 2 2 1 2 185 215 2 101 11 0305 30 23 1751− ± = − ± = − ± =−α ν (( . , . )− −53 1751 6 8249 As�the�CI�does�not�contain�the�hypothesized�mean�difference�value�of�0,�then�we�would� again� reject� the� null� hypothesis� and� conclude� that� the� mean� gender� difference� was� not� equal�to�0�at�the��05�level�of�significance�(p�<��05)� 7.2.3 Recommendations The�following�four�recommendations�are�made�regarding�the�two�independent�samples� case�� Although� there� is� no� total� consensus� in� the� field,� our� recommendations� take� into� account,� as� much� as� possible,� the� available� research� and� statistical� software�� First,� if� the� normality� assumption� is� satisfied,� the� following� recommendations� are� made:� (a)� the� 175Inferences About the Difference Between Two Means independent�t�test�is�recommended�when�the�homogeneity�of�variance�assumption�is�met;� (b)�the�independent�t�test�is�recommended�when�the�homogeneity�of�variance�assumption� is�not�met�and�when�there�are�an�equal�number�of�observations�in�the�samples;�and�(c)�the� Welch�t′�test�is�recommended�when�the�homogeneity�of�variance�assumption�is�not�met� and�when�there�are�an�unequal�number�of�observations�in�the�samples� Second,�if�the�normality�assumption�is�not�satisfied,�the�following�recommendations�are� made:� (a)� if� the� homogeneity� of� variance� assumption� is� met,� then� the� independent� t� test� using�ranked�scores�(Conover�&�Iman,�1981),�rather�than�raw�scores,�is�recommended;�and� (b)�if�homogeneity�of�variance�assumption�is�not�met,�then�the�Welch�t′�test�using�ranked� scores�is�recommended,�regardless�of�whether�there�are�an�equal�number�of�observations� in�the�samples��Using�ranked�scores�means�you�rank�order�the�observations�from�highest� to�lowest�regardless�of�group�membership,�then�conduct�the�appropriate�t�test�with�ranked� scores�rather�than�raw�scores� Third,�the�dependent�t�test�is�recommended�when�there�is�some�dependence�between� the� groups� (e�g�,� matched� pairs� or� the� same� individuals� measured� on� two� occasions),� as� described� later� in� this� chapter�� Fourth,� the� nonparametric� Mann-Whitney-Wilcoxon� test� is�not�recommended��Among�the�disadvantages�of�this�test�are�that�(a)�the�critical�values� are�not�extensively�tabled,�(b)�tied�ranks�can�affect�the�results�and�no�optimal�procedure� has�yet�been�developed�(Wilcox,�1996),�and�(c)�Type�I�error�appears�to�be�inflated�regard- less� of� the� status� of� the� assumptions� (Zimmerman,� 2003)�� For� these� reasons,� the� Mann– Whitney–Wilcoxon� test� is� not� further� described� here�� Note� that� most� major� statistical� packages,�including�SPSS,�have�options�for�conducting�the�independent�t�test,�the�Welch�t′� test,�and�the�Mann-Whitney-Wilcoxon�test��Alternatively,�one�could�conduct�the�Kruskal– Wallis�nonparametric�one-factor�ANOVA,�which�is�also�based�on�ranked�data,�and�which� is�appropriate�for�comparing�the�means�of�two�or�more�independent�groups��This�test�is� considered�more�fully�in�Chapter�11��These�recommendations�are�summarized�in�Box�7�1� STOp aNd ThINk bOx 7.1 Recommendations�for�the�Independent�and�Dependent�Samples�Tests�Based�on�Meeting� or Violating�the�Assumption�of�Normality Assumption Independent Samples Tests Dependent Samples Tests Normality�is�met •��Use�the�independent�t�test�when� homogeneity�of�variances�is�met •�Use�the�dependent�t�test •��Use�the�independent�t�test�when� homogeneity�of�variances�is�not�met,�but� there�are�equal�sample�sizes�in�the�groups •��Use�the�Welch�t′�test�when�homogeneity�of� variances�is�not�met�and�there�are�unequal� sample�sizes�in�the�groups Normality�is�not�met •��Use�the�independent�t�test�with�ranked� scores�when�homogeneity�of�variances�is� met •��Use�the�Welch�t′�test�with�ranked�scores� when�homogeneity�of�variances�is�not�met,� regardless�of�equal�or�unequal�sample� sizes�in�the�groups •��Use�the�Kruskal–Wallis�nonparametric� procedure •��Use�the�dependent�t�test�with�ranked� scores,�or�alternative�procedures� including�bootstrap�methods,� trimmed�means,�medians,�or�Stein’s� method •��Use�the�Wilcoxon�signed�ranks�test� when�data�are�both�nonnormal�and� have�extreme�outliers •��Use�the�Friedman�nonparametric� procedure 176 An Introduction to Statistical Concepts 7.3 Inferences About Two Dependent Means In�this�section,�two�inferential�tests�of�the�difference�between�two�dependent�means�are� described,� the� dependent� t� test� and� briefly� the� Wilcoxon� signed� ranks� test�� The� section� concludes�with�a�list�of�recommendations� 7.3.1 dependent t Test As�you�may�recall,�the�dependent�t�test�is�appropriate�to�use�when�there�are�two�samples� that�are�dependent—the�individuals�in�sample�1�have�some�relationship�to�the�individuals� in�sample�2��First,�we�need�to�determine�the�conditions�under�which�the�dependent�t�test�is� appropriate��In�part,�this�has�to�do�with�the�statistical�assumptions�associated�with�the�test� itself—that�is,�(a)�normality�of�the�distribution�of�the�differences�of�the�dependent�variable� Y,�(b)�homogeneity�of�variance�of�the�two�populations,�and�(c)�independence�of�the�obser- vations�within�each�sample��Like�the�independent�t�test,�the�dependent�t�test�is�reasonably� robust�to�violation�of�the�normality�assumption,�as�we�show�later��Because�this�is�a�test�of� means,�the�dependent�variable�must�be�measured�on�an�interval�or�ratio�scale��For�example,� the� same� individuals� may� be� measured� at� two� points� in� time� on� the� same� interval-scaled� pretest�and�posttest,�or�some�matched�pairs�(e�g�,�twins�or�husbands–wives)�may�be�assessed� with�the�same�ratio-scaled�measure�(e�g�,�weight�measured�in�pounds)� Although�there�are�several�methods�for�computing�the�test�statistic�t,�the�most�direct�method� and�the�one�most�closely�aligned�conceptually�with�the�one-sample�t�test�is�as�follows: � t d sd = where d – �is�the�mean�difference sd–�is�the�standard�error�of�the�mean�difference Conceptually,� this� test� statistic� looks� just� like� the� one-sample� t� test� statistic,� except� now� that�the�notation�has�been�changed�to�denote�that�we�are�dealing�with�difference�scores� rather�than�raw�scores� The�standard�error�of�the�mean�difference�is�computed�by � s s n d d= where sd�is�the�standard�deviation�of�the�difference�scores�(i�e�,�like�any�other�standard�devia- tion,�only�this�one�is�computed�from�the�difference�scores�rather�than�raw�scores) n�is�the�total�number�of�pairs Conceptually,�this�standard�error�looks�just�like�the�standard�error�for�the�one-sample�t�test�� If�we�were�doing�hand�computations,�we�would�compute�a�difference�score�for�each�pair�of� scores�(i�e�,�Y1�−�Y2)��For�example,�if�sample�1�were�wives�and�sample�2�were�their�husbands,� then�we�calculate�a�difference�score�for�each�couple��From�this�set�of�difference�scores,�we� then�compute�the�mean�of�the�difference�scores�d – �and�standard�deviation�of�the�difference� 177Inferences About the Difference Between Two Means scores�sd��This�leads�us�directly�into�the�computation�of�the�t�test�statistic��Note�that�although� there�are�n�scores�in�sample�1,�n�scores�in�sample�2,�and�thus�2n�total�scores,�there�are�only�n� difference�scores,�which�is�what�the�analysis�is�actually�based�upon� The�test�statistic�t�is�then�compared�with�a�critical�value(s)�from�the�t�distribution��For�a� two-tailed�test,�from�Table�A�2,�we�would�use�the�appropriate�α2�column�depending�on�the� desired� level� of� significance� and� the� appropriate� row� depending� on� the� degrees� of� free- dom��The�degrees�of�freedom�for�this�test�are�n�−�1��Conceptually,�we�lose�one�degree�of� freedom�from�the�number�of�differences�(or�pairs)�because�we�are�estimating�the�popula- tion�variance�(or�standard�deviation)�of�the�difference��Thus,�there�is�one�restriction�along� the� lines� of� our� discussion� of� degrees� of� freedom� in� Chapter� 6�� The� critical� values� are� denoted�as�± −α2 1tn ��The�subscript,�α2,�of�the�critical�values�reflects�the�fact�that�this�is�a�two- tailed�test,�and�the�subscript�n�−�1�indicates�the�degrees�of�freedom��If�the�test�statistic�falls� into�either�critical�region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�H0� For�a�one-tailed�test,�from�Table�A�2,�we�would�use�the�appropriate�α1�column�depending� on�the�desired�level�of�significance�and�the�appropriate�row�depending�on�the�degrees�of� freedom��The�degrees�of�freedom�are�again�n�−�1��The�critical�value�is�denoted�as�+ −α1 1tn �for� the�alternative�hypothesis�H1:�μ1�−�μ2�>�0�and�as� − −α1 1tn �for�the�alternative�hypothesis�H1:�
μ1�−�μ2�<�0��If�the�test�statistic�t�falls�into�the�appropriate�critical�region,�then�we�reject�H0;� otherwise,�we�fail�to�reject�H0� 7.3.1.1 Confidence Interval For�the�two-tailed�test,�a�(1�−�α)%�CI�can�also�be�examined��The�CI�is�formed�as�follows: � d t sn d± −α 2 1( ) If�the�CI�contains�the�hypothesized�mean�difference�of�0,�then�the�conclusion�is�to�fail�to� reject�H0;�otherwise,�we�reject�H0��The�interpretation�of�these�CIs�is�the�same�as�those�previ- ously�discussed�for�the�one-sample�t�and�the�independent�t� 7.3.1.2 Effect Size The�effect�size�can�be�measured�using�Cohen’s�(1988)�d�computed�as�follows: � Cohen d d sd = where�Cohen’s�d�is�simply�used�to�distinguish�among�the�various�uses�and�slight�differ- ences� in� the� computation� of� d�� Interpretation� of� the� value� of� d� would� be� the� same� as� for� the�one-sample�t�and�the�independent�t�previously�discussed—specifically,�the�number�of� standard�deviation�units�for�which�the�mean(s)�differ(s)� 7.3.1.3 Example of the Dependent t Test Let�us�consider�an�example�for�purposes�of�illustrating�the�dependent�t�test��Ten�young� swimmers�participated�in�an�intensive�2�month�training�program��Prior�to�the�program,� each�swimmer�was�timed�during�a�50�meter�freestyle�event��Following�the�program,�the� 178 An Introduction to Statistical Concepts same�swimmers�were�timed�in�the�50�meter�freestyle�event�again��This�is�a�classic�pretest- posttest�design��For�illustrative�purposes,�we�will�conduct�a�two-tailed�test��However,�a� case� might� also� be� made� for� a� one-tailed� test� as� well,� in� that� the� coach� might� want� to� see�improvement�only��However,�conducting�a�two-tailed�test�allows�us�to�examine�the� CI�for�purposes�of�illustration��The�raw�scores,�the�difference�scores,�and�the�mean�and� standard�deviation�of�the�difference�scores�are�shown�in�Table�7�2��The�pretest�mean�time� was�64�seconds�and�the�posttest�mean�time�was�59�seconds� To�determine�our�test�statistic�value,�t,�first�we�compute�the�standard�error�of�the�mean� difference�as�follows: � s s n d d= = = 2 1602 10 0 6831 . . Next,�using�this�value�for�the�denominator,�the�test�statistic�t�is�then�computed�as�follows: � t d sd = = = 5 0 6831 7 3196 . . We�then�use�Table�A�2�to�determine�the�critical�values��As�there�are�nine�degrees�of�free- dom�(n�−�1�=�10�−�1�=�9),�using�α�=��05�and�a�two-tailed�or�nondirectional�test,�we�find�the� critical�values�using�the�appropriate�α2�column�to�be�+2�262�and�−2�262��Since�the�test�sta- tistic�falls�beyond�the�critical�values,�as�shown�in�Figure�7�2,�we�reject�the�null�hypothesis� that�the�means�are�equal�in�favor�of�the�nondirectional�alternative�that�the�means�are�not� equal��Thus,�we�conclude�that�the�mean�swimming�performance�changed�from�pretest�to� posttest�at�the��05�level�of�significance�(p�<��05)� The�95%�CI�is�computed�to�be�the�following: � d t sn d± = ± = ± =−α 2 1 5 2 262 0 6831 5 1 5452 3 4548 6 5452( ) . ( . ) . ( . , . ) Table 7.2 Swimming�Data�for�Dependent�Samples Swimmer Pretest Time (in Seconds) Posttest Time (in Seconds) Difference (d) 1 58 54 (i�e�,�58�−�54)�=�4 2 62 57 5 3 60 54 6 4 61 56 5 5 63 61 2 6 65 59 6 7 66 64 2 8 69 62 7 9 64 60 4 10 72 63 9 d – �=�5�0000 sd�=�2�1602 179Inferences About the Difference Between Two Means As�the�CI�does�not�contain�the�hypothesized�mean�difference�value�of�0,�we�would�again� reject�the�null�hypothesis�and�conclude�that�the�mean�pretest-posttest�difference�was�not� equal�to�0�at�the��05�level�of�significance�(p�<��05)� The�effect�size�is�computed�to�be�the�following: � Cohen d d sd = = = 5 2 1602 2 3146 . . which� is� interpreted� as� there� is� approximately� a� two� and� one-third� standard� deviation� difference�between�the�pretest�and�posttest�mean�swimming�times,�a�very�large�effect�size� according�to�Cohen’s�subjective�standard� 7.3.1.4 Assumptions Let� us� return� to� the� assumptions� of� normality,� independence,� and� homogeneity� of� vari- ance�� For� the� dependent� t� test,� the� assumption� of� normality� is� met� when� the� difference� scores� are� normally� distributed�� Normality� of� the� difference� scores� can� be� examined� as� discussed� previously—graphical� methods� (such� as� stem-and-leaf� plots,� box� plots,� histo- grams,�and/or�Q–Q�plots),�statistical�procedures�such�as�the�S–W�test�(1965),�and/or�skew- ness�and�kurtosis�statistics��The�assumption�of�independence�is�met�when�the�cases�in�our� sample�have�been�randomly�selected�from�the�population��If�the�independence�assump- tion�is�not�met,�then�probability�statements�about�the�Type�I�and�Type�II�errors�will�not�be� accurate;�in�other�words,�the�probability�of�a�Type�I�or�Type�II�error�may�be�increased�as�a� result�of�the�assumption�not�being�met��Homogeneity�of�variance�refers�to�equal�variances� of�the�two�populations��In�later�chapters,�we�will�examine�procedures�for�formally�testing� for�equal�variances��For�the�moment,�if�the�ratio�of�the�smallest�to�largest�sample�variance� is� within� 1:4,� then� we� have� evidence� to� suggest� the� assumption� of� homogeneity� of� vari- ances�is�met��Research�has�shown�that�the�effect�of�heterogeneity�(i�e�,�unequal�variances)� is�minimal�when�the�sizes�of�the�two�samples,�n1�and�n2,�are�equal,�as�is�the�case�with�the� dependent�t�test�by�definition�(unless�there�are�missing�data)� α = .025 –2.262 Critical value +2.262 Critical value +7.3196 t test statistic value α = .025 FIGuRe 7.2 Critical� regions� and� test� statistic� for� the� swimming�example� 180 An Introduction to Statistical Concepts 7.3.2 Recommendations The� following� three� recommendations� are� made� regarding� the� two� dependent� samples� case��First,�the�dependent�t�test�is�recommended�when�the�normality�assumption�is�met�� Second,�the�dependent�t�test�using�ranks�(Conover�&�Iman,�1981)�is�recommended�when� the�normality�assumption�is�not�met��Here�you�rank�order�the�difference�scores�from�high- est�to�lowest,�then�conduct�the�test�on�the�ranked�difference�scores�rather�than�on�the�raw� difference�scores��However,�more�recent�research�by�Wilcox�(2003)�indicates�that�power�for� the�dependent�t�can�be�reduced�even�for�slight�departures�from�normality��Wilcox�recom- mends� several� procedures� not� readily� available� and� beyond� the� scope� of� this� text� (boot- strap�methods,�trimmed�means,�medians,�Stein’s�method)��Keep�in�mind,�though,�that�the� dependent�t�test�is�fairly�robust�to�nonnormality�in�most�situations� Third,�the�nonparametric�Wilcoxon�signed�ranks�test�is�recommended�when�the�data� are�nonnormal�with�extreme�outliers�(one�or�a�few�observations�that�behave�quite�differ- ently�from�the�rest)��However,�among�the�disadvantages�of�this�test�are�that�(a)�the�criti- cal�values�are�not�extensively�tabled�and�two�different�tables�exist�depending�on�sample� size,� and� (b)� tied� ranks� can� affect� the� results� and� no� optimal� procedure� has� yet� been� developed� (Wilcox,� 1996)�� For� these� reasons,� the� details� of� the� Wilcoxon� signed� ranks� test� are� not� described� here�� Note� that� most� major� statistical� packages,� including� SPSS,� include�options�for�conducting�the�dependent�t�test�and�the�Wilcoxon�signed�ranks�test�� Alternatively,�one�could�conduct�the�Friedman�nonparametric�one-factor�ANOVA,�also� based�on�ranked�data,�and�which�is�appropriate�for�comparing�two�or�more�dependent� sample�means��This�test�is�considered�more�fully�in�Chapter�15��These�recommendations� are�summarized�in�Box�7�1� 7.4 SPSS Instructions�for�determining�the�independent�samples�t�test�using�SPSS�are�presented�first�� This� is� followed� by� additional� steps� for� examining� the� assumption� of� normality� for� the� independent�t�test��Next,�instructions�for�determining�the�dependent�samples�t�test�using� SPSS�are�presented�and�are�then�followed�by�additional�steps�for�examining�the�assump- tions�of�normality�and�homogeneity� Independent t Test Step 1:� In� order� to� conduct� an� independent� t� test,� your� dataset� needs� to� include� a� dependent�variable�Y�that�is�measured�on�an�interval�or�ratio�scale�(e�g�,�cholesterol),�as� well�as�a�grouping�variable�X�that�is�measured�on�a�nominal�or�ordinal�scale�(e�g�,�gen- der)��For�the�grouping�variable,�if�there�are�more�than�two�categories�available,�only�two� categories�can�be�selected�when�running�the�independent�t�test�(the�ANOVA�is�required� for� examining� more� than� two� categories)�� To� conduct� the� independent� t� test,� go� to� the� “Analyze”�in�the�top�pulldown�menu,�then�select�“Compare Means,”�and�then�select� “Independent-Samples T Test.”�Following�the�screenshot�(step�1)�as�follows�pro- duces�the�“Independent-Samples T Test”�dialog�box� 181Inferences About the Difference Between Two Means A B C Independent t test: Step 1 Step 2:�Next,�from�the�main�“Independent-Samples T Test”�dialog�box,�click�the� dependent�variable�(e�g�,�cholesterol)�and�move�it�into�the�“Test Variable”�box�by�click- ing�on�the�arrow�button��Next,�click�the�grouping�variable�(e�g�,�gender)�and�move�it�into� the�“Grouping Variable”� box� by� clicking� on� the� arrow� button�� You� will� notice� that� there�are�two�question�marks�next�to�the�name�of�your�grouping�variable��This�is�SPSS�let- ting�you�know�that�you�need�to�define�(numerically)�which�two�categories�of�the�grouping� variable�you�want�to�include�in�your�analysis��To�do�that,�click�on�“Define Groups.” Clicking on “Options” will allow you to define a confidence interval percentage. e default is 95% (corresponding to an alpha of .05). Select the variable of interest from the list on the left and use the arrow to move to the “Test Variable” box on the right. Clicking on “Define Groups” will allow you to define the two numeric values of the categories for the independent variable. Independent t test: Step 2 182 An Introduction to Statistical Concepts Step 3:�From�the�“Define Groups”�dialog�box,�enter�the�numeric�value�designated�for� each�of�the�two�categories�or�groups�of�your�independent�variable��Where�it�says�“Group 1,”�type�in�the�value�designated�for�your�first�group�(e�g�,�1,�which�in�our�case�indicated� that�the�individual�was�a�female),�and�where�it�says�“Group 2,”�type�in�the�value�desig- nated�for�your�second�group�(e�g�,�2,�in�our�example,�a�male)�(see�step�3�screenshot)� Independent t test: Step 3 Click�on�“Continue”�to�return�to�the�original�dialog�box�(see�step�2�screenshot)�and�then� click�on�“OK”�to�run�the�analysis��The�output�is�shown�in�Table�7�3� Changing the alpha level (optional):�The�default�alpha�level�in�SPSS�is��05,�and� thus,�the�default�corresponding�CI�is�95%��If�you�wish�to�test�your�hypothesis�at�an�alpha� level�other�than��05�(and�thus�obtain�CIs�other�than�95%),�click�on�the�“Options”�button� located�in�the�top�right�corner�of�the�main�dialog�box�(see�step�2�screenshot)��From�here,� the�CI�percentage�can�be�adjusted�to�correspond�to�the�alpha�level�at�which�you�wish�your� hypothesis�to�be�tested�(see�Chapter�6�screenshot�step�3)��(For�purposes�of�this�example,�the� test�has�been�generated�using�an�alpha�level�of��05�) Interpreting the output:�The�top�table�provides�various�descriptive�statistics�for� each� group,� while� the� bottom� box� gives� the� results� of� the� requested� procedure�� There� you� see� the� following� three� different� inferential� tests� that� are� automatically� provided:� (1)� Levene’s� test� of� the� homogeneity� of� variance� assumption� (the� first� two� columns� of� results),�(2)�the�independent�t�test�(which�SPSS�calls�“Equal variances assumed”)� (the�top�row�of�the�remaining�columns�of�results),�and�(3)�the�Welch�t′�test�(which�SPSS� calls�“Equal variances not assumed”)�(the�bottom�row�of�the�remaining�columns� of�results)� The� first� interpretation� that� must� be� made� is� for� Levene’s� test� of� equal� variances�� The� assumption� of� equal� variances� is� met� when� Levene’s� test� is� not� statistically� significant�� We� can� determine� statistical� significance� by� reviewing� the� p� value� for� the� F� test�� In� this� example,�the�p�value�is��090,�greater�than�our�alpha�level�of��05�and�thus�not�statistically� significant��Levene’s�test�tells�us�that�the�variance�for�cholesterol�level�for�males�is�not�sta- tistically�significantly�different�than�the�variance�for�cholesterol�level�for�females��Having� met� the� assumption� of� equal� variances,� the� values� in� the� rest� of� the� table� will� be� drawn� from�the�row�labeled�“Equal Variances Assumed.”�Had�we�not�met�the�assumption� of�equal�variances�(p�<�α),�we�would�report�Welch�t′�for�which�the�statistics�are�presented� on�the�row�labeled�“Equal Variances Not Assumed.” After� determining� that� the� variances� are� equal,� the� next� step� is� to� examine� the� results�of�the�independent�t�test��The�t�test�statistic�value�is�−2�4842,�and�the�associated� p�value�is��023��Since�p�is�less�than�α,�we�reject�the�null�hypothesis��There�is�evidence�to� suggest�that�the�mean�cholesterol�level�for�males�is�different�than�the�mean�cholesterol� level�for�females� 183Inferences About the Difference Between Two Means Table 7.3 SPSS�Results�for�Independent�t�Test Group Statistics Gender N Mean Std. Deviation Std. Error Mean Female 8 185.0000 19.08627 6.74802Cholesterol level Male 12 215.0000 30.22642 8.72562 Independent Samples Test Levene's Test for Equality of Variances t-Test for Equality of Means 95% Confidence Interval of the Difference F Sig. t df Sig. (2-Tailed) Mean Difference Std. Error Difference Lower Upper 3.201 .090 –2.484 .023 –30.00000 12.07615 –55.37104 –2.720 18 17.984 .014 –30.00000 11.03051 –53.17573 –4.62896 – 6.82427 “t” is the t test statistic value. �e t value in the top row is used when the assumption of equal variances has been met and is calculated as: The t value in the bottom row is the Welch t΄and is used when the assumption of equal variances has not been met. �e table labeled “Group Statistics” provides basic descriptive statistics for the dependent variable by group. SPSS reports the95% confidence interval of the difference. This is interpreted to mean that 95% of the CIs generated across samples will contain the true population mean difference of 0. �e F test (and p value) of Levene’s Test for Equality of Variances is reviewed to determine if the equal variances assumption has been met. �e result of this test determines which row of statistics to utilize. In this case, we meet the assumption and use the statistics reported in the top row. “Sig.” is the observed p value for the independent t test. It is interpreted as: there is less than a 3% probability of a sample mean difference of –30 or greater occurring by chance if the null hypothesis is really true (i.e., if the population mean difference is 0). �e mean difference is simply the difference between the sample mean cholesterol values. In other words, 185 – 215 = – 30 The standard error of the mean difference is calculated as: =–sY1 sp n1 n2 11 + df are the degrees of freedom. For the independent samples t test, they are calculated as Equal variances assumed Cholesterol level Equal variances not assumed Y2 n1 + n2 – 2. –2.484 12.075 Y1 – Y2t sY1 – Y2 185 – 215 === 184 An Introduction to Statistical Concepts Using “Explore” to Examine Normality of Distribution of Dependent Variable by Categories of Independent Variable Generating normality evidence: As�alluded�to�earlier�in�the�chapter,�understanding� the�distributional�shape,�specifically�the�extent�to�which�normality�is�a�reasonable�assump- tion,�is�important��For�the�independent�t�test,�the�distributional�shape�for�the�dependent�vari- able�should�be�normally�distributed�for�each�category/group�of�the�independent�variable��As� with�our�one-sample�t�test,�we�can�again�use�“Explore”�to�examine�the�extent�to�which�the� assumption�of�normality�is�met� The� general� steps� for� accessing�“Explore”� have� been� presented� in� previous� chapters� (e�g�,�Chapter�4),�and�they�will�not�be�reiterated�here��Normality�of�the�dependent�variable� must�be�examined�for�each�category�of�the�independent�variable,�so�we�must�tell�SPSS�to� split�the�examination�of�normality�by�group��Click�the�dependent�variable�(e�g�,�cholesterol)� and�move�it�into�the�“Dependent List”�box�by�clicking�on�the�arrow�button��Next,�click� the�grouping�variable�(e�g�,�gender)�and�move�it�into�the�“Factor List”�box�by�clicking� on�the�arrow�button��The�procedures�for�selecting�normality�statistics�were�presented�in� Chapter� 6,� and� they� remain� the� same� here:� click� on�“Plots”� in� the� upper� right� corner�� Place� a� checkmark� in� the� boxes� for� “Normality plots with tests”� and� also� for� “Histogram.”�Then�click�“Continue”�to�return�to�the�main�“Explore”�dialog�screen�� From�there,�click�“OK”�to�generate�the�output� Select the dependent variable from the list on the left and use the arrow to move to the “Dependent List” box on the right and the independent variable from the list on the left and use the arrow to move to the “Factor List” box on the right. �en click on “Plots.” Generating normality evidence by group Interpreting normality evidence:�We�have�already�developed�a�good�under- standing�of�how�to�interpret�some�forms�of�evidence�of�normality�including�skewness� 185Inferences About the Difference Between Two Means and� kurtosis,� histograms,� and� boxplots�� As� we� examine� the� “Descriptives”� table,� we� see� the� output� for� the� cholesterol� statistics� is� separated� for� male� (top� portion)� and� female�(bottom�portion)� Mean 95% Con�dence interval for mean 5% Trimmed mean Median Variance Std. deviation Minimum Maximum Range Interquartile range Skewness Kurtosis Mean 95% Con�dence interval for mean 5% Trimmed mean Median Variance Std. deviation Minimum Maximum Range Interquartile range Skewness Kurtosis Female Lower bound Upper bound Lower bound Upper bound Cholesterol level Male Gender Descriptives Statistic Std. Error 215.0000 195.7951 234.2049 215.0000 215.0000 913.636 170.00 260.00 90.00 57.50 .000 –1.446 185.0000 169.0435 200.9565 185.0000 185.0000 364.286 19.08627 160.00 210.00 50.00 37.50 .000 –1.790 1.232 6.74802 .637 30.22642 8.72562 .752 1.481 The� skewness� statistic� of� cholesterol� level� for� the� males� is� �000� and� kurtosis� is� −1�446—both� within� the� range� of� an� absolute� value� of� 2�0,� suggesting� some� evidence� of� normality� of� the� dependent� variable� for� males�� Evidence� of� normality� for� the� dis- tributional�shape�of�cholesterol�level�for�females�is�also�present:�skewness�=��000�and� kurtosis�=�−1�790� The�histogram�of�cholesterol�level�for�males�is�not�exactly�what�most�researchers�would� consider� a� classic� normally� shaped� distribution�� Although� the� histogram� of� cholesterol� level�for�females�is�not�presented�here,�it�follows�a�similar�distributional�shape� 186 An Introduction to Statistical Concepts 2.0 1.5 1.0 Fr eq ue nc y 0.5 0.0 160.00 180.00 200.00 220.00 Cholesterol level 240.00 260.00 Histogram for group = Male Mean = 215.00 Std. dev. = 30.226 N = 12 There�are�a�few�other�statistics�that�can�be�used�to�gauge�normality�as�well��Our�formal� test�of�normality,�the�Shapiro–Wilk�test�(SW)�(Shapiro�&�Wilk,�1965),�provides�evidence�of� the�extent�to�which�our�sample�distribution�is�statistically�different�from�a�normal�distri- bution�� The� output� for� the� S–W� test� is� presented� in� the� following� and� suggests� that� our� sample�distribution�for�cholesterol�level�is�not�statistically�significantly�different�than�what� would�be�expected�from�a�normal�distribution—and�this�is�true�for�both�males�(SW�=��949,� df�=�12,�p�=��617)�and�females�(SW�=��931,�df�=�8,�p�=��525)� Gender Male Female Cholesterol level Statistic Statisticdf Tests of Normality Kolmogorov–Smirnova df Shapiro–Wilk Sig. Sig. .129 .159 8 12 8 12 .617 .525 .200 .200 .931 .949 a Lilliefors significance correction * This is a lower bound of the true significance. Quantile–quantile� (Q–Q)� plots� are� also� often� examined� to� determine� evidence� of� nor- mality�� Q–Q� plots� are� graphs� that� plot� quantiles� of� the� theoretical� normal� distribution� against�quantiles�of�the�sample�distribution��Points�that�fall�on�or�close�to�the�diagonal�line� suggest�evidence�of�normality��Similar�to�what�we�saw�with�the�histogram,�the�Q–Q�plot� of�cholesterol�level�for�both�males�and�females�(although�not�shown�here)�suggests�some� nonnormality��Keep�in�mind�that�we�have�a�relatively�small�sample�size��Thus,�interpreting� the�visual�graphs�(e�g�,�histograms�and�Q–Q�plots)�can�be�challenging,�although�we�have� plenty�of�other�evidence�for�normality� 187Inferences About the Difference Between Two Means 2 1 0 –1 –2 Ex pe ct ed n or m al 175 200 225 Observed value 250 275 Normal Q–Q plot of cholesterol level For group = male Examination�of�the�boxplots�suggests�a�relatively�normal�distributional�shape�of�choles- terol�level�for�both�males�and�females�and�no�outliers� 260.00 240.00 220.00 200.00 C ho le st er ol le ve l 180.00 160.00 Male Female Gender Considering�the�forms�of�evidence�we�have�examined,�skewness�and�kurtosis�statistics,� the�S–W�test,�and�the�boxplots,�all�suggest�normality�is�a�reasonable�assumption��Although� the� histograms� and� Q–Q� plots� suggest� some� nonnormality,� this� is� somewhat� expected� given� the� small� sample� size�� Generally,� we� can� be� reasonably� assured� we� have� met� the� assumption� of� normality� of� the� dependent� variable� for� each� group� of� the� independent� variable��Additionally,�recall�that�when�the�assumption�of�normality�is�violated�with�the� independent�t�test,�the�effects�on�Type�I�and�Type�II�errors�are�minimal�when�using�a�two- tailed�test,�as�we�are�conducting�here�(e�g�,�Glass,�Peckham,�&�Sanders,�1972;�Sawilowsky� &�Blair,�1992)� 188 An Introduction to Statistical Concepts Dependent t Test Step 1:�To�conduct�a�dependent�t�test,�your�dataset�needs�to�include�the�two�variables� (i�e�,�for�the�paired�samples)�whose�means�you�wish�to�compare�(e�g�,�pretest�and�posttest)�� To� conduct� the� dependent� t� test,� go� to� the�“Analyze”� in� the� top� pulldown� menu,� then� select�“Compare Means,”�and�then�select�“Paired-Samples T Test.”�Following�the� screenshot�(step�1)�as�follows�produces�the�“Paired-Samples T Test”�dialog�box� A B C Dependent t test: Step 1 Step 2:�Click�both�variables�(e�g�,�pretest�and�posttest�as�variable�1�and�variable�2,�respec- tively)�and�move�them�into�the�“Paired Variables”�box�by�clicking�the�arrow�button�� Both�variables�should�now�appear�in�the�box�as�shown�in�screenshot�step�2��Then�click�on� “Ok” to�run�the�analysis�and�generate�the�output� Select the paired samples from the list on the left and use the arrow to move to the “Paired Variables” box on the right. Then click on “Ok.” Dependent t test: Step 2 The�output�appears�in�Table�7�4,�where�again�the�top�box�provides�descriptive�statistics,� the�middle�box�provides�a�bivariate�correlation�coefficient,�and�the�bottom�box�gives�the� results�of�the�dependent�t�test�procedure� 189Inferences About the Difference Between Two Means Table 7.4 SPSS�Results�for�Dependent�t�Test Paired Samples Statistics Mean N Std. Deviation Std. Error Mean Pretest 64.0000 10 4.21637 1.33333Pair 1 Posttest 59.0000 10 3.62093 1.14504 Paired Samples Correlations N Correlation Sig. Pair 1 Pretest and posttest 10 .859 .001 Paired Samples Test Paired Differences 95% Confidence Interval of the Difference Mean Std. Deviation Std. Error Mean Lower Upper t df Sig. (2-Tailed) Pair 1 Pretest - posttest 5.00000 2.16025 .68313 3.45465 6.54535 7.319 9 .000 �e table labeled “Paired Samples Statistics” provides basic descriptive statistics for the paired samples. The table labeled “Paired Samples Correlations” provides the Pearson Product Moment Correlation Coefficient value, a bivariate correlation coefficient, between the pretest and posttest values. In this example, there is a strong correlation (r = .859) and it is statistically significant ( p = .001).�e values in this section of the table are calculated based on paired differences (i.e., the difference values between pretest and posttest scores). “Sig.” is the observed p value for the dependent t test. It is interpreted as: there is less than a 1% probability of a sample mean difference of 5 or greater occurring by chance if the null hypothesis is really true (i.e., if the population mean difference is 0). df are the degrees of freedom. For the dependent samples t test, they are calculated as n – 1. “t” is the t test statistic value. The t value is calculated as: 5 0.6831 == d sd t 7.3196= 190 An Introduction to Statistical Concepts Using “Explore” to Examine Normality of Distribution of Difference Scores Generating normality evidence:�As�with�the�other�t�tests�we�have�studied,�under- standing� the� distributional� shape� and� the� extent� to� which� normality� is� a� reasonable� assumption� is� important�� For� the� dependent� t� test,� the� distributional� shape� for� the� dif- ference scores� should� be� normally� distributed�� Thus,� we� first� need� to� create� a� new� vari- able�in�our�dataset�to�reflect�the�difference�scores�(in�this�case,�the�difference�between�the� pre-�and�posttest�values)��To�do�this,�go�to�“Transform”�in�the�top�pulldown�menu,�then� select�“Compute Variable.”�Following�the�screenshot�(step�1)�as�follows�produces�the� “Compute Variable”�dialog�box� A B Computing the difference score: Step 1 From�the�“Compute Variable”�dialog�screen,�we�can�define�the�column�header�for�our� variable�by�typing�in�a�name�in�the�“Target Variable”�box�(no�spaces,�no�special�char- acters,�and�cannot�begin�with�a�numeric�value)��The�formula�for�computing�our�difference� score�is�inserted�in�the�“Numeric Expression”�box��To�create�this�formula,�(1)�click�on� “pretest”�in�the�left�list�of�variables�and�use�the�arrow�key�to�move�it�into�the�“Numeric Expression” box;�(2)�use�your�keyboard�or�the�keyboard�within�the�dialog�box�to�insert� a�minus�sign�(i�e�,�dash)�after�“pretest”�in�the�“Numeric Expression”�box;�(3)�click�on� “posttest”�in�the�left�list�of�variables�and�use�the�arrow�key�to�move�it�into�the�“Numeric Expression”�box;�and�(4)�click�on�“OK”�to�create�the�new�difference�score�variable�in� your�dataset� 191Inferences About the Difference Between Two Means Computing the difference score: Step 2 We�can�again�use�“Explore”�to�examine�the�extent�to�which�the�assumption�of�normal- ity�is�met�for�the�distributional�shape�of�our�newly�created�difference score��The�general�steps� for� accessing� “Explore”� (see,� e�g�,� Chapter� 4)� and� for� generating� normality� evidence� for� one�variable�(see�Chapter�6)�have�been�presented�in�previous�chapters,�and�they�will�not� be�reiterated�here� Interpreting normality evidence:� We� have� already� developed� a� good� under- standing�of�how�to�interpret�some�forms�of�evidence�of�normality�including�skewness�and� kurtosis,�histograms,�and�boxplots��The�skewness�statistic�for�the�difference�score�is��248� and� kurtosis� is� �050—both� within� the� range� of� an� absolute� value� of� 2�0,� suggesting� one� form�of�evidence�of�normality�of�the�differences� The�histogram�for�the�difference�scores�(not�presented�here)�is�not�necessarily�what�most� researchers� would� consider� a� normally� shaped� distribution�� Our� formal� test� of� normal- ity,� the� S–W� (SW)� test� (Shapiro� &� Wilk,� 1965),� suggests� that� our� sample� distribution� for� differences�is�not�statistically�significantly�different�than�what�would�be�expected�from�a� normal�distribution�(S–W�=��956,�df�=�10,�p�=��734)��Similar�to�what�we�saw�with�the�histo- gram,�the�Q–Q�plot�of�differences�suggests�some�nonnormality�in�the�tails�(as�the�farthest� points� are� not� falling� on� the� diagonal� line)�� Keep� in� mind� that� we� have� a� small� sample� size��Thus,�interpreting�the�visual�graphs�(e�g�,�histograms�and�Q–Q�plots)�can�be�difficult�� Examination�of�the�boxplot�suggests�a�relatively�normal�distributional�shape��Considering� the�forms�of�evidence�we�have�examined,�skewness�and�kurtosis,�the�S–W�test�of�normal- ity,� and� boxplots,� all� suggest� normality� is� a� reasonable� assumption�� Although� the� histo- grams�and�Q–Q�plots�suggested�some�nonnormality,�this�is�somewhat�expected�given�the� small�sample�size��Generally,�we�can�be�reasonably�assured�we�have�met�the�assumption� of�normality�of�the�difference�scores� Generating evidence of homogeneity of variance of difference scores:� Without�conducting�a�formal�test�of�equality�of�variances�(as�we�do�in�Chapter�9),�a�rough� benchmark�for�having�met�the�assumption�of�homogeneity�of�variances�when�conducting� 192 An Introduction to Statistical Concepts the�dependent�t�test�is�that�the�ratio�of�the�smallest�to�largest�variance�of�the�paired�samples� is�no�greater�than�1:4��The�variance�can�be�computed�easily�by�any�number�of�procedures� in�SPSS�(e�g�,�refer�back�to�Chapter�3),�and�these�steps�will�not�be�repeated�here��For�our� paired�samples,�the�variance�of�the�pretest�score�is�17�778�and�the�variance�of�the�posttest� score�is�13�111—well�within�the�range�of�1:4,�suggesting�that�homogeneity�of�variances�is� reasonable� 7.5 G*Power Using�the�results�of�the�independent�samples�t�test�just�conducted,�let�us�use�G*Power�to� compute�the�post�hoc�power�of�our�test� Post Hoc Power for the Independent t Test Using G*Power The� first� thing� that� must� be� done� when� using� G*Power� for� computing� post� hoc� power� is� to� select� the� correct� test� family�� In� our� case,� we� conducted� an� independent� samples� t� test;� therefore,� the� default� selection� of�“t tests”� is� the� correct� test� family�� Next,� we� need� to� select� the� appropriate� statistical� test�� We� use� the� arrow� to� toggle� to� “Means: Difference between two independent means (two groups).”�The�“Type of Power Analysis”� desired� then� needs� to� be� selected�� To� compute� post� hoc� power,� we� need�to�select�“Post hoc: Compute achieved power–given�α, sample size, and effect size.” The�“Input Parameters”�must�then�be�specified��The�first�parameter�is�the�selection�of� whether�your�test�is�one-tailed�(i�e�,�directional)�or�two-tailed�(i�e�,�nondirectional)��In�this� example,�we�have�a�two-tailed�test�so�we�use�the�arrow�to�toggle�to�“Two.”�The�achieved� or�observed�effect�size�was�−1�1339��The�alpha�level�we�tested�at�was��05,�and�the�sample� size�for�females�was�8�and�for�males,�12��Once�the�parameters�are�specified,�simply�click�on� “Calculate”�to�generate�the�achieved�power�statistics� The�“Output Parameters”�provide�the�relevant�statistics�given�the�input�just�speci- fied�� In� this� example,� we� were� interested� in� determining� post� hoc� power� given� a� two- tailed�test,�with�an�observed�effect�size�of�−1�1339,�an�alpha�level�of��05,�and�sample�sizes� of� 8� (females)� and� 12� (males)�� Based� on� those� criteria,� the� post� hoc� power� was� �65�� In� other� words,� with� a� sample� size� of� 8� females� and� 12� males� in� our� study,� testing� at� an� alpha�level�of��05�and�observing�a�large�effect�of�−1�1339,�then�the�power�of�our�test�was� �65—the�probability�of�rejecting�the�null�hypothesis�when�it�is�really�false�will�be�65%,� which�is�only�moderate�power��Keep�in�mind�that�conducting�power�analysis�a�priori�is� recommended�so�that�you�avoid�a�situation�where,�post�hoc,�you�find�that�the�sample�size� was�not�sufficient�to�reach�the�desired�power�(given�the�observed�effect�size�and�alpha� level)��We�were�fortunate�in�this�example�in�that�we�were�still�able�to�detect�a�statistically� significant�difference�in�cholesterol�levels�between�males�and�females;�however�we�will� likely�not�always�be�that�lucky� 193Inferences About the Difference Between Two Means The “Input Parameters” for computing post hoc power must be specified including: Once the parameters are specified, click on “Calculate.” Independent t test 1. One versus two tailed test; 2. Observed effect size d; 3. Alpha level; and 4. Sample size for each group of the independent variable. Post Hoc Power for the Dependent t Test Using G*Power Now,�let�us�use�G*Power�to�compute�post�hoc�power�for�the�dependent�t�test��First,�the�cor- rect�test�family�needs�to�be�selected��In�our�case,�we�conducted�a�dependent�samples�t�test;� therefore,�the�default�selection�of�“t tests”�is�the�correct�test�family��Next,�we�need�to� select�the�appropriate�statistical�test��We�use�the�arrow�to�toggle�to�“Means: Difference between two dependent means (matched pairs).”� The� “Type of Power Analysis”� desired� then� needs� to� be� selected�� To� compute� post� hoc� power,� we� need� to� select�“Post hoc: Compute achieved power–given α, sample size, and effect size.” The�“Input Parameters”� must� then� be� specified�� The� first� parameter� is� the� selec- tion�of�whether�your�test�is�one-tailed�(i�e�,�directional)�or�two-tailed�(i�e�,�nondirectional)�� 194 An Introduction to Statistical Concepts In�this�example,�we�have�a�two-tailed�test,�so�we�use�the�arrow�to�toggle�to�“Two.”�The� achieved�or�observed�effect�size�was�2�3146��The�alpha�level�we�tested�at�was��05,�and�the� total�sample�size�was�10��Once�the�parameters�are�specified,�simply�click�on�“Calculate”� to�generate�the�achieved�power�statistics� The�“Output Parameters”�provide�the�relevant�statistics�given�the�input�specified��In� this�example,�we�were�interested�in�determining�post�hoc�power�given�a�two-tailed�test,� with� an� observed� effect� size� of� 2�3146,� an� alpha� level� of� �05,� and� total� sample� size� of� 10�� Based�on�those�criteria,�the�post�hoc�power�was��99��In�other�words,�with�a�total�sample�size� of�10,�testing�at�an�alpha�level�of��05�and�observing�a�large�effect�of�2�3146,�then�the�power� of�our�test�was�over��99—the�probability�of�rejecting�the�null�hypothesis�when�it�is�really� false�will�be�greater�than�99%,�about�the�strongest�power�that�can�be�achieved��Again,�con- ducting�power�analysis�a�priori�is�recommended�so�that�you�avoid�a�situation�where,�post� hoc,�you�find�that�the�sample�size�was�not�sufficient�to�reach�the�desired�power�(given�the� observed�effect�size�and�alpha�level)� Once the parameters are specified, click on “Calculate.” Dependent t test �e “Input Parameters” for computing post hoc power must be specified including: 1. One versus two tailed test; 2. Observed effect size d; 3. Alpha level; and 4. Sample size for each group of the independent variable. 195Inferences About the Difference Between Two Means 7.6 Template and APA-Style Write-Up Next�we�develop�APA-style�paragraphs�describing�the�results�for�both�examples��First�is�a� paragraph�describing�the�results�of�the�independent�t�test�for�the�cholesterol�example,�and� this�is�followed�by�dependent�t�test�for�the�swimming�example� Independent t Test Recall�that�our�graduate�research�assistant,�Marie,�was�working�with�JoAnn,�a�local�nurse� practitioner,� to� assist� in� analyzing� cholesterol� levels�� Her� task� was� to� assist� JoAnn� with� writing�her�research�question�(Is there a mean difference in cholesterol level between males and females?)�and�generating�the�test�of�inference�to�answer�her�question��Marie�suggested�an� independent�samples�t�test�as�the�test�of�inference��A�template�for�writing�a�research�ques- tion�for�an�independent�t�test�is�presented�as�follows: Is there a mean difference in [dependent variable] between [group 1 of the independent variable] and [group 2 of the independent variable]? It�may�be�helpful�to�preface�the�results�of�the�independent�samples�t�test�with�informa- tion� on� an� examination� of� the� extent� to� which� the� assumptions� were� met� (recall� there� are� three� assumptions:� normality,� homogeneity� of� variances,� and� independence)�� This� assists� the� reader� in� understanding� that� you� were� thorough� in� data� screening� prior� to� conducting�the�test�of�inference� An independent samples t test was conducted to determine if the mean cholesterol level of males differed from females. The assumption of normality was tested and met for the distributional shape of the dependent variable (cholesterol level) for females. Review of the S-W test for normality (SW = .931, df = 8, p = .525) and skewness (.000) and kurtosis (−1.790) statistics suggested that normality of cholesterol levels for females was a reasonable assumption. Similar results were found for male cholesterol levels. Review of the S-W test for normality (S-W = .949, df = 12, p = .617) and skewness (.000) and kurtosis (−1.446) statistics suggested that normality of males cholesterol levels was a reasonable assumption. The boxplots suggested a relatively normal distributional shape (with no outliers) of cholesterol levels for both males and females. The Q–Q plots and histograms suggested some minor nonnormality for both male and female cholesterol levels. Due to the small sample, this was anticipated. Although normality indices gener- ally suggest the assumption is met, even if there are slight depar- tures from normality, the effects on Type I and Type II errors will be minimal given the use of a two-tailed test (e.g., Glass, Peckham, & Sanders, 1972; Sawilowsky & Blair, 1992). According to Levene’s test, the homogeneity of variance assumption was satisfied (F = 3.2007, p = .090). Because there was no random assignment of the individuals to gender, the assumption of independence was not met, creating a poten- tial for an increased probability of a Type I or Type II error. 196 An Introduction to Statistical Concepts It�is�also�desirable�to�include�a�measure�of�effect�size��Recall�our�formula�for�computing� the�effect�size,�d,�presented�earlier�in�the�chapter��Plugging�in�the�values�for�our�cholesterol� example,� we� find� an� effect� size� d� of� −1�1339,� which� is� interpreted� according� to� Cohen’s� (1988)�guidelines�as�a�large�effect: � d Y Y sp = − = − = −1 2 185 215 26 4575 1 1339 . . Remember�that�for�the�two-sample�mean�test,�d�indicates�how�many�standard�deviations� the�mean�of�sample�1�is�from�the�mean�of�sample�2��Thus,�with�an�effect�size�of�−1�1339,� there�are�nearly�one�and�one-quarter�standard�deviation�units�between�the�mean�choles- terol�levels�of�males�as�compared�to�females��The�negative�sign�simply�indicates�that�group� 1�has�the�smaller�mean�(as�it�is�the�first�value�in�the�numerator�of�the�formula;�in�our�case,� the�mean�cholesterol�level�of�females)� Here�is�an�APA-style�example�paragraph�of�results�for�the�cholesterol�level�data�(remem- ber�that�this�will�be�prefaced�by�the�paragraph�reporting�the�extent�to�which�the�assump- tions�of�the�test�were�met)� As shown in Table 7.3, cholesterol data were gathered from samples of 12 males and 8 females, with a female sample mean of 185 (SD = 19.09) and a male sample mean of 215 (SD = 30.22). The independent t test indi- cated that the cholesterol means were statistically significantly dif- ferent for males and females (t = −2.4842, df = 18, p = .023). Thus, the null hypothesis that the cholesterol means were the same by gender was rejected at the .05 level of significance. The effect size d (calculated using the pooled standard deviation) was −1.1339. Using Cohen’s (1988) guidelines, this is interpreted as a large effect. The results provide evidence to support the conclusion that males and females differ in cholesterol levels, on average. More specifically, males were observed to have larger cholesterol levels, on average, than females. Parenthetically,�notice�that�the�results�of�the�Welch�t′�test�were�the�same�as�for�the�inde- pendent� t� test� (Welch� t′� =� −2�7197,� rounded� df� =� 18,� p� =� �014)�� Thus,� any� deviation� from� homogeneity�of�variance�did�not�affect�the�results� Dependent t Test Marie,�as�you�recall,�was�also�working�with�Mark,�a�local�swimming�coach,�to�assist�in�analyz- ing�freestyle�swimming�time�before�and�after�swimmers�participated�in�an�intensive�training� program�� Marie� suggested� a� research� question� (Is there a mean difference in swim time for the 50-meter freestyle event before participation in an intensive training program as compared to swim time for the 50-meter freestyle event after participation in an intensive training program?)�and�assisted�in� generating�the�test�of�inference�(specifically�the�dependent�t�test)�to�answer�her�question��A� template�for�writing�a�research�question�for�a�dependent�t�test�is�presented�as�follows� Is there a mean difference in [paired sample 1] as compared to [paired sample 2]? 197Inferences About the Difference Between Two Means It�may�be�helpful�to�preface�the�results�of�the�dependent�samples�t�test�with�information�on� the�extent�to�which�the�assumptions�were�met�(recall�there�are�three�assumptions:�normal- ity,�homogeneity�of�variance,�and�independence)��This�assists�the�reader�in�understanding� that�you�were�thorough�in�data�screening�prior�to�conducting�the�test�of�inference� A dependent samples t test was conducted to determine if there was a difference in the mean swim time for the 50 meter freestyle before participation in an intensive training program as compared to the mean swim time for the 50 meter freestyle after participation in an intensive training program. The assumption of normality was tested and met for the distributional shape of the paired differences. Review of the S-W test for normality (SW = .956, df = 10, p = .734) and skew- ness (.248) and kurtosis (.050) statistics suggested that normality of the paired differences was reasonable. The boxplot suggested a rela- tively normal distributional shape, and there were no outliers pres- ent. The Q–Q plot and histogram suggested minor nonnormality. Due to the small sample, this was anticipated. Homogeneity of variance was tested by reviewing the ratio of the raw score variances. The ratio of the smallest (posttest = 13.111) to largest (pretest = 17.778) variance was less than 1:4; therefore, there is evidence of the equal variance assumption. The individuals were not randomly selected; therefore, the assumption of independence was not met, creating a potential for an increased probability of a Type I or Type II error. It�is�also�important�to�include�a�measure�of�effect�size��Recall�our�formula�for�computing� the�effect�size,�d,�presented�earlier�in�the�chapter��Plugging�in�the�values�for�our�swimming� example,�we�find�an�effect�size�d�of�2�3146,�which�is�interpreted�according�to�Cohen’s�(1988)� guidelines�as�a�large�effect: � Cohen d d sd = = = 5 2 1602 2 3146 . . With� an� effect� size� of� 2�3146,� there� are� about� two� and� a� third� standard� deviation� units� between�the�pretraining�mean�swim�time�and�the�posttraining�mean�swim�time� Here�is�an�APA-style�example�paragraph�of�results�for�the�swimming�data�(remember� that�this�will�be�prefaced�by�the�paragraph�reporting�the�extent�to�which�the�assumptions� of�the�test�were�met)� From Table 7.4, we see that pretest and posttest data were collected from a sample of 10 swimmers, with a pretest mean of 64 seconds (SD = 4.22) and a posttest mean of 59 seconds (SD = 3.62). Thus, swimming times decreased from pretest to posttest. The dependent t test was conducted to determine if this difference was statistically significantly dif- ferent from 0, and the results indicate that the pretest and posttest means were statistically different (t = 7.319, df = 9, p < .001). Thus, the null hypothesis that the freestyle swimming means were the same at both points in time was rejected at the .05 level of significance. The effect size d (calculated as the mean difference divided by the standard 198 An Introduction to Statistical Concepts deviation of the difference) was 2.3146. Using Cohen’s (1988) guidelines, this is interpreted as a large effect. The results provide evidence to support the conclusion that the mean 50 meter freestyle swimming time prior to intensive training is different than the mean 50 meter free- style swimming time after intensive training. 7.7 Summary In�this�chapter,�we�considered�a�second�inferential�testing�situation,�testing�hypotheses� about� the� difference� between� two� means�� Several� inferential� tests� and� new� concepts� were�discussed��New�concepts�introduced�were�independent�versus�dependent�samples,� the� sampling� distribution� of� the� difference� between� two� means,� the� standard� error� of� the�difference�between�two�means,�and�parametric�versus�nonparametric�tests��We�then� moved�on�to�describe�the�following�three�inferential�tests�for�determining�the�difference� between� two� independent� means:� the� independent� t� test,� the� Welch� t′� test,� and� briefly� the� Mann–Whitney–Wilcoxon� test�� The� following� two� tests� for� determining� the� differ- ence� between� two� dependent� means� were� considered:� the� dependent� t� test� and� briefly� the�Wilcoxon�signed�ranks�test��In�addition,�examples�were�presented�for�each�of�the�t� tests,� and� recommendations� were� made� as� to� when� each� test� is� most� appropriate�� The� chapter�concluded�with�a�look�at�SPSS�and�G*Power�(for�post�hoc�power)�as�well�as�devel- oping�an�APA-style�write-up�of�results��At�this�point,�you�should�have�met�the�following� objectives:�(a)�be�able�to�understand�the�basic�concepts�underlying�the�inferential�tests� of�two�means,�(b)�be�able�to�select�the�appropriate�test,�and�(c)�be�able�to�determine�and� interpret�the�results�from�the�appropriate�test��In�the�next�chapter,�we�discuss�inferential� tests�involving�proportions��Other�inferential�tests�are�covered�in�subsequent�chapters� Problems Conceptual problems 7.1� We�test�the�following�hypothesis: � H0 1 2 0: µ µ− = � H1 1 2 0: µ µ− ≠ � The�level�of�significance�is��05�and�H0�is�rejected��Assuming�all�assumptions�are�met�and� H0�is�true,�the�probability�of�committing�a�Type�I�error�is�which�one�of�the�following? � a�� 0 � b�� 0�05 � c�� Between��05�and��95 � d�� 0�95 � e�� 1�00 199Inferences About the Difference Between Two Means 7.2� When�H0�is�true,�the�difference�between�two�independent�sample�means�is�a�func- tion�of�which�one�of�the�following? � a�� Degrees�of�freedom � b�� The�standard�error � c�� The�sampling�distribution � d�� Sampling�error 7.3� The�denominator�of�the�independent�t�test�is�known�as�the�standard�error�of�the� difference�between�two�means,�and�may�be�defined�as�which�one�of�the�following? � a�� The�difference�between�the�two�group�means � b�� The�amount�by�which�the�difference�between�the�two�group�means�differs�from� the�population�mean � c�� The� standard� deviation� of� the� sampling� distribution� of� the� difference� between� two�means � d�� All�of�the�above � e�� None�of�the�above 7.4� In�the�independent�t�test,�the�homoscedasticity�assumption�states�what? � a�� The�two�population�means�are�equal� � b�� The�two�population�variances�are�equal� � c�� The�two�sample�means�are�equal� � d�� The�two�sample�variances�are�equal� 7.5� Sampling�error�increases�with�larger�samples��True�or�false? 7.6� At� a� given� level� of� significance,� it� is� possible� that� the� significance� test� and� the� CI� results�will�differ�for�the�same�dataset��True�or�false? 7.7� I� assert� that� the� critical� value� of� t� required� for� statistical� significance� is� smaller� (in� absolute�value�or�ignoring�the�sign)�when�using�a�directional�rather�than�a�nondirec- tional�test��Am�I�correct? 7.8� If�a�95%�CI�from�an�independent�t�test�ranges�from�−�13�to�+1�67,�I�assert�that�the�null� hypothesis�would�not�be�rejected�at�the��05�level�of�significance��Am�I�correct? 7.9� A� group� of� 15� females� was� compared� to� a� group� of� 25� males� with� respect� to� intel- ligence��To�test�if�the�sample�sizes�are�significantly�different,�which�of�the�following� tests�would�you�use? � a�� Independent�t�test � b�� Dependent�t�test � c�� z�test � d�� None�of�the�above 7.10� The� mathematic� ability� of� 10� preschool� children� was� measured� when� they� entered� their�first�year�of�preschool�and�then�again�in�the�spring�of�their�kindergarten�year�� To�test�for�pre-�to�post-mean�differences,�which�of�the�following�tests�would�be�used? � a�� Independent�t�test � b�� Dependent�t�test � c�� z�test � d�� None�of�the�above 200 An Introduction to Statistical Concepts 7.11� A� researcher� collected� data� to� answer� the� following� research� question:� Are� there� mean�differences�in�science�test�scores�for�middle�school�students�who�participate�in� school-sponsored�athletics�as�compared�to�students�who�do�not�participate?�Which� of�the�following�tests�would�be�used�to�answer�this�question? � a�� Independent�t�test � b�� Dependent�t�test � c�� z�test � d�� None�of�the�above 7.12� The�number�of�degrees�of�freedom�for�an�independent�t�test�with�15�females�and� 25�males�is�40��True�or�false? 7.13� I�assert�that�the�critical�value�of�t,�for�a�test�of�two�dependent�means,�will�increase�as� the�samples�become�larger��Am�I�correct? 7.14� Which�of�the�following�is�NOT�an�assumption�of�the�independent�t�test? � a�� Normality � b�� Independence � c�� Equal�sample�sizes � d�� Homogeneity�of�variance 7.15� For� which� of� the� following� assumptions� of� the� independent� t� test� is� evidence� pro- vided�in�the�SPSS�output�by�default? � a�� Normality � b�� Independence � c�� Equal�sample�sizes � d�� Homogeneity�of�variance Computational problems 7.1� The�following�two�independent�samples�of�older�and�younger�adults�were�measured� on�an�attitude�toward�violence�test: Sample 1 (Older Adult) Data Sample 1 (Younger Adult) Data 42 36 47 45 50 57 35 46 37 58 43 52 52 44 47 43 60 41 51 56 54 49 44 51 55 50 40 49� 55� 56 40 46 41 � a�� Test�the�following�hypotheses�at�the��05�level�of�significance: � H0 1 2 0: µ µ− = � H1 1 2 0: µ µ− ≠ � b�� Construct�a�95%�CI� 201Inferences About the Difference Between Two Means 7.2� The�following�two�independent�samples�of�male�and�female�undergraduate�students� were�measured�on�an�English�literature�quiz: Sample 1 (Male) Data Sample 1 (Female) Data 5 7 8 9 9 11 10 11 11 13 15 18 13 15 19 20 � a�� Test�the�following�hypotheses�at�the��05�level�of�significance: � H0 1 2 0: µ µ− = � H1 1 2 0: µ µ− ≠ � b�� Construct�a�95%�CI� 7.3� The� following� two� independent� samples� of� preschool� children� (who� were� demo- graphically� similar� but� differed� in� Head� Start� participation)� were� measured� on� teacher-reported�social�skills�during�the�spring�of�kindergarten: Sample 1 (Head Start) Data Sample 1 (Non-Head Start) Data 18 14 12 15 12 9 16 10 17 10 18 12 20 16 19 11 8 11 15 13 22 13 10 14 � a�� Test�the�following�hypothesis�at�the��05�level�of�significance: � H0 1 2 0: µ µ− = � H1 1 2 0: µ µ− ≠ � b�� Construct�a�95%�CI� 202 An Introduction to Statistical Concepts 7.4� The�following�is�a�random�sample�of�paired�values�of�weight�measured�before�(time�1)� and�after�(time�2)�a�weight-reduction�program: Pair 1 2 1 127 130 2 126 124 3 129 135 4 123 127 5 124 127 6 129 128 7 132 136 8 125 130 9 135 131 10 126 128 � a�� Test�the�following�hypothesis�at�the��05�level�of�significance: � H0 1 2 0: µ µ− = � H1 1 2 0: µ µ− ≠ � b�� Construct�a�95%�CI� 7.5� Individuals�were�measured�on�the�number�of�words�spoken�during�the�1�minute�prior� to�exposure�to�a�confrontational�situation��During�the�1�minute�after�exposure,�the�indi- viduals�were�again�measured�on�the�number�of�words�spoken��The�data�are�as�follows: Person Pre Post 1 60 50 2 80 70 3 120 80 4 100 90 5 90 100 6 85 70 7 70 40 8 90 70 9 100 60 10 110 100 11 80 100 12 100 70 13 130 90 14 120 80 15 90 50 � a�� Test�the�following�hypotheses�at�the��05�level�of�significance: � H0 1 2 0: µ µ− = � H1 1 2 0: µ µ− ≠ � b�� Construct�a�95%�CI� 203Inferences About the Difference Between Two Means 7.6� The�following�is�a�random�sample�of�scores�on�an�attitude�toward�abortion�scale�for� husband�(sample�1)�and�wife�(sample�2)�pairs: Pair 1 2 1 1 3 2 2 3 3 4 6 4 4 5 5 5 7 6 7 8 7 7 9 8 8 10 � a�� Test�the�following�hypotheses�at�the��05�level�of�significance: � H0 1 2 0: µ µ− = � H1 1 2 0: µ µ− ≠ � b�� Construct�a�95%�CI� 7.7� For� two� dependent� samples,� test� the� following� hypothesis� at� the� �05� level� of� significance: � Sample�statistics:�n�=�121;�d – �=�10;�sd�=�45� � H0 1 2 0: µ µ− = � H1 1 2 0: µ µ− >
7.8� For� two� dependent� samples,� test� the� following� hypothesis� at� the� �05� level� of�
significance�
� Sample�statistics:�n�=�25;�d
–
�=�25;�sd�=�14�
� H0 1 2 0: µ µ− =
� H1 1 2 0: µ µ− >
Interpretive problems
7.1� Using� the� survey� 1� dataset� from� the� website,� use� SPSS� to� conduct� an� independent�
t� test,� where� gender� is� the� grouping� variable� and� the� dependent� variable� is� a� vari-
able�of�interest�to�you��Test�for�the�extent�to�which�the�assumptions�have�been�met��
Calculate� an� effect� size� as� well� as� post� hoc� power�� Then� write� an� APA-style� para-
graph�describing�the�results�
7.2� Using�the�survey�1�dataset�from�the�website,�use�SPSS�to�conduct�an�independent�t test,�
where� the� grouping� variable� is� whether� or� not� the� person� could� tell� the� difference�
between�Pepsi�and�Coke�and�the�dependent�variable�is�a�variable�of�interest�to�you��Test�
for�the�extent�to�which�the�assumptions�have�been�met��Calculate�an�effect�size�as�well�as�
post�hoc�power��Then�write�an�APA-style�paragraph�describing�the�results�

205
8
Inferences About Proportions
Chapter Outline
8�1� Inferences�About�Proportions�Involving�the�Normal�Distribution
8�1�1� Introduction
8�1�2� Inferences�About�a�Single�Proportion
8�1�3� Inferences�About�Two�Independent�Proportions
8�1�4� Inferences�About�Two�Dependent�Proportions
8�2� Inferences�About�Proportions�Involving�the�Chi-Square�Distribution
8�2�1� Introduction
8�2�2� Chi-Square�Goodness-of-Fit�Test
8�2�3� Chi-Square�Test�of�Association
8�3� SPSS
8�4� G*Power
8�5� Template�and�APA-Style�Write-Up
Key Concepts
� 1�� Proportion
� 2�� Sampling�distribution�and�standard�error�of�a�proportion
� 3�� Contingency�table
� 4�� Chi-square�distribution
� 5�� Observed�versus�expected�proportions
In� Chapters� 6� and� 7,� we� considered� testing� inferences� about� means,� first� for� a� single� mean�
(Chapter�6)�and�then�for�two�means�(Chapter�7)��The�major�concepts�discussed�in�those�two�
chapters� included� the� following:� types� of� hypotheses,� types� of� decision� errors,� level� of� sig-
nificance,�power,�confidence�intervals�(CIs),�effect�sizes,�sampling�distributions�involving�the�
mean,�standard�errors�involving�the�mean,�inferences�about�a�single�mean,�inferences�about�
the�difference�between�two�independent�means,�and�inferences�about�the�difference�between�
two� dependent� means�� In� this� chapter,� we� consider� inferential� tests� involving� proportions��
We�define�a�proportion�as�the�percentage�of�scores�falling�into�particular�categories��Thus,�
the�tests�described�in�this�chapter�deal�with�variables�that�are�categorical�in�nature�and�thus�
are� nominal� or� ordinal� variables� (see� Chapter� 1),� or� have� been� collapsed� from� higher-level�
variables�into�nominal�or�ordinal�variables�(e�g�,�high�and�low�scorers�on�an�achievement�test)�

206 An Introduction to Statistical Concepts
The�tests�that�we�cover�in�this�chapter�are�considered�nonparametric�procedures,�also�
sometimes�referred�to�as�distribution-free�procedures,�as�there�is�no�requirement�that�the�
data�adhere�to�a�particular�distribution�(e�g�,�normal�distribution)��Nonparametric�pro-
cedures�are�often�less�preferable�than�parametric�procedures�(e�g�,�t�tests�which�assume�
normality� of� the� distribution)� for� the� following� reasons:� (1)� parametric� procedures� are�
often� robust� to� assumption� violations;� in� other� words,� the� results� are� often� still� inter-
pretable�even�if�there�may�be�assumption�violations;�(2)�nonparametric�procedures�have�
lower�power�relative�to�sample�size;�in�other�words,�rejecting�the�null�hypothesis�if�it�is�
false�requires�a�larger�sample�size�with�nonparametric�procedures;�and�(3)�the�types�of�
research�questions�that�can�be�addressed�by�nonparametric�procedures�are�often�quite�
simple� (e�g�,� while� complex� interactions� of� many� different� variables� can� be� tested� with�
parametric� procedures� such� as� factorial� analysis� of� variance,� this� cannot� be� done� with�
nonparametric�procedures)��Nonparametric�procedures�can�still�be�valuable�to�use�given�
the� measurement� scale(s)� of� the� variable(s)� and� the� research� question;� however,� at� the�
same�time,�it�is�important�that�researchers�recognize�the�limitations�in�using�these�types�
of�procedures�
Research�questions�to�be�asked�of�proportions�include�the�following�examples:
� 1�� Is� the� quarter� in� my� hand� a� fair� or� biased� coin;� in� other� words,� over� repeated�
samples,�is�the�proportion�of�heads�equal�to��50�or�not?
� 2�� Is�there�a�difference�between�the�proportions�of�Republicans�and�Democrats�who�
support�the�local�school�bond�issue?
� 3�� Is�there�a�relationship�between�ethnicity�(e�g�,�African-American,�Caucasian)�and�
type�of�criminal�offense�(e�g�,�petty�theft,�rape,�murder);�in�other�words,�is�the�pro-
portion�of�one�ethnic�group�different�from�another�in�terms�of�the�types�of�crimes�
committed?
Several�inferential�tests�are�covered�in�this�chapter,�depending�on�(a)�whether�there�are�one�
or�two�samples,�(b)�whether�the�two�samples�are�selected�in�an�independent�or�dependent�
manner,�and�(c)�whether�there�are�one�or�more�categorical�variables��More�specifically,�the�
topics� described� include� the� following� inferential� tests:� testing� whether� a� single� propor-
tion�is�different�from�a�hypothesized�value,�testing�whether�two�independent�proportions�
are�different,�testing�whether�two�dependent�proportions�are�different,�and�the�chi-square�
goodness-of-fit� test� and� chi-square� test� of� association�� We� use� many� of� the� foundational�
concepts� previously� covered� in� Chapters� 6� and� 7�� New� concepts� to� be� discussed� include�
the�following:�proportion,�sampling�distribution�and�standard�error�of�a�proportion,�con-
tingency� table,� chi-square� distribution,� and� observed� versus� expected� frequencies�� Our�
objectives�are�that�by�the�end�of�this�chapter,�you�will�be�able�to�(a)�understand�the�basic�
concepts�underlying�tests�of�proportions,�(b)�select�the�appropriate�test,�and�(c)�determine�
and�interpret�the�results�from�the�appropriate�test�
8.1 Inferences About Proportions Involving Normal Distribution
We�have�been�following�Marie,�an�educational�research�graduate�student,�as�she�completes�
tasks�assigned�to�her�by�her�faculty�advisor�

207Inferences About Proportions
Marie’s�advisor�has�received�two�additional�calls�from�individuals�in�other�states�who�
are� interested� in� assistance� with� statistical� analysis�� Knowing� the� success� Marie� has�
had� with� the� previous� consultations,� Marie’s� advisor� requests� that� Marie� work� with�
Tami,�a�staff�member�in�the�Undergraduate�Services�Office�at�Ivy-Covered�University�
(ICU),�and�Matthew,�a�lobbyist�from�a�state�that�is�considering�legalizing�gambling�
In�conversation�with�Marie,�Tami�shares�that�she�recently�read�a�report�that�provided�
national�statistics�on�the�proportion�of�students�that�major�in�various�disciplines��Tami�
wants�to�know�if�there�are�similar�proportions�at�their�institution��Marie�suggests�the�
following� research� question:� Are the sample proportions of undergraduate student college
majors at Ivy Covered University in the same proportions of those nationally?�Marie�suggests�
a�chi-square�goodness-of-fit�test�as�the�test�of�inference��Her�task�is�then�to�assist�Tami�
in�generating�the�test�of�inference�to�answer�her�research�question�
Marie�then�speaks�with�Matthew,�a�lobbyist�who�is�lobbying�against�legalizing�gam-
bling�in�his�state��Matthew�wants�to�determine�if�there�is�a�relationship�between�level�
of�education�and�stance�on�a�proposed�gambling�amendment��Matthew�suspects�that�
the�proportions�supporting�gambling�vary�as�a�function�of�their�education�level��The�
following�research�question�is�suggested�by�Marie:�Is there an association between level of
education and stance on gambling?�Marie�suggests�a�chi-square�test�of�association�as�the�
test�of�inference��Her�task�is�then�to�assist�Matthew�in�generating�the�test�of�inference�
to�answer�his�research�question�
This�section�deals�with�concepts�and�procedures�for�testing�inferences�about�proportions�
that�involve�the�normal�distribution��Following�a�discussion�of�the�concepts�related�to�tests�
of�proportions,�inferential�tests�are�presented�for�situations�when�there�is�a�single�propor-
tion,�two�independent�proportions,�and�two�dependent�proportions�
8.1.1 Introduction
Let�us�examine�in�greater�detail�the�concepts�related�to�tests�of�proportions��First,�a�propor-
tion�represents�the�percentage�of�individuals�or�objects�that�fall�into�a�particular�category��
For� instance,� the� proportion� of� individuals� who� support� a� particular� political� candidate�
might�be�of�interest��Thus,�the�variable�here�is�a�dichotomous,�categorical,�nominal�variable,�
as�there�are�only�two�categories�represented,�support�or�do�not�support�the�candidate�
For�notational�purposes,�we�define�the�population proportion�π�(pi)�as
π =
f
N
where
f� is� the� number� of� frequencies� in� the� population� who� fall� into� the� category� of� interest�
(e�g�,�the�number�of�individuals�in�the�population�who�support�the�candidate)
N�is�the�total�number�of�individuals�in�the�population
For� example,� if� the� population� consists� of� 100� individuals� and� 58� support� the� candidate,�
then�π�=��58�(i�e�,�58/100)��If�the�proportion�is�multiplied�by�100%,�this�yields�the�percent-
age� of� individuals� in� the� population� who� support� the� candidate,� which� in� the� example�
would�be�58%��At�the�same�time,�1�−�π�represents�the�population�proportion�of�individuals�
who�do�not�support�the�candidate,�which�for�this�example�would�be�1�−��58�=��42��If�this�is�
multiplied�by�100%,�this�yields�the�percentage�of�individuals�in�the�population�who�do�not�
support�the�candidate,�which�in�the�example�would�be�42%�

208 An Introduction to Statistical Concepts
In�a�fashion,�the�population�proportion�is�conceptually�similar�to�the�population�mean�if�
the�category�of�interest�(support�of�candidate)�is�coded�as�1�and�the�other�category�(no�sup-
port)�is�coded�as�0��In�the�case�of�the�example�with�100�individuals,�there�are�58�individuals�
coded�1,�42�individuals�coded�0,�and�therefore,�the�mean�would�be��58��To�this�point�then,�we�
have�π�representing�the�population�proportion�of�individuals�supporting�the�candidate�and�
1�−�π�representing�the�population�proportion�of�individuals�not�supporting�the�candidate�
The�population variance of a proportion�can�also�be�determined�by�σ2�=�π(1�−�π),�and�
thus,� the� population� standard� deviation� of� a� proportion� is� σ π π= −( )1 �� These� provide�
us�with�measures�of�variability�that�represent�the�extent�to�which�the�individuals�in�the�
population� vary� in� their� support� of� the� candidate�� For� the� example� population� then,� the�
variance�is�computed�to�be�σ2�=�π(1�−�π)�=��58(1�−��58)�=��58(�42)�=��2436,�and�the�standard�
deviation�is�σ π π= − = − =( ) . ( . ) . (. )1 58 1 58 58 42 �=��4936�
For�the�population�parameters,�we�now�have�the�population�proportion�(or�mean),�the�pop-
ulation�variance,�and�the�population�standard�deviation��The�next�step�is�to�discuss�the�cor-
responding�sample�statistics�for�the�proportion��The�sample proportion�p�is�defined�as
p
f
n
=
where
f�is�the�number�of�frequencies�in�the�sample�that�fall�into�the�category�of�interest�(e�g�,�the�
number�of�individuals�who�support�the�candidate)
n�is�the�total�number�of�individuals�in�the�sample
The� sample� proportion� p� is� thus� a� sample� estimate� of� the� population� proportion� π�� One�
way�we�can�estimate�the�population�variance�is�by�the�sample�variance�s2�=�p(1�−�p),�and�the�
population�standard�deviation�of�a�proportion�can�be�estimated�by�the�sample�standard�
deviation�s p p= −( )1 �
The�next�concept�to�discuss�is�the�sampling�distribution�of�the�proportion��This�is�com-
parable� to� the� sampling� distribution� of� the� mean� discussed� in� Chapter� 5�� If� one� were� to�
take�many�samples,�and�for�each�sample,�compute�the�sample�proportion�p,�then�we�could�
generate�a�distribution�of�p��This�is�known�as�the�sampling distribution of the proportion��
For�example,�imagine�that�we�take�50�samples�of�size�100�and�determine�the�proportion�
for�each�sample��That�is,�we�would�have�50�different�sample�proportions�each�based�on�100�
observations��If�we�construct�a�frequency�distribution�of�these�50�proportions,�then�this�is�
actually�the�sampling�distribution�of�the�proportion�
In�theory,�the�sample�proportions�for�this�example�could�range�from��00�(p�=�0/100)�to�1�00�
(p�=�100/100)�given�that�there�are�100�observations�in�each�sample��One�could�also�examine�
the�variability�of�these�50�sample�proportions��That�is,�we�might�be�interested�in�the�extent�
to�which�the�sample�proportions�vary��We�might�have,�for�one�example,�most�of�the�sample�
proportions�falling�near�the�mean�proportion�of��60��This�would�indicate�for�the�candidate�
data�that�(a)�the�samples�generally�support�the�candidate,�as�the�average�proportion�is��60,�
and�(b)�the�support�for�the�candidate�is�fairly�consistent�across�samples,�as�the�sample�pro-
portions�tend�to�fall�close�to��60��Alternatively,�in�a�second�example,�we�might�find�the�sample�
proportions� varying� quite� a� bit� around� the� mean� of� �60,� say� ranging� from� �20� to� �80�� This�
would� indicate� that� (a)� the� samples� generally� support� the� candidate� again,� as� the� average�
proportion�is��60,�and�(b)�the�support�for�the�candidate�is�not�very�consistent�across�samples,�
leading�one�to�believe�that�some�groups�support�the�candidate�and�others�do�not�

209Inferences About Proportions
The�variability�of�the�sampling�distribution�of�the�proportion�can�be�determined�as�fol-
lows��The�population�variance�of�the�sampling�distribution�of�the�proportion�is�known�as�
the�variance error of the proportion,�denoted�by�σ p
2��The�variance�error�is�computed�as
σ
π π
p
n
2 1=
−( )
where
π�is�again�the�population�proportion
n�is�sample�size�(i�e�,�the�number�of�observations�in�a�single�sample)
The�population�standard�deviation�of�the�sampling�distribution�of�the�proportion�is�known�
as� the� standard error of the proportion,� denoted� by� σp�� The� standard� error� is� an� index�
of� how� variable� a� sample� statistic� (in� this� case,� the� sample� proportion)� is� when� multiple�
samples�of�the�same�size�are�drawn,�and�is�computed�as�follows:
σ
π π
p
n
=
−( )1
This�situation�is�quite�comparable�to�the�sampling�distribution�of�the�mean�discussed�in�
Chapter�5��There�we�had�the�variance�error�and�standard�error�of�the�mean�as�measures�of�
the�variability�of�the�sample�means�
Technically� speaking,� the� binomial� distribution� is� the� exact� sampling� distribution� for� the�
proportion;�binomial�here�refers�to�a�categorical�variable�with�two�possible�categories,�which�is�
certainly�the�situation�here��However,�except�for�rather�small�samples,�the�normal�distribution�
is�a�reasonable�approximation�to�the�binomial�distribution�and�is�therefore�typically�used��The�
reason�we�can�rely�on�the�normal�distribution�is�due�to�the�central�limit�theorem,�previously�
discussed�in�Chapter�5��For�proportions,�the�central�limit�theorem�states�that�as�sample�size�n�
increases,�the�sampling�distribution�of�the�proportion�from�a�random�sample�of�size�n�more�
closely�approximates�a�normal�distribution��If�the�population�distribution�is�normal�in�shape,�
then� the� sampling� distribution� of� the� proportion� is� also� normal� in� shape�� If� the� population�
distribution�is�not�normal�in�shape,�then�the�sampling�distribution�of�the�proportion�becomes�
more�nearly�normal�as�sample�size�increases��As�previously�shown�in�Figure�5�1�in�the�context�
of�the�mean,�the�bottom�line�is�that�if�the�population�is�nonnormal,�this�will�have�a�minimal�
effect�on�the�sampling�distribution�of�the�proportion�except�for�rather�small�samples�
Because� nearly� always� the� applied� researcher� only� has� access� to� a� single� sample,� the�
population� variance� error� and� standard� error� of� the� proportion� must� be� estimated�� The�
sample�variance�error�of�the�proportion�is�denoted�by�sp
2�and�computed�as
s
p p
n
p
2 1=
−( )
where
p�is�again�the�sample�proportion
n�is�sample�size
The�sample�standard�error�of�the�proportion�is�denoted�by�sp�and�computed�as
s
p p
n
p =
−( )1

210 An Introduction to Statistical Concepts
8.1.2 Inferences about a Single proportion
In�the�first�inferential�testing�situation�for�proportions,�the�researcher�would�like�to�
know� whether� the� population� proportion� is� equal� to� some� hypothesized� proportion�
or� not�� This� is� comparable� to� the� one-sample� t� test� described� in� Chapter� 6� where� a�
population�mean�was�compared�against�some�hypothesized�mean��First,�the�hypoth-
eses� to� be� evaluated� for� detecting� whether� a� population� proportion� differs� from� a�
hypothesized� proportion� are� as� follows�� The� null� hypothesis� H0� is� that� there� is� no�
difference�between�the�population�proportion�π�and�the�hypothesized�proportion�π0,�
which�we�denote�as
H0 0: π π=
Here�there�is�no�difference,�or�a�“null”�difference,�between�the�population�proportion�and�
the� hypothesized� proportion�� For� example,� if� we� are� seeking� to� determine� whether� the�
quarter� you� are� flipping� is� a� biased� coin� or� not,� then� a� reasonable� hypothesized� value�
would�be��50,�as�an�unbiased�coin�should�yield�“heads”�about�50%�of�the�time�
The� nondirectional,� scientific,� or� alternative� hypothesis� H1� is� that� there� is� a� difference�
between� the� population� proportion� π� and� the� hypothesized� proportion� π0,� which� we�
denote�as
H1 0: π π≠
The�null�hypothesis�H0�will�be�rejected�here�in�favor�of�the�alternative�hypothesis�H1�
if� the� population� proportion� is� different� from� the� hypothesized� proportion�� As� we�
have�not�specified�a�direction�on�H1,�we�are�willing�to�reject�H0�either�if�π�is�greater�
than�π0�or�if�π�is�less�than�π0��This�alternative�hypothesis�results�in�a�two-tailed�test��
Directional� alternative� hypotheses� can� also� be� tested� if� we� believe� either� that� π� is�
greater�than�π0�or�that�π�is�less�than�π0��In�either�case,�the�more�the�resulting�sample�
proportion�differs�from�the�hypothesized�proportion,�the�more�likely�we�are�to�reject�
the�null�hypothesis�
It�is�assumed�that�the�sample�is�randomly�drawn�from�the�population�(i�e�,�the�assump-
tion�of�independence)�and�that�the�normal�distribution�is�the�appropriate�sampling�distri-
bution��The�next�step�is�to�compute�the�test�statistic�z�as
z
p
s
p
n
p
=
−
=
−
−
π π
π π
0 0
0 01ˆ ( )
where�sp̂�is�estimated�based�on�the�hypothesized�proportion�π0�
The�test�statistic�z�is�then�compared�to�a�critical�value(s)�from�the�unit�normal�distribu-
tion��For�a�two-tailed�test,�the�critical�values�are�denoted�as�±α/2z�and�are�found�in�Table�
A�1��If�the�test�statistic�z�falls�into�either�critical�region,�then�we�reject�H0;�otherwise,�we�fail�
to� reject� H0�� For� a� one-tailed� test,� the� critical� value� is� denoted� as� +αz� for� the� alternative�
hypothesis�H1:�π�>�π0�(i�e�,�a�right-tailed�test)�and�as�−αz�for�the�alternative�hypothesis�

211Inferences About Proportions
H1:� π� <� π0� (i�e�,� a� left-tailed� test)�� If� the� test� statistic� z� falls� into� the� appropriate� critical� region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�H0� For� the� two-tailed� test,� a� (1� −� α)%� CI� can� also� be� examined�� The� CI� is� formed� as� follows: p z sp± α/ ( )2 ˆ where p�is�the�observed�sample�proportion ±α/2z�is�the�tabled�critical�value sp̂�is�the�sample�standard�error�of�the�proportion If�the�CI�contains�the�hypothesized�proportion�π0,�then�the�conclusion�is�to�fail�to�reject� H0;� otherwise,� we� reject� H0�� Simulation� research� has� shown� that� this� CI� procedure� works�fine�for�small�samples�when�the�sample�proportion�is�near��50;�that�is,�the�normal� distribution� is� a� reasonable� approximation� in� this� situation�� However,� as� the� sample� proportion�moves�closer�to�0�or�1,�larger�samples�are�required�for�the�normal�distribu- tion� to� be� reasonably� approximate�� Alternative� approaches� have� been� developed� that� appear�to�be�more�widely�applicable��The�interested�reader�is�referred�to�Ghosh�(1979)� and�Wilcox�(1996)� Several�points�should�be�noted�about�each�of�the�z�tests�for�proportions�developed�in� this�chapter��First,�the�interpretation�of�CIs�described�in�this�chapter�is�the�same�as�those� in�Chapter�7��Second,�Cohen’s�(1988)�measure�of�effect�size�for�proportion�tests�using�z�is� known�as�h��Unfortunately,�h�involves�the�use�of�arcsine�transformations�of�the�propor- tions,� which� is� beyond� the� scope� of� this� test�� In� addition,� standard� statistical� software,� such�as�SPSS,�does�not�provide�measures�of�effect�size�for�any�of�these�tests� Let�us�consider�an�example�to�illustrate�the�use�of�the�test�of�a�single�proportion��We�fol- low�the�basic�steps�for�hypothesis�testing�that�we�applied�in�previous�chapters��These�steps� include�the�following: � 1�� State�the�null�and�alternative�hypotheses� � 2�� Select�the�level�of�significance�(i�e�,�alpha,�α)� � 3�� Calculate�the�test�statistic�value� � 4�� Make�a�statistical�decision�(reject�or�fail�to�reject�H0)� Suppose�a�researcher�conducts�a�survey�in�a�city�that�is�voting�on�whether�or�not�to�have� an�elected�school�board��Based�on�informal�conversations�with�a�small�number�of�influ- ential�citizens,�the�researcher�is�led�to�hypothesize�that�50%�of�the�voters�are�in�favor�of� an�elected�school�board��Through�the�use�of�a�scientific�poll,�the�researcher�would�like�to� know�whether�the�population�proportion�is�different�from�this�hypothesized�value;�thus,� a� nondirectional,� two-tailed� alternative� hypothesis� is� utilized�� The� null� and� alternative� hypotheses�are�denoted�as�follows: H H 0 0 1 0 : : π π π π = ≠ 212 An Introduction to Statistical Concepts If�the�null�hypothesis�is�rejected,�this�would�indicate�that�scientific�polls�of�larger�samples� yield� different� results� and� are� important� in� this� situation�� If� the� null� hypothesis� is� not� rejected,�this�would�indicate�that�informal�conversations�with�a�small�sample�are�just�as� accurate�as�a�scientific�larger-sized�sample� A� random� sample� of� 100� voters� is� taken,� and� 60� indicate� their� support� of� an� elected� school�board�(i�e�,�p�=��60)��In�an�effort�to�minimize�the�Type�I�error�rate,�the�significance� level�is�set�at�α�=��01��The�test�statistic�z�is�computed�as z p n = − − = − − = = π π π 0 0 01 60 50 50 1 50 100 10 50 50 100 10 05( ) . . . ( . ) . . (. ) . . 000 2 0000= . Note� that� the� final� value� for� the� denominator� is� the� standard� error� of� the� proportion� (i�e�,�sp̂�=��0500),�which�we�will�need�for�computing�the�CI��From�Table�A�1,�we�determine� the�critical�values�to�be�±α/2�z�=�±�005�z�=�±2�58;�in�other�words,�the�z�value�that�corresponds� to�the�P(z)�value�closest�to��995�is�when�z�is�equal�to�2�58��As�the�test�statistic�(i�e�,�z�=�2�000)� does�not�exceed�the�critical�values�and�thus�fails�to�fall�into�a�critical�region,�our�decision� is�to�fail�to�reject�H0��Our�conclusion�then�is�that�the�accuracy�of�the�scientific�poll�is�not� any�different�from�the�hypothesized�value�of��50�as�determined�informally� The�99%�CI�for�the�example�would�be�computed�as�follows: p z sp± = ± = ± =α/ ( ) . . (. ) . . (. , . )2 60 2 58 0500 60 129 471 729ˆ Because� the� CI� contains� the� hypothesized� value� of� �50,� our� conclusion� is� to� fail� to� reject� H0�(the�same�result�found�when�we�conducted�the�statistical�test)��The�conclusion�derived� from�the�test�statistic�is�always�consistent�with�the�conclusion�derived�from�the�CI��We�can� interpret�the�CI�as�follows:�99%�of�similarly�constructed�CIs�will�contain�the�hypothesized� value�of��50� 8.1.3 Inferences about Two Independent proportions In� our� second� inferential� testing� situation� for� proportions,� the� researcher� would� like� to� know�whether�the�population�proportion�for�one�group�is�different�from�the�population� proportion�for�a�second�independent�group��This�is�comparable�to�the�independent�t�test� described�in�Chapter�7,�where�one�population�mean�was�compared�to�a�second�indepen- dent� population� mean�� Once� again,� we� have� two� independently� drawn� samples,� as� dis- cussed�in�Chapter�7� First,� the� hypotheses� to� be� evaluated� for� detecting� whether� two� independent� popula- tion�proportions�differ�are�as�follows��The�null�hypothesis�H0�is�that�there�is�no�difference� between�the�two�population�proportions�π1�and�π2,�which�we�denote�as H0 1 2 0: π π− = Here� there� is� no� difference,� or� a� “null”� difference,� between� the� two� population� propor- tions��For�example,�we�may�be�seeking�to�determine�whether�the�proportion�of�Democratic� senators�who�support�gun�control�is�equal�to�the�proportion�of�Republican�senators�who� support�gun�control� 213Inferences About Proportions The� nondirectional,� scientific,� or� alternative� hypothesis� H1� is� that� there� is� a� difference� between�the�population�proportions�π1�and�π2,�which�we�denote�as H1 1 2 0: π π− ≠ The�null�hypothesis�H0�will�be�rejected�here�in�favor�of�the�alternative�hypothesis�H1�if�the�pop- ulation�proportions�are�different��As�we�have�not�specified�a�direction�on�H1,�we�are�willing� to�reject�either�if�π1�is�greater�than�π2�or�if�π1�is�less�than�π2��This�alternative�hypothesis�results� in�a�two-tailed�test��Directional�alternative�hypotheses�can�also�be�tested�if�we�believe�either� that�π1�is�greater�than�π2�or�that�π1�is�less�than�π2��In�either�case,�the�more�the�resulting�sample� proportions�differ�from�one�another,�the�more�likely�we�are�to�reject�the�null�hypothesis� It� is� assumed� that� the� two� samples� are� independently� and� randomly� drawn� from� their� respective�populations�(i�e�,�the�assumption�of�independence)�and�that�the�normal�distribu- tion�is�the�appropriate�sampling�distribution��The�next�step�is�to�compute�the�test�statistic�z�as z p p s p p p p n n p p = − = − − +       − 1 2 1 2 1 2 1 2 1 1 1 ( ) where�n1�and�n2�are�the�sample�sizes�for�samples�1�and�2,�respectively,�and p f f n n = + + 1 2 1 2 where�f1�and�f2�are�the�number�of�observed�frequencies�for�samples�1�and�2,�respectively�� The�denominator�of�the�z�test�statistic�sp p1 2− �is�known�as�the�standard error of the differ- ence between two proportions�and�provides�an�index�of�how�variable�the�sample�statistic� (in�this�case,�the�sample�proportion)�is�when�multiple�samples�of�the�same�size�are�drawn�� This�test�statistic�is�conceptually�similar�to�the�test�statistic�for�the�independent�t�test� The�test�statistic�z�is�then�compared�to�a�critical�value(s)�from�the�unit�normal�distribu- tion��For�a�two-tailed�test,�the�critical�values�are�denoted�as�±α/2z�and�are�found�in�Table� A�1�� If� the� test� statistic� z� falls� into� either� critical� region,� then� we� reject� H0;� otherwise,� we� fail�to�reject�H0��For�a�one-tailed�test,�the�critical�value�is�denoted�as�+αz�for�the�alternative� hypothesis�H1:�π1�−�π2�>�0�(i�e�,�a�right-tailed�test)�and�as�−αz�for�the�alternative�hypothesis�
H1:�π1�−�π2�<�0�(i�e�,�a�left-tailed�test)��If�the�test�statistic�z�falls�into�the�appropriate�critical� region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�H0��It�should�be�noted�that�other�alter- natives�to�this�test�have�been�proposed�(e�g�,�Storer�&�Kim,�1990)� For�the�two-tailed�test,�a�(1�−�α)%�CI�can�also�be�examined��The�CI�is�formed�as�follows: ( ) ( )/p p z sp p1 2 2 1 2− ± −α If� the� CI� contains� 0,� then� the� conclusion� is� to� fail� to� reject� H0;� otherwise,� we� reject� H0�� Alternative�methods�are�described�by�Beal�(1987)�and�Coe�and�Tamhane�(1993)� Let�us�consider�an�example�to�illustrate�the�use�of�the�test�of�two�independent�propor- tions��Suppose�a�researcher�is�taste-testing�a�new�chocolate�candy�(“chocolate�yummies”)� and� wants� to� know� the� extent� to� which� individuals� would� likely� purchase� the� product�� 214 An Introduction to Statistical Concepts As�taste�in�candy�may�be�different�for�adults�versus�children,�a�study�is�conducted�where� independent� samples� of� adults� and� children� are� given� “chocolate� yummies”� to� eat� and� asked�whether�they�would�buy�them�or�not��The�researcher�would�like�to�know�whether� the� population� proportion� of� individuals� who� would� purchase� “chocolate� yummies”� is� different�for�adults�and�children��Thus,�a�nondirectional,�two-tailed�alternative�hypothesis� is�utilized��The�null�and�alternative�hypotheses�are�denoted�as�follows: H H 0 1 2 1 1 2 0 0 : : π π π π − = − ≠ If�the�null�hypothesis�is�rejected,�this�would�indicate�that�interest�in�purchasing�the�prod- uct�is�different�in�the�two�groups,�and�this�might�result�in�different�marketing�and�packag- ing�strategies�for�each�group��If�the�null�hypothesis�is�not�rejected,�then�this�would�indicate� the�product�is�equally�of�interest�to�both�adults�and�children,�and�different�marketing�and� packaging�strategies�are�not�necessary� A�random�sample�of�100�children�(sample�1)�and�a�random�sample�of�100�adults�(sam- ple� 2)� are� independently� selected�� Each� individual� consumes� the� product� and� indicates� whether�or�not�he�or�she�would�purchase�it��Sixty-eight�of�the�children�and�54�of�the�adults� state�they�would�purchase�“chocolate�yummies”�if�they�were�available��The�level�of�signifi- cance�is�set�at�α�=��05��The�test�statistic�z�is�computed�as�follows��We�know�that�n1�=�100,� n2�=�100,�f1�=�68,�f2�=�54,�p1�=��68,�and�p2�=��54��We�compute�p�to�be p f f n n = + + = + + = =1 2 1 2 68 54 100 100 122 200 6100. This�allows�us�to�compute�the�test�statistic�z�as z p p p p n n = − − +     = . − . − +    1 2 61 1 61 1 100 1 100 (1 ) 1 1 68 54 . ( . ) 1 2  = = = . (. )(. )(. ) . . 2.0290 14 61 39 02 14 0690 The�denominator�of�the�z�test�statistic,�sp p1 2− �=��0690,�is�the�standard�error�of�the�difference� between�two�proportions,�which�we�will�need�for�computing�the�CI� The�test�statistic�z�is�then�compared�to�the�critical�values�from�the�unit�normal�distribu- tion��As�this�is�a�two-tailed�test,�the�critical�values�are�denoted�as�±α/2z�and�are�found�in� Table�A�1�to�be�±α/2z�=�±�025z�=�±1�9600��In�other�words,�this�is�the�z�value�that�is�closest�to� a�P(z)�of��975��As�the�test�statistic�z�falls�into�the�upper�tail�critical�region,�we�reject�H0�and� conclude�that�the�adults�and�children�are�not�equally�interested�in�the�product� Finally,�we�can�compute�the�95%�CI�as�follows: ( ) ( ) (. . ) . (. ) (. ) (. ) (/p p z sp p1 2 2 1 2 68 54 1 96 0690 14 1352− ± = − ± = ± =−α .. , . )0048 2752 Because�the�CI�does�not�include�0,�we�would�again�reject�H0�and�conclude�that�the�adults� and�children�are�not�equally�interested�in�the�product��As�previously�stated,�the�conclusion� 215Inferences About Proportions derived�from�the�test�statistic�is�always�consistent�with�the�conclusion�derived�from�the�CI� at�the�same�level�of�significance��We�can�interpret�the�CI�as�follows:�for�95%�of�similarly� constructed�CIs,�the�true�population�proportion�difference�will�not�include�0� 8.1.4 Inferences about Two dependent proportions In�our�third�inferential�testing�situation�for�proportions,�the�researcher�would�like�to�know� whether�the�population�proportion�for�one�group�is�different�from�the�population�propor- tion�for�a�second�dependent�group��This�is�comparable�to�the�dependent�t�test�described�in� Chapter�7,�where�one�population�mean�was�compared�to�a�second�dependent�population� mean��Once�again,�we�have�two�dependently�drawn�samples,�as�discussed�in�Chapter�7��For� example,�we�may�have�a�pretest-posttest�situation�where�a�comparison�of�proportions�over� time�for�the�same�individuals�is�conducted��Alternatively,�we�may�have�pairs�of�matched� individuals�(e�g�,�spouses,�twins,�brother-sister)�for�which�a�comparison�of�proportions�is� of�interest� First,�the�hypotheses�to�be�evaluated�for�detecting�whether�two�dependent�population� proportions� differ� are� as� follows�� The� null� hypothesis� H0� is� that� there� is� no� difference� between�the�two�population�proportions�π1�and�π2,�which�we�denote�as H0 1 2 0: π π− = Here�there�is�no�difference�or�a�“null”�difference�between�the�two�population�proportions�� For� example,� a� political� analyst� may� be� interested� in� determining� whether� the� approval� rating�of�the�president�is�the�same�just�prior�to�and�immediately�following�his�annual�State� of� the� Union� address� (i�e�,� a� pretest–posttest�situation)�� As� a� second� example,� a� marriage� counselor�wants�to�know�whether�husbands�and�wives�equally�favor�a�particular�training� program�designed�to�enhance�their�relationship�(i�e�,�a�couple�situation)� The� nondirectional,� scientific,� or� alternative� hypothesis� H1� is� that� there� is� a� difference� between�the�population�proportions�π1�and�π2,�which�we�denote�as�follows: H1 1 2 0: π π− ≠ The�null�hypothesis�H0�will�be�rejected�here�in�favor�of�the�alternative�hypothesis�H1�if�the� population�proportions�are�different��As�we�have�not�specified�a�direction�on�H1,�we�are�will- ing�to�reject�either�if�π1�is�greater�than�π2�or�if�π1�is�less�than�π2��This�alternative�hypothesis� results�in�a�two-tailed�test��Directional�alternative�hypotheses�can�also�be�tested�if�we�believe� either�that�π1�is�greater�than�π2�or�that�π1�is�less�than�π2��The�more�the�resulting�sample�pro- portions�differ�from�one�another,�the�more�likely�we�are�to�reject�the�null�hypothesis� Before�we�examine�the�test�statistic,�let�us�consider�a�table�in�which�the�proportions�are� often�presented��As�shown�in�Table�8�1,�the�contingency table�lists�proportions�for�each�of� Table 8.1 Contingency�Table�for�Two�Samples Sample 1 Sample 2 “Unfavorable” “Favorable” Marginal Proportions “Favorable” a b p2 “Unfavorable” c d 1�−�p2 Marginal�proportions 1�−�p1 p1 216 An Introduction to Statistical Concepts the�different�possible�outcomes��The�columns�indicate�the�proportions�for�sample�1��The�left� column�contains�those�proportions�related�to�the�“unfavorable”�condition�(or�disagree�or� no,�depending�on�the�situation),�and�the�right�column,�those�proportions�related�to�the�“favor- able”�condition�(or�agree�or�yes,�depending�on�the�situation)��At�the�bottom�of�the�columns� are� the� marginal� proportions� shown� for� the� “unfavorable”� condition,� denoted� by� 1� −� p1,� and� for� the� “favorable”� condition,� denoted� by� p1�� The� rows� indicate� the� proportions� for� sample�2��The�top�row�contains�those�proportions�for�the�“favorable”� condition,�and�the� bottom�row�contains�those�proportions�for�the�“unfavorable”�condition��To�the�right�of�the� rows�are�the�marginal�proportions�shown�for�the�“favorable”�condition,�denoted�by�p2,�and� for�the�“unfavorable”�condition,�denoted�by�1�−�p2� Within�the�box�of�the�table�are�the�proportions�for�the�different�combinations�of�condi- tions� across� the� two� samples�� The� upper� left-hand� cell� is� the� proportion� of� observations� that�are�“unfavorable”�in�sample�1�and�“favorable”�in�sample�2�(i�e�,�dissimilar�across�sam- ples),� denoted�by�a��The�upper�right-hand�cell�is�the�proportion� of�observations� who�are� “favorable”�in�sample�1�and�“favorable”�in�sample�2�(i�e�,�similar�across�samples),�denoted� by�b��The�lower�left-hand�cell�is�the�proportion�of�observations�who�are�“unfavorable”�in� sample� 1� and� “unfavorable”� in� sample� 2� (i�e�,� similar� across� samples),� denoted� by� c�� The� lower�right-hand�cell�is�the�proportion�of�observations�who�are�“favorable”�in�sample�1�and� “unfavorable”�in�sample�2�(i�e�,�dissimilar�across�samples),�denoted�by�d� It�is�assumed�that�the�two�samples�are�randomly�drawn�from�their�respective�popula- tions�and�that�the�normal�distribution�is�the�appropriate�sampling�distribution��The�next� step�is�to�compute�the�test�statistic�z�as z p p s p p d a n p p = − = − +− 1 2 1 2 1 2 where�n�is�the�total�number�of�pairs��The�denominator�of�the�z�test�statistic�sp p1 2− �is�again� known�as�the�standard�error�of�the�difference�between�two�proportions�and�provides�an� index�of�how�variable�the�sample�statistic�(i�e�,�the�difference�between�two�sample�propor- tions)�is�when�multiple�samples�of�the�same�size�are�drawn��This�test�statistic�is�conceptu- ally�similar�to�the�test�statistic�for�the�dependent�t�test� The�test�statistic�z�is�then�compared�to�a�critical�value(s)�from�the�unit�normal�distribu- tion��For�a�two-tailed�test,�the�critical�values�are�denoted�as�±α/2z�and�are�found�in�Table� A�1�� If� the� test� statistic� z� falls� into� either� critical� region,� then� we� reject� H0;� otherwise,� we� fail�to�reject�H0��For�a�one-tailed�test,�the�critical�value�is�denoted�as�+αz�for�the�alternative� hypothesis�H1:�π1�−�π2�>�0�(i�e�,�right-tailed�test)�and�as�−αz�for�the�alternative�hypothesis�
H1:� π1� −� π2� <� 0� (i�e�,� left-tailed� test)�� If� the� test� statistic� z� falls� into� the� appropriate� critical� region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�H0��It�should�be�noted�that�other�alter- natives�to�this�test�have�been�proposed�(e�g�,�the�chi-square�test�as�described�in�the�follow- ing�section)��Unfortunately,�the�z�test�does�not�yield�an�acceptable�CI�procedure� Let�us�consider�an�example�to�illustrate�the�use�of�the�test�of�two�dependent�propor- tions��Suppose�a�medical�researcher�is�interested�in�whether�husbands�and�wives�agree� on� the� effectiveness� of� a� new� headache� medication� “No-Head�”� A� random� sample� of� 100�husband-wife�couples�were�selected�and�asked�to�try�“No-Head”�for�2�months��At� the�end�of�2�months,�each�individual�was�asked�whether�the�medication�was�effective� or�not�at�reducing�headache�pain��The�researcher�wants�to�know�whether�the�medica- tion�is�differentially�effective�for�husbands�and�wives��Thus,�a�nondirectional,�two-tailed� alternative�hypothesis�is�utilized� 217Inferences About Proportions The�resulting�proportions�are�presented�as�a�contingency�table�in�Table�8�2��The�level�of� significance�is�set�at�α�=��05��The�test�statistic�z�is�computed�as�follows: z p p s p p d a n p p = − = − + = − + = − = − − 1 2 1 2 1 2 40 65 15 40 100 25 0742 3 3 (. . ) . . . . . 6693 The�test�statistic�z�is�then�compared�to�the�critical�values�from�the�unit�normal�distribu- tion�� As� this� is� a� two-tailed� test,� the� critical� values� are� denoted� as� ±α/2z� and� are� found� in�Table�A�1�to�be�±α/2z�=�±�025z�=�±1�96��In�other�words,�this�is�the�z�value�that�is�closest� to�a�P(z)�of��975��As�the�test�statistic�z�falls�into�the�lower�tail�critical�region,�we�reject�H0� and�conclude�that�the�husbands�and�wives�do�not�believe�equally�in�the�effectiveness�of� “No-Head�” 8.2 Inferences About Proportions Involving Chi-Square Distribution This�section�deals�with�concepts�and�procedures�for�testing�inferences�about�proportions� that� involve� the� chi-square� distribution�� Following� a� discussion� of� the� chi-square� distri- bution� relevant� to� tests� of� proportions,� inferential� tests� are� presented� for� the� chi-square� goodness-of-fit�test�and�the�chi-square�test�of�association� 8.2.1 Introduction The�previous�tests�of�proportions�in�this�chapter�were�based�on�the�unit�normal�distri- bution,� whereas� the� tests� of� proportions� in� the� remainder� of� the� chapter� are� based� on� the�chi-square distribution��Thus,�we�need�to�become�familiar�with�this�new�distribu- tion��Like�the�normal�and�t�distributions,�the�chi-square�distribution�is�really�a�family�of� distributions�� Also,� like� the� t� distribution,� the� chi-square� distribution� family� members� depend�on�the�number�of�degrees�of�freedom�represented��As�we�shall�see,�the�degrees� of� freedom� for� the� chi-square� goodness-of-fit� test� are� calculated� as� the� number� of� cat- egories�(denoted�as�J)�minus�1��For�example,�the�chi-square�distribution�for�one�degree� of� freedom� (i�e�,� for� a� variable� which� has� two� categories)� is� denoted� by� χ1 2 � as� shown� in� Figure� 8�1�� This� particular� chi-square� distribution� is� especially� positively� skewed� and� leptokurtic�(sharp�peak)� Table 8.2 Contingency�Table�for�Headache�Example Husband Sample Wife Sample “Ineffective” “Effective” Marginal Proportions “Effective” a�=��40 b�=��25 p2�=��65 “Ineffective” c�=��20 d�=��15 1�−�p2�=��35 Marginal�proportions 1�−�p1�=��60 p1�=��40 218 An Introduction to Statistical Concepts The�figure�also�describes�graphically�the�distributions�for�χ5 2 �and� χ10 2 ��As�you�can�see� in�the�figure,�as�the�degrees�of�freedom�increase,�the�distribution�becomes�less�skewed� and� less� leptokurtic;� in� fact,� the� distribution� becomes� more� nearly� normal� in� shape� as� the� number� of� degrees� of� freedom� increase�� For� extremely� large� degrees� of� freedom,� the�chi-square�distribution�is�approximately�normal��In�general,�we�denote�a�particular� chi-square� distribution� with� ν� degrees� of� freedom� as� χ ν2�� The� mean� of� any� chi-square� distribution�is�ν,�the�mode�is�ν�−�2�when�ν�is�at�least�2,�and�the�variance�is�2ν��The�value�of� chi-square�can�range�from�0�to�positive�infinity��A�table�of�different�percentile�values�for� many�chi-square�distributions�is�given�in�Table�A�3��This�table�is�utilized�in�the�following� two�chi-square�tests� One�additional�point�that�should�be�noted�about�each�of�the�chi-square�tests�of�propor- tions�developed�in�this�chapter�is�that�there�are�no�CI�procedures�for�either�the�chi-square� goodness-of-fit�test�or�the�chi-square�test�of�association� 8.2.2 Chi-Square Goodness-of-Fit Test The�first�test�to�consider�is�the�chi-square goodness-of-fit test��This�test�is�used�to�determine� whether� the� observed� proportions� in� two� or� more� categories� of� a� categorical� variable� dif- fer�from�what�we�would�expect�a�priori��For�example,�a�researcher�is�interested�in�whether� the�current�undergraduate�student�body�at�ICU�is�majoring�in�disciplines�according�to�an�a� priori�or�expected�set�of�proportions��Based�on�research�at�the�national�level,�the�expected� proportions�of�undergraduate�college�majors�are�as�follows:��20�education,��40�arts�and�sci- ences,��10�communications,�and��30�business��In�a�random�sample�of�100�undergraduates�at� ICU,�the�observed�proportions�are�as�follows:��25�education,��50�arts�and�sciences,��10�com- munications,�and��15�business��Thus,�the�researcher�would�like�to�know�whether�the�sample� proportions�observed�at�ICU�fit�the�expected�national�proportions��In�essence,�the�chi-square� goodness-of-fit�test�is�used�to�test�proportions�for�a�single�categorical�variable�(i�e�,�nominal� or�ordinal�measurement�scale)� The� observed proportions� are� denoted� by� pj,� where� p� represents� a� sample� proportion� and�j�represents�a�particular�category�(e�g�,�education�majors),�where�j�=�1,�…,�J�categories�� The�expected proportions�are�denoted�by�πj,�where�π�represents�an�expected�proportion� FIGuRe 8.1 Several� members� of� the� family� of� the� chi- square�distribution� 0.3 0.2 0.1Re la tiv e fr eq ue nc y 0 0 5 10 Chi-square 15 20 25 1 5 10 219Inferences About Proportions and� j� represents� a� particular� category�� The� null� and� alternative� hypotheses� are� denoted� as�follows,�where�the�null�hypothesis�states�that�the�difference�between�the�observed�and� expected�proportions�is�0�for�all�categories: H p for all jj j0 0: ( )− =π H p for all jj j1 0: ( )− ≠π The�test�statistic�is�a�chi-square�and�is�computed�by χ π π 2 2 1 = − = ∑n pj j jj J ( ) where�n�is�the�size�of�the�sample��The�test�statistic�is�compared�to�a�critical�value�from�the� chi-square�table�(Table�A�3)�α νχ 2 ,�where�ν�=�J�−�1��The�degrees�of�freedom�are�1�less�than�the� total�number�of�categories�J,�because�the�proportions�must�total�to�1�00;�thus,�only�J�−�1�are� free�to�vary� If� the� test� statistic� is� larger� than� the� critical� value,� then� the� null� hypothesis� is� rejected� in� favor� of� the� alternative�� This� would� indicate� that� the� observed� and� expected� propor- tions�were�not�equal�for�all�categories��The�larger�the�differences�are�between�one�or�more� observed�and�expected�proportions,�the�larger�the�value�of�the�test�statistic,�and�the�more� likely�it�is�to�reject�the�null�hypothesis��Otherwise,�we�would�fail�to�reject�the�null�hypoth- esis,�indicating�that�the�observed�and�expected�proportions�were�approximately�equal�for� all�categories� If�the�null�hypothesis�is�rejected,�one�may�wish�to�determine�which�sample�proportions� are� different� from� their� respective� expected� proportions�� Here� we� recommend� you� con- duct�tests�of�a�single�proportion�as�described�in�the�preceding�section��If�you�would�like�to� control�the�experimentwise�Type�I�error�rate�across�a�set�of�such�tests,�then�the�Bonferroni� method�is�recommended�where�the�α�level�is�divided�up�among�the�number�of�tests�con- ducted��For�example,�with�an�overall�α�=��05�and�five�categories,�one�would�conduct�five� tests�of�a�single�proportion,�each�at�the��01�level�of�α� Another� way� to� determine� which� cells� are� statistically� different� in� observed� to� expected�proportions�is�to�examine�the�standardized�residuals�which�can�be�computed� as�follows: R O E E = − Standardized�residuals�that�are�greater�(in�absolute�value�terms)�than�1�96�(when�α�=��05)�or� 2�58�(when�α�=��01)�have�different�observed�to�expected�frequencies�and�are�contributing�to� the�statistically�significant�chi-square�statistic��The�sign�of�the�residual�provides�informa- tion�on�whether�the�observed�frequency�is�greater�than�the�expected�frequency�(i�e�,�posi- tive�value)�or�less�than�the�expected�frequency�(i�e�,�negative�value)� 220 An Introduction to Statistical Concepts Let� us� return� to� the� example� and� conduct� the� chi-square� goodness-of-fit� test�� The� test� statistic�is�computed�as�follows: � χ π π 2 2 1 = − = ∑n pj j jj J ( ) � = − + − + − + − 100 25 20 20 50 40 40 10 10 10 15 30 3 2 2 2 2(. . ) . (. . ) . (. . ) . (. . ) . 00 1 4       = ∑ j � = + + + = = = ∑100 0125 0250 0000 0750 100 1125 11 25 1 4 (. . . . ) (. ) . j The� test� statistic� is� compared� to� the� critical� value,� from� Table� A�3,� of� .05 3 2χ � =� 7�8147�� Because� the� test� statistic� is� larger� than� the� critical� value,� we� reject� the� null� hypothesis� and�conclude�that�the�sample�proportions�from�ICU�are�different�from�the�expected�pro- portions�at�the�national�level��Follow-up�tests�to�determine�which�cells�are�statistically� different�in�their�observed�to�expected�proportions�involve�examining�the�standardized� residuals��In�this�example,�the�standardized�residuals�are�computed�as�follows: R O E E R Education Arts and sciences = − = − = = − = 25 20 20 1 118 50 40 40 1 58 . . 11 10 10 10 0 15 30 30 2 739 R R Communication Bu ess = − = = − = −sin . The�standardized�residual�for�business�is�greater�(in�absolute�value�terms)�than�1�96�(as�α�=� �05)�and�thus�suggests�that�there�are�different�observed�to�expected�frequencies�for�students� majoring�in�business�at�ICU�compared�to�national�estimates,�and�that�this�category�is�the� one�which�is�contributing�most�to�the�statistically�significant�chi-square�statistic� 8.2.2.1 Effect Size An�effect�size�for�the�chi-square�goodness-of-fit�test�can�be�computed�by�hand�as�follows,� where�N�is�the�total�sample�size�and�J�is�the�number�of�categories�in�the�variable: Effect size N J = − χ2 1( ) This�effect�size�statistic�can�range�from�0�to�+1,�where�0�indicates�no�difference�between� the�sample�and�hypothesized�proportions�(and�thus�no�effect)��Positive�one�indicates�the� 221Inferences About Proportions maximum� difference� between� the� sample� and� hypothesized� proportions� (and� thus� a� large�effect)��Given�the�range�of�this�value�(0�to�+1�0)�and�similarity�to�a�correlation�coeffi- cient,�it�is�reasonable�to�apply�Cohen’s�interpretations�for�correlations�as�a�rule�of�thumb�� These�include�the�following:�small�effect�size�=��10;�medium�effect�size�=��30;�and�large� effect�size�=��50��For�the�previous�example,�the�effect�size�would�be�calculated�as�follows� and�would�be�interpreted�as�a�small�effect: Effectsize N J = − = − = = χ2 1 11 25 100 4 1 11 25 300 0375 ( ) . ( ) . . 8.2.2.2 Assumptions Two� assumptions� are� made� for� the� chi-square� goodness-of-fit� test:� (1)� observations� are� independent�(which�is�met�when�a�random�sample�of�the�population�is�selected),�and� (2)�expected�frequency�is�at�least�5�per�cell�(and�in�the�case�of�the�chi-square�goodness-of- fit�test,�this�translates�to�an�expected�frequency�of�at�least�5�per�category�as�there�is�only� one� variable� included� in� the� analysis)�� When� the� expected� frequency� is� less� than� 5,� that� particular� cell� (i�e�,� category)� has� undue� influence� on� the� chi-square� statistic�� In� other� words,�the�chi-square�goodness-of-fit�test�becomes�too�sensitive�when�the�expected�values� are�less�than�5� 8.2.3 Chi-Square Test of association The�second�test�to�consider�is�the�chi-square test of association��This�test�is�equivalent� to�the�chi-square�test�of�independence�and�the�chi-square�test�of�homogeneity,�which� are�not�further�discussed��The�chi-square�test�of�association�incorporates�both�of�these� tests�(e�g�,�Glass�&�Hopkins,�1996)��The�chi-square�test�of�association�is�used�to�deter- mine�whether�there�is�an�association�or�relationship�between�two�or�more�categorical� (i�e�,� nominal� or� ordinal)� variables�� Our� discussion� is,� for� the� most� part,� restricted� to� the� two-variable� situation� where� each� variable� has� two� or� more� categories�� The� chi- square�test�of�association�is�the�logical�extension�to�the�chi-square�goodness-of-fit�test,� which�is�concerned�with�one�categorical�variable��Unlike�the�chi-square�goodness-of- fit� test� where� the� expected� proportions� are� known� a� priori,� for� the� chi-square� test� of� association,� the� expected� proportions� are� not� known� a� priori� but� must� be� estimated� from�the�sample�data� For� example,� suppose� a� researcher� is� interested� in� whether� there� is� an� association� between� level� of� education� and� stance� on� a� proposed� amendment� to� legalize� gambling�� Thus,� one� categorical� variable� is� level� of� education� with� the� categories� being� as� follows:� (1)�less�than�a�high�school�education,�(2)�high�school�graduate,�(3)�undergraduate�degree,� and�(4)�graduate�school�degree��The�other�categorical� variable� is�stance�on�the�gambling� amendment�with�the�following�categories:�(1)�in�favor�of�the�gambling�bill�and�(2)�opposed� to�the�gambling� bill��The�null�hypothesis� is� that�there�is� no�association�between�level� of� education�and�stance�on�gambling,�whereas�the�alternative�hypothesis�is�that�there�is�some� association�between�level�of�education�and�stance�on�gambling��The�alternative�would�be� supported�if�individuals�at�one�level�of�education�felt�differently�about�the�bill�than�indi- viduals�at�another�level�of�education� The�data�are�shown�in�Table�8�3,�known�as�a�contingency table�(or�crosstab�table)��As� there�are�two�categorical�variables,�we�have�a�two-way�or�two-dimensional�contingency� 222 An Introduction to Statistical Concepts table��Each�combination�of�the�two�variables�is�known�as�a�cell��For�example,�the�cell� for�row�1,�favor�bill,�and�column�2,�high�school�graduate,�is�denoted�as�cell 12,�the�first� value� (i�e�,� 1)� referring� to� the� row� and� the� second� value� (i�e�,� 2)� to� the� column�� Thus,� the� first� subscript� indicates� the� particular� row� r,� and� the� second� subscript� indicates� the� particular� column� c�� The� row� subscript� ranges� from� r� =� 1,…,� R,� and� the� column� subscript�ranges�from�c�=�1,…,�C,�where�R�is�the�last�row�and�C�is�the�last�column��This� example�contains�a�total�of�eight�cells,�two�rows�times�four�columns,�denoted�by�R�×� C�=�2�×�4�=�8� Each� cell� in� the� table� contains� two� pieces� of� information,� the� number� (or� count� or� frequencies)�of�observations�in�that�cell�and�the�observed�proportion�in�that�cell��For� cell 12,� there� are� 13� observations� denoted� by� n12� =� 13� and� an� observed� proportion� of� �65�denoted�by�p12�=��65��The�observed�proportion�is�computed�by�taking�the�number� of�observations�in�the�cell�and�dividing�by�the�number�of�observations�in�the�column�� Thus,�for�cell 12,�13�of�the�20�high�school�graduates�favor�the�bill,�or�13/20�=��65��The�col- umn�information�is�given�at�the�bottom�of�each�column,�known�as�the�column margin- als��Here�we�are�given�the�number�of�observations�in�a�column,�denoted�by�n�c,�where� the�“�”�indicates�we�have�summed�across�rows�and�c�indicates�the�particular�column�� For�column�2�(reflecting�high�school�graduates),�there�are�20�observations�denoted�by� n�2�=�20� There�is�also�row�information�contained�at�the�end�of�each�row,�known�as�the�row mar- ginals�� Two� values� are� listed� in� the� row� marginals�� First,� the� number� of� observations� in� a�row�is�denoted�by�nr�,�where�r�indicates�the�particular�row�and�the�“�”�indicates�we�have� summed�across�the�columns��Second,�the�expected�proportion�for�a�specific�row�is�denoted� by�πr�,�where�again� r�indicates� the�particular�row�and�the�“�”�indicates� we�have�summed� across�the�columns��The�expected�proportion�for�a�particular�row�is�computed�by�taking� the�number�of�observations�in�that�row�nr��and�dividing�by�the�number�of�total�observa- tions�n��Note�that�the�total�number�of�observations�is�given�in�the�lower�right-hand�por- tion�of�the�figure�and�denoted�as�n��=�80��Thus,�for�the�first�row,�the�expected�proportion�is� computed�as�π1��=�n1�/n��=�44/80�=��55� The�null�and�alternative�hypotheses�can�be�written�as�follows: H p for all cellsrc r0 0: ( ).− =π H p for all cellsrc r1 0: ( ).− ≠π Table 8.3 Contingency�Table�for�Gambling�Example Level of Education Stance on Gambling Less than High School High School Undergraduate Graduate Row Marginals “Favor” n11�=�16 n12�=�13 n13�=�10 n14�=�5 n1��=�44 p11�=��80 p12�=��65 p13�=��50 p14�=��25 π1��=��55 “Opposed” n21�=�4 n22�=�7 n23�=�10 n24�=�15 n2��=�36 p21�=��20 p22�=��35 p23�=��50 p24�=��75 π2��=��45 Column�marginals n�1�=�20 n�2�=�20 n�3�=�20 n�4�=�20 n��=�80 223Inferences About Proportions The�test�statistic�is�a�chi-square�and�is�computed�by χ π π 2 1 2 1 = − == ∑∑ n pc c C rc r rr R . . . ( ) The�test�statistic�is�compared�to�a�critical�value�from�the�chi-square�table�(Table�A�3)� α χν 2 ,� where�ν�=�(R�−�1)(C�−�1)��That�is,�the�degrees�of�freedom�are�1�less�than�the�number�of�rows� times�1�less�than�the�number�of�columns� If�the�test�statistic�is�larger�than�the�critical�value,�then�the�null�hypothesis�is�rejected�in� favor�of�the�alternative��This�would�indicate�that�the�observed�and�expected�proportions� were�not�equal�across�cells�such�that�the�two�categorical�variables�have�some�association�� The�larger�the�differences�between�the�observed�and�expected�proportions,�the�larger�the� value�of�the�test�statistic�and�the�more�likely�it�is�to�reject�the�null�hypothesis��Otherwise,� we� would� fail� to� reject� the� null� hypothesis,� indicating� that� the� observed� and� expected� proportions� were� approximately� equal,� such� that� the� two� categorical� variables� have� no� association� If�the�null�hypothesis�is�rejected,�then�one�may�wish�to�determine�for�which�combina- tion�of�categories�the�sample�proportions�are�different�from�their�respective�expected� proportions��Here�we�recommend�you�construct�2�×�2�contingency�tables�as�subsets�of� the�larger�table�and�conduct�chi-square�tests�of�association��If�you�would�like�to�con- trol� the� experimentwise� Type� I� error� rate� across� the� set� of� tests,� then� the� Bonferroni� method� is� recommended� where� the� α� level� is� divided� up� among� the� number� of� tests� conducted�� For� example,� with� α� =� �05� and� five� 2� ×� 2� tables,� one� would� conduct� five� tests� each� at� the� �01� level� of� α�� As� with� the� chi-square� goodness-of-fit� test,� it� is� also� possible� to� examine� the� standardized� residuals� (which� can� be� requested� in� SPSS)� to� determine�the�cells�that�have�significantly�different�observed�to�expected�proportions�� Cells�where�the�standardized�residuals�are�greater�(in�absolute�value�terms)�than�1�96� (when�α�=��05)�or�2�58�(when�α�=��01)�are�significantly�different�in�observed�to�expected� frequencies� Finally,�it�should�be�noted�that�we�have�only�considered�two-way�contingency�tables�here�� Multiway�contingency�tables�can�also�be�constructed�and�the�chi-square�test�of�association� utilized�to�determine�whether�there�is�an�association�among�several�categorical�variables� Let�us�complete�the�analysis�of�the�example�data��The�test�statistic�is�computed�as � χ π π 2 1 2 1 = − == ∑∑ n pc c C rc r rr R . . . ( ) � = − + − + − + − 20 80 55 55 20 20 45 45 20 65 55 55 20 35 42 2 2(. . ) . (. . ) . (. . ) . (. . 55 45 2) . � + − + − + − + − 20 50 55 55 20 50 45 45 20 25 55 55 20 75 42 2 2(. . ) . (. . ) . (. . ) . (. . 55 45 2) . � = + + + + + + + =2 2727 2 7778 3636 4444 0909 1111 3 2727 4 0000 13 33. . . . . . . . . 332 224 An Introduction to Statistical Concepts The�test�statistic�is�compared�to�the�critical�value,�from�Table�A�3,�of�.05 3 2χ �=�7�8147��Because�the� test�statistic�is�larger�than�the�critical�value,�we�reject�the�null�hypothesis�and�conclude� that� there� is� an� association� between� level� of� education� and� stance� on� the� gambling� bill��In�other�words,�stance�on�gambling�is�not�the�same�for�all�levels�of�education��The� cells�with�the�largest�contribution�to�the�test�statistic�give�some�indication�as�to�where� the� observed� and� expected� proportions� are� most� different�� Here� the� first� and� fourth� columns�have�the�largest�contributions�to�the�test�statistic�and�have�the�greatest�differ- ences�between�the�observed�and�expected�proportions;�these�would�be�of�interest�in�a� 2�×�2�follow-up�test� 8.2.3.1 Effect Size Several�measures�of�effect�size,�such�as�correlation�coefficients�and�measures�of�association,� can� be� requested� in� SPSS� and� are� commonly� reported� effect� size� indices� for� results� from� chi-square� test� of� association�� Which� effect� size� value� is� selected� depends� in� part� on� the� measurement�scale�of�the�variable��For�example,�researchers�working�with�nominal�data�can� select�a�contingency�coefficient,�phi�(for�2�×�2�tables),�Cramer’s�V�(for�tables�larger�than�2�×�2),� lambda,�or�an�uncertainty�coefficient��Correlation�options�available�for�ordinal�data�include� gamma,�Somer’s�d,�Kendall’s�tau-b,�and�Kendall’s�tau-c��From�the�contingency�coefficient,�C,� we�can�compute�Cohen’s�w�as�follows: w C C = − 2 21 Cohen’s�recommended�subjective�standard�for�interpreting�w�(as�well�as�the�other�correla- tion�coefficients�presented)�is�as�follows:�small�effect�size,�w�=��10;�medium�effect�size,�w�=��30;� and�large�effect�size,�w�=��50��See�Cohen�(1988)�for�further�details� 8.2.3.2 Assumptions The� same� two� assumptions� that� apply� to� the� chi-square� goodness-of-fit� test� also� apply� to� the� chi-square� test� of� association,� as� follows:� (1)� observations� are� independent� (which� is� met� when� a� random� sample� of� the� population� is� selected),� and� (2)� expected� frequency� is�at�least�5�per�cell��When�the�expected�frequency�is�less�than�5,�that�particular�cell�has� undue�influence�on�the�chi-square�statistic��In�other�words,�the�chi-square�test�of�association� becomes�too�sensitive�when�the�expected�values�are�less�than�5� 8.3 SPSS Once� again,� we� consider� the� use� of� SPSS� for� the� example� datasets�� While� SPSS� does� not� have�any�of�the�z�procedures�described�in�the�first�part�of�this�chapter,�it�is�capable�of�con- ducting�both�of�the�chi-square�procedures�covered�in�the�second�part�of�this�chapter,�as� described�in�the�following� 225Inferences About Proportions Chi-Square Goodness-of-Fit Test Step 1: To� conduct� the� chi-square� goodness-of-fit� test,� you� need� one� variable� that� is� either�nominal�or�ordinal�in�scale�(e�g�,�major)��To�conduct�the�chi-square�goodness-of-fit� test,�go�to�“Analyze”�in�the�top�pulldown�menu,�then�select�“Nonparametric Tests,” followed� by�“Legacy Dialogs,”� and� then�“Chi-Square.”� Following� the� screenshot� (step�1)�as�follows�produces�the�“Chi-Square Goodness-of-Fit”�dialog�box� A B C D Chi-square goodness-of-fit: Step 1 Step 2:� Next,� from� the� main�“Chi-Square Goodness-of-Fit”� dialog� box,� click� the� variable�(e�g�,�major)�and�move�it�into�the�“Test Variable List”�box�by�clicking�on�the� arrow�button��In�the�lower�right-hand�portion�of�the�screen�is�a�section�labeled�“Expected Values.”� The� default� is� to� conduct� the� analysis� with� the� expected� values� equal� for� each� category� (you� will� see� that� the� radio� button� for� “All categories equal”� is� prese- lected)��Much�of�the�time,�you�will�want�to�use�different�expected�values��To�define�different� expected�values,�click�on�the�“Values”�radio�button��Enter�each�expected�value�in�the�box� below�“Values,”�in�the�same�order�as�the�categories�(e�g�,�first�enter�the�expected�value�for� category�1�and�then�the�expected�value�for�category�2),�and�then�click�“Add”�to�define�the� value�in�the�box��This�sets�up�an�expected�value�for�each�category��Repeat�this�process�for� every�category�of�your�variable� 226 An Introduction to Statistical Concepts Chi-square goodness-of-fit: Step 2a Enter the expected value for the category that corresponds to the first numeric value (e.g., 1). Click on “Add” to define the value expected in each group. Repeat this for each category. Chi-square goodness-of-fit: Step 2b The expected values will appear in rank order from the first category to the last category. 227Inferences About Proportions Then�click�on�“OK”�to�run�the�analysis��The�output�is�shown�in�Table�8�4� Table 8.4 SPSS�Results�for�Undergradute�Majors�Example Observed N reflects the observed frequencies from your sample. Expected N reflects the expected values that were input by the researcher. The residual is simply the difference between the observed and expected frequencies. “Asymp. sig.” is the observed p value for the chi-square goodness-of-fit test. It is interpreted as: there is about a 1% probability of the sample proportions occurring by chance if the null hypothesis is really true (i.e., if the population proportions are 20, 40, 10, and 30). df are the degrees of freedom. For the chi-square goodness-of-fit test, they are calculated as J – 1 (i.e., one less than the number of categories). College Major Chi-Square Test Frequencies Education 5.0 10.0 .0 –15.0 20.0 40.0 10.0 30.0 25 50 10 15 100 Arts and sciences Communications Business Total College Major Observed N Expected N Residual Chi-square df Asymp. sig. .010 a 0 cells (.0%) have expected frequencies less than 5. The minimum expected cell frequency is 10.0. 3 11.250a Test Statistics “Chi-square” is the test statistic value and is calculated as: = 100 j=1 j=1 J 4 (.25 – .20)2 .20 .40 .10 .30 +++ = 11.25 (.50 – .40)2 (pj – πj)2 (.10 – .10)2 (.15 – .30)2 Σ Σ2 = n πj Interpreting the output:� The� top� table� provides� the� frequencies� observed� in� the� sample� (“Observed N”)� and� the� expected� frequencies� given� the� values� defined� by� the� researcher�(“Expected N”)��The�“Residual”�is�simply�the�difference�between�the�two� Ns��The�chi-square�test�statistic�value�is�11�25,�and�the�associated�p�value�is��01��Since�p�is� less�than�α,�we�reject�the�null�hypothesis��Let�us�translate�this�back�to�the�purpose�of�our� null� hypothesis� statistical� test�� There� is� evidence� to� suggest� that� the� sample� proportions� observed�differ�from�the�proportions�of�college�majors�nationally��Follow-up�tests�to�deter- mine�which�cells�are�statistically�different�in�the�observed�to�expected�proportions�can�be� 228 An Introduction to Statistical Concepts conducted� by� examining� the� standardized� residuals�� In� this� example,� the� standardized� residuals�were�computed�previously�as�follows: R O E E R Education Arts and sciences = − = − = = − = 25 20 20 1 118 50 40 40 1 58 . . 11 10 10 10 0 15 30 30 2 739 R R Communication Bu ess = − = = − = −sin . The�standardized�residual�for�business�is�greater�(in�absolute�value�terms)�than�1�96�(given� α� =� �05)� and� thus� suggests� that� there� are� different� observed� to� expected� frequencies� for� students�majoring�in�business�at�ICU�compared�to�national�estimates��This�category�is�the� one�contributing�most�to�the�statistically�significant�chi-square�statistic� The�effect�size�can�be�calculated�as�follows: Effect size N J = − = − = = χ2 1 11 25 100 4 1 11 25 300 0375 ( ) . ( ) . . Chi-Square Test of Association Step 1:�To�conduct�a�chi-square�test�of�association,�you�need�two�categorical�variables�(nomi- nal�and/or�ordinal)�whose�frequencies�you�wish�to�associate�(e�g�,�education�level�and�gambling� stance)��To�compute�the�chi-square�test�of�association,�go�to�“Analyze”�in�the�top�pulldown,� then�select�“Descriptive Statistics,”�and�then�select�the�“Crosstabs”�procedure� A B C Chi-square test of association: Step 1 229Inferences About Proportions Step 2:�Select�the�dependent�variable�and�move�it�into�the�“Row(s)”�box�by�clicking�on� the�arrow�key�[e�g�,�here�we�used�stance�on�gambling�as�the�dependent�variable�(1�=�support;� 0�=�not�support)]��Then�select�the�independent�variable�and�move�it�into�the�“Column(s)”� box� [in� this� example,� level� of� education� is� the� independent� variable� (1� =� less� than� high� school;�2�=�high�school;�3�=�undergraduate;�4�=�graduate)]� Select the variable of interest from the list on the left and use the arrow to move to the boxes on the right. Clicking on “Cells” provides options for what statistics to display in the cells. See step 4. Clicking on “Format” allows the option of displaying the categories in ascending or descending order. See step 5. Chi-square test of association: Step 2 Clicking on “Statistics” will allow you to select various statistics to generate (including the chi-square test statistics value and various correlation coefficients). See step 3. The dependent variable should be displayed in the row(s) and the independent variable in the column(s). Step 3:�In�the�top�right�corner�of�the�“Crosstabs”�dialog�box�(see�screenshot�step�2),� click� on� the� button� labeled�“Statistics.”� From� here,� placing� a� checkmark� in� the� box� for� “Chi-square”� will� produce� the� chi-square� test� statistic� value� and� resulting� null� hypothesis� statistical� test� results� (including� degrees� of� freedom� and� p� value)�� Also� from� “Statistics,”�you�can�select�various�measures�of�association�that�can�serve�as�an�effect� size� (i�e�,� correlation� coefficient� values)�� Which� correlation� is� selected� should� depend� on� the� measurement� scales� of� your� variables�� We� are� working� with� two� nominal� variables;� thus,�for�purposes�of�this�illustration,�we�will�select�both�“Phi and Cramer’s V”�and� “Contingency coefficient”�just�to�illustrate�two�different�effect�size�indices�(although� it�is�standard�practice�to�use�and�report�only�one�effect�size)��We�will�use�the�contingency� coefficient�to�compute�Cohen’s�w��Click�on�“Continue”�to�return�to�the�main�“Crosstabs”� dialog�box� 230 An Introduction to Statistical Concepts Chi-square test of association: Step 3 Step 4:�In�the�top�right�corner�of�the�“Crosstabs”�dialog�box�(see�screenshot�step�2),� click�on�the�button�labeled�“Cells.”�From�the “Cells”�dialog�box,�options�are�available� for�selecting�counts�and�percentages��We�have�requested�“Observed”�and�“Expected”� counts,� “Column”� percentages,� and� “Standardized”� residuals�� We� will� review� the� expected� counts� to� determine� if� the� assumption� of� five� expected� frequencies� per� cell� is� met��We�will�use�the�standardized�residuals�post�hoc�if�the�results�of�the�test�are�statisti- cally�significant�to�determine�which�cell(s)�is�most�influencing�the�chi-square�value��Click� “Continue”�to�return�to�the�main�“Crosstabs”�dialog�box� Ch-square test of association: Step 4 231Inferences About Proportions Step 5:� In� the� top� right� corner� of� the� “Crosstabs”� dialog� box� (see� screenshot� step�2),�click�on�the�button�labeled�“Format.”�From�the�“Format”�dialog�box,�options� are� available� for� determining� which� order,� “Ascending”� or� “Descending,”� you� want� the� row� values� presented� in� the� contingency� table� (we� asked� for� descending� in� this� example,� such� that� row� 1� was� gambling� =� 1� and� row� 2� was� gambling� =� 0)�� Click� “Continue”� to� return� to� the� main�“Crosstabs”� dialog� box�� Then� click� on�“OK”� to� run�the�analysis� Chi-square test of association: Step 5 Interpreting the Output:�The�output�appears�in�Table�8�5,�where�the�top�box�(“Case Processing Summary”)�provides�information�on�the�sample�size�and�frequency�of�miss- ing� data� (if� any)�� The�“Crosstabulation”� table� is� next� and� provides� the� contingency� table� (i�e�,� counts,� percentages,� and� standardized� residuals)�� The�“Chi-Square Tests”� box� gives� the� results� of� the� procedure� (including� chi-square� test� statistic� value� labeled� “Pearson Chi-Square,”�degrees�of�freedom,�and�p�value�labeled�as�“Asymp. Sig.”)�� The� likelihood� ratio� chi-square� uses� a� different� mathematical� formula� than� the� Pearson� chi-square;�however�for�large�sample�sizes,�the�values�for�the�likelihood�ratio�chi-square� and�the�Pearson�chi-square�should�be�similar�(and�rarely�should�the�two�statistics�suggest� different�conclusions�in�terms�of�rejecting�or�failing�to�reject�the�null�hypothesis)��The�lin- ear�by�linear�association�statistic,�also�known�as�the�Mantel-Haenszel�chi-square,�is�based� on�the�Pearson�correlation�and�tests�whether�there�is�a�linear�association�between�the�two� variables�(and�thus�should�not�be�used�for�nominal�variables)� For�the�contingency�coefficient,�C,�of��378,�we�compute�Cohen’s�w�effect�size�as�follows:� w C C = − = − = − = = 2 2 2 21 378 1 378 143 1 143 167 408 . . . . . . Cohen’s�w�of��408�would�be�interpreted�as�a�moderate�to�large�effect��Cramer’s�V,�as�seen�in� the�output,�is��408�and�would�be�interpreted�similarly—a�moderate�to�large�effect� 8.4 G*Power A�priori�power�can�be�determined�using�specialized�software�(e�g�,�Power�and�Precision,� Ex-Sample,�G*Power)�or�power�tables�(e�g�,�Cohen,�1988),�as�previously�described��However,� since� SPSS� does� not� provide� power� information� for� the� results� of� the� chi-square� test� of� association�just�conducted,�let�us�use�G*Power�to�compute�the�post�hoc�power�of�our�test� 232 An Introduction to Statistical Concepts Table 8.5 SPSS�Results�for�Gambling�Example Review the standardized residuals to determine which cell(s) are contributing to the statistically significant chi-square value. Standardized residuals greater than an absolute value of 1.96 (critical value when alpha=.05) indicate that cell is contributing to the association between the variables. In this case, only one cell, graduate/do not support, has a standardized residual of 2.0 and thus is contributing to the relationship. When analyzing the percentages in the crosstab table, compare the categories of the dependent variable (rows) across the columns of the independent variable (columns). For example, of respondents with a high school diploma, 65% support gambling Zero cells have expected counts less than five, thus we have met this assumption of the chi-square test of association. Degrees of freedom are computed as: (Rows–1)(Columns – 1) = (2 – 1)(4 – 1) = 3 The probability is less than 1% (see “Asymp. sig.”) that we would see these proportions by random chance if the proportions were all equal (i.e., if the null hypothesis were really true). We have a 2 × 4 table thus Cramer’s V is appropriate. It is statistically significant, and using Cohen’s interpretations, reflects a moderate to large effect size. “Pearson Chi-square” is the test statistic value and is calculated as (see Section 8.2.3 for the full computation): 2 = n.c πr. R r=1 Σ C (prc–πr.) 2 c=1 Σ Observed and expected counts The contingency coefficient can be used to compute Cohen’s w, a measure of effect size as follows: w = = = .408 C 2 .3782 1–.37821–C 2 Stance on Gambling * Level of Education Crosstabulation Chi-Square Tests Symmetric Measures Cases Valid Missing Total N N NPercent Percent Percent 100.080.00100.080Stance on gambling* Level of education Level of Education Less Than High School High School Undergraduate Graduate Total Stance on gambling Support Do not support Count % Within level of education Std. residual Count Expected count % Within level of education Std. residual Count Expected count % Within level of education Pearson chi-square Likelihood ratio Linear-by-linear association N of valid cases 16 11.0 80.0% 1.5 4 9.0 20.0% –1.7 20 20.0 100.0% 20 –.7 35.0% 9.0 7 .6 65.0% 11.0 13 20.0 100.0% 20 2.0 75.0% 9.0 15 –1.8 25.0% 11.0 5 20.0 100.0% 45.0% 36.0 36 55.0% 44.0 44 80 80.0 100.0% 20 20.0 100.0% 10 11.0 50.0% –.3 10 9.0 50.0% .3 Value 13.333a 13.969 12.927 80 a0 cells (.0%) have expected co unt less than 5. The minimum expected count is 9.00. 3 3 1 .000 Value .408 .408 .378 80 Nominal by nominal N of valid cases .004 .004 .004 Approx. Sig. .003 .004 df Asymp. Sig. (2-Sided) Phi Cramer’s V Contingency coefficient Case Processing Summary Expected count 233Inferences About Proportions Post Hoc Power for the Chi-Square Test of Association Using G*Power The�first�thing�that�must�be�done�when�using�G*Power�for�computing�post�hoc�power�is� to� select� the� correct� test� family�� In� our� case,� we� conducted� a� chi-square� test� of� associa- tion;� therefore,� the� toggle� button� must� be� used� to� change� the� test� family� to� χ2�� Next,� we� need�to�select� the�appropriate� statistical� test��We� toggle�to�“Goodness-of-fit tests: Contingency tables.”�The�“Type of power analysis”�desired�then�needs�to�be� selected��To�compute�post�hoc�power,�we�need�to�select�“Post hoc: Compute achieved power—given α, sample size, and effect size.” The� “Input Parameters”� must� then� be� specified�� The� first� parameter� is� speci- fication� of� the� effect� size� w� (this� was� computed� by� hand� from� the� contingency� coef- ficient� and� w� =� �408)�� The� alpha� level� we� tested� at� was� �05,� the� sample� size� was� 80,� and�the�degrees�of�freedom�were�3��Once�the�parameters�are�specified,�simply�click�on� “Calculate”�to�generate�the�achieved�power�statistics� The�“Output Parameters”�provide�the�relevant�statistics�given�the�input�just�speci- fied��In�this�example,�we�were�interested�in�determining�post�hoc�power�given�a�two-tailed� test,�with�an�observed�effect�size�of��408,�an�alpha�level�of��05,�and�sample�size�of�80��Based� on�those�criteria,�the�post�hoc�power�was��88��In�other�words,�with�a�sample�size�of�80,�test- ing�at�an�alpha�level�of��05�and�observing�a�moderate�to�large�effect�of��408,�then�the�power� of�our�test�was��88—the�probability�of�rejecting�the�null�hypothesis�when�it�is�really�false� was�88%,�which�is�very�high�power��Keep�in�mind�that�conducting�power�analysis�a�priori� is�recommended�so�that�you�avoid�a�situation�where,�post�hoc,�you�find�that�the�sample� size�was�not�sufficient�to�reach�the�desired�level�of�power�(given�the�observed�effect�size� and�alpha�level)� Once the parameters are specified, click on “Calculate.”�e “Input Parameters” for computing post hoc power must be specified including: Chi-square test of association 1. Observed effect size w 2. Alpha level 3. Total sample size 4. Degrees of freedom 234 An Introduction to Statistical Concepts 8.5 Template and APA-Style Write-Up We�finish�the�chapter�by�presenting�templates�and�APA-style�write-ups�for�our�examples�� First�we�present�an�example�paragraph�detailing�the�results�of�the�chi-square�goodness-of- fit�test�and�then�follow�this�by�the�chi-square�test�of�association� Chi-Square Goodness-of-Fit Test Recall�that�our�graduate�research�assistant,�Marie,�was�working�with�Tami,�a�staff�member�in� the�Undergraduate�Services�Office�at�ICU,�to�assist�in�analyzing�the�proportions�of�students� enrolled� in� undergraduate� majors�� Her� task� was� to� assist� Tami� with� writing� her� research� question�(Are the sample proportions of undergraduate student college majors at ICU in the same proportions as those nationally?)�and�generating�the�statistical�test�of�inference�to�answer�her� question��Marie�suggested�a�chi-square�goodness-of-fit�test�as�the�test�of�inference��A�tem- plate� for� writing� a� research� question� for� a� chi-square� goodness-of-fit� test� is� presented� as� follows: Are the sample proportions of [units in categories] in the same pro- portions of those [identify the source to which the comparison is being made]? It�may�be�helpful�to�include�in�the�results�of�the�chi-square�goodness-of-fit�test�information� on�an�examination�of�the�extent�to�which�the�assumptions�were�met�(recall�there�are�two� assumptions:�independence�and�expected�frequency�of�at�least�5�per�cell)��This�assists�the� reader�in�understanding�that�you�were�thorough�in�data�screening�prior�to�conducting�the� test�of�inference� A chi-square goodness-of-fit test was conducted to determine if the sample proportions of undergraduate student college majors at ICU were in the same proportions of those reported nationally. The test was conducted using an alpha of .05. The null hypothesis was that the proportions would be as follows: .20 education, .40 arts and sciences, .10 communications, and .30 business. The assumption of an expected frequency of at least 5 per cell was met. The assumption of indepen- dence was met via random selection. As shown in Table 8.4, there was a statistically significant differ- ence between the proportion of undergraduate majors at ICU and those reported nationally (χ2 = 11.250, df = 3, p = .010). Thus, the null hypothesis that the proportions of undergraduate majors at ICU par- allel those expected at the national level was rejected at the .05 level of significance. The effect size (χ2/[N(J − 1)]) was .0375, and interpreted using Cohen’s guide (1988) as a very small effect. Follow-up tests were conducted by examining the standardized residu- als. The standardized residual for business was −2.739 and thus sug- gests that there are different observed to expected frequencies for students majoring in business at ICU compared to national estimates. 235Inferences About Proportions Therefore, business is the college major that is contributing most to the statistically significant chi-square statistic. Chi-Square Test of Association Marie,�our�graduate�research�assistant,�was�also�working�with�Matthew,�a�lobbyist�inter- ested�in�examining�the�association�between�education�level�and�stance�on�gambling��Marie� was�tasked�with�assisting�Matthew�in�writing�his�research�question�(Is there an association between level of education and stance on gambling?)� and� generating� the� test� of� inference� to� answer�his�question��Marie�suggested�a�chi-square�test�of�association�as�the�test�of�infer- ence��A�template�for�writing�a�research�question�for�a�chi-square�test�of�association�is�pre- sented�as�follows: Is there an association between [independent variable] and [dependent variable]? It� may� be� helpful� to� include� in� the� results� of� the� chi-square� test� of� association� informa- tion�on�the�extent�to�which�the�assumptions�were�met�(recall�there�are�two�assumptions:� independence� and� expected� frequency� of� at� least� 5� per� cell)�� This� assists� the� reader� in� understanding� that� you� were� thorough� in� data� screening� prior� to� conducting� the� test� of� inference�� It� is� also� desirable� to� include� a� measure� of� effect� size�� Given� the� contingency� coefficient,�C,�of��378,�we�computed�Cohen’s�w�effect�size�to�be��408,�which�would�be�inter- preted�as�a�moderate�to�large�effect� A chi-square test of association was conducted to determine if there was a relationship between level of education and stance on gambling. The test was conducted using an alpha of .01. It was hypothesized that there was an association between the two variables. The assump- tion of an expected frequency of at least 5 per cell was met. The assumption of independence was not met since the respondents were not randomly selected; thus, there is an increased probability of a Type I error. From Table 8.5, we can see from the row marginals that 55% of the individuals overall support gambling. However, lower levels of edu- cation have a much higher percentage of support, while the highest level of education has a much lower percentage of support. Thus, there appears to be an association or relationship between gambling stance and level of education. This is subsequently supported sta- tistically from the chi-square test (χ2 = 13.333, df = 3, p = .004). Thus, the null hypothesis that there is no association between stance on gambling and level of education was rejected at the .01 level of significance. Examination of the standardized residuals suggests that respondents who hold a graduate degree are signifi- cantly more likely not to support gambling (standardized residual = 2.0) as compared to all other respondents. The effect size, Cohen’s w, was computed to be .408, which is interpreted to be a moderate to large effect (Cohen, 1988). 236 An Introduction to Statistical Concepts 8.6 Summary In�this�chapter,�we�described�a�third�inferential�testing�situation:�testing�hypotheses�about� proportions��Several�inferential�tests�and�new�concepts�were�discussed��The�new�concepts� introduced� were� proportions,� sampling� distribution� and� standard� error� of� a� proportion,� contingency�table,�chi-square�distribution,�and�observed�versus�expected�frequencies��The� inferential� tests� described� involving� the� unit� normal� distribution� were� tests� of� a� single� proportion,� of� two� independent� proportions,� and� of� two� dependent� proportions�� These� tests�are�parallel�to�the�tests�of�one�or�two�means�previously�discussed�in�Chapters�6�and�7�� The�inferential�tests�described�involving�the�chi-square�distribution�consisted�of�the�chi- square� goodness-of-fit� test� and� the� chi-square� test� of� association�� In� addition,� examples� were�presented�for�each�of�these�tests��Box�8�1�summarizes�the�tests�reviewed�in�this�chap- ter�and�the�key�points�related�to�each�(including�the�distribution�involved�and�recommen- dations�for�when�to�use�the�test)� STOp aNd ThINk bOx 8.1 Characteristics�and�Recommendations�for�Inferences�About�Proportions Test Distribution When to Use Inferences�about�a� single�proportion� (akin�to�one-sample� mean�test) Unit�normal,�z •��To�determine�if�the�sample�proportion� differs�from�a�hypothesized�proportion •�One�variable,�nominal�or�ordinal�in�scale Inferences�about�two� independent� proportions�(akin�to� the�independent�t�test) Unit�normal,�z •��To�determine�if�the�population�proportion� for�one�group�differs�from�the�population� proportion�for�a�second�independent�group •��Two�variables,�both�nominal�and�ordinal� in scale Inferences�about�two� dependent� proportions�(akin�to� the�dependent�t�test) Unit�normal,�z •��To�determine�if�the�population�proportion� for�one�group�is�different�than�the� population�proportion�for�a�second� dependent�group •��Two�variables�of�the�same�measure,�both� nominal�and�ordinal�in�scale Chi-square�goodness- of-fit�test Chi-square •��To�determine�if�observed�proportions�differ� from�what�would�be�expected�a�priori •�One�variable,�nominal�or�ordinal�in�scale Chi-square�test�of� association Chi-square •��To�determine�association/relationship� between�two�variables�based�on�observed� proportions •��Two�variables,�both�nominal�and�ordinal� in scale At�this�point,�you�should�have�met�the�following�objectives:�(a)�be�able�to�understand�the� basic�concepts�underlying�tests�of�proportions,�(b)�be�able�to�select�the�appropriate�test,�and� (c)�be�able�to�determine�and�interpret�the�results�from�the�appropriate�test��In�Chapter�9,�we� discuss�inferential�tests�involving�variances� 237Inferences About Proportions Problems Conceptual problems 8.1� How�many�degrees�of�freedom�are�there�in�a�5�×�7�contingency�table�when�the�chi- square�test�of�association�is�used? � a�� 12 � b�� 24 � c�� 30 � d�� 35 8.2� The�more�that�two�independent�sample�proportions�differ,�all�else�being�equal,�the� smaller�the�z�test�statistic��True�or�false? 8.3� The�null�hypothesis�is�a�numerical�statement�about�an�unknown�parameter��True�or� false? 8.4� In�testing�the�null�hypothesis�that�the�proportion�is��50,�the�critical�value�of�z�increases� as�degrees�of�freedom�increase��True�or�false? 8.5� A�consultant�found�a�sample�proportion�of�individuals�favoring�the�legalization�of� drugs� to� be� −�50�� I� assert� that� a� test� of� whether� that� sample� proportion� is� different� from�0�would�be�rejected��Am�I�correct? 8.6� Suppose�we�wish�to�test�the�following�hypotheses�at�the��10�level�of�significance: H H 0 1 60 60 : . : . π π = >
A�sample�proportion�of��15�is�observed��I�assert�if�I�conduct�the�z�test�that�it�is�possible�
to�reject�the�null�hypothesis��Am�I�correct?
8.7� When� the� chi-square� test� statistic� for� a� test� of� association� is� less� than� the� cor-
responding� critical� value,� I� assert� that� I� should� reject� the� null� hypothesis�� Am� I�
correct?
8.8� Other�things�being�equal,�the�larger�the�sample�size,�the�smaller�the�value�of�sp��True�
or�false?
8.9� In� the� chi-square� test� of� association,� as� the� difference� between� the� observed� and�
expected�proportions�increases,
� a�� The�critical�value�for�chi-square�increases�
� b�� The�critical�value�for�chi-square�decreases�
� c�� The�likelihood�of�rejecting�the�null�hypothesis�decreases�
� d�� The�likelihood�of�rejecting�the�null�hypothesis�increases�
8.10� When� the� hypothesized� value� of� the� population� proportion� lies� outside� of� the� CI�
around�a�single�sample�proportion,�I�assert�that�the�researcher�should�reject�the�null�
hypothesis��Am�I�correct?

238 An Introduction to Statistical Concepts
8.11� Statisticians�at�a�theme�park�want�to�know�if�the�same�proportions�of�visitors�select�
the� Jungle� Safari� as� their� favorite� ride� as� compared� to� the� Mountain� Rollercoaster��
They�sample�150�visitors�and�collect�data�on�one�variable:�favorite�ride�(two�catego-
ries:�Jungle�Safari�and�Mountain�Rollercoaster)��Which�statistical�procedure�is�most�
appropriate�to�use�to�test�the�hypothesis?
� a�� Chi-square�goodness-of-fit�test
� b�� Chi-square�test�of�association
8.12� Sophie� is� a� reading� teacher�� She� is� researching� the� following� question:� is� there� a�
relationship�between�a�child’s�favorite�genre�of�book�and�their�socioeconomic�sta-
tus�category?�She�collects�data�from�35�children�on�two�variables:�(a)�favorite�genre�
of�book�(two�categories:�fiction,�nonfiction)�and�(b)�socioeconomic�status�category�
(three�categories:�low,�middle,�high)��Which�statistical�procedure�is�most�appropri-
ate�to�use�to�test�the�hypothesis?
� a�� Chi-square�goodness-of-fit�test
� b�� Chi-square�test�of�association
Computational problems
8.1� For�a�random�sample�of�40�widgets�produced�by�the�Acme�Widget�Company,�30�suc-
cesses�and�10�failures�are�observed��Test�the�following�hypotheses�at�the��05�level�of�
significance:
H
H
0
1
60
60
: .
: .
π
π
=
≠
8.2� The� following� data� are� calculated� for� two� independent� random� samples� of� male�
and� female� teenagers,� respectively,� on� whether� they� expect� to� attend� graduate�
school:�n1�=�48,�p1�=�18/48,�n2�=�52,�p2�=�33/52��Test�the�following�hypotheses�at�the�
�05�level�of�significance:
H
H
0 1 2
1 1 2
0
0
:
:
π π
π π
− =
− ≠
8.3� The�following�frequencies�of�successes�and�failures�are�obtained�for�two�dependent�
random�samples�measured�at�the�pretest�and�posttest�of�a�weight�training�program:
Pretest
Posttest Success Failure
Failure 18 30
Success 33 19
Test�the�following�hypotheses�at�the��05�level�of�significance:
H
H
0 1 2
1 1 2
0
0
:
:
π π
π π
− =
− ≠

239Inferences About Proportions
8.4� A� chi-square� goodness-of-fit� test� is� to� be� conducted� with� six� categories� of� profes-
sions�to�determine�whether�the�sample�proportions�of�those�supporting�the�current�
government�differ�from�a�priori�national�proportions��The�chi-square�test�statistic�is�
equal�to�16�00��Determine�the�result�of�this�test�by�looking�up�the�critical�value�and�
making�a�statistical�decision,�using�α�=��01�
8.5� A�chi-square�goodness-of-fit�test�is�to�be�conducted�to�determine�whether�the�sample�
proportions�of�families�in�Florida�who�select�various�schooling�options�(five�catego-
ries�including�home�school,�public�school,�public�charter�school,�private�school,�and�
other)� differ� from� the� proportions� reported� nationally�� The� chi-square� test� statistic�
is�equal�to�9�00��Determine�the�result�of�this�test�by�looking�up�the�critical�value�and�
making�a�statistical�decision,�using�α�=��05�
8.6� A� random� sample� of� 30� voters� was� classified� according� to� their� general� political�
beliefs� (liberal� vs�� conservative)� and� also� according� to� whether� they� voted� for� or�
against�the�incumbent�representative�in�their�town��The�results�were�placed�into�the�
following�contingency�table:
Liberal Conservative
Yes 10 5
No 5 10
Use� the� chi-square� test� of� association� to� determine� whether� political� belief� is� inde-
pendent�of�voting�behavior�at�the��05�level�of�significance�
8.7� A�random�sample�of�40�kindergarten�children�was�classified�according�to�whether�
they� attended� at� least� 1� year� of� preschool� prior� to� entering� kindergarten� and� also�
according�to�gender��The�results�were�placed�into�the�following�contingency�table:
Boy Girl
Preschool 12 10
No�preschool 8 10
Use�the�chi-square�test�of�association�to�determine�whether�enrollment�in�preschool�
is�independent�of�gender�at�the��10�level�of�significance�
Interpretive problem
There�are�numerous�ways�to�use�the�survey�1�dataset�from�the�website�as�there�are�several�
categorical�variables��Here�are�some�examples�for�the�tests�described�in�this�chapter:
� a�� Conduct�a�test�of�a�single�proportion:�Is�the�sample�proportion�of�females�equal�
to��50?
� b�� Conduct�a�test�of�two�independent�proportions:�Is�there�a�difference�between�the�
sample�proportion�of�females�who�are�right-handed�and�the�sample�proportion�of�
males�who�are�right-handed?
� c�� Conduct� a� test� of� two� dependent� proportions:� Is� there� a� difference� between� the�
sample�proportion�of�students’�mothers�who�are�right-handed�and�the�sample�pro-
portion�of�students’�fathers�who�are�right-handed?

240 An Introduction to Statistical Concepts
� d�� Conduct� a� chi-square� goodness-of-fit� test:� Do� the� sample� proportions� for� the�
political�view�categories�differ�from�their�expected�proportions�(very�liberal�=��10,�
liberal�=��15,�middle�of�the�road�=��50,�conservative�=��15,�very�conservative�=��10)?�
Determine�if�the�assumptions�of�the�test�are�met��Determine�and�interpret�the�cor-
responding�effect�size�
� e�� Conduct�a�chi-square�goodness-of-fit�test�to�determine�if�there�are�similar�propor-
tions� of� respondents� who� can� (vs�� cannot)� tell� the� difference� between� Pepsi� and�
Coke��Determine�if�the�assumptions�of�the�test�are�met��Determine�and�interpret�
the�corresponding�effect�size�
� f�� Conduct�a�chi-square�test�of�association:�Is�there�an�association�between�political�
view� and� gender?� Determine� if� the� assumptions� of� the� test� are� met�� Determine�
and�interpret�the�corresponding�effect�size�
� g�� Compute�a�chi-square�test�of�association�to�examine�the�relationship�between�if�a�
person�smokes�and�their�political�view��Determine�if�the�assumptions�of�the�test�
are�met��Determine�and�interpret�the�corresponding�effect�size�

241
9
Inferences About Variances
Chapter Outline
9�1� New�Concepts
9�2� Inferences�About�Single�Variance
9�3� Inferences�About�Two�Dependent�Variances
9�4� �Inferences�About�Two�or�More�Independent�Variances�(Homogeneity�of�
Variance�Tests)
9�4�1� Traditional�Tests
9�4�2� Brown–Forsythe�Procedure
9�4�3� O’Brien�Procedure
9�5� SPSS
9�6� Template�and�APA-Style�Write-Up
Key Concepts
� 1�� Sampling�distributions�of�the�variance
� 2�� The�F�distribution
� 3�� Homogeneity�of�variance�tests
In�Chapters�6�through�8,�we�looked�at�testing�inferences�about�means�(Chapters�6�and�7)�
and�about�proportions�(Chapter�8)��In�this�chapter,�we�examine�inferential�tests�involving�
variances�� Tests� of� variances� are� useful� in� two� applications,� (a)� as� an� inferential� test� by�
itself�and�(b)�as�a�test�of�the�homogeneity�of�variance�assumption�for�another�procedure�
(e�g�,�t�test,�analysis�of�variance�[ANOVA])��First,�a�researcher�may�want�to�perform�infer-
ential�tests�on�variances�for�their�own�sake,�in�the�same�fashion�that�we�described�for�the�
one-�and�two-sample�t�tests�on�means��For�example,�we�may�want�to�assess�whether�the�
variance�of�undergraduates�at�Ivy-Covered�University�(ICU)�on�an�intelligence�measure�is�
the�same�as�the�theoretically�derived�variance�of�225�(from�when�the�test�was�developed�
and�normed)��In�other�words,�is�the�variance�at�a�particular�university�greater�than�or�less�
than� 225?� As� another� example,� we� may� want� to� determine� whether� the� variances� on� an�
intelligence�measure�are�consistent�across�two�or�more�groups;�for�example,�is�the�variance�
of�the�intelligence�measure�at�ICU�different�from�that�at�Podunk�University?

242 An Introduction to Statistical Concepts
Second,�for�some�procedures�such�as�the�independent�t�test�(Chapter�7)�and�the�ANOVA�
(Chapter� 11),� it� is� assumed� that� the� variances� for� two� or� more� independent� samples� are�
equal� (known� as� the� homogeneity� of� variance� assumption)�� Thus,� we� may� want� to� use�
an� inferential� test� of� variances� to� assess� whether� this� assumption� has� been� violated� or�
not�� The� following� inferential� tests� of� variances� are� covered� in� this� chapter:� (a)� testing�
whether�a�single�variance�is�different�from�a�hypothesized�value,�(b)�testing�whether�two�
dependent�variances�are�different,�and�(c)�testing�whether�two�or�more�independent�vari-
ances� are� different�� We� utilize� many� of� the� foundational� concepts� previously� covered� in�
Chapters�6�through�8��New�concepts�to�be�discussed�include�the�following:�the�sampling�
distributions� of� the� variance,� the� F� distribution,� and� homogeneity� of� variance� tests�� Our�
objectives�are�that�by�the�end�of�this�chapter,�you�will�be�able�to�(a)�understand�the�basic�
concepts� underlying� tests� of� variances,� (b)� select� the� appropriate� test,� and� (c)� determine�
and�interpret�the�results�from�the�appropriate�test�
9.1 New Concepts
As� you� remember,� Marie� is� a� graduate� student� working� on� a� degree� in� educational�
research��She�has�been�building�her�statistical�skills�and�is�becoming�quite�adept�at�apply-
ing� her� skills� as� she� completes� tasks� assigned� to� her� by� her� faculty� advisor�� We� revisit�
Marie�again�
Another�call�has�been�fielded�by�Marie’s�advisor�for�assistance�with�statistical�analysis��
This�time,�it�is�Jessica,�an�elementary�teacher�within�the�community��Having�built�quite�
a�reputation�for�success�in�statistical�consultations,�Marie’s�advisor�requests�that�Marie�
work�with�Jessica�
Jessica�shares�with�Marie�that�she�is�conducting�a�teacher�research�project�related�to�
achievement�of�first-grade�students�at�her�school��Jessica�wants�to�determine�if�the�vari-
ances�of�the�achievement�scores�differ�when�children�begin�school�in�the�fall�as�com-
pared� to� when� they� end� school� in� the� spring�� Marie� suggests� the� following� research�
question:�Are�the�variances�of�achievement�scores�for�first-grade�children�the�same�in�
the�fall�as�compared�to�the�spring?�Marie�suggests�a�test�of�variance�as�the�test�of�infer-
ence��Her�task�is�then�to�assist�Jessica�in�generating�the�test�of�inference�to�answer�her�
research�question�
This�section�deals�with�concepts�for�testing�inferences�about�variances,�in�particular,�the�
sampling�distributions�underlying�such�tests��Subsequent�sections�deal�with�several�infer-
ential�tests�of�variances��Although�the�sampling�distribution�of�the�mean�is�a�normal�distri-
bution�(Chapters�6�and�7),�and�the�sampling�distribution�of�a�proportion�is�either�a�normal�
or�chi-square�distribution�(Chapter�8),�the�sampling distribution of a variance�is�either�a�
chi-square�distribution�for�a�single�variance,�a�t�distribution�for�two�dependent�variances,�
or� an� F� distribution� for� two� or� more� independent� variances�� Although� we� have� already�
discussed�the�t�distribution�in�Chapter�6�and�the�chi-square�distribution�in�Chapter�8,�we�
need�to�discuss�the�F�distribution�(named�in�honor�of�the�famous�statistician�Sir�Ronald�A��
Fisher)�in�some�detail�here�

243Inferences About Variances
Like�the�normal,�t,�and�chi-square�distributions,�the�F�distribution�is�really�a�family�of�
distributions��Also,�like�the�t�and�chi-square�distributions,�the�F�distribution�family�mem-
bers� depend� on� the� number� of� degrees� of� freedom� represented�� Unlike� any� previously�
discussed�distribution,�the�F�distribution�family�members�actually�depend�on�a�combina-
tion�of�two�different�degrees�of�freedom,�one�for�the�numerator�and�one�for�the�denomina-
tor��The�reason�is�that�the�F�distribution�is�a�ratio�of�two�chi-square�variables��To�be�more�
precise,�F�with�ν1�degrees�of�freedom�for�the�numerator�and�ν2�degrees�of�freedom�for�the�
denominator�is�actually�a�ratio�of�the�following�chi-square�variables:
�
Fν ν
ν
ν
χ ν
χ ν1 2
1
2
2
1
2
2
,
/
/
=
For�example,�the�F�distribution�for�1�degree�of�freedom�numerator�and�10�degrees�of�free-
dom�denominator�is�denoted�by�F1,10��The�F�distribution�is�generally�positively�skewed�
and�leptokurtic�in�shape�(like�the�chi-square�distribution)�and�has�a�mean�of�ν2/(ν2�−�2)�
when�ν2�>�2�(where�ν2�represents�the�denominator�degrees�of�freedom)��A�few�examples�
of�the�F�distribution�are�shown�in�Figure�9�1�for�the�following�pairs�of�degrees�of�freedom�
(i�e�,�numerator,�denominator):�F10,10;�F20,20;�and�F40,40�
Critical� values� for� several� levels� of� α� of� the� F� distribution� at� various� combinations� of�
degrees�of�freedom�are�given�in�Table�A�4��The�numerator�degrees�of�freedom�are�given�
in�the�columns�of�the�table�(ν1),�and�the�denominator�degrees�of�freedom�are�shown�in�the�
rows�of�the�table�(ν2)��Only�the�upper-tail�critical�values�are�given�in�the�table�(e�g�,�percen-
tiles�of��90,��95,��99�for�α�=��10,��05,��01,�respectively)��The�reason�is�that�most�inferential�tests�
involving�the�F�distribution�are�one-tailed�tests�using�the�upper-tail�critical�region��Thus,�
to�find�the�upper-tail�critical�value�for��05F1,10,�we�look�on�the�second�page�of�the�table�(α�=��05),�
in�the�first�column�of�values�on�that�page�for�ν1�=�1,�and�where�it�intersects�with�the�10th�
row�of�values�for�ν2�=�10��There�you�should�find��05F1,10�=�4�96�
1.5
1.2
0.9
0.6
Re
la
tiv
e
fr
eq
ue
nc
y
0.3
0
0 1 2 3
F
4 5
10,10
20,20
40,40
FIGuRe 9.1
Several� members� of� the� family� of�
F distributions�

244 An Introduction to Statistical Concepts
9.2 Inferences About Single Variance
In�our�initial�inferential�testing�situation�for�variances,�the�researcher�would�like�to�know�
whether�the�population�variance�is�equal�to�some�hypothesized�variance�or�not��First,�the�
hypotheses� to� be� evaluated� for� detecting� whether� a� population� variance� differs� from� a�
hypothesized�variance�are�as�follows��The�null�hypothesis�H0�is�that�there�is�no�difference�
between�the�population�variance�σ2�and�the�hypothesized�variance�σ02,�which�we�denote�as
� H0
2
0
2: σ σ=
Here�there�is�no�difference�or�a�“null”�difference�between�the�population�variance�and�the�
hypothesized�variance��For�example,�if�we�are�seeking�to�determine�whether�the�variance�
on� an� intelligence� measure� at� ICU� is� different� from� the� overall� adult� population,� then� a�
reasonable�hypothesized�value�would�be�225,�as�this�is�the�theoretically�derived�variance�
for�the�adult�population�
The� nondirectional,� scientific,� or� alternative� hypothesis� H1� is� that� there� is� a� differ-
ence�between�the�population�variance�σ2�and�the�hypothesized�variance� σ02 ,�which�we�
denote�as
� H1
2
0
2: σ σ≠
The�null�hypothesis�H0�will�be�rejected�here�in�favor�of�the�alternative�hypothesis�H1�if�
the� population� variance� is� different� from� the� hypothesized� variance�� As� we� have� not�
specified�a�direction�on�H1,�we�are�willing�to�reject�either�if�σ2�is�greater�than� σ0
2 �or�if�σ2�
is�less�than�σ0
2��This�alternative�hypothesis�results�in�a�two-tailed�test��Directional�alter-
native�hypotheses�can�also�be�tested�if�we�believe�either�that�σ2�is�greater�than� σ02 �or�that�
σ2�is�less�than�σ02��In�either�case,�the�more�the�resulting�sample�variance�differs�from�the�
hypothesized�variance,�the�more�likely�we�are�to�reject�the�null�hypothesis�
It�is�assumed�that�the�sample�is�randomly�drawn�from�the�population�(i�e�,�the�assump-
tion�of�independence)�and�that�the�population�of�scores�is�normally�distributed��Because�
we�are�testing�a�variance,�a�condition�of�the�test�is�that�the�variable�must�be�interval�or�ratio�
in�scale�
The�next�step�is�to�compute�the�test�statistic�χ2�as
�
χ
ν
σ
2
2
0
2=
s
where
s2�is�the�sample�variance
ν�=�n�−�1
The�test�statistic�χ2�is�then�compared�to�a�critical�value(s)�from�the�chi-square�distribu-
tion�� For� a� two-tailed� test,� the� critical� values� are� denoted� as� α νχ/2
2 � and� 1 2
2
− α νχ/ � and� are�
found�in�Table�A�3�(recall�that�unlike�z�and�t�critical�values,�two�unique�χ2�critical�values�
must�be�found�from�the�table�as�the�χ2�distribution�is�not�symmetric�like�z�or�t)��If�the�test�
statistic�χ2�falls�into�either�critical�region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�H0��
For�a�one-tailed�test,�the�critical�value�is�denoted�as� α νχ
2 �for�the�alternative�hypothesis�

245Inferences About Variances
H1:� σ2� <� σ0 2 � and� as� 1 2 −α νχ � for� the� alternative� hypothesis� H1:�σ2�>� σ0
2��If�the�test�statistic�
χ2�falls�into�the�appropriate�critical�region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�
H0��It�has�been�noted�by�statisticians�such�as�Wilcox�(1996)�that�the�chi-square�distribu-
tion�does�not�perform�adequately�when�sampling�from�a�nonnormal�distribution,�as�the�
actual�Type�I�error�rate�can�differ�greatly�from�the�nominal�α�level�(the�level�set�by�the�
researcher)�� However,� Wilcox� stated� “it� appears� that� a� completely� satisfactory� solution�
does�not�yet�exist,�although�many�attempts�have�been�made�to�find�one”�(p��85)�
For�the�two-tailed�test,�a�(1�−�α)%�confidence�interval�(CI)�can�also�be�examined�and�is�
formed�as�follows��The�lower�limit�of�the�CI�is
�
ν
χα ν
s2
1 2
2
− /
whereas�the�upper�limit�of�the�CI�is
�
ν
χα ν
s2
2
2
/
If� the� CI� contains� the� hypothesized� value� σ0
2 ,� then� the� conclusion� is� to� fail� to� reject� H0;�
otherwise,�we�reject�H0�
Now�consider�an�example�to�illustrate�use�of�the�test�of�a�single�variance��We�follow�the�
basic�steps�for�hypothesis�testing�that�we�applied�in�previous�chapters��These�steps�include�
the�following:
� 1�� State�the�null�and�alternative�hypotheses�
� 2�� Select�the�level�of�significance�(i�e�,�alpha,�α)�
� 3�� Calculate�the�test�statistic�value�
� 4�� Make�a�statistical�decision�(reject�or�fail�to�reject�H0)�
A�researcher�at�the�esteemed�ICU�is�interested�in�determining�whether�the�population�
variance�in�intelligence�at�the�university�is�different�from�the�norm-developed�hypoth-
esized�variance�of�225��Thus,�a�nondirectional,�two-tailed�alternative�hypothesis�is�uti-
lized�� If� the� null� hypothesis� is� rejected,� this� would� indicate� that� the� intelligence� level�
at� ICU� is� more� or� less� diverse� or� variable� than� the� norm�� If� the� null� hypothesis� is� not�
rejected,� this� would� indicate� that� the� intelligence� level� at� ICU� is� as� equally� diverse� or�
variable�as�the�norm�
The�researcher�takes�a�random�sample�of�101�undergraduates�from�throughout�the�uni-
versity�and�computes�a�sample�variance�of�149��The�test�statistic�χ2�is�computed�as�follows:
�
χ
ν
σ
2
2
0
2
100 149
225
66 2222= = =
s ( )
.
From� the� Table� A�3,� and� using� an� α� level� of� �05,� we� determine� the� critical� values� to� be�
. .025 100
2 74 2219χ = �and� . .975 100
2 129 561χ = ��As�the�test�statistic�does�exceed�one�of�the�critical�
values�by�falling�into�the�lower-tail�critical�region�(i�e�,�66�2222�<�74�2219),�our�decision�is�to� 246 An Introduction to Statistical Concepts reject�H0��Our�conclusion�then�is�that�the�variance�of�the�undergraduates�at�ICU�is�different� from�the�hypothesized�value�of�225� The�95%�CI�for�the�example�is�computed�as�follows��The�lower�limit�of�the�CI�is � ν χα ν s2 1 2 2 100 149 129 561 115 0037 − = = / ( ) . . and�the�upper�limit�of�the�CI�is � ν χα ν s2 2 2 100 149 74 2219 200 7494 / ( ) . .= = As�the�limits�of�the�CI�(i�e�,�115�0037,�200�7494)�do�not�contain�the�hypothesized�variance�of� 225,�the�conclusion�is�to�reject�H0��As�always,�the�CI�procedure�leads�us�to�the�same�conclu- sion�as�the�hypothesis�testing�procedure�for�the�same�α�level� 9.3 Inferences About Two Dependent Variances In�our�second�inferential�testing�situation�for�variances,�the�researcher�would�like�to�know� whether�the�population�variance�for�one�group�is�different�from�the�population�variance� for� a� second� dependent� group�� This� is� comparable� to� the� dependent� t� test� described� in� Chapter�7�where�one�population�mean�was�compared�to�a�second�dependent�population� mean��Once�again,�we�have�two�dependently�drawn�samples� First,�the�hypotheses�to�be�evaluated�for�detecting�whether�two�dependent�population� variances�differ�are�as�follows��The�null�hypothesis�H0�is�that�there�is�no�difference�between� the�two�population�variances�σ1 2�and�σ2 2,�which�we�denote�as � H0 1 2 2 2 0: σ σ− = Here�there�is�no�difference�or�a�“null”�difference�between�the�two�population�variances�� For�example,�we�may�be�seeking�to�determine�whether�the�variance�of�income�of�husbands� is�equal�to�the�variance�of�their�wives’�incomes��Thus,�the�husband�and�wife�samples�are� drawn�as�couples�in�pairs�or�dependently,�rather�than�individually�or�independently� The� nondirectional,� scientific,� or� alternative� hypothesis� H1� is� that� there� is� a� difference� between�the�population�variances�σ1 2�and�σ2 2,�which�we�denote�as � H1 1 2 2 2 0: σ σ− ≠ The�null�hypothesis�H0�is�rejected�here�in�favor�of�the�alternative�hypothesis�H1�if�the�popu- lation�variances�are�different��As�we�have�not�specified�a�direction�on�H1,�we�are�willing� to�reject�either�if� σ1 2�is�greater�than�σ2 2 �or�if�σ1 2�is�less�than�σ2 2 ��This�alternative�hypothesis� results� in� a� two-tailed� test�� Directional� alternative� hypotheses� can� also� be� tested� if� we� believe�either�that�σ1 2�is�greater�than�σ2 2�or�that�σ1 2�is�less�than� σ2 2 ��In�either�case,�the�more� the�resulting�sample�variances�differ�from�one�another,�the�more�likely�we�are�to�reject�the� null�hypothesis� 247Inferences About Variances It� is� assumed� that� the� two� samples� are� dependently� and� randomly� drawn� from� their� respective�populations,�that�both�populations�are�normal�in�shape,�and�that�the�t�distribu- tion�is�the�appropriate�sampling�distribution��Since�we�are�testing�variances,�a�condition�of� the�test�is�that�the�variable�must�be�interval�or�ratio�in�scale� The�next�step�is�to�compute�the�test�statistic�t�as�follows: � t s s s s r = − − 1 2 2 2 1 2 12 2 2 1 ν where s1 2�and�s2 2�are�the�sample�variances�for�samples�1�and�2�respectively s1�and�s2�are�the�sample�standard�deviations�for�samples�1�and�2�respectively r12�is�the�correlation�between�the�scores�from�sample�1�and�sample�2�(which�is�then�squared) ν� is� the� number� of� degrees� of� freedom,� ν� =� n� −� 2,� with� n� being� the� number� of� paired� observations�(not�the�number�of�total�observations) Although�correlations�are�not�formally�discussed�until�Chapter�10,�conceptually�the�cor- relation�is�a�measure�of�the�relationship�between�two�variables��This�test�statistic�is�concep- tually�somewhat�similar�to�the�test�statistic�for�the�dependent�t�test� The� test� statistic� t� is� then� compared� to� a� critical� value(s)� from� the� t� distribution�� For� a� two-tailed�test,�the�critical�values�are�denoted�as� ± α ν2 t �and�are�found�in�Table�A�2��If�the� test�statistic�t�falls�into�either�critical�region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�H0�� For�a�one-tailed�test,�the�critical�value�is�denoted�as� + α ν1 t �for�the�alternative�hypothesis�H1:� σ1 2 �−� σ2 2 �>�0�and�as� − α ν1 t �for�the�alternative�hypothesis�H1:� σ1
2 �−� σ2
2 �<�0��If�the�test�statistic�t� falls�into�the�appropriate�critical�region,�then�we�reject�H0;�otherwise,�we�fail�to�reject�H0�� It�is�thought�that�this�test�is�not�particularly�robust�to�nonnormality�(Wilcox,�1987)��As�a� result,�other�procedures�have�been�developed�that�are�thought�to�be�more�robust��However,� little�in�the�way�of�empirical�results�is�known�at�this�time��Some�of�the�new�procedures�can� also�be�used�for�testing�inferences�involving�the�equality�of�two�or�more�dependent�variances�� In�addition,�note�that�acceptable�CI�procedures�are�not�currently�available� Let�us�consider�an�example�to�illustrate�use�of�the�test�of�two�dependent�variances��The� same�basic�steps�for�hypothesis�testing�that�we�applied�in�previous�chapters�will�be�applied� here�as�well��These�steps�include�the�following: � 1�� State�the�null�and�alternative�hypotheses� � 2�� Select�the�level�of�significance�(i�e�,�alpha,�α)� � 3�� Calculate�the�test�statistic�value� � 4�� Make�a�statistical�decision�(reject�or�fail�to�reject�H0)� A�researcher�is�interested�in�whether�there�is�greater�variation�in�achievement�test�scores� at�the�end�of�the�first�grade�as�compared�to�the�beginning�of�the�first�grade��Thus,�a�direc- tional,�one-tailed�alternative�hypothesis�is�utilized��If�the�null�hypothesis�is�rejected,�this� would�indicate�that�first�graders’�achievement�test�scores�are�more�variable�at�the�end�of� the�year�than�at�the�beginning�of�the�year��If�the�null�hypothesis�is�not�rejected,�this�would� indicate�that�first�graders’�achievement�test�scores�have�approximately�the�same�variance� at�both�the�end�of�the�year�and�at�the�beginning�of�the�year� 248 An Introduction to Statistical Concepts A�random�sample�of�62�first-grade�children�is�selected�and�given�the�same�achievement� test� at� the� beginning� of� the� school� year� (September)� and� at� the� end� of� the� school� year� (April)��Thus,�the�same�students�are�tested�twice�with�the�same�instrument,�thereby�result- ing�in�dependent�samples�at�time�1�and�time�2��The�level�of�significance�is�set�at�α�=��01��The� test�statistic�t�is�computed�as�follows��We�determine�that�n�=�62,�ν�=�60,� s12 �=�100,�s1�=�10,� s2 2 �=�169,� s2�=�13,�and�r12�=��80��We�compute�the�test�statistic�t�to�be�as�follows: � t s s s s r = − − = − − = −1 2 2 2 1 2 12 2 2 1 100 169 2 10 13 1 64 60 3 4261 ν ( ) . . The�test�statistic�t�is�then�compared�to�the�critical�value�from�the�t�distribution��As�this�is� a�one-tailed�test,�the�critical�value�is�denoted�as�− α ν1t �and�is�determined�from�Table�A�2�to� be�−�01t60�=�−2�390��The�test�statistic�t�falls�into�the�lower-tail�critical�region,�as�it�is�less�than� the�critical�value�(i�e�,�−3�4261�<�−2�390),�so�we�reject�H0�and�conclude�that�the�variance�in� achievement�test�scores�increases�from�September�to�April� 9.4 Inferences About Two or More Independent Variances (Homogeneity of Variance Tests) In�our�third�and�final�inferential�testing�situation�for�variances,�the�researcher�would� like�to�know�whether�the�population�variance�for�one�group�is�different�from�the�pop- ulation� variance� for� one� or� more� other� independent� groups�� In� this� section,� we� first� describe� the� somewhat� cloudy� situation� that� exists� for� the� traditional� tests�� Then� we� provide� details� on� two� recommended� tests,� the� Brown–Forsythe� procedure� and� the� O’Brien�procedure� 9.4.1 Traditional Tests One�of�the�more�heavily�studied�inferential�testing�situations�in�recent�years�has�been�for� testing�whether�differences�exist�among�two�or�more�independent�group�variances��These� tests�are�often�referred�to�as�homogeneity of variance tests��Here�we�briefly�discuss�the� more�traditional�tests�and�their�associated�problems��In�the�sections�that�follow,�we�then� recommend�two�of�the�“better”�tests��As�was�noted�in�the�previous�procedures,�the�vari- able�for�which�the�variance(s)�is�computed�must�be�interval�or�ratio�in�scale� Several� tests� have� traditionally� been� used� to� test� for� the� equality� of� independent� vari- ances�� An� early� simple� test� for� two� independent� variances� is� to� form� a� ratio� of� the� two� sample�variances,�which�yields�the�following�F�test�statistic: � F s s = 1 2 2 2 This�F�ratio�test�assumes�that�the�two�populations�are�normally�distributed��However,�it� is�known�that�the�F�ratio�test�is�not�very�robust�to�violation�of�the�normality�assumption,� 249Inferences About Variances except�for�when�the�sample�sizes�are�equal�(i�e�,�n1�=�n2)��In�addition,�the�F�ratio�test�can�only� be�used�for�the�two-group�situation� Subsequently,� more� general� tests� were� developed� to� cover� the� multiple-group� situation�� One�such�popular�test�is�Hartley’s�Fmax�test�(developed�in�1950),�which�is�simply�a�more�general� version�of�the�F�ratio�test�just�described��The�test�statistic�for�Hartley’s�Fmax�test�is�the�following: � F s s max largest smallest = 2 2 where slargest 2 �is�the�largest�variance�in�the�set�of�variances ssmallest 2 �is�the�smallest�variance�in�the�set Hartley’s�Fmax�test�assumes�normal�population�distributions�and�requires�equal�sample�sizes�� We�also�know�that�Hartley’s�Fmax�test�is�not�very�robust�to�violation�of�the�normality�assump- tion��Cochran’s�C�test�(developed�in�1941)�is�also�an�F�test�statistic�and�is�computed�by�taking�the� ratio�of�the�largest�variance�to�the�sum�of�all�of�the�variances��Cochran’s�C�test�also�assumes�nor- mality,�requires�equal�sample�sizes,�and�has�been�found�to�be�even�less�robust�to�nonnormality� than�Hartley’s�Fmax�test��As�we�see�in�Chapter�11�for�the�ANOVA,�it�is�when�we�have�unequal� sample�sizes�that�unequal�variances�are�a�problem;�for�these�reasons,�none�of�these�tests�can�be� recommended,�which�is�the�same�situation�we�encountered�with�the�independent�t�test� Bartlett’s�χ2�test�(developed�in�1937)�does�not�have�the�stringent�requirement�of�equal�sam- ple�sizes;�however,�it�does�still�assume�normality��Bartlett’s�test�is�very�sensitive�to�nonnor- mality�and�is�therefore�not�recommended�either��Since�1950,�the�development�of�homogeneity� tests�has�proliferated,�with�the�goal�being�to�find�a�test�that�is�fairly�robust�to�nonnormality�� Seemingly�as�each�new�test�was�developed,�later�research�would�show�that�the�test�was�not� very�robust��Today�there�are�well�over�60�such�tests�available�for�examining�homogeneity�of� variance�(e�g�,�a�bootstrap�method�developed�by�Wilcox�[2002])��Rather�than�engage�in�a�pro- tracted�discussion�of�these�tests�and�their�associated�limitations,�we�simply�present�two�tests� that�have�been�shown�to�be�most�robust�to�nonnormality�in�several�recent�studies��These�are� the�Brown–Forsythe�procedure�and�the�O’Brien�procedure��Unfortunately,�neither�of�these� tests�is�available�in�the�major�statistical�packages�(e�g�,�SPSS),�which�only�include�some�of�the� problematic�tests�previously�described� 9.4.2 brown–Forsythe procedure The�Brown–Forsythe�procedure�is�a�variation�of�Levene’s�test�developed�in�1960��Levene’s� test�is�essentially�an�ANOVA�on�the�transformed�variable: � Z Y Yij ij j= − � where ij�designates�the�ith�observation�in�group�j Zij�is�computed�for�each�individual�by�taking�their�score�Yij,�subtracting�from�it�the�group� mean�Y – �j�(the�“�”�indicating�we�have�averaged�across�all�i�observations�in�group j),�and� then�taking�the�absolute�value�(i�e�,�by�removing�the�sign) Unfortunately,� Levene’s� test� is� not� very� robust� to� nonnormality,� except� when� sample� sizes�are�equal� 250 An Introduction to Statistical Concepts Developed�in�1974,�the�Brown–Forsythe�procedure�has�been�shown�to�be�quite�robust�to� nonnormality�in�numerous�studies�(e�g�,�Olejnik�&�Algina,�1987;�Ramsey,�1994)��Based�on� this� and� other� research,� the� Brown–Forsythe� procedure� is� recommended� for� leptokurtic� distributions� (i�e�,� those� with� sharp� peaks),� as� it� is� robust� to� nonnormality� and� provides� adequate� Type� I� error� protection� and� excellent� power�� In� the� next� section,� we� describe� the� O’Brien� procedure,� which� is� recommended� for� other� distributions� (i�e�,� mesokurtic� and�platykurtic�distributions)��In�cases�where�you�are�unsure�of�which�procedure�to�use,� Algina,� Blair,� and� Combs� (1995)� recommend� using� a� maximum� procedure,� where� both� tests�are�conducted�and�the�procedure�with�the�maximum�test�statistic�is�selected� Let�us�now�examine�in�detail�the�Brown–Forsythe�procedure��The�null�hypothesis�is�that� H0:� σ1 2 �=� σ2 2 �=�…�=� σJ 2,�and�the�alternative�hypothesis�is�that�not�all�of�the�population�group� variances�are�the�same��The�Brown–Forsythe�procedure�is�essentially�an�ANOVA�on�the� transformed�variable: � Z Y Mdnij ij j= − � which� is� computed� for� each� individual� by� taking� their� score� Yij,� subtracting� from� it� the� group�median�Mdn�j,�and�then�taking�the�absolute�value�(i�e�,�by�removing�the�sign)��The� test�statistic�is�an�F�and�is�computed�by � F n Z Z J Z Z N J j j j J ij j j J i nj = − − − − = == ∑ ∑∑ ( ) /( ) ( ) /( ) . .. . 2 1 2 11 1 where nj�designates�the�number�of�observations�in�group�j J�is�the�number�of�groups�(where�j�ranges�from�1�to�J) Z – �j�is�the�mean�for�group�j�(computed�by�taking�the�sum�of�the�observations�in�group�j� and�dividing�by�the�number�of�observations�in�group�j,�which�is�nj) Z – ��is�the�overall�mean�regardless�of�group�membership�(computed�by�taking�the�sum�of�all� of�the�observations�across�all�groups�and�dividing�by�the�total�number�of�observations�N) The� test� statistic� F� is� compared� against� a� critical� value� from� the� F� table� (Table� A�4)� with� J�−�1�degrees�of�freedom�in�the�numerator�and�N�−�J�degrees�of�freedom�in�the�denomina- tor,�denoted�by� αFJ−1,�N−J��If�the�test�statistic�is�greater�than�the�critical�value,�we�reject�H0;� otherwise,�we�fail�to�reject�H0� An� example� using� the� Brown–Forsythe� procedure� is� certainly� in� order� now�� Three� dif- ferent� groups� of� children,� below-average,� average,� and� above-average� readers,� play� a� com- puter�game��The�scores�on�the�dependent�variable�Y�are�their�total�points�from�the�game��We� are� interested� in� whether� the� variances� for� the� three� student� groups� are� equal� or� not�� The� example�data�and�computations�are�given�in�Table�9�1��First�we�compute�the�median�for�each� group,� and� then� compute� the� deviation� from� the� median� for� each� individual� to� obtain� the� transformed�Z�values��Then�the�transformed�Z�values�are�used�to�compute�the�F�test�statistic� The�test�statistic�F�=�1�6388�is�compared�against�the�critical�value�for�α�=��05�of��05F2,9�=� 4�26�� As� the� test� statistic� is� smaller� than� the� critical� value� (i�e�,� 1�6388� <� 4�26),� we� fail� to� reject�the�null�hypothesis�and�conclude�that�the�three�student�groups�do�not�have�different� variances� 251Inferences About Variances 9.4.3 O’brien procedure The� final� test� to� consider� in� this� chapter� is� the� O’Brien� procedure�� While� the� Brown– Forsythe�procedure�is�recommended�for�leptokurtic�distributions,�the�O’Brien�procedure� is�recommended�for�other�distributions�(i�e�,�mesokurtic�and�platykurtic�distributions)�� Let�us�now�examine�in�detail�the�O’Brien�procedure��The�null�hypothesis�is�again�that� H0:� σ1 2 � =� σ2 2� =� …� =� σJ 2,� and� the� alternative� hypothesis� is� that� not� all� of� the� population� group�variances�are�the�same� Table 9.1 Example�Using�the�Brown–Forsythe�and�O’Brien�Procedures Group 1 Group 2 Group 3 Y Z r Y Z r Y Z r 6 4 124�2499 9 4 143 10 8 704 8 2 14�2499 12 1 −7 16 2 −16 12 2 34�2499 14 1 −7 20 2 −96 13 3 89�2499 17 4 143 30 12 1104 Mdn Z – r– Mdn Z – r– Mdn Z – r– 10 2�75 65�4999 13 2�50 68 18 6 424 Overall�Z – Overall�r– 3�75 185�8333 Computations�for�the�Brown–Forsythe�procedure: F n Z Z J Z Z N J j j j J ij j j J i nj = − − − − = = == ∑ ∑∑ ( ) /( ) ( ) /( ) [ ( . . .. . 2 1 2 11 1 4 2 775 3 75 4 2 50 3 75 4 6 3 75 2 4 2 75 2 2 75 2 2 2 2 2 − + − + − − + − . ) ( . . ) ( . ) ]/ [( . ) ( . ) ++ + − = � ( ) ]/ . 12 6 9 1 63882 Computations�for�the�O’Brien�procedure: Sample�means:�Y – 1�=�9�75,�Y – 2�=�13�0,�Y – 3�=�19�0 Sample�variances:�s1 2 0.= 1 9167,�s22 = 11 3333. ,�s32 0= 7 6667. Example�computation�for�rij: r n n Y Y s n n n ij j j ij j j j j j = − − − − − − = −( . ) ( ) . ( ) ( )( ) ( . ) (.1 5 5 1 1 2 4 1 5 42 2 66 9 75 5 10 9167 4 1 4 1 4 2 124 2499 2− − − − − = . ) . ( . )( ) ( )( ) . Test�statistic: F n r r J r r N J j j j J ij j j J i nj = − − − − = = == ∑ ∑∑ ( ) /( ) ( ) / ( ) [ ( . .. . 2 1 2 11 1 4 65.. . ) ( . ) ( . ) ]/ [( 4999 185 8333 4 68 185 8333 4 424 185 8333 2 124 2 2 2− + − + − .. . ) ( . . ) ( ) ]/ . 2499 65 4999 14 2499 65 4999 1104 424 9 1 479 2 2 2− + − + + − = � 99 252 An Introduction to Statistical Concepts The�O’Brien�procedure�is�an�ANOVA�on�a�different�transformed�variable: � r n n Y Y s n n n ij j j ij j j j j j = − − − − − − ( . ) ( ) . ( ) ( )( ) .1 5 5 1 1 2 2 2 which�is�computed�for�each�individual,�where nj�is�the�size�of�group�j Y – �j�is�the�mean�for�group�j sj 2�is�the�sample�variance�for�group�j The�test�statistic�is�an�F�and�is�computed�by � F n r r J r r N J j j j J ij j j J i nj = − − − − = == ∑ ∑∑ ( ) /( ) ( ) /( ) . .. . 2 1 2 11 1 where nj�designates�the�number�of�observations�in�group�j J�is�the�number�of�groups�(where�j�ranges�from�1�to�J) r–�j�is�the�mean�for�group�j�(computed�by�taking�the�sum�of�the�observations�in�group�j� and�dividing�by�the�number�of�observations�in�group�j,�which�is�nj) r��is�the�overall�mean�regardless�of�group�membership�(computed�by�taking�the�sum�of� all�of�the�observations�across�all�groups�and�dividing�by�the�total�number�of�observa- tions�N) The�test�statistic�F�is�compared�against�a�critical�value�from�the�F�table�(Table�A�4)�with�J�−�1� degrees�of�freedom�in�the�numerator�and�N�−�J�degrees�of�freedom�in�the�denominator,� denoted�by�αFJ−1,N−J��If�the�test�statistic�is�greater�than�the�critical�value,�then�we�reject�H0;� otherwise,�we�fail�to�reject�H0� Let�us�return�to�the�example�in�Table�9�1�and�consider�the�results�of�the�O’Brien�proce- dure��From�the�computations�shown�in�the�table,�the�test�statistic�F�=�1�4799�is�compared� against�the�critical�value�for�α�=��05�of��05F2,9�=�4�26��As�the�test�statistic�is�smaller�than�the� critical�value�(i�e�,�1�4799�<�4�26),�we�fail�to�reject�the�null�hypothesis�and�conclude�that� the�three�student�groups�do�not�have�different�variances� 9.5 SPSS Unfortunately,�there�is�not�much�to�report�on�tests�of�variances�for�SPSS��There�are�no�tests� available�for�inferences�about�a�single�variance�or�for�inferences�about�two�dependent�vari- ances��For�inferences�about�independent�variances,�SPSS�does�provide�Levene’s�test�as�part� of�the�“Independent�t�Test”�procedure�(previously�discussed�in�Chapter�7),�and�as�part�of�the� 253Inferences About Variances “One-Way�ANOVA”�and�“Univariate�ANOVA”�procedures�(to�be�discussed�in�Chapter�11)�� Given�our�previous�concerns�with�Levene’s�test,�use�it�with�caution��There�is�also�little�infor- mation�published�in�the�literature�on�power�and�effect�sizes�for�tests�of�variances� 9.6 Template and APA-Style Write-Up Consider�an�example�paragraph�for�one�of�the�tests�described�in�this�chapter,�more�spe- cifically,� testing� inferences� about� two� dependent� variances�� As� you� may� remember,� our� graduate�research�assistant,�Marie,�was�working�with�Jessica,�a�classroom�teacher,�to�assist� in�analyzing�the�variances�of�first-grade�students��Her�task�was�to�assist�Jessica�with�writ- ing� her� research� question� (Are the variances of achievement scores for first-grade children the same in the fall as compared to the spring?)�and�generating�the�test�of�inference�to�answer�her� question��Marie�suggested�a�dependent�variances�test�as�the�test�of�inference��A�template� for�writing�a�research�question�for�the�dependent�variances�is�presented�as�follows: Are the Variances of [Variable] the Same in [Time 1] as Compared to [Time 2]? An�example�write-up�is�presented�as�follows: A dependent variances test was conducted to determine if variances of achievement scores for first-grade children were the same in the fall as compared to the spring. The test was conducted using an alpha of .05. The null hypothesis was that the variances would be the same. There was a statistically significant difference in variances of achievement scores of first-grade children in the fall as compared to the spring (t = −3.4261, df = 60, p < .05). Thus, the null hypothesis that the variances would be equal at the beginning and end of the first grade was rejected. The variances of achievement test scores significantly increased from September to April. 9.7 Summary In� this� chapter,� we� described� testing� hypotheses� about� variances�� Several� inferential� tests� and� new� concepts� were� discussed�� The� new� concepts� introduced� were� the� sam- pling� distribution� of� the� variance,� the� F� distribution,� and� homogeneity� of� variance� tests�� The� first� inferential� test� discussed� was� the� test� of� a� single� variance,� followed� by�a�test�of�two�dependent�variances��Next�we�examined�several�tests�of�two�or�more� independent�variances��Here�we�considered�the�following�traditional�procedures:�the� F� ratio� test,� Hartley’s� Fmax� test,� Cochran’s� C� test,� Bartlett’s� χ2� test,� and� Levene’s� test�� Unfortunately,�these�tests�are�not�very�robust�to�violation�of�the�normality�assumption�� We� then� discussed� two� newer� procedures� that� are� relatively� robust� to� nonnormality,� 254 An Introduction to Statistical Concepts the�Brown–Forsythe�procedure�and�the�O’Brien�procedure��Examples�were�presented� for� each� of� the� recommended� tests�� At� this� point,� you� should� have� met� the� following� objectives:� (a)� be� able� to� understand� the� basic� concepts� underlying� tests� of� variances,� (b)�be�able�to�select�the�appropriate�test,�and�(c)�be�able�to�determine�and�interpret�the� results�from�the�appropriate�test��In�Chapter�10,�we�discuss�correlation�coefficients,�as� well�as�inferential�tests�involving�correlations� Problems Conceptual problems 9.1� Which�of�the�following�tests�of�homogeneity�of�variance�is�most�robust�to�assump- tion�violations? � a�� F�ratio�test � b�� Bartlett’s�chi-square�test � c�� The�O’Brien�procedure � d�� Hartley’s�Fmax�test 9.2� Cochran’s�C�test�requires�equal�sample�sizes��True�or�false? 9.3� I�assert�that�if�two�dependent�sample�variances�are�identical,�I�would�not�be�able�to� reject�the�null�hypothesis��Am�I�correct? 9.4� Suppose�that�I�wish�to�test�the�following�hypotheses�at�the��01�level�of�significance: � H H 0 2 1 2 250 250 : : σ σ = >
� A� sample� variance� of� 233� is� observed�� I� assert� that� if� I� compute� the� χ2� test� statistic�
and�compare�it�to�the�χ2�table,�it�is�possible�that�I�could�reject�the�null�hypothesis��Am�
I�correct?
9.5� Suppose�that�I�wish�to�test�the�following�hypotheses�at�the��05�level�of�significance:
�
H
H
0
2
1
2
16
16
:
:
σ
σ
=
>
� A�sample�variance�of�18�is�observed��I�assert�that�if�I�compute�the�χ2�test�statistic�and�
compare� it� to� the� χ2� table,� it� is� possible� that� I� could� reject� the� null� hypothesis�� Am� I�
correct?
9.6� If� the� 90%� CI� for� a� single� variance� extends� from� 25�7� to� 33�6,� I� assert� that� the� null�
hypothesis�would�definitely�be�rejected�at�the��10�level��Am�I�correct?
9.7� If� the� 95%� CI� for� a� single� variance� ranges� from� 82�0� to� 93�5,� I� assert� that� the� null�
hypothesis�would�definitely�be�rejected�at�the��05�level��Am�I�correct?
9.8� If�the�mean�of�the�sampling�distribution�of�the�difference�between�two�variances�equals�
0,�I�assert�that�both�samples�probably�represent�a�single�population��Am�I�correct?

255Inferences About Variances
9.9� Which�of�the�following�is�an�example�of�two�dependent�samples?
� a�� Pretest�scores�of�males�in�one�course�and�posttest�scores�of�females�in�another�course
� b�� Husbands�and�their�wives�in�your�neighborhood
� c�� Softball�players�at�your�school�and�football�players�at�your�school
� d�� Professors�in�education�and�professors�in�psychology
9.10� The� mean� of� the� F� distribution� increases� as� degrees� of� freedom� denominator� (ν2)�
increase��True�or�false?
Computational problems
9.1� The�following�random�sample�of�scores�on�a�preschool�ability�test�is�obtained�from�a�
normally�distributed�population�of�4�year�olds:
20 22 24 30 18 22 29 27
25 21 19 22 38 26 17 25
� a�� Test�the�following�hypotheses�at�the��10�level�of�significance:
�
H
H
0
2
1
2
75
75
:
:
σ
σ
=
≠
� b�� Construct�a�90%�CI�
9.2� The� following� two� independent� random� samples� of� number� of� CDs� owned� are�
obtained�from�two�populations�of�undergraduate�(sample�1)�and�graduate�students�
(sample�2),�respectively:
Sample 1 Data Sample 2 Data
42 36 47 35 46 45 50 57 58 43
37 52 44 47 51 52 43 60 41 49
56 54 55 50 40 44 51 49 55 56
40 46 41
� Test� the� following� hypotheses� at� the� �05� level� of� significance� using� the� Brown–
Forsythe�and�O’Brien�procedures:
�
H
H
0 1
2
2
2
1 1
2
2
2
0
0
:
:
σ σ
σ σ
− =
− ≠
9.3� The�following�summary�statistics�are�available�for�two�dependent�random�samples�
of�brothers�and�sisters,�respectively,�on�their�allowance�for�the�past�month:�s1
2�=�49,�
s2
2�=�25,�n =�32,�r12�=��60�
�Test�the�following�hypotheses�at�the��05�level�of�significance:
�
H
H
0 1
2
2
2
1 1
2
2
2
0
0
:
:
σ σ
σ σ
− =
− ≠

256 An Introduction to Statistical Concepts
9.4� The�following�summary�statistics�are�available�for�two�dependent�random�samples�
of�first�semester�college�students�who�were�measured�on�their�high�school�and�first�
semester�college�GPAs,�respectively:� s1
2 �=�1�56,� s2
2 �=�4�42,�n�=�62,�r12�=��72�
�Test�the�following�hypotheses�at�the��05�level�of�significance:
�
H
H
0 1
2
2
2
1 1
2
2
2
0
0
:
:
σ σ
σ σ
− =
− ≠
9.5� A�random�sample�of�21�statistics�exam�scores�is�collected�with�a�sample�mean�of�
50� and� a� sample� variance� of� 10�� Test� the� following� hypotheses� at� the� �05� level� of�
significance:
�
H
H
0
2
1
2
25
25
:
:
σ
σ
=
≠
9.6� A� random� sample� of� 30� graduate� entrance� exam� scores� is� collected� with� a� sample�
mean�of�525�and�a�sample�variance�of�16,900��Test�the�following�hypotheses�at�the��05�
level�of�significance:
�
H
H
0
2
1
2
10 000
10 000
: ,
: ,
σ
σ
=
≠
9.7� A� pretest� was� given� at� the� beginning� of� a� history� course� and� a� posttest� at� the� end�
of� the� course�� The� pretest� variance� is� 36,� the� posttest� variance� is� 64,� sample� size� is�
31,� and� the� pretest-posttest� correlation� is� �80�� Test� the� null� hypothesis� that� the� two�
dependent�variances�are�equal�against�a�nondirectional�alternative�at�the��01�level�of�
significance�
Interpretive problems
9.1� Use� the� survey� 1� dataset� from� the� website� to� determine� if� there� are� gender� differ-
ences�among�the�variances�for�any�items�of�interest�that�are�at�least�interval�or�ratio�
in�scale��Some�example�items�might�include�the�following:
� a�� Item�#1:�height�in�inches
� b�� Item�#6:�amount�spent�at�last�hair�appointment
� c�� Item�#7:�number�of�compact�disks�owned
� d�� Item�#9:�current�GPA
� e�� Item�#10:�amount�of�exercise�per�week
� f�� Item�#15:�number�of�alcoholic�drinks�per�week
� g�� Item�#21:�number�of�hours�studied�per�week

257Inferences About Variances
9.2� Use� the� survey� 1� dataset� from� the� website� to� determine� if� there� are� differences�
between�the�variances�for�left-�versus�right-handed�individuals�on�any�items�of�inter-
est�that�are�at�least�interval�or�ratio�in�scale��Some�example�items�might�include�the�
following:
� a�� Item�#1:�height�in�inches
� b�� Item�#6:�amount�spent�at�last�hair�appointment
� c�� Item�#7:�number�of�compact�disks�owned
� d�� Item�#9:�current�GPA
� e�� Item�#10:�amount�of�exercise�per�week
� f�� Item�#15:�number�of�alcoholic�drinks�per�week
� g�� Item�#21:�number�of�hours�studied�per�week

259
10
Bivariate Measures of Association
Chapter Outline
10�1� Scatterplot
10�2� Covariance
10�3� Pearson�Product–Moment�Correlation�Coefficient
10�4� Inferences�About�the�Pearson�Product–Moment�Correlation�Coefficient
10�4�1� Inferences�for�a�Single�Sample
10�4�2� Inferences�for�Two�Independent�Samples
10�5� Assumptions�and�Issues�Regarding�Correlations
10�5�1� Assumptions
10�5�2� Correlation�and�Causality
10�5�3� Restriction�of�Range
10�6� Other�Measures�of�Association
10�6�1� Spearman’s�Rho
10�6�2� Kendall’s�Tau
10�6�3� Phi
10�6�4� Cramer’s�Phi
10�6�5� Other�Correlations
10�7� SPSS
10�8� G*Power
10�9� Template�and�APA-Style�Write-Up
Key Concepts
� 1�� Scatterplot
� 2�� Strength�and�direction
� 3�� Covariance
� 4�� Correlation�coefficient
� 5�� Fisher’s�Z�transformation
� 6�� Linearity�assumption,�causation,�and�restriction�of�range�issues

260 An Introduction to Statistical Concepts
We� have� considered� various� inferential� tests� in� the� last� four� chapters,� specifically� those�
that�deal�with�tests�of�means,�proportions,�and�variances��In�this�chapter,�we�examine�mea-
sures�of�association�as�well�as�inferences�involving�measures�of�association��Methods�for�
directly�determining�the�relationship�among�two�variables�are�known�as�bivariate analy-
sis,� rather� than� univariate analysis� which� is� only� concerned� with� a� single� variable�� The�
indices�used�to�directly�describe�the�relationship�among�two�variables�are�known�as�cor-
relation coefficients�(in�the�old�days,�known�as�co-relation)�or�as�measures of association�
These�measures�of�association�allow�us�to�determine�how�two�variables�are�related�to�
one� another� and� can� be� useful� in� two� applications,� (a)� as� a� descriptive� statistic� by� itself�
and�(b)�as�an�inferential�test��First,�a�researcher�may�want�to�compute�a�correlation�coeffi-
cient�for�its�own�sake,�simply�to�tell�the�researcher�precisely�how�two�variables�are�related�
or� associated�� For� example,� we� may� want� to� determine� whether� there� is� a� relationship�
between�the�GRE-Quantitative�(GRE-Q)�subtest�and�performance�on�a�statistics�exam��Do�
students�who�score�relatively�high�on�the�GRE-Q�perform�higher�on�a�statistics�exam�than�
do�students�who�score�relatively�low�on�the�GRE-Q?�In�other�words,�as�scores�increase�on�
the�GRE-Q,�do�they�also�correspondingly�increase�their�performance�on�a�statistics�exam?
Second,� we� may� want� to� use� an� inferential� test� to� assess� whether� (a)� a� correlation� is�
significantly�different�from�0�or�(b)�two�correlations�are�significantly�different�from�one�
another��For�example,�is�the�correlation�between�GRE-Q�and�statistics�exam�performance�
significantly�different�from�0?�As�a�second�example,�is�the�correlation�between�GRE-Q�
and�statistics�exam�performance�the�same�for�younger�students�as�it�is�for�older�students?
The�following�topics�are�covered�in�this�chapter:�scatterplot,�covariance,�Pearson�product-
moment�correlation�coefficient,�inferences�about�the�Pearson�product–moment�correlation�
coefficient,� some� issues� regarding� correlations,� other� measures� of� association,� SPSS,� and�
power��We�utilize�some�of�the�basic�concepts�previously�covered�in�Chapters�6�through�9��
New�concepts�to�be�discussed�include�the�following:�scatterplot;�strength�and�direction;�
covariance;� correlation� coefficient;� Fisher’s� Z� transformation;� and� linearity� assumption,�
causation,�and�restriction�of�range�issues��Our�objectives�are�that�by�the�end�of�this�chapter,�
you�will�be�able�to�(a)�understand�the�concepts�underlying�the�correlation�coefficient�and�
correlation�inferential�tests,�(b)�select�the�appropriate�type�of�correlation,�and�(c)�determine�
and�interpret�the�appropriate�correlation�and�inferential�test�
10.1 Scatterplot
Marie,�the�graduate�student�pursuing�a�degree�in�educational�research,�continues�to�work�
diligently�on�her�coursework��Additionally,�as�we�will�once�again�see�in�this�chapter,�Marie�
continues�to�assist�her�faculty�advisor�with�various�research�tasks�
Marie’s� faculty� advisor� received� a� telephone� call� from� Matthew,� the� director� of� mar-
keting�for�the�local�animal�shelter��Based�on�the�donor�list,�it�appears�that�the�donors�
who� contribute� the� largest� donations� also� have� children� and� pets�� In� an� effort� to�
attract�more�donors�to�the�animal�shelter,�Matthew�is�targeting�select�groups—one�of�
which�he�believes�may�be�families�that�have�children�at�home�and�who�also�have�pets��
Matthew�believes�if�there�is�a�relationship�between�these�variables,�he�can�more�easily�
reach� the� intended� audience� with� his� marketing� materials� which� will� then� translate�
into� increased� donations� to� the� animal� shelter�� However,� Matthew� wants� to� base� his�

261Bivariate Measures of Association
decision�on�solid�evidence�and�not�just�a�hunch��Having�built�a�good�knowledge�base�
with� previous� consulting� work,� Marie’s� faculty� advisor� puts� Matthew� in� touch� with�
Marie�� After� consulting� with� Matthew,� Marie� suggests� a� Pearson� correlation� as� the�
test�of�inference�to�test�his�research�question:�Is there a correlation between the number of
children in a family and the number of pets?�Marie’s�task�is�then�to�assist�in�generating�the�
test�of�inference�to�answer�Matthew’s�research�question�
This�section�deals�with�an�important�concept�underlying�the�relationship�among�two�vari-
ables,�the�scatterplot��Later�sections�move�us�into�ways�of�measuring�the�relationship�among�
two� variables�� First,� however,� we� need� to� set� up� the� situation� where� we� have� data� on� two�
different�variables�for�each�of�N�individuals�in�the�population��Table�10�1�displays�such�a�situ-
ation��The�first�column�is�simply�an�index�of�the�individuals�in�the�population,�from�i�=�1,…,�N,�
where� N� is� the� total� number� of� individuals� in� the� population�� The� second� column� denotes�
the�values�obtained�for�the�first�variable�X��Thus,�X1�=�10�means�that�the�first�individual�had�
a�score�of�10�on�variable�X��The�third�column�provides�the�values�for�the�second�variable�Y��
Thus,�Y1�=�20�indicates�that�the�first�individual�had�a�score�of�20�on�variable�Y��In�an�actual�
data�table,�only�the�scores�would�be�shown,�not�the�Xi�and�Yi�notation��Thus,�we�have�a�tabular�
method�for�depicting�the�data�of�a�two-variable�situation�in�Table�10�1�
A�graphical�method�for�depicting�the�relationship�among�two�variables�is�to�plot�the�
pair�of�scores�on�X�and�Y�for�each�individual�on�a�two-dimensional�figure�known�as�a�
scatterplot�(or�scattergram)��Each�individual�has�two�scores�in�a�two-dimensional�coor-
dinate�system,�denoted�by�(X,�Y)��For�example,�our�individual�1�has�the�paired�scores�
of� (10,� 20)�� An� example� scatterplot� is� shown� in� Figure� 10�1�� The� X� axis� (the� horizontal�
Table 10.1
Layout�for�Correlational�Data
Individual X Y
1 X1�=�10 Y1�=�20
2 X2�=�12 Y2�=�28
3 X3�=�20 Y3�=�33
� � �
� � �
� � �
N XN�=�44 YN�=�65
20
Y
10 X
FIGuRe 10.1
Scatterplot�

262 An Introduction to Statistical Concepts
axis�or�abscissa)�represents�the�values�for�variable�X,�and�the�Y�axis�(the�vertical�axis�or�
ordinate)�represents�the�values�for�variable�Y��Each�point�on�the�scatterplot�represents�a�
pair�of�scores�(X,�Y)�for�a�particular�individual��Thus,�individual�1�has�a�point�at�X�=�10�
and�Y�=�20�(the�circled�point)��Points�for�other�individuals�are�also�shown��In�essence,�
the�scatterplot�is�actually�a�bivariate�frequency�distribution��When�there�is�a�moderate�
degree�of�relationship,�the�points�may�take�the�shape�of�an�ellipse�(i�e�,�a�football�shape�
where� the� direction� of� the� relationship,� positive� or� negative,� may� make� the� football�
appear�to�point�up�to�the�right—as�with�a�positive�relation�depicted�in�this�figure),�as�
in�Figure�10�1�
The�scatterplot�allows�the�researcher�to�evaluate�both�the�direction�and�the�strength�of�
the�relationship�among�X�and�Y��The�direction�of�the�relationship�has�to�do�with�whether�
the�relationship�is�positive�or�negative��A�positive�relationship�occurs�when�as�scores�on�
variable�X�increase�(from�left�to�right),�scores�on�variable�Y�also�increase�(from�bottom�to�
top)��Thus,�Figure�10�1�indicates�a�positive�relationship�among�X�and�Y��Examples�of�dif-
ferent�scatterplots�are�shown�in�Figure�10�2��Figure�10�2a�and�d�displays�positive�relation-
ships��A�negative�relationship,�sometimes�called�an�inverse�relationship,�occurs�when�as�
scores�on�variable�X�increase�(from�left�to�right),�scores�on�variable�Y�decrease�(from�top�to�
bottom)��Figure�10�2b�and�e�shows�examples�of�negative�relationships��There�is�no�relation-
ship�between�X�and�Y�when�for�a�large�value�of�X,�a�large�or�a�small�value�of�Y�can�occur,�
and�for�a�small�value�of�X,�a�large�or�a�small�value�of�Y�can�also�occur��In�other�words,�X�
and�Y�are�not�related,�as�shown�in�Figure�10�2c�
The� strength� of� the� relationship� among� X� and� Y� is� determined� by� the� scatter� of� the�
points� (hence� the� name� scatterplot)�� First,� we� draw� a� straight� line� through� the� points�
which�cuts�the�bivariate�distribution�in�half,�as�shown�in�Figures�10�1�and�10�2��In�Chapter�17,�
we�note�that�this�line�is�known�as�the�regression�line��If�the�scatter�is�such�that�the�points�
tend�to�fall�close�to�the�line,�then�this�is�indicative�of�a�strong�relationship�among�X�and�Y��
Figure�10�2a�and�b�denotes�strong�relationships��If�the�scatter�is�such�that�the�points�are�
widely�scattered�around�the�line,�then�this�is�indicative�of�a�weak�relationship�among�
(e)
(c)(a)
(d)
(b)
FIGuRe 10.2
Examples�of�possible�scatterplots�

263Bivariate Measures of Association
X� and� Y�� Figure� 10�2d� and� e� denotes� weak� relationships�� To� summarize� Figure� 10�2,�
part�(a)�represents�a�strong�positive�relationship,�part�(b)�a�strong�negative�relationship,�part�
(c)� no� relationship,� part� (d)� a� weak� positive� relationship,� and� part� (e)� a� weak� negative�
relationship��Thus,�the�scatterplot�is�useful�for�providing�a�quick�visual�indication�of�the�
nature�of�the�relationship�among�variables�X�and�Y�
10.2 Covariance
The�remainder�of�this�chapter�deals�with�statistical�methods�for�measuring�the�relationship�
among�variables�X�and�Y��The�first�such�method�is�known�as�the�covariance��The�covariance�
conceptually� is� the� shared� variance� (or� co-variance)� among� X� and� Y�� The� covariance� and�
correlation� share� commonalities� as� the� correlation� is� simply� the� standardized� covariance��
The�population�covariance�is�denoted�by�σXY,�and�the�conceptual�formula�is�given�as�follows:
σ
µ µ
XY
i X i Y
i
N
X Y
N
=
− −
=
∑( )( )
1
where
Xi�and�Yi�are�the�scores�for�individual�i�on�variables�X�and�Y,�respectively
μX�and�μY�are�the�population�means�for�variables�X�and�Y,�respectively
N�is�the�population�size
This� equation� looks� similar� to� the� computational� formula� for� the� variance� presented� in�
Chapter�3,�where�deviation�scores�from�the�mean�are�computed�for�each�individual��The�
conceptual� formula� for� the� covariance� is� essentially� an� average� of� the� paired� deviation�
score�products��If�variables�X�and�Y�are�positively�related,�then�the�deviation�scores�will�
tend�to�be�of�the�same�sign,�their�products�will�tend�to�be�positive,�and�the�covariance�will�
be�a�positive�value�(i�e�,�σXY�>�0)��If�variables�X�and�Y�are�negatively�related,�then�the�devia-
tion�scores�will�tend�to�be�of�opposite�signs,�their�products�will�tend�to�be�negative,�and�
the�covariance�will�be�a�negative�value�(i�e�,�σXY�<�0)��Finally,�if�variables�X�and�Y�are�not� related,�then�the�deviation�scores�will�consist�of�both�the�same�and�opposite�signs,�their� products�will�be�both�positive�and�negative�and�sum�to�0,�and�the�covariance�will�be�a�zero� value�(i�e�,�σXY�=�0)� The� sample� covariance� is� denoted� by� sXY,� and� the� conceptual� formula� becomes� as� follows: s X X Y Y n XY i i i n = − − − = ∑( )( ) 1 1 where X – �and�Y – �are�the�sample�means�for�variables�X�and�Y,�respectively n�is�sample�size 264 An Introduction to Statistical Concepts Note� that� the� denominator� becomes� n� −� 1� so� as� to� yield� an� unbiased� sample� esti- mate�of�the�population�covariance�(i�e�,�similar�to�what�we�did�in�the�sample�variance� situation)� The� conceptual� formula� is� unwieldy� and� error� prone� for� other� than� small� samples�� Thus,� a� computational� formula� for� the� population� covariance� has� been� developed� as� seen�here: σXY i i i N i i N i i N N X Y X Y N =       −             = = = ∑ ∑ ∑ 1 1 1 2 where�the�first�summation�involves�the�cross�product�of�X�multiplied�by�Y�for�each�indi- vidual� summed� across� all� N� individuals,� and� the� other� terms� should� be� familiar�� The� computational�formula�for�the�sample�covariance�is�the�following: s n X Y X Y n n XY i i i n i i n i i n =       −             − = = = ∑ ∑ ∑ 1 1 1 1( ) where�the�denominator�is�n(n�−�1)�so�as�to�yield�an�unbiased�sample�estimate�of�the�popula- tion�covariance� Table�10�2�gives�an�example�of�a�population�situation�where�a�strong�positive�relation- ship�is�expected�because�as�X�(number�of�children�in�a�family)�increases,�Y�(number�of�pets� in�a�family)�also�increases��Here�σXY�is�computed�as�follows: σXY i i i N i i N i i N N X Y X Y N =       −             = −= = = ∑ ∑ ∑ 1 1 1 5 108 2 ( ) (115 30 25 3 6000 )( ) .= The�sign�indicates� that�the�relationship�between�X�and�Y�is�indeed�positive��That�is,�the� more�children�a�family�has,�the�more�pets�they�tend�to�have��However,�like�the�variance,� Table 10.2 Example�Correlational�Data�(X�=�#�Children,�Y�=�#�Pets) Individual X Y XY X 2 Y 2 Rank X Rank Y (Rank X − Rank Y )2 1 1 2 2 1 4 1 1 0 2 2 6 12 4 36 2 3 1 3 3 4 12 9 16 3 2 1 4 4 8 32 16 64 4 4 0 5 5 10 50 25 100 5 5 0 Sums 15 30 108 55 220 2 265Bivariate Measures of Association the� value� of� the� covariance� depends� on� the� scales� of� the� variables� involved�� Thus,� inter- pretation�of�the�magnitude�of�a�single�covariance�is�difficult,�as�it�can�take�on�literally�any� value�� We� see� shortly� that� the� correlation� coefficient� takes� care� of� this� problem�� For� this� reason,� you� are� only� likely� to� see� the� covariance� utilized� in� the� analysis� of� covariance� (Chapter� 14)� and� advanced� techniques� such� as� structural� equation� modeling� and� multi- level�modeling�(beyond�the�scope�of�this�text)� 10.3 Pearson Product–Moment Correlation Coefficient Other�methods�for�measuring�the�relationship�among�X�and�Y�have�been�developed�that� are�easier�to�interpret�than�the�covariance��We�refer�to�these�measures�as�correlation coeffi- cients��The�first�correlation�coefficient�we�consider�is�the�Pearson product–moment corre- lation coefficient,�developed�by�the�famous�statistician�Karl�Pearson,�and�simply�referred� to�as�the�Pearson�here��The�Pearson�can�be�considered�in�several�different�forms,�where�the� population�value�is�denoted�by�ρXY�(rho)�and�the�sample�value�by�rXY��One�conceptual�form� of�the�Pearson�is�a�product�of�standardized�z�scores�(previously�described�in�Chapter�4)�� This�formula�for�the�Pearson�is�given�as�follows: ρXY X Y i N z z N = = ∑( ) 1 where�zX�and�zY�are�the�z�scores�for�variables�X�and�Y,�respectively,�whose�product�is�taken� for�each�individual,�and�then�summed�across�all�N�individuals� Because�z�scores�are�standardized�versions�of�raw�scores,�then�the�Pearson�correla- tion�is�simply�a�standardized�version�of�the�covariance��The�sign�of�the�Pearson�denotes� the�direction�of�the�relationship�(e�g�,�positive�or�negative),�and�the�value�of�the�Pearson� denotes� the� strength� of� the� relationship�� The� Pearson� falls� on� a� scale� from� −1�00� to� +1�00,� where� −1�00� indicates� a� perfect� negative� relationship,� 0� indicates� no� relation- ship,� and� +1�00� indicates� a� perfect� positive� relationship�� Values� near� �50� or� −�50� are� considered�as�moderate�relationships,�values�near�0�as�weak�relationships,�and�values� near�+1�00�or�−1�00�as�strong�relationships�(although�these�are�subjective�terms)��Cohen� (1988)� also� offers� rules� of� thumb,� which� are� presented� later� in� this� chapter,� for� inter- preting� the� value� of� the� correlation�� As� you� may� see� as� you� read� more� statistics� and� research� methods� textbooks,� there� are� other� guidelines� offered� for� interpreting� the� value�of�the�correlation� There� are� other� forms� of� the� Pearson�� A� second� conceptual� form� of� the� Pearson� is� in� terms�of�the�covariance�and�the�standard�deviations�and�is�given�as�follows: ρ σ σ σ XY XY X Y = 266 An Introduction to Statistical Concepts This�form�is�useful�when�the�covariance�and�standard�deviations�are�already�known��A�final� form�of�the�Pearson�is�the�computational�formula,�written�as�follows: ρXY i i i N i i N i i N i i N N X Y X Y N X =       −               = = = = ∑ ∑ ∑ ∑ 1 1 1 2 1     −                     −       = = = ∑ ∑ ∑X N Y Yi i N i i N i i N 1 2 2 1 1 2        where�all�terms�should�be�familiar�from�the�computational�formulas�of�the�variance�and� covariance��This�is�the�formula�to�use�for�hand�computations,�as�it�is�more�error-free�than� the�other�previously�given�formulas� For�the�example�children-pet�data�given�in�Table�10�2,�we�see�that�the�Pearson�correlation� is�computed�as�follows: ρXY i i i N i i N i i N i i N N X Y X Y N X =       −               = = = = ∑ ∑ ∑ ∑ 1 1 1 2 1     −                     −       = = = ∑ ∑ ∑X N Y Yi i N i i N i i N 1 2 2 1 1 2        = − −  −  = 5 108 15 30 5 55 15 5 220 302 2 ( ) ( )( ) ( ) ( ) ( ) ( ) .99000 Thus,�there�is�a�very�strong�positive�relationship�among�variables�X�(the�number�of�chil- dren)�and�Y�(the�number�of�pets)� The�sample�correlation�is�denoted�by�rXY��The�formulas�are�essentially�the�same�for�the� sample�correlation�rXY�and�the�population�correlation�ρXY,�except�that�n�is�substituted�for�N�� For�example,�the�computational�formula�for�the�sample�correlation�is�noted�here: r n X Y X Y n X XY i i i n i i n i i n i i n =       −               = = = = ∑ ∑ ∑ ∑ 1 1 1 2 1     −                     −       = = = ∑ ∑ ∑X n Y Yi i n i i n i i n 1 2 2 1 1 2        Unlike�the�sample�variance�and�covariance,�the�sample�correlation�has�no�correction�for�bias� 10.4 Inferences About Pearson Product–Moment Correlation Coefficient Once�a�researcher�has�determined�one�or�more�Pearson�correlation�coefficients,�it�is�often� useful�to�know�whether�the�sample�correlations�are�significantly�different�from�0��Thus,� we�need�to�visit�the�world�of�inferential�statistics�again��In�this�section,�we�consider�two� 267Bivariate Measures of Association different� inferential� tests:� first� for� testing� whether� a� single� sample� correlation� is� signifi- cantly�different�from�0�and�second�for�testing�whether�two�independent�sample�correla- tions�are�significantly�different� 10.4.1 Inferences for a Single Sample Our�first�inferential�test�is�appropriate�when�you�are�interested�in�determining�whether� the� correlation� among� variables� X� and� Y� for� a� single� sample� is� significantly� different� from� 0�� For� example,� is� the� correlation� between� the� number� of� years� of� education� and� current�income�significantly�different�from�0?�The�test�of�inference�for�the�Pearson�cor- relation�will�be�conducted�following�the�same�steps�as�those�in�previous�chapters��The� null�hypothesis�is�written�as H0 0: ρ = A�nondirectional�alternative�hypothesis,�where�we�are�willing�to�reject�the�null�if�the�sam- ple�correlation�is�either�significantly�greater�than�or�less�than�0,�is�nearly�always�utilized�� Unfortunately,�the�sampling�distribution�of�the�sample�Pearson�r�is�too�complex�to�be�of� much� value� to� the� applied� researcher�� For� testing� whether� the� correlation� is� different� from�0�(i�e�,�where�the�alternative�hypothesis�is�specified�as�H1:�ρ�≠�0),�a�transformation�of�r� can�be�used�to�generate�a�t-distributed�test�statistic��The�test�statistic�is t r n r = − − 2 1 2 which�is�distributed�as�t�with�ν�=�n�−�2�degrees�of�freedom,�assuming�that�both�X�and�Y� are�normally�distributed�(although�even�if�one�variable�is�normal�and�the�other�is�not,�the� t�distribution�may�still�apply;�see�Hogg�&�Craig,�1970)� There�are�two�assumptions�with�the�Pearson�correlation��First,�the�Pearson�correlation� is� appropriate� only� when� there� is� a� linear� relationship� assumed� between� the� variables� (given�that�both�variables�are�at�least�interval�in�scale)��In�other�words,�when�a�curvilinear� or�some�type�of�polynomial�relationship�is�present,�the�Pearson�correlation�should�not�be� computed�� Testing� for� linearity� can� be� done� by� simply� graphing� a� bivariate� scatterplot� and�reviewing�it�for�a�general�linear�display�of�points��Also,�and�as�we�have�seen�with�the� other�inferential�procedures�discussed�in�previous�chapters,�we�need�to�again�assume�that� the�scores�of�the�individuals�are�independent�of�one�another��For�the�Pearson�correlation,�the� assumption� of� independence� is� met� when� a� random� sample� of� units� have� been� selected� from�the�population� It� should� be� noted� for� inferential� tests� of� correlations� that� sample� size� plays� a� role� in� determining� statistical� significance�� For� instance,� this� particular� test� is� based� on� n� −� 2� degrees� of� freedom�� If� sample� size� is� small� (e�g�,� 10),� then� it� is� difficult� to� reject� the� null� hypothesis�except�for�very�strong�correlations��If�sample�size�is�large�(e�g�,�200),�then�it�is� easy� to� reject� the� null� hypothesis� for� all� but� very� weak� correlations�� Thus,� the� statistical� significance�of�a�correlation�is�definitely�a�function�of�sample�size,�both�for�tests�of�a�single� correlation�and�for�tests�of�two�correlations� Effect�size�and�power�are�always�important,�particularly�here�where�sample�size�plays� such� a� large� role�� Cohen� (1988)� proposed� using� r� as� a� measure� of� effect� size,� using� the� subjective�standard�(ignoring�the�sign�of�the�correlation)�of�r�=��1�as�a�weak�effect,�r�=��3� 268 An Introduction to Statistical Concepts as�a�moderate�effect,�and�r�=��5�as�a�strong�effect��These�standards�were�developed�for�the� behavioral�sciences,�but�other�standards�may�be�used�in�other�areas�of�inquiry��Cohen�also� has�a�nice�series�of�power�tables�in�his�Chapter�3�for�determining�power�and�sample�size� when�planning�a�correlational�study��As�for�confidence�intervals�(CIs),�Wilcox�(1996)�notes� that�“many�methods�have�been�proposed�for�computing�CIs�for�ρ,�but�it�seems�that�a�satis- factory�method�for�applied�work�has�yet�to�be�derived”�(p��303)��Thus,�a�CI�procedure�is�not� recommended,�even�for�large�samples� From�the�example�children-pet�data,�we�want�to�determine�whether�the�sample�Pearson� correlation�is�significantly�different�from�0,�with�a�nondirectional�alternative�hypothesis� and�at�the��05�level�of�significance��The�test�statistic�is�computed�as�follows: t r n r = − − = − − = 2 1 9000 5 2 1 8100 3 57622 . . . The� critical� values� from� Table� A�2� are� ± = ±α2 3 3 182t . �� Thus,� we� would� reject� the� null� hypothesis,� as� the� test� statistic� exceeds� the� critical� value,� and� conclude� the� correlation� among�variables�X�and�Y�is�significantly�different�from�0��In�summary,�there�is�a�strong,� positive,�statistically�significant�correlation�between�the�number�of�children�and�the�num- ber�of�pets� 10.4.2 Inferences for Two Independent Samples In�a�second�situation,�the�researcher�may�have�collected�data�from�two�different�indepen- dent�samples��It�can�be�determined�whether�the�correlations�among�variables�X�and�Y�are� equal�for�these�two�independent�samples�of�observations��For�example,�is�the�correlation� among�height�and�weight�the�same�for�children�and�adults?�Here�the�null�and�alternative� hypotheses�are�written�as H H 0 1 2 1 1 2 0 0 : : ρ ρ ρ ρ − = − ≠ where�ρ1�is�the�correlation�among�X�and�Y�for�sample�1�and�ρ2�is�the�correlation�among�X� and�Y�for�sample�2��However,�because�correlations�are�not�normally�distributed�for�every� value�of�ρ,�a�transformation�is�necessary��This�transformation�is�known�as�Fisher’s�Z�trans- formation,� named� after� the� famous� statistician� Sir� Ronald� A�� Fisher,� which� is� approxi- mately� normally� distributed� regardless� of� the� value� of� ρ�� Table� A�5� is� used� to� convert� a� sample�correlation�r�to�a�Fisher’s�Z�transformed�value��Note�that�Fisher’s�Z�is�a�totally�dif- ferent�statistic�from�any�z�score�or�z�statistic�previously�covered� The�test�statistic�for�this�situation�is z Z Z n n = − − + − 1 2 1 2 1 3 1 3 where n1�and�n2�are�the�sizes�of�the�two�samples Z1�and�Z2�are�the�Fisher’s�Z�transformed�values�for�the�two�samples 269Bivariate Measures of Association The� test� statistic� is� then�compared� to� critical� values� from� the� z� distribution� in� Table� A�1�� For� a� nondirectional� alternative� hypothesis� where� the� two� correlations� may� be� different� in� either� direction,� the� critical� values� are� ± α2z�� Directional� alternative� hypotheses� where� the�correlations�are�different�in�a�particular�direction�can�also�be�tested�by�looking�in�the� appropriate�tail�of�the�z�distribution�(i�e�,�either�+ α1 z�or�− α1 z)� Cohen�(1988)�proposed�a�measure�of�effect�size�for�the�difference�between�two�indepen- dent�correlations�as�q�=�Z1�−�Z2��The�subjective�standards�proposed�(ignoring�the�sign)�are� q�=��1�as�a�weak�effect,�q�=��3�as�a�moderate�effect,�and�q�=��5�as�a�strong�effect�(these�are�the� standards�for�the�behavioral�sciences,�although�standards�vary�across�disciplines)��A�nice� set�of�power�tables�for�planning�purposes�is�contained�in�Chapter�4�of�Cohen��Once�again,� while�CI�procedures�have�been�developed,�none�of�these�have�been�viewed�as�acceptable� (Marascuilo�&�Serlin,�1988;�Wilcox,�2003)� Consider� the� following� example�� Two� samples� have� been� independently� drawn� of� 28� children� (sample� 1)� and� 28� adults� (sample� 2)�� For� each� sample,� the� correlations� among� height�and�weight�were�computed�to�be�rchildren�=��8�and�radults�=��4��A�nondirectional�alter- native�hypothesis�is�utilized�where�the�level�of�significance�is�set�at��05��From�Table�A�5,�we� first�determine�the�Fisher’s�Z�transformed�values�to�be�Zchildren�=�1�099�and�Zadults�=��4236�� Then�the�test�statistic�z�is�computed�as�follows: z Z Z n n = − − + − = − + =1 2 1 2 1 3 1 3 1 099 4236 1 25 1 25 2 3878 . . . From�Table�A�1,�the�critical�values�are�± = ±α2 1 96z . ��Our�decision�then�is�to�reject�the�null� hypothesis�and�conclude�that�height�and�weight�do�not�have�the�same�correlation�for�chil- dren�and�adults��In�other�words,�there�is�a�statistically�significant�difference�of�the�height- weight�correlation�between�children�and�adults�with�a�strong�effect�size�(q�=��6754)��This� inferential�test�assumes�both�variables�are�normally�distributed�for�each�population�and� that�scores�are�independent�across�individuals;�however,�the�procedure�is�not�very�robust� to� nonnormality� as� the� Z� transformation� assumes� normality� (Duncan� &� Layard,� 1973;� Wilcox,�2003;�Yu�&�Dunn,�1982)��Thus,�caution�should�be�exercised�in�using�the�z�test�when� data�are�nonnormal�(e�g�,�Yu�&�Dunn�recommend�the�use�of�Kendall’s�τ�as�discussed�later� in�this�chapter)� 10.5 Assumptions and Issues Regarding Correlations There�are�several�issues�about�the�Pearson�and�other�types�of�correlations�that�you�should� be�aware�of��These�issues�are�concerned�with�the�assumption�of�linearity,�correlation�and� causation,�and�restriction�of�range� 10.5.1 assumptions First,� as� mentioned� previously,� the� Pearson� correlation� assumes� that� the� relationship� among�X�and�Y�is�a�linear relationship.�In�fact,�the�Pearson�correlation,�as�a�measure�of� relationship,�is�really�a�linear�measure�of�relationship��Recall�from�earlier�in�the�chapter� 270 An Introduction to Statistical Concepts the� scatterplots� to� which� we� fit� a� straight� line�� The� linearity� assumption� means� that� a� straight�line�provides�a�reasonable�fit�to�the�data��If�the�relationship�is�not�a�linear�one,� then�the�linearity�assumption�is�violated��However,�these�correlational�methods�can�still� be�computed,�fitting�a�straight�line�to�the�data,�albeit�inappropriately��The�result�of�such� a� violation� is� that� the� strength� of� the� relationship� will� be� reduced�� In� other� words,� the� linear�correlation�will�be�much�closer�to�0�than�the�true�nonlinear�relationship� For�example,�there�is�a�perfect�curvilinear�relationship�shown�by�the�data�in�Figure�10�3,� where�all�of�the�points�fall�precisely�on�the�curved�line��Something�like�this�might�occur�if� you�correlate�age�with�time�in�the�mile�run,�as�younger�and�older�folks�would�take�longer� to�run�this�distance�than�others��If�these�data�are�fit�by�a�straight�line,�then�the�correlation� will�be�severely�reduced,�in�this�case,�to�a�value�of�0�(i�e�,�the�horizontal�straight�line�that� runs�through�the�curved�line)��This�is�another�good�reason�to�always�examine�your�data�� The� computer� may� determine� that� the� Pearson� correlation� among� variables� X� and� Y� is� small�or�around�0��However,�on�examination�of�the�data,�you�might�find�that�the�relation- ship�is�indeed�nonlinear;�thus,�you�should�get�to�know�your�data��We�return�to�the�assess- ment�of�nonlinear�relationships�in�Chapter�17� Second,�the�assumption�of�independence�applies�to�correlations��This�assumption�is�met� when�units�or�cases�are�randomly�sampled�from�the�population� 10.5.2 Correlation and Causality A�second�matter�to�consider�is�an�often-made�misinterpretation�of�a�correlation��Many�indi- viduals�(e�g�,�researchers,�the�public,�and�the�media)�often�infer�a�causal�relationship�from�a� strong�correlation��However,�a�correlation�by�itself�should�never�be�used�to�infer�causation�� In�particular,�a�high�correlation�among�variables�X�and�Y�does�not�imply�that�one�variable�is� causing�the�other;�it�simply�means�that�these�two�variables�are�related�in�some�fashion��There� are�many�reasons�why�variables�X�and�Y�are�highly�correlated��A�high�correlation�could�be� the�result�of�(a)�X�causing�Y,�(b)�Y�causing�X,�(c)�a�third�variable�Z�causing�both�X�and�Y,�or� (d)�even�many�more�variables�being�involved��The�only�methods�that�can�strictly�be�used�to� infer�cause�are�experimental�methods�that�employ�random�assignment�where�one�variable� is� manipulated� by� the� researcher� (the� cause),� a� second� variable� is� subsequently� observed� (the�effect),�and�all�other�variables�are�controlled��[There�are,�however,�some�excellent�quasi- experimental�methods,�propensity�score�analysis�and�regression�discontinuity,�that�can�be� used�in�some�situations�and�that�mimic�random�assignment�and�increase�the�likelihood�of� speaking�to�causal�inference�(Shadish,�Cook,�&�Campbell,�2002)�] FIGuRe 10.3 Nonlinear�relationship� Y X 271Bivariate Measures of Association 10.5.3 Restriction of Range A�final�issue�to�consider�is�the�effect�of�restriction of the range�of�scores�on�one�or�both� variables��For�example,�suppose�that�we�are�interested�in�the�relationship�among�GRE� scores�and�graduate�grade�point�average�(GGPA)��In�the�entire�population�of�students,� the� relationship� might� be� depicted� by� the� scatterplot� shown� in� Figure� 10�4�� Say� the� Pearson�correlation�is�found�to�be��60�as�depicted�by�the�entire�sample�in�the�full�scat- terplot��Now�we�take�a�more�restricted�population�of�students,�those�students�at�highly� selective� Ivy-Covered� University� (ICU)�� ICU� only� admits� students� whose� GRE� scores� are�above�the�cutoff�score�shown�in�Figure�10�4��Because�of�restriction�of�range�in�the� scores�of�the�GRE�variable,�the�strength�of�the�relationship�among�GRE�and�GGPA�at� ICU�is�reduced�to�a�Pearson�correlation�of��20,�where�only�the�subsample�portion�of�the� plot�to�the�right�of�the�cutoff�score�is�involved��Thus,�when�scores�on�one�or�both�vari- ables�are�restricted�due�to�the�nature�of�the�sample�or�population,�then�the�magnitude� of�the�correlation�will�usually�be�reduced�(although�see�an�exception�in�Figure�6�3�from� Wilcox,�2003)� It�is�difficult�for�two�variables�to�be�highly�related�when�one�or�both�variables�have�little� variability��This�is�due�to�the�nature�of�the�formula��Recall�that�one�version�of�the�Pearson� formula�consisted�of�standard�deviations�in�the�denominator��Remember�that�the�standard� deviation�measures�the�distance�of�the�sample�scores�from�the�mean��When�there�is�restric- tion�of�range,�the�distance�of�the�individual�scores�from�the�mean�is�minimized��In�other� words,� there� is� less� variation� or� variability� around� the� mean�� This� translates� to� smaller� correlations�(and�smaller�covariances)��If�the�size�of�the�standard�deviation�for�one�variable� is�reduced,�everything�else�being�equal,�then�the�size�of�correlations�with�other�variables� will�also�be�reduced��In�other�words,�we�need�sufficient�variation�for�a�relationship�to�be� evidenced�through�the�correlation�coefficient�value��Otherwise�the�correlation�is�likely�to� be�reduced�in�magnitude,�and�you�may�miss�an�important�correlation��If�you�must�use�a� restrictive�subsample,�we�suggest�you�choose�measures�of�greater�variability�for�correla- tional�purposes� Outliers,�observations�that�are�different�from�the�bulk�of�the�observations,�also�reduce� the� magnitude� of� correlations�� If� one� observation� is� quite� different� from� the� rest� such� that�it�fell�outside�of�the�ellipse,�then�the�correlation�would�be�smaller�in�magnitude�(e�g�,� closer�to�0)�than�the�correlation�without�the�outlier��We�discuss�outliers�in�this�context� in�Chapter�17� GGPA Cuto� GRE FIGuRe 10.4 Restriction�of�range�example� 272 An Introduction to Statistical Concepts 10.6 Other Measures of Association Thus�far,�we�have�considered�one�type�of�correlation,�the�Pearson�product–moment�cor- relation� coefficient�� The� Pearson� is� most� appropriate� when� both� variables� are� at� least� interval�level��That�is,�both�variables�X�and�Y�are�interval-�and/or�ratio-level�variables�� The�Pearson�is�considered�a�parametric�procedure�given�the�distributional�assumptions� associated� with� it�� If� both� variables� are� not� at� least� interval� level,� then� other� measures� of� association,� considered� nonparametric� procedures,� should� be� considered� as� they� do� not�have�distributional�assumptions�associated�with�them��In�this�section,�we�examine� in� detail� the� Spearman’s� rho� and� phi� types� of� correlation� coefficients� and� briefly� men- tion�several�other�types��While�a�distributional�assumption�for�these�correlations�is�not� necessary,�the�assumption�of�independence�still�applies�(and�thus�a�random�sample�from� the�population�is�assumed)� 10.6.1 Spearman’s Rho Spearman’s�rho�rank�correlation�coefficient�is�appropriate�when�both�variables�are�ordinal� level�� This� type� of� correlation� was� developed� by� Charles� Spearman,� the� famous� quanti- tative� psychologist�� Recall� from� Chapter� 1� that� ordinal� data� are� where� individuals� have� been�rank-ordered,�such�as�class�rank��Thus,�for�both�variables,�either�the�data�are�already� available�in�ranks,�or�the�researcher�(or�computer)�converts�the�raw�data�to�ranks�prior�to� the�analysis� The�equation�for�computing�Spearman’s�rho�correlation�is ρS i i i N X Y N N = − − − = ∑ 1 6 1 2 1 2 ( ) ( ) where ρS�denotes�the�population�Spearman’s�rho�correlation (Xi�−�Yi)�represents�the�difference�between�the�ranks�on�variables�X�and�Y�for�individual�i� The�sample�Spearman’s�rho�correlation�is�denoted�by�rS�where�n�replaces�N,�but�other- wise�the�equation�remains�the�same��In�case�you�were�wondering�where�the�“6”�in�the� equation�comes�from,�you�will�find�an�interesting�article�by�Lamb�(1984)��Unfortunately,� this�particular�computational�formula�is�only�appropriate�when�there�are�no�ties�among� the�ranks�for�either�variable��An�example�of�a�tie�in�rank�would�be�if�two�cases�scored�the� same�value�on�either�X�or�Y��With�ties,�the�formula�given�is�only�approximate,�depending� on�the�number�of�ties��In�the�case�of�ties,�particularly�when�there�are�more�than�a�few,� many�researchers�recommend�using�Kendall’s�τ�(tau)�as�an�alternative�correlation�(e�g�,� Wilcox,�1996)� As� with� the� Pearson� correlation,� Spearman’s� rho� ranges� from� −1�0� to� +1�0�� The� rules� of� thumb�that�we�used�for�interpreting�the�Pearson�correlation�(e�g�,�Cohen,�1988)�can�be�applied� to�Spearman’s�rho�correlation�values�as�well��The�sign�of�the�coefficient�can�be�interpreted�as� with�the�Pearson��A�negative�sign�indicates�that�as�the�values�for�one�variable�increase,�the� values�for�the�other�variable�decrease��A�positive�sign�indicates�that�as�one�variable�increases� in�value,�the�value�of�the�second�variable�also�increases� 273Bivariate Measures of Association As�an�example,�consider�the�children-pets�data�again�in�Table�10�2��To�the�right�of�the�table,� you� see� the� last� three� columns� labeled� as� rank� X,� rank� Y,� and� (rank� X� −� rank� Y)2�� The� raw� scores�were�converted�to�ranks,�where�the�lowest�raw�score�received�a�rank�of�1��The�last�col- umn�lists�the�squared�rank�differences��As�there�were�no�ties,�the�computations�are�as�follows: ρS i i i N X Y N N = − − − = − == ∑ 1 6 1 1 6 2 5 24 9000 2 1 2 ( ) ( ) ( ) ( ) . Thus,�again�there�is�a�strong�positive�relationship�among�variables�X�and�Y��It�is�a�coincidence� that�ρ�=�ρS�for�this�dataset,�but�not�so�for�computational�problem�1�at�the�end�of�this�chapter� To�test�whether�a�sample�Spearman’s�rho�correlation�is�significantly�different�from�0,� we� examine� the� following� null� hypothesis� (the� alternative� hypothesis� would� be� stated� as�H1:�ρS�≠�0): H S0 0: ρ = The�test�statistic�is�given�as t r n r S S = − − 2 1 2 which�is�approximately�distributed�as�a�t�distribution�with�ν�=�n�−�2�degrees�of�freedom� (Ramsey,�1989)��The�approximation�works�best�when�n�is�at�least�10��A�nondirectional�alter- native�hypothesis,�where�we�are�willing�to�reject�the�null�if�the�sample�correlation�is�either� significantly�greater�than�or�less�than�0,�is�nearly�always�utilized��From�the�example,�we� want�to�determine�whether�the�sample�Spearman’s�rho�correlation�is�significantly�different� from�0�at�the��05�level�of�significance��For�a�nondirectional�alternative�hypothesis,�the�test� statistic�is�computed�as t r n r S S = − − = − − = 2 1 9000 5 2 1 81 3 5762 2 . . . where�the�critical�values�from�Table�A�2�are�± = ±α2 3 3 182t . ��Thus,�we�would�reject�the�null� hypothesis� and� conclude� that� the� correlation� is� significantly� different� from� 0,� strong� in� magnitude�(suggested�by�the�value�of�the�correlation�coefficient;�using�Cohen’s�guidelines� for� interpretation� as� an� effect� size,� this� would� be� considered� a� large� effect),� and� positive� in� direction� (evidenced� from� the� sign� of� the� correlation� coefficient)�� The� exact� sampling� distribution�for�when�3�≤�n�≤�18�is�given�by�Ramsey� 10.6.2 kendall’s Tau Another�correlation�that�can�be�computed�with�ordinal�data�is�Kendall’s�tau,�which�also� uses�ranks�of�data�to�calculate�the�correlation�coefficient�(and�has�an�adjustment�for�tied� ranks)�� The� ranking� for� Kendall’s� tau� differs� from� Spearman’s� rho� in� the� following� way�� 274 An Introduction to Statistical Concepts With�Kendall’s�tau,�the�values�for�one�variable�are�rank-ordered,�and�then�the�order�of�the� second�variable�is�examined�to�see�how�many�pairs�of�values�are�out�of�order��A�perfect� positive�correlation�(+1�0)�is�achieved�with�Kendall’s�tau�when�no�scores�are�out�of�order,� and�a�perfect�negative�correlation�(−1�0)�is�obtained�when�all�scores�are�out�of�order��Values� for�Kendall’s�tau�range�from�−1�0�to�+1�0��The�rules�of�thumb�that�we�used�for�interpreting� the�Pearson�correlation�(e�g�,�Cohen,�1988)�can�be�applied�to�Kendall’s�tau�correlation�val- ues�as�well��The�sign�of�the�coefficient�can�be�interpreted�as�with�the�Pearson:�A�negative� sign�indicates�that�as�the�values�for�one�variable�increase,�the�values�for�the�second�vari- able�decrease��A�positive�sign�indicates�that�as�one�variable�increases�in�value,�the�value� of�the�second�variable�also�increases��While�similar�in�some�respects,�Spearman’s�rho�and� Kendall’s�tau�are�based�on�different�calculations,�and,�thus,�finding�different�results�is�not� uncommon�� While� both� are� appropriate� when� ordinal� data� are� being� correlated,� it� has� been�suggested�that�Kendall’s�tau�provides�a�better�estimation�of�the�population�correla- tion�coefficient�value�given�the�sample�data�(Howell,�1997),�especially�with�smaller�sample� sizes�(e�g�,�n�≤�10)� 10.6.3 phi The�phi�coefficient�ϕ�is�appropriate�when�both�variables�are�dichotomous�in�nature�(and�is� statistically� equivalent� to� the� Pearson)�� Recall� from� Chapter� 1� that� a� dichotomous� variable� is�one�consisting�of�only�two�categories�(i�e�,�binary),�such�as�gender,�pass/fail,�or�enrolled/ dropped�out��Thus,�the�variables�being�correlated�would�be�either�nominal�or�ordinal�in�scale�� When�correlating�two�dichotomous�variables,�one�can�think�of�a�2�×�2�contingency�table�as� previously�discussed�in�Chapter�8��For�instance,�to�determine�if�there�is�a�relationship�among� gender�and�whether�students�are�still�enrolled�since�their�freshman�year,�a�contingency�table� like�Table�10�3�can�be�constructed��Here�the�columns�correspond�to�the�two�levels�of�the�enroll- ment�status�variable,�enrolled�(coded�1)�or�dropped�out�(0),�and�the�rows�correspond�to�the� two�levels�of�the�gender�variable,�female�(1)�or�male�(0)��The�cells�indicate�the�frequencies�for� the�particular�combinations�of�the�levels�of�the�two�variables��If�the�frequencies�in�the�cells�are� denoted�by�letters,�then�a�represents�females�who�dropped�out,�b�represents�females�who� are�enrolled,�c�indicates�males�who�dropped�out,�and�d�indicates�males�who�are�enrolled� The�equation�for�computing�the�phi�coefficient�is ρφ = − + + + + ( ) ( )( )( )( ) bc ad a c b d a b c d where�ρϕ�denotes�the�population�phi�coefficient�(for�consistency’s�sake,�although�typically� written�as�ϕ),�and�rϕ�denotes�the�sample�phi�coefficient�using�the�same�equation��Note�that� Table 10.3 Contingency�Table�for�Phi�Correlation Enrollment Status Student Gender Dropped Out (0) Enrolled (1) Female�(1) a�=�5 b�=�20 a�+�b�=�25 Male�(0) c�=�15 d�=�10 c�+�d�=�25 a�+�c�=�20 b�+�d�=�30 a�+�b�+�c�+�d�=�50 275Bivariate Measures of Association the�bc�product�involves�the�consistent�cells,�where�both�values�are�the�same,�either�both�0�or� both�1,�and�the�ad�product�involves�the�inconsistent�cells,�where�both�values�are�different� Using�the�example�data�from�Table�10�3,�we�compute�the�phi�coefficient�to�be�the�following: ρφ = − + + + + = − = ( ) ( )( )( )( ) ( ) ( )( )( )( ) . bc ad a c b d a b c d 300 50 20 30 25 25 40082 Thus,�there�is�a�moderate,�positive�relationship�between�gender�and�enrollment�status��We� see�from�the�table�that�a�larger�proportion�of�females�than�males�are�still�enrolled� To� test� whether� a� sample� phi� correlation� is� significantly� different� from� 0,� we� test� the� following�null�hypothesis�(the�alternative�hypothesis�would�be�stated�as�H1:�ρϕ�≠�0): H0 0: ρφ = The�test�statistic�is�given�as χ φ 2 2= nr which�is�distributed�as�a�χ2�distribution�with�one�degree�of�freedom��From�the�example,� we�want�to�determine�whether�the�sample�phi�correlation�is�significantly�different�from�0� at�the��05�level�of�significance��The�test�statistic�is�computed�as χ φ 2 2 250 4082 8 3314= = =nr (. ) . and�the�critical�value�from�Table�A�3�is�. .05 1 2 3 84χ = ��Thus,�we�would�reject�the�null�hypoth- esis�and�conclude�that�the�correlation�among�gender�and�enrollment�status�is�significantly� different�from�0� 10.6.4 Cramer’s phi When�the�variables�being�correlated�have�more�than�two�categories,�Cramer’s�phi�(Cramer’s�V� in�SPSS)�can�be�computed��Thus,�Cramer’s�phi�is�appropriate�when�both�variables�are�nominal� (and�at�least�one�variable�has�more�than�two�categories)�or�when�one�variable�is�nominal�and� the�other�variable�is�ordinal�(and�at�least�one�variable�has�more�than�two�categories)�� As� with�the�other�correlation�coefficients�that�we�have�discussed,�values�range�from�−1�0�to� +1�0�� Cohen’s� guidelines� (1988)� for� interpreting� the� correlation� in� terms� of� effect� size� can� be� applied�to�Cramer’s�phi�correlations,�as�they�can�with�any�other�correlation�examined� 10.6.5 Other Correlations Other� types� of� correlations� have� been� developed� for� different� combinations� of� types� of� variables,�but�these�are�rarely�used�in�practice�and�are�unavailable�in�most�statistical�pack- ages�(e�g�,�rank�biserial�and�point�biserial)��Table�10�4�provides�suggestions�for�when�dif- ferent�types�of�correlations�are�most�appropriate��We�mention�briefly�the�two�other�types� of�correlations�in�the�table:�the�rank�biserial�correlation�is�appropriate�when�one�variable� is�dichotomous�and�the�other�variable�is�ordinal,�whereas�the�point�biserial�correlation�is� appropriate�when�one�variable�is�dichotomous�and�the�other�variable�is�interval�or�ratio� (statistically�equivalent�to�the�Pearson;�thus,�the�Pearson�correlation�can�be�computed�in� this�situation)� 276 An Introduction to Statistical Concepts 10.7 SPSS Next� let� us� see� what� SPSS� has� to� offer� in� terms� of� measures� of� association� using� the� children-pets�example�dataset��There�are�two�programs�for�obtaining�measures�of�asso- ciation� in� SPSS,� dependent� on� the� measurement� scale� of� your� variables—the� Bivariate� Correlation�program�(for�computing�the�Pearson,�Spearman’s�rho,�and�Kendall’s�tau)�and� the�Crosstabs�program�(for�computing�the�Pearson,�Spearman’s�rho,�Kendall’s�tau,�phi,� Cramer’s�phi,�and�several�other�types�of�measures�of�association)� Bivariate Correlations Step 1:�To�locate�the�Bivariate�Correlations�program,�we�go�to�“Analyze”�in�the�top�pull- down�menu,�then�select�“Correlate,”�and�then�“Bivariate.”�Following�the�screenshot� (step�1),�as�follows,�produces�the�“Bivariate”�dialog�box� A B C Bivariate correlations: Step 1 Table 10.4 Different�Types�of�Correlation�Coefficients Variable X Variable Y Nominal Ordinal Interval/Ratio Nominal Phi�(when�both�variables�are� dichotomous)�or�Cramer’s�V� (when�one�or�both�variables�have� more�than�two�categories) Rank�biserial�or� Cramer’s�V Point�biserial�(Pearson� in lieu�of�point�biserial) Ordinal Rank�biserial�or�Cramer’s�V Spearman’s�rho�or� Kendall’s�tau Spearman’s�rho�or� Kendall’s�tau�or�Pearson Interval/ratio Point�biserial�(Pearson�in�lieu� of point�biserial) Spearman’s�rho�or� Kendall’s�tau�or� Pearson Pearson 277Bivariate Measures of Association Step 2:�Next,�from�the�main�“Bivariate Correlations”�dialog�box,�click�the�variables� to�correlate�(e�g�,�number�of�children�and�pets)�and�move�them�into�the�“Variables”�box� by�clicking�on�the�arrow�button��In�the�bottom�half�of�this�dialog�box,�options�are�available� for�selecting�the�type�of�correlation,�one-�or�two-tailed�test�(i�e�,�directional�or�nondirectional� test),�and�whether�to�flag�statistically�significant�correlations��For�illustrative�purposes,�we� will� place� a� checkmark� to� generate� the�“Pearson”� and�“Spearman’s rho”� correlation� coefficients��We�will�also�select�the�radio�button�for�a�“Two-tailed”�test�of�significance,�and� at�the�very�bottom�check,�we�will�“Flag significant correlations”�(which�simply� means�an�asterisk�will�be�placed�next�to�significant�correlations�in�the�output)� Select the variables of interest from the list on the left and use the arrow to move to the “Variables” box on the right. Clicking on “Options” will allow you to obtain the means, standard deviations, and/or covariances. Place a checkmark in the box that corresponds to the type of correlation to generate. This decision will be based on the measurement scale of your variables. “Test of significance” selected is based on a non- directional (two-tailed) or directional (one-tailed) test. “Flag significant correlations” will generate asterisks in the output for statistically significant correlations. Bivariate correlations: Step 2 Step 3 (optional):� To� obtain� means,� standard� deviations,� and/or� covariances,� as� well�as�options�for�dealing�with�missing�data�(listwise�or�pairwise�deletion),�click�on�the� “Options”�button�located�in�the�top�right�corner�of�the�main�dialog�box� Step 3 278 An Introduction to Statistical Concepts From� the� main� dialog� box,� click� on� “Ok”� to� run� the� analysis� and� to� generate� the� output� Interpreting the output:�The�output�for�generation�of�the�Pearson�and�Spearman’s� rho� bivariate� correlations� between� number� of� children� and� number� of� pets� appears� in� Table� 10�5�� For� illustrative� purposes,� we� asked� for� both� the� Pearson� and� Spearman’s� rho� correlations�(although�the�Pearson�is�the�appropriate�correlation�given�the�measurement� scales�of�our�variables,�we�have�also�generated�the�Spearman’s�rho�so�that�the�output�can� be�reviewed)��Thus,�the�top�Correlations�box�gives�the�Pearson�results�and�the�bottom� Correlations�box�the�Spearman’s�rho�results��In�both�cases,�the�output�presents�the�cor- relation,�sample�size�(N�in�SPSS�language,�although�usually�denoted�as�n�by�everyone�else),� observed�level�of�significance,�and�asterisks�denoting�statistically�significant�correlations�� In� reviewing� Table� 10�5,� we� see� that� SPSS� does� not� provide� any� output� in� terms� of� CIs,� power,� or� effect� size�� Later� in� the� chapter,� we� illustrate� the� use� of� G*Power� for� comput- ing�power��Effect�size�is�easily�interpreted�from�the�correlation�coefficient�value�utilizing� Cohen’s�(1988)�subjective�standards�previously�described,�and�we�have�not�recommended� any�CI�procedures�for�correlations� Table 10.5 SPSS�Results�for�Child—Pet�Data The bivariate Pearson correlations are presented in the top row. The value of “1” indicates the Pearson correlation of the variable with itself. The correlation of interest (relationship of number of children to number of pets) is .900. The asterisk indicates the correlation is statistically significant at an alpha of .05. �e probability is less than 4% (see “Sig. (two-tailed)”) that we would see this relationship by random chance if the relationship between variables was zero (i.e., if the null hypothesis was really true). N represents the total sample size. �e bottom half of the table presents the same information as that presented in the top half. �e results for the same data computed with Spearman’s rho are presented here and interpreted similarly. Children Pets Pearson correlation Sig. (two-tailed) N Pearson correlation Sig. (two-tailed) N Correlation coe�cient Correlations Children Pets Sig. (two-tailed) N Correlation coe�cient Sig. (two-tailed) N * Correlation is significant at the 0.05 level (two-tailed). Children Correlations Pets .900* 5 1 5 1 5 5 .037 .900* .037 * Correlation is significant at the 0.05 level (two-tailed). Pets ChildrenSpearman’s rho 1.000 . 5 .900* .037 5 5 .900* .037 5 . 1.000 279Bivariate Measures of Association Using Scatterplots to Examine Linearity for Bivariate Correlations Step 1:�As�alluded�to�earlier�in�the�chapter,�understanding�the�extent�to�which�linear- ity� is� a� reasonable� assumption� is� an� important� first� step� prior� to� computing� a� Pearson� correlation� coefficient�� To� generate� a� scatterplot,� go� to� “Graphs”� in� the� top� pulldown� menu��From�there,�select�“Legacy Dialogs,”�then�“Scatter/Dot”�(see�screenshot�for� “Scatterplots: Step 1”)� A B C Scatterplots: Step 1 Step 2:� This� will� bring� up� the� “Scatter/Dot”� dialog� box� (see� screenshot� for� “Scatterplots: Step 2”).�The�default�selection�is�“Simple Scatter,”�and�this�is� the�option�we�will�use��Then�click�“Define.” Scatterplots: Step 2 Step 3:� This� will� bring� up� the�“Simple Scatterplot”� dialog� box� (see� screenshot� for�“Scatterplots: Step 3”)��Click�the�dependent�variable�(e�g�,�number�of�pets)�and� move�it�into�the�“Y�Axis”�box�by�clicking�on�the�arrow��Click�the�independent�variable� (e�g�,� number� of� children)� and� move� it� into� the�“X� Axis”� box� by� clicking� on� the� arrow�� Then�click�“Ok.” 280 An Introduction to Statistical Concepts Scatterplots: Step 3 Interpreting linearity evidence:� Scatterplots� are� also� often� examined� to� deter- mine�visual�evidence�of�linearity�prior�to�computing�Pearson�correlations��Scatterplots�are� graphs�that�depict�coordinate�values�of�X�and�Y��Linearity�is�suggested�by�points�that�fall�in� a�straight�line��This�line�may�suggest�a�positive�relation�(as�scores�on�X�increase,�scores�on�Y� increase,�and�vice�versa),�a�negative�relation�(as�scores�on�X�increase,�scores�on�Y�decrease,� and� vice� versa),� little� or� no� relation� (relatively� random� display� of� points),� or� a� polynomial� relation�(e�g�,�curvilinear)��In�this�example,�our�scatterplot�suggests�evidence�of�linearity�and,� more�specifically,�a�positive�relationship�between�number�of�children�and�number�of�pets�� Thus,�proceeding�to�compute�a�bivariate�Pearson�correlation�coefficient�is�reasonable� 10.00 8.00 6.00 4.00 2.00 1.00 2.00 3.00 Number of children 4.00 5.00 N um be r o f p et s 281Bivariate Measures of Association Using Crosstabs to Compute Correlations The�Crosstabs�program�has�already�been�discussed�in�Chapter�8,�but�it�can�also�be�used� for� obtaining� many� measures� of� association� (specifically� Spearman’s� rho,� Kendall’s� tau,� Pearson,�phi�and�Cramer’s�phi)��We�will�illustrate�the�use�of�Crosstabs�for�two�nominal� variables,�thus�generating�phi�and�Cramer’s�phi� Step 1:�To�compute�phi�or�Cramer’s�phi�correlations,�go�to�“Analyze”�in�the�top�pull- down,� then� select� “Descriptive Statistics,”� and� then� select� the� “Crosstabs”� procedure� A B C Phi and Cramers’s phi: Step 1 Step 2:�Select�the�dependent�variable�(if�applicable;�many�times,�there�are�not�depen- dent�and�independent�variables,�per�se,�with�bivariate�correlations,�and�in�those�cases,� determining� which� variable� is� X� and� which� variable� is� Y� is� largely� irrelevant)� and� move�it�into�the�“Row(s)”�box�by�clicking�on�the�arrow�key�[e�g�,�here�we�used�enroll- ment�status�as�the�dependent�variable�(1�=�enrolled;�0�=�not�enrolled)]��Then�select�the� independent�variable�and�move�it�into�the�“Column(s)”�box�[in�this�example,�gender� is�the�independent�variable�(0�=�male;�1�=�female)]� 282 An Introduction to Statistical Concepts Clicking on “Statistics” will allow you to select various statistics to generate (including various measures of association). Select the variable of interest from the list on the left and use the arrow to move to the boxes on the right. If applicable, the dependent variable should be displayed in the row(s) and the independent variable in the column(s). Phi and Cramers’s phi: Step 2 Step 3:�In�the�top�right�corner�of�the�“Crosstabs”�dialog�box�(see�screenshot�step�2),� click� on� the� button� labeled�“Statistics.”� From� here,� you� can� select� various� measures� of�association�(i�e�,�types�of�correlation�coefficients)��Which�correlation�is�selected�should� depend� on� the� measurement� scales� of� your� variables�� With� two� nominal� variables,� the� appropriate�correlation�to�select� is�“Phi and Cramer’s V.”� Click�on�“Continue”�to� return�to�the�main�“Crosstabs”�dialog�box� Clicking on “Correlations” will generate Pearson, Spearman’s rho, and Kendall’s tau correlations. Phi and Cramer’s phi: Step 3 From�the�main�dialog�box,�click�on�“Ok”�to�run�the�analysis�and�generate�the�output� 283Bivariate Measures of Association 10.8 G*Power A� priori� and� post� hoc� power� could� again� be� determined� using� the� specialized� software� described�previously�in�this�text�(e�g�,�G*Power),�or�you�can�consult�a�priori�power�tables� (e�g�,�Cohen,�1988)��As�an�illustration,�we�use�G*Power�to�compute�the�post�hoc�power�of� our�test� Post Hoc Power for the Pearson Bivariate Correlation Using G*Power The�first�thing�that�must�be�done�when�using�G*Power�for�computing�post�hoc�power�is�to� select�the�correct�test�family��In�our�case,�we�conducted�a�Pearson�correlation��To�find�the� Pearson,�we�will�select�“Tests”�in�the�top�pulldown�menu,�then�“Correlations and regression,”� and� then�“Correlations: Bivariate normal model.”� Once� that� selection�is�made,�the�“Test family”�automatically�changes�to�“Exact.” A B C Step 1 284 An Introduction to Statistical Concepts The�“Type of power analysis”�desired�then�needs�to�be�selected��To�compute� post hoc�power,�select�“Post hoc: Compute achieved power—given�α,�sample size, and effect size.” �e default selection for “Test Family” is“t tests.” Following the procedures presented in Step 1 will automatically change the test family to “exact.” �e default selection for “Statistical Test” is “Correlation: Point biserial model.” Following the procedures presented in Step 1 will automatically change the statistical test to “correlation: bivariate normal model.” Step 2 The� “Input Parameters”� must� then� be� specified�� The� first� parameter� is� specifica- tion� of� the� number� of� tail(s)�� For� a� directional� hypothesis,� “One”� is� selected,� and� for� a� nondirectional�hypothesis,�“Two”�is�selected��In�our�example,�we�chose�a�nondirectional� hypothesis�and�thus�will�select�“Two”�tails��We�then�input�the�observed�correlation�coef- ficient�value�in�the�box�for�“Correlation�ρ�H1�”�In�this�example,�our�Pearson�correlation� coefficient�value�was��90��The�alpha�level�we�tested�at�was��05,�the�total�sample�size�was�5,� and� the�“Correlation� ρ�H0”� will� remain� as� the� default� 0� (this� is� the� correlation� value� expected�if�the�null�hypothesis�is�true;�in�other�words,�there�is�zero�correlation�between� variables� given� the� null� hypothesis)�� Once� the� parameters� are� specified,� simply� click� on� “Calculate”�to�generate�the�power�results� 285Bivariate Measures of Association The “Input Parameters” for computing post hoc power must be specified for: 1. One or two tailed test 2. Observed correlation coefficient value 3. Alpha level 4. Total sample size 5. Hypothesized correlation coefficient value Once the parameters are specified, click on “Calculate.” The�“Output Parameters”�provide�the�relevant�statistics�given�the�input�just�speci- fied��In�this�example,�we�were�interested�in�determining�post�hoc�power�for�a�Pearson�cor- relation�given�a�two-tailed�test,�with�a�computed�correlation�value�of��90,�an�alpha�level�of� �05,�total�sample�size�of�5,�and�a�null�hypothesis�correlation�value�of�0� Based� on� those� criteria,� the� post� hoc� power� was� �67�� In� other� words,� with� a� two-tailed� test,�an�observed�Pearson�correlation�of��90,�an�alpha�level�of��05,�sample�size�of�5,�and�a� null� hypothesis� correlation� value� of� 0,� the� power� of� our� test� was� �67—the� probability� of� rejecting�the�null�hypothesis�when�it�is�really�false�(in�this�case,�the�probability�that�there� is�not�a�zero�correlation�between�our�variables)�was�67%,�which�is�slightly�less�than�what� would�be�usually�considered�sufficient�power�(sufficient�power�is�often��80�or�above)��Keep� in�mind�that�conducting�power�analysis�a�priori�is�recommended�so�that�you�avoid�a�situ- ation�where,�post�hoc,�you�find�that�the�sample�size�was�not�sufficient�to�reach�the�desired� level�of�power�(given�the�observed�parameters)� 286 An Introduction to Statistical Concepts 10.9 Template and APA-Style Write-Up Finally�we�conclude�the�chapter�with�a�template�and�an�APA-style�paragraph�detailing�the� results�from�an�example�dataset� Pearson Correlation Test As� you� may� recall,� our� graduate� research� assistant,� Marie,� was� working� with� the� mar- keting� director� of� the� local� animal� shelter,� Matthew�� Marie’s� task� was� to� assist� Matthew� in�generating�the�test�of�inference�to�answer�his�research�question,�“Is there a relationship between the number of children in a family and the number of pets”?�A�Pearson�correlation�was� the�test�of�inference�suggested�by�Marie��A�template�for�writing�a�research�question�for�a� correlation�(regardless�of�which�type�of�correlation�coefficient�is�computed)�is�presented� in�the�following: Is There a Correlation Between [Variable 1] and [Variable 2]? It� may� be� helpful� to� include� in� the� results� information� on� the� extent� to� which� the� assumptions�were�met�(recall�there�are�two�assumptions:�independence�and�linearity)�� This�assists�the�reader�in�understanding�that�you�were�thorough�in�data�screening�prior� to�conducting�the�test�of�inference��Recall�that�the�assumption�of�independence�is�met� when�the�cases�in�our�sample�have�been�randomly�selected�from�the�population��One� or�two�sentences�are�usually�sufficient�to�indicate�if�the�assumptions�are�met��It�is�also� important� to� address� effect� size� in� the� write-up�� Correlations� are� unique� in� that� they� are� already� effect� size� measures,� so� computing� an� effect� size� in� addition� to� the� cor- relation�value�is�not�needed��However,�it�is�desirable�to�interpret�the�correlation�value� as� an�effect�size�� Effect�size�is�easily�interpreted�from�the�correlation�coefficient�value� utilizing�Cohen’s�(1988)�subjective�standards�previously�described��Here�is�an�APA-style� example�paragraph�of�results�for�the�correlation�between�number�of�children�and�num- ber�of�pets� A Pearson correlation coefficient was computed to determine if there is a relationship between the number of children in a family and the number of pets in the family. The test was conducted using an alpha of .05. The null hypothesis was that the relationship would be 0. The assumption of independence was met via random selection. The assump- tion of linearity was reasonable given a review of a scatterplot of the variables. The Pearson correlation between children and pets is .90, which is positive, is interpreted as a large effect size (Cohen, 1988), and is statistically different from 0 (r =�.90,�n�= 5, p = .037).�Thus, the null hypothesis that the correlation is 0 was rejected at the .05 level of significance. There is a strong, positive correlation between the number of children in a family and the number of pets in the family. 287Bivariate Measures of Association 10.10 Summary In�this�chapter,�we�described�various�measures�of�the�association�or�correlation�among�two� variables�� Several� new� concepts� and� descriptive� and� inferential� statistics� were� discussed�� The� new� concepts� covered� were� as� follows:� scatterplot;� strength� and� direction;� covariance;� correlation� coefficient;� Fisher’s� Z� transformation;� and� linearity� assumption,� causation,� and� restriction�of�range�issues��We�began�by�introducing�the�scatterplot�as�a�graphical�method�for� visually�depicting�the�association�among�two�variables��Next�we�examined�the�covariance�as� an�unstandardized�measure�of�association��Then�we�considered�the�Pearson�product–moment� correlation�coefficient,�first�as�a�descriptive�statistic�and�then�as�a�method�for�making�infer- ences�when�there�are�either�one�or�two�samples�of�observations��Some�important�issues�about� the�correlational�measures�were�also�discussed��Finally,�a�few�other�measures�of�association� were�introduced,�in�particular,�the�Spearman’s�rho�and�Kendall’s�tau�rank-order�correlation� coefficients�and�the�phi�and�Cramer’s�phi�coefficients��At�this�point,�you�should�have�met�the� following�objectives:�(a)�be�able�to�understand�the�concepts�underlying�the�correlation�coef- ficient�and�correlation�inferential�tests,�(b)�be�able�to�select�the�appropriate�type�of�correlation,� and�(c)�be�able�to�determine�and�interpret�the�appropriate�correlation�and�correlation�inferen- tial�test��In�Chapter�11,�we�discuss�the�one-factor�analysis�of�variance,�the�logical�extension�of� the�independent�t�test,�for�assessing�mean�differences�among�two�or�more�groups� Problems Conceptual problems 10.1� The�variance�of�X�is�9,�the�variance�of�Y�is�4,�and�the�covariance�between�X�and�Y�is�2�� What�is�rXY? � a�� 039 � b�� 056 � c�� 233 � d�� 333 10.2� The�standard�deviation�of�X�is�20,�the�standard�deviation�of�Y�is�50,�and�the�covari- ance�between�X�and�Y�is�30��What�is�rXY? � a�� 030 � b�� 080 � c�� 150 � d�� 200 10.3� Which�of�the�following�correlation�coefficients,�each�obtained�from�a�sample�of�1000� children,�indicates�the�weakest�relationship? � a�� −�90 � b�� −�30 � c�� +�20 � d�� +�80 288 An Introduction to Statistical Concepts 10.4� �Which�of�the�following�correlation�coefficients,�each�obtained�from�a�sample�of�1000� children,�indicates�the�strongest�relationship? � a�� −�90 � b�� −�30 � c�� +�20 � d�� +�80 10.5� �If�the�relationship�between�two�variables�is�linear,�which�of�the�following�is�neces- sarily�true? � a�� The�relation�can�be�most�accurately�represented�by�a�straight�line� � b�� All�the�points�will�fall�on�a�curved�line� � c�� The�relationship�is�best�represented�by�a�curved�line� � d�� All�the�points�must�fall�exactly�on�a�straight�line� 10.6� �In� testing� the� null� hypothesis� that� a� correlation� is� equal� to� 0,� the� critical� value� decreases�as�α�decreases��True�or�false? 10.7� �If�the�variances�of�X�and�Y�are�increased,�but�their�covariance�remains�constant,�the� value�of�rXY�will�be�unchanged��True�or�false? 10.8� �We�compute�rXY�=��50�for�a�sample�of�students�on�variables�X�and�Y��I�assert�that�if� the�low-scoring�students�on�variable�X�are�removed,�then�the�new�value�of�rXY�would� most�likely�be�less�than��50��Am�I�correct? 10.9� �Two�variables�are�linearly�related�such�that�there�is�a�perfect�relationship�between�X� and�Y��I�assert�that�rXY�must�be�equal�to�either�+1�00�or�−1�00��Am�I�correct? 10.10� �If� the� number� of� credit� cards� owned� and� the� number� of� cars� owned� are� strongly positively�correlated,�then�those�with�more�credit�cards�tend�to�own�more�cars��True� or�false? 10.11� �If� the� number� of� credit� cards� owned� and� the� number� of� cars� owned� are� strongly negatively� correlated,� then� those� with� more� credit� cards� tend� to� own� more� cars�� True�or�false? 10.12� �A�statistical�consultant�at�a�rival�university�found�the�correlation�between�GRE-Q� scores� and� statistics� grades� to� be� +2�0�� I� assert� that� the� administration� should� be� advised�to�congratulate�the�students�and�faculty�on�their�great�work�in�the�class- room��Am�I�correct? 10.13� �If� X� correlates� significantly� with� Y,� then� X� is� necessarily� a� cause� of� Y�� True� or� false? 10.14� �A�researcher�wishes�to�correlate�the�grade�students�earned�from�a�pass/fail�course� (i�e�,�pass�or�fail)�with�their�cumulative�GPA��Which�is�the�most�appropriate�correla- tion�coefficient�to�examine�this�relationship? � a�� Pearson � b�� Spearman’s�rho�or�Kendall’s�tau � c�� Phi � d�� None�of�the�above 10.15� �If�both�X�and�Y�are�ordinal�variables,�then�the�most�appropriate�measure�of�associa- tion�is�the�Pearson��True�or�false? 289Bivariate Measures of Association Computational problems 10.1� You�are�given�the�following�pairs�of�sample�scores�on�X�(number�of�credit�cards�in� your�possession)�and�Y�(number�of�those�credit�cards�with�balances): X Y 5 4 6 1 4 3 8 7 2 2 � a�� Graph�a�scatterplot�of�the�data� � b�� Compute�the�covariance� � c�� Determine�the�Pearson�product–moment�correlation�coefficient� � d�� Determine�the�Spearman’s�rho�correlation�coefficient� 10.2� If� rXY� =� �17� for� a� random� sample� of� size� 84,� test� the� hypothesis� that� the� population� Pearson�is�significantly�different�from�0�(conduct�a�two-tailed�test�at�the��05�level�of� significance)� 10.3� If� rXY� =� �60� for� a� random� sample� of� size� 30,� test� the� hypothesis� that� the� population� Pearson�is�significantly�different�from�0�(conduct�a�two-tailed�test�at�the��05�level�of� significance)� 10.4� The�correlation�between�vocabulary�size�and�mother’s�age�is��50�for�12�rural�children� and��85�for�17�inner-city�children��Does�the�correlation�for�rural�children�differ�from� that�of�the�inner-city�children�at�the��05�level�of�significance? 10.5� You�are�given�the�following�pairs�of�sample�scores�on�X�(number�of�coins�in�posses- sion)�and�Y�(number�of�bills�in�possession): X Y 2 1 3 3 4 5 5 5 6 3 7 1 � a�� Graph�a�scatterplot�of�the�data� � b�� Describe�the�relationship�between�X�and�Y� � c�� What�do�you�think�the�Pearson�correlation�will�be? 290 An Introduction to Statistical Concepts 10.6� Six� adults� were� assessed� on� the� number� of� minutes� it� took� to� read� a� government� report�(X)�and�the�number�of�items�correct�on�a�test�of�the�content�of�that�report�(Y)�� Use�the�following�data�to�determine�the�Pearson�correlation�and�the�effect�size� X Y 10 17 8 17 15 13 12 16 14 15 16 12 10.7� Ten�kindergarten�children�were�observed�on�the�number�of�letters�written�in�proper� form�(given�26�letters)�(X)�and�the�number�of�words�that�the�child�could�read�(given� 50�words)�(Y)��Use�the�following�data�to�determine�the�Pearson�correlation�and�the� effect�size� X Y 10 5 16 8 22 40 8 15 12 28 20 37 17 29 21 30 15 18 9 4 Interpretive problems 10.1� Select�two�interval/ratio�variables�from�the�survey�1�dataset�on�the�website��Use�SPSS� to�generate�the�appropriate�correlation,�determine�statistical�significance,�interpret�the� correlation�value�(including�interpretation�as�an�effect�size),�and�examine�and�inter- pret�the�scatterplot� 10.2� Select� two� ordinal� variables� from� the� survey� 1� dataset� on� the� website�� Use� SPSS� to�generate�the�appropriate�correlation,�determine�statistical�significance,�interpret� the�correlation�value�(including�interpretation�as�an�effect�size),�and�examine�and� interpret�the�scatterplot� 10.3� Select�one�ordinal�variable�and�one�interval/ratio�variable�from�the�survey�1�dataset� on�the�website��Use�SPSS�to�generate�the�appropriate�correlation,�determine�statisti- cal�significance,�interpret�the�correlation�value�(including�interpretation�as�an�effect� size),�and�examine�and�interpret�the�scatterplot� 10.4� Select� one� dichotomous� variable� and� one� interval/ratio� variable� from� the� survey� 1� dataset�on�the�website��Use�SPSS�to�generate�the�appropriate�correlation,�determine� statistical�significance,�interpret�the�correlation�value�(including�interpretation�as�an� effect�size),�and�examine�and�interpret�the�scatterplot� 291 11 One-Factor Analysis of Variance: Fixed-Effects Model Chapter Outline 11�1� Characteristics�of�the�One-Factor�ANOVA�Model 11�2� Layout�of�Data 11�3� ANOVA�Theory � 11�3�1� General�Theory�and�Logic � 11�3�2� Partitioning�the�Sums�of�Squares � 11�3�3� ANOVA�Summary�Table 11�4� ANOVA�Model � 11�4�1� Model � 11�4�2� Estimation�of�the�Parameters�of�the�Model � 11�4�3� Effect�Size�Measures,�Confidence�Intervals,�and�Power � 11�4�4� Example � 11�4�5� Expected�Mean�Squares 11�5� Assumptions�and�Violation�of�Assumptions � 11�5�1� Independence � 11�5�2� Homogeneity�of�Variance � 11�5�3� Normality 11�6� Unequal�n’s�or�Unbalanced�Design 11�7� Alternative�ANOVA�Procedures � 11�7�1� Kruskal–Wallis�Test � 11�7�2� Welch,�Brown–Forsythe,�and�James�Procedures 11�8� SPSS�and�G*Power 11�9� Template�and�APA-Style�Write-Up Key Concepts � 1�� Between-�and�within-groups�variability � 2�� Sources�of�variation � 3�� Partitioning�the�sums�of�squares � 4�� The�ANOVA�model � 5�� Expected�mean�squares 292 An Introduction to Statistical Concepts In�the�last�five�chapters,�our�discussion�has�dealt�with�various�inferential�statistics,�includ- ing�inferences� about� means��The�next�six� chapters�are�concerned�with� different� analysis� of�variance�(ANOVA)�models��In�this�chapter,�we�consider�the�most�basic�ANOVA�model,� known� as� the� one-factor� ANOVA� model�� Recall� the� independent� t� test� from� Chapter� 7� where� the� means� from� two� independent� samples� were� compared�� What� if� you� wish� to� compare� more� than� two� means?� The� answer� is� to� use� the� analysis of variance�� At� this� point,�you�may�be�wondering�why�the�procedure�is�called�the�analysis�of�variance�rather� than�the�analysis�of�means,�because�the�intent�is�to�study�possible�mean�differences��One� way�of�comparing�a�set�of�means�is�to�think�in�terms�of�the�variability�among�those�means�� If�the�sample�means�are�all�the�same,�then�the�variability�of�those�means�would�be�0��If�the� sample�means�are�not�all�the�same,�then�the�variability�of�those�means�would�be�somewhat� greater�than�0��In�general,�the�greater�the�mean�differences�are,�the�greater�is�the�variabil- ity�of�the�means��Thus,�mean�differences�are�studied�by�looking�at�the�variability�of�the� means;�hence,�the�term�analysis�of�variance�is�appropriate�rather�than�analysis�of�means� (further�discussed�in�this�chapter)� We� use� X� to� denote� our� single� independent variable,� which� we� typically� refer� to� as� a� factor,� and� Y� to� denote� our� dependent� (or� criterion)� variable�� Thus,� the� one-factor� ANOVA� is� a� bivariate,� or� two-variable,� procedure�� Our� interest� here� is� in� determin- ing�whether�mean�differences�exist�on�the�dependent�variable��Stated�another�way,�the� researcher� is� interested� in� the� influence� of� the� independent� variable� on� the� dependent� variable�� For� example,� a� researcher� may� want� to� determine� the� influence� that� method� of�instruction�has�on�statistics�achievement��The�independent�variable,�or�factor,�would� be� method� of� instruction� and� the� dependent� variable� would� be� statistics� achievement�� Three� different� methods� of� instruction� that� might� be� compared� are� large� lecture� hall� instruction,�small-group�instruction,�and�computer-assisted�instruction��Students�would� be�randomly�assigned�to�one�of�the�three�methods�of�instruction�and�at�the�end�of�the� semester�evaluated�as�to�their�level�of�achievement�in�statistics��These�results�would�be�of� interest�to�a�statistics�instructor�in�determining�the�most�effective�method�of�instruction� (where�“effective”�is�measured�by�student�performance�in�statistics)��Thus,�the�instructor� may�opt�for�the�method�of�instruction�that�yields�the�highest�mean�achievement� There� are� a� number� of� new� concepts� introduced� in� this� chapter� as� well� as� a� refresher� of� concepts� that� have� been� covered� in� previous� chapters�� The� concepts� addressed� in� this� chapter� include� the� following:� independent� and� dependent� variables;� between-� and� within-groups�variability;�fixed�and�random�effects;�the�linear�model;�partitioning�of�the� sums�of�squares;�degrees�of�freedom,�mean�square�terms,�and�F�ratios;�the�ANOVA�sum- mary� table;� expected� mean� squares;� balanced� and� unbalanced� models;� and� alternative� ANOVA�procedures��Our�objectives�are�that�by�the�end�of�this�chapter,�you�will�be�able�to� (a)�understand�the�characteristics�and�concepts�underlying�a�one-factor�ANOVA,�(b)�gener- ate�and�interpret�the�results�of�a�one-factor�ANOVA,�and�(c)�understand�and�evaluate�the� assumptions�of�the�one-factor�ANOVA� 11.1 Characteristics of One-Factor ANOVA Model We� have� been� following� Marie,� our� very� capable� educational� research� graduate� student,� as�she�develops�her�statistical�skills��As�we�will�see,�Marie�is�embarking�on�a�very�exciting� research�adventure�of�her�own� 293One-Factor Analysis of Variance: Fixed-Effects Model Marie�is�enrolled�in�an�independent�study�class��As�part�of�the�course�requirement,�she� has�to�complete�a�research�study��In�collaboration�with�the�statistics�faculty�in�her�pro- gram,�Marie�designs�an�experimental�study�to�determine�if�there�is�a�mean�difference� in�student�attendance�in�the�statistics�lab�based�on�the�attractiveness�of�the�statistics�lab� instructor��Marie’s�research�question�is:�Is there a mean difference in the number of statistics labs attended by students based on the attractiveness of the lab instructor?�Marie�determined� that�a�one-way�ANOVA�was�the�best�statistical�procedure�to�use�to�answer�her�ques- tion��Her�next�task�is�to�collect�and�analyze�the�data�to�address�her�research�question� This� section� describes� the� distinguishing� characteristics� of� the� one-factor� ANOVA� model�� Suppose�you�are�interested�in�comparing�the�means�of�two�independent�samples��Here�the� independent�t�test�would�be�the�method�of�choice�(or�perhaps�the�Welch�t′�test)��What�if�your� interest�is�in�comparing�the�means�of�more�than�two�independent�samples?�One�possibility� is�to�conduct�multiple�independent�t�tests�on�each�pair�of�means��For�example,�if�you�wished� to�determine�whether�the�means�from�five�independent�samples�are�the�same,�you�could�do� all�possible�pairwise�t�tests��In�this�case,�the�following�null�hypotheses�could�be�evaluated:� μ1�=�μ2,�μ1�=�μ3,�μ1�=�μ4,�μ1�=�μ5,�μ2�=�μ3,�μ2�=�μ4,�μ2�=�μ5,�μ3�=�μ4,�μ3�=�μ5,�and�μ4�=�μ5��Thus,�we� would�have�to�carry�out�10�different�independent�t�tests��The�number�of�possible�pairwise� t�tests�that�could�be�done�for�J�means�is�equal�to�½[J(J�−�1)]� Is� there� a� problem� in� conducting�so�many�t� tests?� Yes;� the� problem� has� to�do� with� the� probability�of�making�a�Type�I�error�(i�e�,�α),�where�the�researcher�incorrectly�rejects�a�true� null�hypothesis��Although�the�α�level�for�each�t�test�can�be�controlled�at�a�specified�nominal� α�level�that�is�set�by�the�researcher,�say��05,�what�happens�to�the�overall�α�level�for�the�entire� set�of�tests?�The�overall�α�level�for�the�entire�set�of�tests�(i�e�,�αtotal),�often�called�the�experi- mentwise Type I error rate,�is�larger�than�the�α�level�for�each�of�the�individual�t�tests� In�our�example,�we�are�interested�in�comparing�the�means�for�10�pairs�of�groups�(again,� these�would�be�μ1�=�μ2,�μ1�=�μ3,�μ1�=�μ4,�μ1�=�μ5,�μ2�=�μ3,�μ2�=�μ4,�μ2�=�μ5,�μ3�=�μ4,�μ3�=�μ5,�and� μ4�=�μ5)��A�t�test�is�conducted�for�each�of�the�10�pairs�of�groups�at�α�=��05��Although�each� test�controls�the�α�level�at��05,�the�overall�α�level�will�be�larger�because�the�risk�of�a�Type� I�error�accumulates�across�the�tests��For�each�test,�we�are�taking�a�risk;�the�more�tests�we� do,�the�more�risks�we�are�taking��This�can�be�explained�by�considering�the�risk�you�take� each�day�you�drive�your�car�to�school�or�work��The�risk�of�an�accident�is�small�for�any�1�day;� however,�over�the�period�of�a�year,�the�risk�of�an�accident�is�much�larger� For�C�independent�(or�orthogonal)�tests,�the�experimentwise�error�is�as�follows: α αtotal = − −1 1( ) C Assume�for�the�moment�that�our�10�tests�are�independent�(although�they�are�not�because� within� those� 10� tests,� each� group� is� actually� being� compared� to� another� group� in� four� different�instances)��If�we�go�ahead�with�our�10�t�tests�at�α�=��05,�then�the�experimentwise� error�rate�is α total = − − = − =1 1 05 1 60 40 10( . ) . . Although�we�are�seemingly�controlling�our�α�level�at�the��05�level,�the�probability�of�making�a� Type�I�error�across�all�10�tests�is��40��In�other�words,�in�the�long�run,�if�we�conduct�10�indepen- dent�t�tests,�4�times�out�of�10,�we�will�make�a�Type�I�error��For�this�reason,�we�do�not�want�to�do� 294 An Introduction to Statistical Concepts all�possible�t�tests��Before�we�move�on,�the�experimentwise�error�rate�for�C�dependent�tests�αtotal� (which�would�be�the�case�when�doing�all�possible�pairwise�t�tests,�as�in�our�example)�is�more� difficult�to�determine,�so�let�us�just�say�that α α α≤ ≤total C Are� there� other� options� available� to� us� where� we� can� maintain� better� control� over� our� experimentwise�error�rate?�The�optimal�solution,�in�terms�of�maintaining�control�over� our� overall� α� level� as� well� as� maximizing� power,� is� to� conduct� one� overall� test,� often� called� an� omnibus test�� Recall� that� power� has� to� do� with� the� probability� of� correctly� rejecting� a� false� null� hypothesis�� The� omnibus� test� could� assess� the� equality� of� all� of� the�means�simultaneously�and�is�the�one�used�in�ANOVA��The�one-factor�ANOVA�then� represents� an� extension� of� the� independent� t� test� for� two� or� more� independent� sample� means,�where�the�experimentwise�error�rate�is�controlled� In�addition,�the�one-factor�ANOVA�has�only�one�independent�variable�or�factor�with�two� or� more� levels�� The� independent� variable� is� a� discrete� or� grouping� variable,� where� each� subject� responds� to� only� one� level�� The� levels� represent� the� different� samples� or� groups� or�treatments�whose�means�are�to�be�compared��In�our�example,�method�of�instruction�is� the�independent�variable�with�three�levels:�large�lecture�hall,�small-group,�and�computer- assisted�� There� are� two� ways� of� conceptually� thinking� about� the� selection� of� levels�� In� the� fixed-effects� model,� all� levels� that� the� researcher� is� interested� in� are� included� in� the� design� and� analysis� for� the� study�� As� a� result,� generalizations� can� only� be� made� about� those�particular�levels�of�the�independent�variable�that�are�actually�selected��For�instance,� if�a�researcher�is�only�interested�in�these�three�methods�of�instruction—large�lecture�hall,� small-group,� and� computer-assisted—then� only� those� levels� are� incorporated� into� the� study�� Generalizations� about� other� methods� of� instruction� cannot� be� made� because� no� other�methods�were�considered�for�selection��Other�examples�of�fixed-effects�independent� variables� might� be� SES,� gender,� specific� types� of� drug� treatment,� age� group,� weight,� or� marital�status� In�the�random-effects�model,�the�researcher�randomly�samples�some�levels�of�the�inde- pendent�variable�from�the�population�of�levels��As�a�result,�generalizations�can�be�made� about�all�of�the�levels�in�the�population,�even�those�not�actually�sampled��For�instance,�a� researcher�interested�in�teacher�effectiveness�may�have�randomly�sampled�history�teach- ers� (i�e�,� the� independent� variable)� from� the� population� of� history� teachers� in� a� particu- lar�school�district��Generalizations�can�then�be�made�about�other�history�teachers�in�that� school�district�not�actually�sampled��The�random�selection�of�levels�is�much�the�same�as� the� random� selection� of� individuals� or� objects� in� the� random� sampling� process�� This� is� the�nature�of�inferential�statistics,�where�inferences�are�made�about�a�population�(of�indi- viduals,�objects,�or�levels)�from�a�sample��Other�examples�of�random-effects�independent� variables� might� be� randomly� selected� classrooms,� types� of� medication,� animals,� or� time� (e�g�,�hours,�days)��The�remainder�of�this�chapter�is�concerned�with�the�fixed-effects�model�� Chapter�15�discusses�the�random-effects�model�in�more�detail� In�the�fixed-effects�model,�once�the�levels�of�the�independent�variable�are�selected,�sub- jects�(i�e�,�persons�or�objects)�are�randomly�assigned�to�the�levels�of�the�independent�vari- able��In�certain�situations,�the�researcher�does�not�have�control�over�which�level�a�subject�is� assigned�to��The�groups�may�already�be�in�place�when�the�researcher�arrives�on�the�scene�� For�instance,�students�may�be�assigned�to�their�classes�at�the�beginning�of�the�year�by�the� school�administration��Researchers�typically�have�little�input�regarding�class�assignments�� 295One-Factor Analysis of Variance: Fixed-Effects Model In� another� situation,� it� may� be� theoretically� impossible� to� assign� subjects� to� groups�� For� example,�as�much�as�we�might�like,�researchers�cannot�randomly�assign�individuals�to�an� age� level�� Thus,� a� distinction� needs� to� be� made� about� whether� or� not� the� researcher� can� control� the� assignment� of� subjects� to� groups�� Although� the� analysis� will� not� be� altered,� the�interpretation�of�the�results�will�be��When�researchers�have�control�over�group�assign- ments,� the� extent� to� which� they� can� generalize� their� findings� is� greater� than� for� those� researchers� who� do� not� have� such� control�� For� further� information� on� the� differences� between�true experimental designs�(i�e�,�with�random�assignment)�and�quasi-experimental designs� (i�e�,� without� random� assignment),� take� a� look� at� Campbell� and� Stanley� (1966),� Cook�and�Campbell�(1979),�and�Shadish,�Cook,�and�Campbell�(2002)� Moreover,�in�the�model�being�considered�here,�each�subject�is�exposed�to�only�one�level� of� the� independent� variable�� Chapter� 15� deals� with� models� where� a� subject� is� exposed� to� multiple� levels� of� an� independent� variable;� these� are� known� as� repeated-measures models��For�example,�a�researcher�may�be�interested�in�observing�a�group�of�young�chil- dren�repeatedly�over�a�period�of�several�years��Thus,�each�child�might�be�observed�every� 6� months� from� birth� to� 5� years� of� age�� This� would� require� a� repeated-measures� design� because� the� observations� of� a� particular� child� over� time� are� obviously� not� independent� observations� One� final� characteristic� is� the� measurement� scale� of� the� independent� and� dependent� variables��In�ANOVA,�because�this�is�a�test�of�means,�a�condition�of�the�test�is�that�the�scale� of�measurement�on�the�dependent�variable�is�at�the�interval�or�ratio�level��If�the�dependent� variable�is�measured�at�the�ordinal�level,�then�the�nonparametric�equivalent,�the�Kruskal– Wallis�test,�should�be�considered�(discussed�later�in�this�chapter)��If�the�dependent�vari- able� shares� properties� of� both� the� ordinal� and� interval� levels� (e�g�,� grade� point� average� [GPA]),� then� both� the� ANOVA� and� Kruskal–Wallis� procedures� could� be� considered� to� cross-reference� any� potential� effects� of� the� measurement� scale� on� the� results�� As� previ- ously�mentioned,�the�independent�variable�is�a�grouping�or�discrete�variable,�so�it�can�be� measured�on�any�scale� However,� there� is� one� caveat� to� the� measurement� scale� of� the� independent� variable�� Technically�the�condition�is�that�the�independent�variable�be�a�grouping�or�discrete�variable�� Most�often,�ANOVAs�are�conducted�with�independent�variables�which�are�categorical— nominal� or� ordinal� in� scale�� ANOVAs� can� also� be� used� in� the� case� of� interval� or� ratio� values�that�are�discrete��Recall�that�discrete�variables�are�variables�that�can�only�take�on� certain�values�and�that�arise�from�the�counting�process��An�example�of�a�discrete�variable� that�could�be�a�good�candidate�for�being�an�independent�variable�in�an�ANOVA�model�is� number�of�children��What�would�make�this�a�good�candidate?�The�responses�to�this�vari- able� would� likely� be� relatively� limited� (in� the� general� population,� it� may� be� anticipated� that�the�range�would�be�from�zero�children�to�five�or�six—although�outliers�may�be�a�pos- sibility),�and�each�discrete�value�would�likely�have�multiple�cases�(with�fewer�cases�having� larger� numbers� of� children)�� Applying� this� is� obviously� at� the� researcher’s� discretion;� at� some�point,�the�number�of�discrete�values�can�become�so�numerous�as�to�be�unwieldy�in� an�ANOVA�model��Thus,�while�at�first�glance�we�may�not�consider�it�appropriate�to�use� interval�or�ratio�variables�as�independent�variables�in�ANOVA�models,�there�are�situations� where�it�is�feasible�and�appropriate� In�summary,�the�characteristics�of�the�one-factor�ANOVA�fixed-effects�model�are�as� follows:� (a)� control� of� the� experimentwise� error� rate� through� an� omnibus� test;� (b)� one� independent�variable�with�two�or�more�levels;�(c)�the�levels�of�the�independent�variable� are�fixed�by�the�researcher;�(d)�subjects�are�randomly�assigned�to�these�levels;�(e)�sub- jects�are�exposed�to�only�one�level�of�the�independent�variable;�and�(f)�the�dependent� 296 An Introduction to Statistical Concepts variable� is� measured� at� least� at� the� interval� level,� although� the� Kruskal–Wallis� one- factor�ANOVA�can�be�considered�for�an�ordinal�level�dependent�variable��In�the�context� of� experimental� design,� the� one-factor� ANOVA� is� often� referred� to� as� the� completely randomized design� 11.2 Layout of Data Before� we� get� into� the� theory� and� analysis� of� the� data,� let� us� examine� one� tabular� form� of� the� data,� known� as� the� layout� of� the� data�� We� designate� each� observation� as� Yij,� where� the� j� subscript� tells� us� what� group� or� level� the� observation� belongs� to� and� the�i�subscript�tells�us�the�observation�or�identification�number�within�that�group��For� instance,� Y34� would� mean� this� is� the� third� observation� in� the� fourth� group,� or� level,� of�the�independent�variable��The�first�subscript�ranges�over�i�=�1,…,�n,�and�the�second� subscript�ranges�over�j�=�1,…,�J��Thus,�there�are�J�levels�(or�categories�or�groups)�of�the� independent�variable�and�n�subjects�in�each�group,�for�a�total�of�Jn�=�N�total�observa- tions��For�now,�presume�there�are�n�subjects�(or�cases�or�units)�in�each�group�in�order� to�simplify�matters;�this�is�referred�to�as�the�equal�n’s�or�balanced case��Later�on�in�this� chapter,�we�consider�the�unequal�n’s�or�unbalanced case� The�layout�of�the�data�is�shown�in�Table�11�1��Here�we�see�that�each�column�represents� the�observations�for�a�particular�group�or�level�of�the�independent�variable��At�the�bottom� of�each�column�are�the�sample�group�means�(Y – �j),�with�the�overall�sample�mean�(Y – ��)�to�the� far�right��In�conclusion,�the�layout�of�the�data�is�one�form�in�which�the�researcher�can�think� about�the�data� 11.3 ANOVA Theory This�section�examines�the�underlying�theory�and�logic�of�ANOVA,�the�sums�of�squares,� and�the�ANOVA�summary�table��As�noted�previously,�in�ANOVA,� mean�differences� are� tested�by�looking�at�the�variability�of�the�means��Here�we�show�precisely�how�this�is�done� Table 11.1 Layout�for�the�One-Factor�ANOVA�Model Level of the Independent Variable 1 2 3 … J Y11 Y12 Y13 … Y1J Y21 Y22 Y23 … Y2J Y31 Y32 Y33 … Y3J � � � � � � � � � � Yn1 Yn2 Yn3 … YnJ Means Y – �1 Y – �2 Y – �3 … Y – �J Y – �� 297One-Factor Analysis of Variance: Fixed-Effects Model 11.3.1 General Theory and logic We�begin�with�the�hypotheses�to�be�tested�in�ANOVA��In�the�two-group�situation�of�the� independent�t�test,�the�null�and�alternative�hypotheses�for�a�two-tailed�(i�e�,�nondirectional)� test�are�as�follows: H H 0 1 2 1 1 2 : : µ µ µ µ = ≠ In� the� multiple-group� situation� (i�e�,� more� than� two� groups),� we� have� already� seen� the� problem�that�occurs�when�multiple�independent�t�tests�are�conducted�for�all�pairs�of�popu- lation�means�(i�e�,�increased�likelihood�of�a�Type�I�error)��We�concluded�that�the�solution� was�to�use�an�omnibus test�where�the�equality�of�all�of�the�means�could�be�assessed�simul- taneously��The�hypotheses�for�the�omnibus�ANOVA�test�are�as�follows: H J0 1 2 3: µ µ µ µ= = = =… H j1 : not all the are equalµ Here� H1� is� purposely� written� in� a� general� form� to� cover� the� multitude� of� possible� mean� differences� that� could� arise�� These� range� from� only� two� of� the� means� being� different� to� all�of�the�means�being�different�from�one�another��Thus,�because�of�the�way�H1�has�been� written,� only� a� nondirectional� alternative� is� appropriate�� If� H0� were� to� be� rejected,� then� the� researcher� might� want� to� consider� a� multiple� comparison� procedure� (MCP)� so� as� to� determine�which�means�or�combination�of�means�are�significantly�different�(we�cover�this� in�greater�detail�in�Chapter�12)� As�was�mentioned�in�the�introduction�to�this�chapter,�the�analysis�of�mean�differences� is�actually�carried�out�by�looking�at�variability�of�the�means��At�first,�this�seems�strange�� If� one� wants� to� test� for� mean� differences,� then� do� a� test� of� means�� If� one� wants� to� test� for�variance�differences,�then�do�a�test�of�variances��These�statements�should�make�sense� because�logic�pervades�the�field�of�statistics��And�they�do�for�the�two-group�situation��For� the�multiple-group�situation,�we�already�know�things�get�a�bit�more�complicated� Say�a�researcher�is�interested�in�the�influence�of�amount�of�daily�study�time�on�statistics� achievement��Three�groups�were�formed�based�on�the�amount�of�daily�study�time�in�sta- tistics,�half�an�hour,�1�hour,�and�2�hours��Is�there�a�differential�influence�of�amount�of�time� studied�on�subsequent�mean�statistics�achievement�(e�g�,�statistics�final�exam)?�We�would� expect� that� the� more� one� studied� statistics,� the� higher� the� statistics� mean� achievement� would�be��One�possible�situation�in�the�population�is�where�the�amount�of�study�time�does� not�influence�statistics�achievement;�here�the�population�means�will�be�equal��That�is,�the� null�hypothesis�of�equal�group�means�is�actually�true��Thus,�the�three�groups�are�really� three�samples�from�the�same�population�of�students,�with�mean�μ��The�means�are�equal;� thus,�there�is�no�variability�among�the�three�group�means��A�second�possible�situation�in� the�population�is�where�the�amount�of�study�time�does�influence�statistics�achievement;� here�the�population�means�will�not�be�equal��That�is,�the�null�hypothesis�is�actually�false�� Thus,�the�three�groups�are�not�really�three�samples�from�the�same�population�of�students,� but�rather,�each�group�represents�a�sample�from�a�distinct�population�of�students�receiv- ing�that�particular�amount�of�study�time,�with�mean�μj��The�means�are�not�equal,�so�there� 298 An Introduction to Statistical Concepts is�variability�among�the�three�group�means��In�summary,�the�statistical�question�becomes� whether�the�difference�between�the�sample�means�is�due�to�the�usual�sampling�variability� expected�from�a�single�population,�or�the�result�of�a�true�difference�between�the�sample� means�from�different�populations� We�conceptually�define�within-groups variability�as�the�variability�of�the�observations� within�a�group�combined�across�groups�(e�g�,�variability�on�test�scores�within�children�in� the�same�proficiency�level,�such�as�low,�moderate,�and�high,�and�then�combined�across�all� proficiency�levels),�and�between-groups variability�as�the�variability�between�the�groups� (e�g�,� variability� among� the� test� scores� from� one� proficiency� level� to� another� proficiency� level)�� In� Figure� 11�1,� the� columns� represent� low� and� high� variability� within� the� groups�� The� rows� represent� low� and� high� variability� between� the� groups�� In� the� upper� left-hand� plot,�there�is�low�variability�both�within�and�between�the�groups��That�is,�performance�is� very�consistent,�both�within�each�group�as�well�as�across�groups��We�see�that�there�is�little� variability� within� the� groups� since� the� individual� distributions� are� not� very� spread� out� and�little�variability�between�the�groups�because�the�distributions�are�not�very�distinct,�as� they�are�nearly�lying�on�top�of�one�another��Here�within-�and�between-group�variability� are� both� low,� and� it� is� quite� unlikely� that� one� would� reject� H0�� In� the� upper� right-hand� plot,�there�is�high�variability�within�the�groups�and�low�variability�between�the�groups�� That�is,�performance�is�very�consistent�across�groups�(i�e�,�the�distributions�largely�overlap)� but� quite� variable� within� each� group�� We� see� high� variability� within� the� groups� because� the� spread� of� each� individual� distribution� is� quite� large� and� low� variability� between� the� groups� because� the� distributions� are� lying� so� closely� together�� Here� within-groups� vari- ability� exceeds� between-group� variability,� and� again� it� is� quite� unlikely� that� one� would� reject�H0��In�the�lower�left-hand�plot,�there�is�low�variability�within�the�groups�and�high� variability�between�the�groups��That�is,�performance�is�very�consistent�within�each�group� but� quite� variable� across� groups�� We� see� low� variability� within� the� groups� because� each� distribution� is� very� compact� with� little� spread� to� the� data� and� high� variability� between� the�groups�because�each�distribution�is�nearly�isolated�from�one�another�with�very�little� overlap��Here�between-group�variability�exceeds�within-groups�variability,�and�it�is�quite� Variability within-groups HighLow Low V ar ia bi lit y be tw ee n- gr ou ps High FIGuRe 11.1 Conceptual�look�at�between-�and�within-groups�variability� 299One-Factor Analysis of Variance: Fixed-Effects Model likely�that�one�would�reject�H0��In�the�lower�right-hand�plot,�there�is�high�variability�both� within�and�between�the�groups��That�is,�performance�is�quite�variable�within�each�group,� as�well�as�across�the�groups��We�see�high�variability�within�groups�because�the�spread�of� each�individual�distribution�is�quite�large�and�high�variability�between�groups�because�of� the�minimal�overlap�from�one�distribution�to�another��Here�within-�and�between-group� variability�are�both�high,�and�depending�on�the�relative�amounts�of�between-�and�within- groups�variability,�one�may�or�may�not�reject�H0��In�summary,�the�optimal�situation�when� seeking�to�reject�H0�is�the�one�represented�by�high�variability�between�the�groups�and�low� variability�within�the�groups� 11.3.2 partitioning the Sums of Squares The�partitioning�of�the�sums�of�squares�in�ANOVA�is�a�new�concept�in�this�chapter,�which� is�also�an�important�concept�in�regression�analysis�(from�Chapters�17�and�18)��In�part,�this� is�because�ANOVA�and�regression�are�both�forms�of�the�same�general�linear�model�(GLM)� (to�be�further�discussed)��Let�us�begin�with�the�total�sum�of�squares�in�Y,�denoted�as�SStotal�� The�term�SStotal�represents�the�amount�of�total�variation�in�Y��The�next�step�is�to�partition� the� total� variation� into� variation� between� the� groups� (i�e�,� the� categories� or� levels� of� the� independent� variable),� denoted� by� SSbetw,� and� variation� within� the� groups� (i�e�,� units� or� cases�within�each�category�or�level�of�the�independent�variable),�denoted�by�SSwith��In�the� one-factor�ANOVA,�we�therefore�partition�SStotal�as�follows: SS SS SStotal betw with= + or ( ) ( ) ( ).. . .. .Y Y Y Y Y Yij j J i n j j J i n ij j j J i − = − + − == == = ∑∑ ∑∑ ∑2 11 2 11 2 1== ∑ 1 n where SStotal�is�the�total�sum�of�squares�due�to�variation�among�all�of�the�observations�without� regard�to�group�membership SSbetw�is�the�between-groups�sum�of�squares�due�to�the�variation�between�the�groups SSwith�is�the�within-groups�sum�of�squares�due�to�the�variation�within�the�groups�com- bined�across�groups We� refer� to� this� particular� formulation� of� the� partitioned� sums� of� squares� as� the� definitional� (or� conceptual)� formula� because� each� term� literally� defines� a� form� of� variation� Due� to� computational� complexity� and� the� likelihood� of� a� computational� error,� the� definitional� formula� is� rarely� used� with� real� data�� Instead,� a� computational formula� for� the� partitioned� sums� of� squares� is� used� for� hand� computations�� However,� since� nearly� all� data� analysis� at� this� level� utilizes� computer� software,� we� defer� to� the� soft- ware�to�actually�perform�an�ANOVA�(SPSS�details�are�provided�toward�the�end�of�this� chapter)��A�complete�example�of�the�one-factor�ANOVA�is�also�considered�later�in�this� chapter� 300 An Introduction to Statistical Concepts 11.3.3 aNOVa Summary Table An� important� result� of� the� analysis� is� the� ANOVA summary table�� The� purpose� of� the� summary� table� is� to� literally� summarize� the� ANOVA�� A� general� form� of� the� summary� table�is�shown�in�Table�11�2��The�first�column�lists�the�sources�of�variation�in�the�model��As� we�already�know,�in�the�one-factor�model,�the�total�variation�is�partitioned�into�between- groups� variation� and� within-groups� variation�� The� second� column� notes� the� sums� of� squares�terms�computed�for�each�source�(i�e�,�SSbetw,�SSwith,�and�SStotal)� The�third�column�gives�the�degrees�of�freedom�for�each�source��Recall�that,�in�general,� the�degrees�of�freedom�have�to�do�with�the�number�of�observations�that�are�free�to�vary�� For�example,�if�a�sample�mean�and�all�of�the�sample�observations�except�for�one�are�known,� then�the�final�observation�is�not�free�to�vary��That�is,�the�final�observation�is�predetermined� to�be�a�particular�value��For�instance,�say�the�mean�is�10�and�there�are�three�observations,� 7,�11,�and�an�unknown�observation��Based�on�that�information,�first,�the�sum�of�the�three� observations�must�be�30�for�the�mean�to�be�10��Second,�the�sum�of�the�known�observations� is�18��Therefore,�the�unknown�observation�must�be�12��Otherwise�the�sample�mean�would� not�be�exactly�equal�to�10� For�the�between-groups�source,�the�definitional�formula�is�concerned�with�the�deviation� of� each� group� mean� from� the� overall� mean�� There� are� J� group� means� (where� J� represents� the�number�of�groups�or�categories�or�levels�of�the�independent�variable),�so�the�dfbetw�(also� known�as�the�degrees�of�freedom�numerator)�must�be�J�−�1��Why?�If�we�have�J�group�means� and�we�know�the�overall�mean,�then�only�J�−�1�of�the�group�means�are�free�to�vary��In�other� words,� if� we� know� the� overall� mean� and� all� but� one� of� the� group� means,� then� the� final� unknown�group�mean�is�predetermined��For�the�within-groups�source,�the�definitional�for- mula�is�concerned�with�the�deviation�of�each�observation�from�its�respective�group�mean�� There� are� n� observations� (i�e�,� cases� or� units)� in� each� group;� consequently,� there� are� n� −� 1� degrees�of�freedom�in�each�group�and�J�groups��Why�are�there�n�−�1�degrees�of�freedom�in� each�group?�If�there�are�n�observations�in�each�group,�then�only�n�−�1�of�the�observations�are� free�to�vary��In�other�words,�if�we�know�one�group�mean�and�all�but�one�of�the�observations� for�that�group,�then�the�final�unknown�observation�for�that�group�is�predetermined��There� are� J� groups,� so� the� dfwith� (also� known� as� the� degrees� of� freedom� denominator)� is� J(n� −� 1),� or�more�simply�as�N�−�J��Thus,�we�lose�one�degree�of�freedom�for�each�group��For�the�total� source,�the�definitional�formula�is�concerned�with�the�deviation�of�each�observation�from�the� overall�mean��There�are�N�total�observations;�therefore,�the�dftotal�must�be�N�−�1��Why?�If�there� are�N�total�observations�and�we�know�the�overall�mean,�then�only�N�−�1�of�the�observations� are�free�to�vary��In�other�words,�if�we�know�the�overall�mean�and�all�but�one�of�the�N�obser- vations,�then�the�final�unknown�observation�is�predetermined� Why� is� the� number� of� degrees� of� freedom� important� in� the� ANOVA?� Suppose� two� researchers�have�conducted�similar�studies,�except�Researcher�A�uses�20�observations�per� group�and�Researcher�B�uses�10�observations�per�group��Each�researcher�obtains�a�SSwith� value�of�15��Would�it�be�fair�to�say�that�this�particular�result�for�the�two�studies�is�the�same?� Table 11.2 ANOVA�Summary�Table Source SS df MS F Between�groups SSbetw J�−�1 MSbetw MSbetw/MSwith Within�groups SSwith N�−�J MSwith Total SStotal N�−�1 301One-Factor Analysis of Variance: Fixed-Effects Model Such�a�comparison�would�be�unfair�because�SSwith�is�influenced�by�the�number�of�observa- tions� per� group�� A� fair� comparison� would� be� to� weight� the� SSwith� terms� by� their� respec- tive� number� of� degrees� of� freedom�� Similarly,� it� would� not� be� fair� to� compare� the� SSbetw� terms�from�two�similar�studies�based�on�different�numbers�of�groups��A�fair�comparison� would� be� to� weight� the� SSbetw� terms� by� their� respective� number� of� degrees� of� freedom�� The�method�of�weighting�a�sum�of�squares�term�by�the�respective�number�of�degrees�of� freedom�on�which�it�is�based�yields�what�is�called�a�mean squares�term��Thus,�MSbetw�=� SSbetw/dfbetw�and�MSwith�=�SSwith/dfwith,�as�shown�in�the�fourth�column�of�Table�11�2��They�are� referred�to�as�mean�squares�because�they�represent�a�summed�quantity�that�is�weighted�by� the�number�of�observations�used�in�the�sum�itself,�like�the�mean��The�mean�squares�terms� are�also�variance�estimates�because�they�represent�the�sum�of�the�squared�deviations�from� a�mean�divided�by�their�degrees�of�freedom,�like�the�sample�variance�s2� The�last�column�in�the�ANOVA�summary�table,�the�F�value,�is�the�summary�test�statistic� of�the�summary�table��The�F�value�is�computed�by�taking�the�ratio�of�the�two�mean�squares� or�variance�terms��Thus,�for�the�one-factor�ANOVA�fixed-effects�model,�the�F�value�is�com- puted� as� F� =� MSbetw/MSwith�� When� developed� by� Sir� Ronald� A�� Fisher� in� the� 1920s,� this� test�statistic�was�originally�known�as�the�variance�ratio�because�it�represents�the�ratio�of� two� variance� estimates�� Later,� the� variance� ratio� was� renamed� the� F� ratio� by� George� W�� Snedecor�(who�worked�out�the�table�of�F�values,�discussed�momentarily)�in�honor�of�Fisher� (F�for�Fisher)� The�F�ratio�tells�us�whether�there�is�more�variation�between�groups�than�there�is�within� groups,�which�is�required�if�we�are�to�reject�H0��Thus,�if�there�is�more�variation�between� groups�than�there�is�within�groups,�then�MSbetw�will�be�larger�than�MSwith��As�a�result�of� this,�the�F�ratio�of�MSbetw/MSwith�will�be�greater�than�1��If,�on�the�other�hand,�the�amount� of�variation�between�groups�is�about�the�same�as�there�is�within�groups,�then�MSbetw�and� MSwith�will�be�about�the�same,�and�the�F�ratio�will�be�approximately�1��Thus,�we�want� to� find� large� F� values� in� order� to� reject� the� null� hypothesis�� The� F� test� statistic� is� then� compared�with�the�F�critical�value�so�as�to�make�a�decision�about�the�null�hypothesis�� The� critical� value� is� found� in� the� F� table� of� Table� A�4� as� αF(J−1,N−J)�� Thus,� the� degrees� of� freedom� are� df betw� =� J� −� 1� for� the� numerator� of� the� F� ratio� and� dfwith� =� N� −� J� for� the� denominator�of�the�F�ratio��The�significance�test�is�a�one-tailed�test�in�order�to�be�consis- tent�with�the�alternative�hypothesis��The�null�hypothesis�is�rejected�if�the�F�test�statistic� exceeds� the� F� critical� value�� This� is� the� omnibus� F� test� which,� again,� simply� provides� evidence�of the extent�to�which�there�is�at�least�one�statistically�significant�mean�differ- ence�between�the�groups� If� the� F� test� statistic� exceeds� the� F� critical� value,� and� there� are� more� than� two� groups,� then� it� is� not� clear� where� the� differences� among� the� means� lie�� In� this� case,� some� MCP� should� be� used� to� determine� where� the� mean� differences� are� in� the� groups;� this� is� the� topic�of�Chapter�12��When�there�are�only�two�groups,�it�is�obvious�where�the�mean�differ- ence�falls,�that�is,�between�groups�1�and�2��A�researcher�can�simply�look�at�the�descriptive� statistics�to�determine�which�group�had�the�higher�mean�relative�to�the�other�group��For� the�two-group�situation,�it�is�also�interesting�to�note�that�the�F�and�t�test�statistics�follow� the�rule�of�F�=�t2,�for�a�nondirectional�alternative�hypothesis�in�the�independent�t�test��In� other�words,�the�one-way�ANOVA�with�two�groups�and�the�independent�t�test�will�gener- ate�the�same�conclusion�such�that�F�=�t2��This�result�occurs�when�the�numerator�degrees�of� freedom�for�the�F�ratio�is�1��In�an�actual�ANOVA�summary�table�(shown�in�the�next�sec- tion),�except�for�the�source�of�variation�column,�it�is�the�values�for�each�of�the�other�entries� generated�from�the�data�that�are�listed�in�the�table��For�example,�instead�of�seeing�SSbetw,� we�would�see�the�computed�value�of�SSbetw� 302 An Introduction to Statistical Concepts 11.4 ANOVA Model In�this�section,�we�introduce�the�ANOVA�linear�model,�the�estimation�of�parameters�of�the� model,�effect�size�measures,�confidence�intervals�(CIs),�power,�and�an�example,�and�finish� up�with�expected�mean�squares� 11.4.1 Model The�one-factor�ANOVA�fixed-effects�model�can�be�written�in�terms�of�population�param- eters�as Yij j ij= + +µ α ε where Y�is�the�observed�score�on�the�dependent�(or�criterion)�variable�for�individual�i�in�group j μ�is�the�overall�or�grand�population�mean�(i�e�,�regardless�of�group�designation) αj�is�the�group�effect�for�group�j ɛij�is�the�random�residual�error�for�individual�i�in�group�j The�residual�error�can�be�due�to�individual�differences,�measurement�error,�and/or�other� factors�not�under�investigation�(i�e�,�other�than�the�independent�variable�X)��The�popula- tion�group�effect�and�residual�error�are�computed�as α µ µj j= −� and ε µij ij jY= − � respectively,�and�μ�j�is�the�population�mean�for�group�j,�where�the�initial�dot�subscript� indicates�we�have�averaged�across�all�i�individuals�in�group�j��That�is,�the�group�effect� is� equal� to� the� difference� between� the� population� mean� of� group� j� and� the� overall� population�mean��The�residual�error�is�equal�to�the�difference�between�an�individual’s� observed� score� and� the� population� mean� of� the� group� that� the� individual� is� a� mem- ber� of� (i�e�,� group� j)�� The� group� effect� can� also� be� thought� of� as� the� average� effect� of� being� a� member� of� a� particular� group�� A� positive� group� effect� implies� a� group� mean� greater� than� the� overall� mean,� whereas� a� negative� group� effect� implies� a� group� mean� less�than�the�overall�mean��Note�that�in�a�one-factor�fixed-effects�model,�the�population� group�effects�sum�to�0��The�residual�error�in�ANOVA�represents�that�portion�of�Y�not� accounted�for�by�X� 11.4.2 estimation of the parameters of the Model Next�we�need�to�estimate�the�parameters�of�the�model�μ,�αj,�and�ɛij��The�sample�estimates� are�represented�by�Y – ��,�aj,�and�eij,�respectively,�where�the�latter�two�are�computed�as 303One-Factor Analysis of Variance: Fixed-Effects Model a Y Yj j= −� �� and e Y Yij ij j= − � respectively��Note�that�Y – ��represents�the�overall�sample�mean,�where�the�double�dot�sub- script�indicates�we�have�averaged�across�both�the�i�and�j�subscripts,�and�Y – �j�represents�the� sample�mean�for�group�j,�where�the�initial�dot�subscript�indicates�we�have�averaged�across� all�i�individuals�in�group�j� 11.4.3 effect Size Measures, Confidence Intervals, and power 11.4.3.1 Effect Size Measures There�are�various�effect�size�measures�to�indicate�the�strength�of�association�between�X�and�Y,� that�is,�the�relative�strength�of�the�group�effect��Let�us�briefly�examine�η2,�ω2,�and�Cohen’s� (1988)� f�� First,� η2� (eta� squared),� ranging� from� 0� to� +1�00,� is� known� as� the� correlation� ratio� (generalization� of� R2)� and� represents� the� proportion� of� variation� in� Y� explained� by� the� group�mean�differences�in�X��An�eta�squared�of�0�suggests�that�none�of�the�total�variance� in�the�dependent�variable�is�due�to�differences�between�the�groups��An�eta�squared�of�1�00� indicates�that�all�the�variance�in�the�dependent�variable�is�due�to�the�group�mean�differ- ences��We�find�η2�to�be�as�follows: η2 = SS SS betw total It�is�well�known�that�η2�is�a�positively�biased�statistic�(i�e�,�overestimates�the�association)�� The�bias�is�most�evident�for�n’s�(i�e�,�group�sample�sizes)�less�than�30� Another�effect�size�measure�is�ω2�(omega�squared),�interpreted�similarly�to�eta�squared� (specifically�proportion�of�variation�in�Y�explained�by�the�group�mean�differences�in�X)� but�which�is�less�biased�than�η2��We�determine�ω2�through�the�following�formula: ω2 1 = − − + SS J MS SS MS betw with total with ( ) A�final�effect�size�measure�is�f�developed�by�Cohen�(1988)��The�effect�f�can�take�on�values� from�0�(when�the�means�are�equal)�to�an�infinitely�large�positive�value��This�effect�is�inter- preted� as� an� approximate� correlation� index� but� can� also� be� interpreted� as� the� standard� deviation�of�the�standardized�means�(Cohen,�1988)��We�compute�f�through�the�following: f = − η η 2 21 We�can�also�use�f�to�compute�the�effect�size�d,�which�you�recall�from�the�t�test�is�interpreted� as�the�standardized�mean�difference��The�formulas�for�translating�f�to�d�are�dependent�on� whether�there�is�minimum,�moderate,�or�maximum�variability�between�the�means�of�the� groups��Interested�readers�are�referred�to�Cohen�(1988)� 304 An Introduction to Statistical Concepts These� are� the� most� common� measures� of� effect� size� used� for� ANOVA� models,� both� in�statistics�software�and�in�print��Cohen’s�(1988)�subjective�standards�can�be�used�as� follows�to�interpret�these�effect�sizes:�small�effect,�f�=��1,�η2�or�ω2�=��01;�medium�effect,� f�=��25,�η2�or�ω2�=��06;�and�large�effect,�f�=��40,�η2�or�ω2�=��14��Note�that�these�are�subjective� standards� developed� for� the� behavioral� sciences;� your� discipline� may� use� other� stan- dards��For�further�discussion,�see�Keppel�(1982),�O’Grady�(1982),�Wilcox�(1987),�Cohen� (1988),� Keppel� and� Wickens� (2004),� and� Murphy,� Myors,� and� Wolach� (2008;� which� includes�software)� 11.4.3.2 Confidence Intervals CI�procedures�are�often�useful�in�providing�an�interval�estimate�of�a�population�parameter� (i�e�,� mean� or� mean� difference);� these� allow� us� to� determine� the� accuracy� of� the� sample� estimate��One�can�form�CIs�around�any�sample�group�mean�from�an�ANOVA�(provided�in� software�such�as�SPSS),�although�CIs�for�means�have�more�utility�for�MCPs,�as�discussed� in� Chapter� 12�� CI� procedures� have� also� been� developed� for� several� effect� size� measures� (Fidler�&�Thompson,�2001;�Smithson,�2001)� 11.4.3.3 Power As� for� power� (the� probability� of� correctly� rejecting� a� false� null� hypothesis),� one� can� consider�either�planned�power�(a�priori)�or�observed�power�(post�hoc),�as�discussed�in� previous�chapters��In�the�ANOVA�context,�we�know�that�power�is�primarily�a�function� of� α,� sample� size,� and� effect� size�� For� planned� power,� one� inputs� each� of� these� compo- nents�either�into�a�statistical�table�or�power�chart�(nicely�arrayed�in�texts�such�as�Cohen,� 1988,� or� Murphy� et� al�,� 2008),� or� into� statistical� software� (such� as� Power� and� Precision,� Ex-Sample,�G*Power,�or�the�software�contained�in�Murphy�et�al�,�2008)��Planned�power� is�most�often�used�by�researchers�to�determine�adequate�sample�sizes�in�ANOVA�mod- els,� which� is� highly� recommended�� Many� disciplines� recommend� a� minimum� power� value,� such� as� �80�� Thus,� these� methods� are� a� useful� way� to� determine� the� sample� size� that� would� generate� a� desired� level� of� power�� Observed� power� is� determined� by� some� statistics�software,�such�as�SPSS,�and�indicates�the�power�that�was�actually�observed�in� a�completed�study� 11.4.4 example Consider�now�an�example�problem�used�throughout�this�chapter��Our�dependent�variable� is� the� number� of� times� a� student� attends� statistics� lab� during� one� semester� (or� quarter),� whereas�the�independent�variable�is�the�attractiveness�of�the�lab�instructor�(assuming�each� instructor�is�of�the�same�gender�and�is�equally�competent)��The�researcher�is�interested�in� whether�the�attractiveness�of�the�instructor�influences�student�attendance�at�the�statistics� lab��The�attractiveness�groups�are�defined�as�follows: •� Group�1,�unattractive •� Group�2,�slightly�attractive •� Group�3,�moderately�attractive •� Group�4,�very�attractive 305One-Factor Analysis of Variance: Fixed-Effects Model Students� were� randomly� assigned� to� one� group� at� the� beginning� of� the� semester,� and� attendance�was�taken�by�the�instructor��There�were�8�students�in�each�group�for�a�total�of� 32��Students�could�attend�a�maximum�of�30�lab�sessions��In�Table�11�3,�we�see�the�raw�data� and�sample�statistics�(means�and�variances)�for�each�group�and�overall�(far�right)� The�results�are�summarized�in�the�ANOVA�summary�table�as�shown�in�Table�11�4�� The� test� statistic,� F� =� 6�1877,� is� compared� to� the� critical� value,� �05F3,28� =� 2�95� obtained� from�Table�A�4,�using�the��05�level�of�significance��To�use�the�F�table,�find�the�numera- tor� degrees� of� freedom,� df betw,� which� are� represented� by� the� columns,� and� then� the� denominator�degrees�of�freedom,�dfwith,�which�are�represented�by�the�rows��The�inter- section� of� the� two� provides� the� F� critical� value�� The� test� statistic� exceeds� the� critical� value,�so�we�reject�H0�and�conclude�that�level�of�attractiveness�is�related�to�mean�dif- ferences� in� statistics� lab� attendance�� The� exact� probability� value� (p� value)� given� by� SPSS�is��001� Next�we�examine�the�group�effects�and�residual�errors��The�group�effects�are�estimated� as�follows�where�the�grand�mean�(irrespective�of�the�group�membership;�here�18�4063)�is� subtracted�from�the�group�mean�(e�g�,�11�125�for�group�1)��The�subscript�of�a�indicates�the� level�or�group�of�the�independent�variable�(e�g�,�1�=�unattractive;�2�=�slightly�attractive;� 3�=�moderately�attractive;�4�=�very�attractive)��A�negative�group�effect�indicates�that�group� had�a�smaller�mean�than�the�overall�average�and�thus�exerted�a�negative�effect�on�the�depen- dent�variable�(in�our�case,�lower�attendance�in�the�statistics�lab)��A�positive�group�effect�indi- Table 11.4 ANOVA�Summary�Table—Statistics�Lab�Example Source SS df MS F Between�groups 738�5938 3 246�1979 6�8177a Within�groups 1011�1250 28 36�1116 Total 1749�7188 31 a� �05F3,28�=�2�95� Table 11.3 Data�and�Summary�Statistics�for�the�Statistics�Lab�Example Number of Statistics Labs Attended by Group Group 1: Unattractive Group 2: Slightly Unattractive Group 3: Moderately Attractive Group 4: Very Attractive Overall 15 20 10 30 10 13 24 22 12 9 29 26 8 22 12 20 21 24 27 29 7 25 21 28 13 18 25 25 3 12 14 15 Means 11�1250 17�8750 20�2500 24�3750 18�4063 Variances 30�1250 35�2679 53�0714 25�9821 56�4425 306 An Introduction to Statistical Concepts cates�that�group�had�a�larger�mean�than�the�overall�average�and�thus�exerted�a�positive�effect� on�the�dependent�variable�(in�our�case,�higher�attendance�in�the�statistics�lab): a Y Y1 11.125 18.4063 7.2813= − = − = −. ..1 a Y Y2 17.875 18.4063 .5313= − = − = −. ..2 a Y Y3 20.250 18.4063 1.8437= − = − = +. ..3 a Y Y4 24.375 18.4063 5.9687= − = − = +. ..4 Thus,�group�4�(very attractive)�has�the�largest�positive�group�effect�(i�e�,�higher�attendance� than�average),�while�group�1�(unattractive)�has�the�largest�negative�group�effect�(i�e�,�lower� attendance� than� average)�� In� Chapter� 12,� we� use� the� same� data� to� determine� which� of� these�group�means,�or�combination�of�group�means,�are�statistically�different��The�resid- ual�errors�(computed�as�the�difference�between�the�observed�value�and�the�group�mean)� for�each�individual�by�group�are�shown�in�Table�11�5�and�discussed�later�in�this�chapter� Finally�we�determine�the�effect�size�measures��For�illustrative�purposes,�all�effect�size�mea- sures�that�were�previously�discussed�have�been�computed��In�practice,�only�one�effect�size� is�usually�computed�and�interpreted��First,�the�correlation�ratio�η2�is�computed�as�follows: η2 738 5938 1749 7188 4221= = = SS SS betw total . . . Next�ω2�is�found�to�be�the�following: ω2 1 738 5938 3 36 1116 174 = − − + = −SS J MS SS MS betw with total with ( ) . ( ) . 99 7188 36 1116 3529 . . . + = Lastly�f�is�computed�as�follows: f = − = − = η η 2 21 4221 1 4221 8546 . . . Table 11.5 Residuals�for�the�Statistics�Lab�Example� by Group Group 1 Group 2 Group 3 Group 4 3�875 2�125 −10�250 5�625 −1�125 −4�875 3�750 −2�375 �875 −8�875 8�750 1�625 −3�125 4�125 −8�250 −4�375 9�875 6�125 6�750 4�625 −4�125 7�125 �750 3�625 1�875 �125 4�750 �625 −8�125 −5�875 −6�250 −9�375 307One-Factor Analysis of Variance: Fixed-Effects Model Recall�Cohen’s�(1988)�subjective�standards�that�can�be�used�to�interpret�these�effect�sizes:� small�effect,�f�=��1,�η2�or�ω2�=��01;�medium�effect,�f�=��25,�η2�or�ω2�=��06;�and�large�effect,�f�=��40,� η2�or�ω2�=��14��Based�on�these�effect�size�measures,�all�measures�lead�to�the�same�conclusion:� there�is�a�large�effect�size�for�the�influence�of�instructor�attractiveness�on�lab�attendance�� Examining�η2�or�ω2,�we�can�also�state�that�42%�or�35%,�respectively,�of�the�variation�in� Y�(attendance�at�the�statistics�lab)�can�be�explained�by�X�(attractiveness�of�the�instructor)�� The�effect�f�suggests�a�strong�correlation� In� addition,� if� we� rank� the� instructor� group� means� from� unattractive� (with� the� lowest� mean)�to�very�attractive�(with�the�highest�mean),�we�see�that�the�more�attractive�the�instruc- tor,� the� more� inclined� the� student� is� to� attend� lab�� While� visual� inspection� of� the� means� suggests� descriptively� that� there� are� differences� in� statistics� lab� attendance� by� group,� we� examine�MCPs�with�these�same�data�in�Chapter�12�to�determine�which�groups�are�statisti- cally�significantly�different�from�each�other� 11.4.5 expected Mean Squares There�is�one�more�theoretical�concept�called�expected mean squares�to�introduce�in�this� chapter��The�notion�of�expected�mean�squares�provides�the�basis�for�determining�what�the� appropriate�error�term�is�when�forming�an�F�ratio�(recall�this�ratio�is�F�=�MSbetw/MSwith)�� That� is,� when� forming� an� F� ratio� to� test� a� certain� hypothesis,� how� do� we� know� which� source�of�variation�to�use�as�the�error�term�in�the�denominator?�For�instance,�in�the�one- factor� fixed-effects� ANOVA� model,� how� did� we� know� to� use� MSwith� as� the� error� term� in� testing�for�differences�between�the�groups?�There�is�a�good�rationale,�as�becomes�evident� Before�we�get�into�expected�mean�squares,�consider�the�definition�of�an�expected�value�� An�expected�value�is�defined�as�the�average�value�of�a�statistic�that�would�be�obtained�with� repeated�sampling��Using�the�sample�mean�as�an�example�statistic,�the�expected�value�of� the�mean�would�be�the�average�value�of�the�sample�means�obtained�from�an�infinite�num- ber�of�samples��The�expected�value�of�a�statistic�is�also�known�as�the�mean�of�the�sampling� distribution�of�that�statistic��In�this�case,�the�expected�value�of�the�mean�is�the�mean�of�the� sampling�distribution�of�the�mean� An�expected�mean�square�for�a�particular�source�of�variation�represents�the�average�mean� square�value�for�that�source�obtained�if�the�same�study�were�to�be�repeated�an�infinite�num- ber�of�times��For�instance,�the�expected�value�of�MSbetw,�denoted�by�E(MSbetw),�is�the�average� value�of�MSbetw�over�repeated�samplings��At�this�point,�you�might�be�asking,�“why�not�only� be�concerned�about�the�values�of�the�mean�square�terms�for�my�own�little�study”?�Well,�the� mean�square�terms�from�your�little�study�do�represent�a�sample�from�a�population�of�mean� square�terms��Thus,�sampling�distributions�and�sampling�variability�are�as�much�a�concern� in�ANOVA�as�they�are�in�other�situations�previously�described�in�this�text� Now� we� are� ready� to� see� what� the� expected� mean� square� terms� actually� look� like�� Consider� the� two� situations� of� H0� actually� being� true� and� H0� actually� being� false�� If� H0� is� actually� true,� such� that� there� really� are� no� differences� between� the� population� group� means,�then�the�expected mean squares�[represented�in�statistical�notation�as�either�E(MSbetw)� or�E(MSwith)]�are�as�follows: E betw( )MS = σε 2 E with( )MS = σε 2 308 An Introduction to Statistical Concepts and�thus�the�ratio�of�expected�mean�squares�is�as�follows: E Ebetw with( )/ ( )MS MS = 1 where�the�expected�value�of�F�is�then�E(F)�=�dfwith/(dfwith�−�2),�and�σε 2�is�the�population�vari- ance�of�the�residual�errors��What�this�tells�us�is�the�following:�if�H0�is�actually�true,�then� each�of�the�J�samples�really�comes�from�the�same�population�with�mean�μ� If�H0�is�actually�false,�such�that�there�really�are�differences�between�the�population�group� means,�then�the�expected�mean�squares�are�as�follows: E betwMS n Jj j J ( ) = +         − = ∑σ αε2 2 1 1/( ) E withMS( ) = σε2 and�thus�the�ratio�of�the�expected�mean�squares�is�as�follows: E Ebetw with( )/ ( )MS MS > 1
where�E(F)�>�dfwith/(dfwith�−�2)��If�H0�is�actually�false,�then�the�J�samples�do�really�come�from�
different�populations�with�different�means�μj �
There�is� a�difference� in�the�expected� mean�square�between� [i�e�,�E(MSbetw)]�when�H0�is�
actually�true�as�compared�to�when�H0�is�actually�false,�as�in�the�latter�situation,�there�is�a�
second�term��The�important�part�of�this�second�term�is� α j
j
J
2
1=
∑ ,�which�represents�the�sum�of�
the�squared�group�effects��The�larger�this�part�becomes,�the�larger�MSbetw�is,�and�thus�the�
larger�the�F�ratio�becomes��In�comparing�the�two�situations,�we�also�see�that�E(MSwith)�is�
the�same�whether�H0�is�actually�true�or�false�and�thus�represents�a�reliable�estimate�of�σε
2��
This�term�is�mean-free�because�it�does�not�depend�on�group�mean�differences��Just�to�cover�
all�of�the�possibilities,�F�could�be�less�than�1�[or�technically�less�than�dfwith/(dfwith�−�2)]�due�
to�sampling�error,�nonrandom�samples,�and/or�assumption�violations��For�a�mathematical�
proof�of�the�E(MS)�terms,�see�Kirk�(1982,�pp��66–71)�
Finally�let�us�try�to�put�all�of�this�information�together��In�general,�the�F�ratio�represents�
the�following:
F = +(systematic variability error variability)/(error variabiility)
where,�for�the�one-factor�fixed-effects�model,�systematic variability�is�variability�between�the�
groups�and�error variability�is�variability�within�the�groups��The�F�ratio�is�formed�in�a�par-
ticular� way� because� we� want� to� isolate� the� systematic� variability� in� the� numerator�� For�
this� model,� the� only� appropriate� F� ratio� is� MSbetw/MSwith� because� it� does� serve� to� isolate�
the�systematic�variability�(i�e�,�the�variability�between�the�groups)��That�is,�the�appropri-
ate�error�term�for�testing�a�particular�effect�(e�g�,�mean�differences�between�groups)�is�the�
mean�square�that�is�identical�to�the�mean�square�of�that�effect,�except�that�it�lacks�a�term�
due�to�the�effect�of�interest��For�this�model,�the�appropriate�error�term�to�use�for�testing�

309One-Factor Analysis of Variance: Fixed-Effects Model
differences�between�groups�is�the�mean�square�identical�to�the�numerator�MSbetw,�except�
it�lacks�a�term�due�to�the�between�groups�effect�[i�e�,� n Jj
j
J
α 2
1
1
=
∑








−/( )];�this,�of�course,�is�
MSwith��It�should�also�be�noted�that�the�F�ratio�is�a�ratio�of�two�independent�variance�esti-
mates,�here�being�MSbetw�and�MSwith�
11.5 Assumptions and Violation of Assumptions
There�are�three�standard�assumptions�made�in�ANOVA�models,�which�we�are�already�famil-
iar�with�from�the�independent�t�test��We�see�these�assumptions�often�in�the�remainder�of�this�
text��The�assumptions�are�concerned�with�independence,�homogeneity�of�variance,�and�nor-
mality��We�also�mention�some�techniques�appropriate�to�use�in�evaluating�each�assumption�
11.5.1 Independence
The� first� assumption� is� that� observations� are� independent� of� one� another� (both� within�
samples� and� across� samples)�� In� general,� the� assumption� of� independence� for� ANOVA�
designs� can� be� met� by� (a)� keeping� the� assignment� of� individuals� to� groups� separate�
through� the� design� of� the� experiment� (specifically� random� assignment—not� to� be� con-
fused�with�random�selection),�and�(b)�keeping�the�individuals�separate�from�one�another�
through�experimental�control�so�that�the�scores�on�the�dependent�variable�Y�for�group�1�
do�not�influence�the�scores�for�group�2�and�so�forth�for�other�groups�of�the�independent�
variable��Zimmerman�(1997)�also�stated�that�independence�can�be�violated�for�supposedly�
independent� samples� due� to� some� type� of� matching� in� the� design� of� the� experiment�
(e�g�,�matched�pairs�based�on�gender,�age,�and�weight)�
The�use�of�independent�random�samples�is�crucial�in�ANOVA��The�F�ratio�is�very�sensi-
tive�to�violation�of�the�independence�assumption�in�terms�of�increased�likelihood�of�a�Type�I�
and/or�Type�II�error�(e�g�,�Glass,�Peckham,�&�Sanders,�1972)��This�effect�can�sometimes�even�
be� worse� with� larger� samples� (Keppel� &� Wickens,� 2004)�� A� violation� of� the� independence�
assumption� may� affect� the� standard� errors� of� the� sample� means� and� thus� influence� any�
inferences�made�about�those�means��One�purpose�of�random�assignment�of�individuals�to�
groups�is�to�achieve�independence��If�each�individual�is�only�observed�once�and�individuals�
are�randomly�assigned�to�groups,�then�the�independence�assumption�is�usually�met��If�indi-
viduals�work�together�during�the�study�(e�g�,�through�discussion�groups�or�group�work),�then�
independence� may� be� compromised�� Thus,� a� carefully� planned,� controlled,� and� conducted�
research�design�is�the�key�to�satisfying�this�assumption�
The�simplest�procedure�for�assessing�independence�is�to�examine�residual�plots�by�group��
If�the�independence�assumption�is�satisfied,�then�the�residuals�should�fall�into�a�random�
display�of�points�for�each�group��If�the�assumption�is�violated,�then�the�residuals�will�fall�
into�some�type�of�pattern��The�Durbin–Watson�statistic�(1950,�1951,�1971)�can�be�used�to�test�
for� autocorrelation�� Violations� of� the� independence� assumption� generally� occur� in� three�
situations:�(1)�when�observations�are�collected�over�time,�(2)�when�observations�are�made�
within� blocks,� or� (3)� when� observation� involves� replication�� For� severe� violations� of� the�
independence�assumption,�there�is�no�simple�“fix”�(e�g�,�Scariano�&�Davenport,�1987)��For�
the�example�data,�a�plot�of�the�residuals�by�group�is�shown�in�Figure�11�2,�and�there�does�
appear�to�be�a�random�display�of�points�for�each�group�

310 An Introduction to Statistical Concepts
11.5.2 homogeneity of Variance
The� second� assumption� is� that� the� variances� of� each� population� are� equal�� This� is�
known�as�the�assumption�of�homogeneity of variance�or�homoscedasticity��A�viola-
tion� of� the� homogeneity� assumption� can� lead� to� bias� in� the� SSwith� term,� as� well� as� an�
increase�in�the�Type�I�error�rate�and�possibly�an�increase�in�the�Type�II�error�rate��Two�
sets� of� research� studies� have� investigated� violations� of� this� assumption,� classic� work�
and�more�modern�work�
The� classic� work� largely� resulted� from� Box� (1954a)� and� Glass� et� al�� (1972)�� Their� results�
indicated�that�the�effect�of�the�violation�was�small�with�equal�or�nearly�equal�n’s�across�the�
groups��There�is�a�more�serious�problem�if�the�larger�n’s�are�associated�with�the�smaller�vari-
ances�(actual�observed�α�>�nominal�α,�which�is�a�liberal�result;�for�example,�if�a�researcher�
desires�a�nominal�alpha�of��05,�the�alpha�actually�observed�will�be�greater�than��05),�or�if�the�
larger� n’s� are� associated� with� the� larger� variances� (actual� observed� α� <� nominal� α,� which� is�a�conservative�result)��[Note�that�Bradley’s�(1978)�criterion�is�used�in�this�text,�where�the� actual�α�should�not�exceed�1�1–1�5�times�the�nominal�α�]�Thus,�the�suggestion�from�the�classic� work�was�that�heterogeneity�was�only�a�concern�when�there�were�unequal�n’s��However,�the� classic�work�only�examined�minor�violations�of�the�assumption�(the�ratio�of�largest�variance� to�smallest�variance�being�relatively�small),�and�unfortunately,�has�been�largely�adapted�in� textbooks�and�by�users� There� has� been� some� research� conducted� since� that� time� by� researchers� such� as� Brown�and�Forsythe�(1974)�and�Wilcox�(1986,�1987,�1988,�1989)�and�nicely�summarized�by� Coombs,�Algina,�and�Ottman�(1996)��In�short,�this�more�modern�work�indicates�that�the� effect�of�heterogeneity�is�more�severe�than�previously�thought�(e�g�,�poor�power;�α�can�be� greatly�affected),�even�with�equal�n’s�(although�having�equal�n’s�does�reduce�the�magni- tude�of�the�problem)��Thus,�F�is�not�even�robust�to�heterogeneity�with�equal�n’s�(equal�n’s� are�sometimes�referred�to�as�a�balanced�design)��Suggestions�for�dealing�with�such�a�vio- lation�include�(a)�using�alternative�procedures�such�as�the�Welch,�Brown–Forsythe,�and� 10.000 5.000 .000 –5.000 –10.000 Re si du al fo r l ab s –15.000 1.00 1.50 2.00 2.50 Level of attractiveness 3.00 3.50 4.00 FIGuRe 11.2 Residual�plot�by�group�for�statistics�lab�example� 311One-Factor Analysis of Variance: Fixed-Effects Model James�procedures�(e�g�,�Coombs�et�al�,�1996;�Glass�&�Hopkins,�1996;�Keppel�&�Wickens,� 2004;�Myers�&�Well,�1995;�Wilcox,�1996,�2003);�(b)�reducing�α�and�testing�at�a�more�strin- gent� alpha� level� (e�g�,� �01� rather� than� the� common� �05)� (e�g�,� Keppel� &� Wickens,� 2004;� Weinberg� &� Abramowitz,� 2002);� or� (c)� transforming� Y� (such� as� Y ,� 1/Y,� or� log� Y)� (e�g�,� Keppel� &� Wickens,� 2004;� Weinberg� &� Abramowitz,� 2002)�� The� alternative� procedures� will�be�more�fully�described�later�in�this�chapter� In�a�plot�of�residuals�versus�each�value�of�X,�the�consistency�of�the�variance�of�the�con- ditional� residual� distributions� may� be� examined� simply� by� eyeballing� the� plot�� Another� method� for� detecting� violation� of� the� homogeneity� assumption� is� the� use� of� formal� sta- tistical� tests,� as� discussed� in� Chapter� 9�� The� traditional� homogeneity� tests� (e�g�,� Levene’s� test)� are� commonly� available� in� statistical� software,� but� are� not� robust� to� nonnormality�� Unfortunately�the�more�robust�homogeneity�tests�are�not�readily�available��For�the�exam- ple� data,� the� residual� plot� of� Figure� 11�2� shows� similar� variances� across� the� groups,� and� Levene’s�test�suggests�the�variances�are�not�different�[F(3,�28)�=��905,�p�=��451]� 11.5.3 Normality The�third�assumption�is�that�each�of�the�populations�follows�the�normal�distribution�(i�e�,�there� is�normality�of�the�dependent�variable�for�each�category�or�group�or�level�of�the�indepen- dent�variable)��The�F�test�is�relatively�robust�to�moderate�violations�of�this�assumption� (i�e�,�in�terms�of�Type�I�and�II�error�rates)��Specifically,�effects�of�the�violation�will�be�mini- mal�except�for�small�n’s,�for�unequal�n’s,�and/or�for�extreme�nonnormality��Violation�of�the� normality�assumption�may�be�a�result�of�outliers��The�simplest�outlier�detection�procedure� is�to�look�for�observations�that�are�more�than�two�or�three�standard�deviations�from�their� respective�group�mean��We�recommend�(and�will�illustrate�later)�inspection�of�residuals�for� examination�of�evidence�of�normality��Formal�procedures�for�the�detection�of�outliers�are� now�available�in�many�statistical�packages� The� following� graphical� techniques� can� be� used� to� detect� violations� of� the� normality� assumption:�(a)�the�frequency�distributions�of�the�scores�or�the�residuals�for�each�group� (through�stem-and-leaf�plots,�boxplots,�histograms,�or�residual�plots),�(b)�the�normal�prob- ability� or� quantile–quantile� (Q–Q)� plot,� or� (c)� a� plot� of� group� means� versus� group� vari- ances� (which� should� be� independent� of� one� another)�� There� are� also� several� statistical� procedures�available�for�the�detection�of�nonnormality�[e�g�,�the�Shapiro–Wilk�(S–W)�test,� 1965]��Transformations�can�also�be�used�to�normalize�the�data��For�instance,�a�nonlinear� relationship�between�X�and�Y�may�result�in�violations�of�the�normality�and/or�homosce- dasticity�assumptions��Readers�interested�in�learning�more�about�potential�data�transfor- mations� are� referred� to� sources� such� as� Bradley� (1982),� Box� and� Cox� (1964),� or� Mosteller� and�Tukey�(1977)� In�the�example�data,�the�residuals�shown�in�Figure�11�2�appear�to�be�somewhat�normal� in�shape,�especially�considering�the�groups�have�fairly�small�n’s��This�is�suggested�by�the� random�display�of�points��In�addition,�for�the�residuals�overall,�skewness�=�−�2389�and�kur- tosis�=�−�0191,�indicating�a�small�departure�from�normality��Thus,�it�appears�that�all�of�our� assumptions�have�been�satisfied�for�the�example�data��We�will�delve�further�into�examina- tion�of�assumptions�later�as�we�illustrate�how�to�use�SPSS�to�conduct�a�one-way�ANOVA� A� summary� of� the� assumptions� and� the� effects� of� their� violation� for� the� one-factor� ANOVA�design�are�presented�in�Table�11�6��Note�that�in�some�texts,�the�assumptions�are� written�in�terms�of�the�residuals�rather�than�the�raw�scores,�but�this�makes�no�difference� for�our�purposes� 312 An Introduction to Statistical Concepts 11.6 Unequal n’s or Unbalanced Procedure Up� to� this� point� in� the� chapter,� we� have� only� considered� the� equal� n’s� or� balanced� case� where�the�number�of�observations� is� equal�for�each�group��This�was�done�only�to�make� things�simple�for�presentation�purposes��However,�we�do�not�need�to�assume�that�the�n’s� must�be�equal�(as�some�textbooks�incorrectly�do)��This�section�briefly�describes�the�unequal� n’s� or� unbalanced case�� For� our� purposes,� the� major� statistical� software� can� handle� the� analysis�of�this�case�for�the�one-factor�ANOVA�model�without�any�special�attention��Thus,� interpretation�of�the�analysis,�the�assumptions,�and�so�forth�are�the�same�as�with�the�equal� n’s�case��However,�once�we�get�to�factorial�designs�in�Chapter�13,�things�become�a�bit�more� complicated�for�the�unequal�n’s�or�unbalanced�case� 11.7 Alternative ANOVA Procedures There�are�several�alternatives�to�the�parametric�one-factor�fixed-effects�ANOVA��These� include�the�Kruskal�and�Wallis�(1952)�one-factor�ANOVA,�the�Welch�(1951)�test,�the�Brown� and�Forsythe�(1974)�procedure,�and�the�James�(1951)�procedures��You�may�recognize�the� Welch�and�Brown–Forsythe�procedures�as�similar�alternatives�to�the�independent�t�test� 11.7.1 kruskal–Wallis Test The�Kruskal–Wallis� test�makes� no�normality�assumption�about�the�population�distribu- tions,�although�it�assumes�similar�distributional�shapes,�but�still�assumes�equal�popula- tion� variances� across� the� groups� (although� heterogeneity� does� have� some� effect� on� this� test,�it�is�less�than�with�the�parametric�ANOVA)��When�the�normality�assumption�is�met,� or�nearly�so�(i�e�,�with�mild�nonnormality),�the�parametric�ANOVA�is�slightly�more�pow- erful� than� the� Kruskal–Wallis� test� (i�e�,� less� likelihood� of� a� Type� II� error)�� Otherwise� the� Kruskal–Wallis�test�is�more�powerful� Table 11.6 Assumptions,�Evidence�to�Examine,�and�Effects�of�Violations:�One-Factor�ANOVA�Design Assumption Evidence to Examine Effect of Assumption Violation Independence •�Scatterplot�of�residuals�by�group Increased�likelihood�of�a�Type�I�and/or� Type�II�error�in�the�F�statistic;�influences� standard�errors�of�means�and�thus� inferences�about�those�means Homogeneity� of�variance •�Scatterplot�of�residuals�by�X •��Formal�test�of�equal�variances�(e�g�, Levene’s� test) Bias�in�SSwith;�increased�likelihood�of�a� Type I�and/or�Type�II�error;�less�effect� with�equal�or�nearly�equal�n’s;�effect� decreases�as�n�increases Normality •��Graphs�of�residuals�(or�scores)�by�group�(e�g�,� boxplots,�histograms,�stem-and-leaf�plots) •�Skewness�and�kurtosis�of�residuals •�Q–Q�plots�of�residuals •�Formal�tests�of�normality�of�residuals •�Plot�of�group�means�by�group�variances Minimal�effect�with�moderate�violation;� effect�less�severe�with�large�n’s,�with�equal� or�nearly�equal�n’s,�and/or�with� homogeneously�shaped�distributions 313One-Factor Analysis of Variance: Fixed-Effects Model The� Kruskal–Wallis� procedure� works� as� follows�� First,� the� observations� on� the� depen- dent�measure�are�rank�ordered,�regardless�of�group�assignment�(the�ranking�is�done�by� the�computer)��That�is,�the�observations�are�ranked�from�highest�to�lowest,�disregarding� group� membership�� The� procedure� essentially� tests� whether� the� mean� ranks� are� differ- ent�across�the�groups�such�that�they�are�unlikely�to�represent�random�samples�from�the� same� population�� Thus,� according� to� the� null� hypothesis,� the� mean� rank� is� the� same� for� each�group,�whereas�for�the�alternative�hypothesis,�the�mean�rank�is�not�the�same�across� groups��The�test�statistic�is�denoted�by�H�and�is�compared�to�the�critical�value�α χ J −1 2 ��The� null�hypothesis�is�rejected�if�the�test�statistic�H�exceeds�the�χ2�critical�value� There� are� two� situations� to� consider� with� this� test�� First,� the� χ2� critical� value� is� really� only� appropriate�when�there�are�at�least�three�groups�and�at�least�five�observations�per�group�(i�e�,�the� χ2�is�not�an�exact�sampling�distribution�of�H)��The�second�situation�is�that�when�there�are�tied� ranks,�the�sampling�distribution�of�H�can�be�affected��Typically�a�midranks�procedure�is�used,� which�results�in�an�overly�conservative�Kruskal–Wallis�test��A�correction�for�ties�is�commonly� used��Unless�the�number�of�ties�is�relatively�large,�the�effect�of�the�correction�is�minimal� Using�the�statistics�lab�data�as�an�example,�we�perform�the�Kruskal–Wallis�ANOVA��The� test�statistic�H�=�13�0610�is�compared�with�the�critical�value�. .05 3 2 7 81χ = ,�from�Table�A�3,�and� the�result�is�that�H0�is�rejected�(p�=��005)��Thus,�the�Kruskal–Wallis�result�agrees�with�the�result� of�the�parametric�ANOVA��This�should�not�be�surprising�because�the�normality�assumption� apparently�was�met��Thus,�we�would�probably�not�have�done�the�Kruskal–Wallis�test�for�the� example�data��We�merely�provide�it�for�purposes�of�explanation�and�comparison� In� summary,� the� Kruskal–Wallis� test� can� be� used� as� an� alternative� to� the� parametric� one-factor�ANOVA�under�nonnormality�and/or�when�data�on�the�dependent�variable�are� ordinal��Under�normality�and�with�interval/ratio�dependent�variable�data,�the�parametric� ANOVA�is�more�powerful�than�the�Kruskal–Wallis�test�and�thus�is�the�preferred�method� 11.7.2 Welch, brown–Forsythe, and James procedures Next� we� briefly� consider� the� following� procedures� for� the� heteroscedasticity� condition:� the�Welch�(1951)�test,�the�Brown�and�Forsythe�(1974)�procedure,�and�the�James�(1951)�first-� and�second-order�procedures�(more�fully�described�by�Coombs�et�al�,�1996;�Myers�&�Well,� 1995;�Wilcox,�1996,�2003)��These�procedures�do�not�require�homogeneity��Current�research� suggests�that�(a)�under�homogeneity,�the�F�test�is�slightly�more�powerful�than�any�of�these� procedures,�and�(b)�under�heterogeneity,�each�of�these�alternative�procedures�is�more�pow- erful�than�the�F,�although�the�choice�among�them�depends�on�several�conditions,�making�a� recommendation�among�these�alternative�procedures�somewhat�complicated�(e�g�,�Clinch� &�Keselman,�1982;�Coombs�et�al�,�1996;�Tomarken�&�Serlin,�1986)��The�Kruskal–Wallis�test� is�widely�available� in�the�major�statistical�software,�and�the�Welch�and�Brown–Forsythe� procedures� are� available� in� the� SPSS� one-way� ANOVA� module�� Wilcox� (1996,� 2003)� also� provides�assistance�for�these�alternative�procedures� 11.8 SPSS and G*Power Next�we�consider�the�use�of�SPSS�for�the�statistics�lab�example��Instructions�for�determining� the�one-way�ANOVA�using�SPSS�are�presented�first,�followed�by�additional�steps�for�examin- ing�the�assumptions�for�the�one-way�ANOVA��Next,�instructions�for�computing�the�Kruskal– Wallis�and�Brown�and�Forsythe�are�presented��Finally�we�return�to�G*Power�for�this�model� 314 An Introduction to Statistical Concepts One-Way ANOVA Note� that� SPSS� needs� the� data� to� be� in� a� specific� form� for� any� of� the� following� analyses� to� proceed,� which� is� different� from� the� layout� of� the� data� in� Table� 11�1�� For� a� one-factor� ANOVA,� the� dataset� must� consist� of� at� least� two� variables� or� columns�� One� column� or� variable� indicates� the� levels� or� categories� of� the� independent� variable,� and� the� second� is� for�the�dependent�variable��Each�row�then�represents�one�individual,�indicating�the�level� or�group�that�individual�is�a�member�of�(1,�2,�3,�or�4�in�our�example),�and�their�score�on�the� dependent�variable��Thus,�we�wind�up�with�two�long�columns�of�group�values�and�scores� as�shown�in�the�following�screenshot� The “independent variable” is labeled “Group” where each value represents the attractiveness of the statistics lab instructor to which the student was assigned. One, you recall, represented “unattractive”. Thus there were eight students randomly assigned to an “unattractive” instructor. Since each of these eight students was in the same group, each is coded with the same value (1, which represents that their group was assigned to an “unattractive” instructor). The “dependent variable” is “Labs” and represents the number of statistics labs the student attended. The other groups (2, 3, and 4) follow this pattern as well. 315One-Factor Analysis of Variance: Fixed-Effects Model Step 1.� To� conduct� a� one-way� ANOVA,� go� to� “Analyze”� in� the� top� pulldown� menu,� then�select�“General Linear Model,”�and�then�select�“Univariate.”�Following�the� screenshot�(step�1)�as�follows�produces�the�“Univariate”�dialog�box� A B C One-way ANOVA: Step 1 Step 2.�Click�the�dependent�variable�(e�g�,�number�of�statistics�labs�attended)�and�move� it�into�the�“Dependent Variable”�box�by�clicking�the�arrow�button��Click�the�indepen- dent�variable�(e�g�,�level�of�attractiveness)�and�move�it�into�the�“Fixed Factors”�box�by� clicking�the�arrow�button��Next,�click�on�“Options.” Select the dependent variable from the list on the left and use the arrow to move to the “Dependent Variable” box on the right. Select the independent variable from the list on the left and use the arrow to move to the“Fixed Factor(s)” box on the right. Clicking on “Plots” will allow you to generate profile plots. Clicking on “Save” will allow you to save various forms of residuals, among other variables. Clicking on “Options” will allow you to obtain a number of other statistics (e.g., descriptive statistics, effect size, power, homogeneity tests). One-way ANOVA: Step 2 316 An Introduction to Statistical Concepts Step 3.� Clicking� on� “Options”� will� provide� the� option� to� select� such� information� as�“Descriptive Statistics,” “Estimates of effect size,” “Observed power,”�and�“Homogeneity tests”�(i�e�,�Levene’s�test�for�equal�variances)�(those�are� the� options� that� we� typically� utilize)�� Click� on� “Continue”� to� return� to� the� original� dialog�box� Select from the list on the left those variables that you wish to display means for and use the arrow to move to the “Display Means for” box on the right. One-way ANOVA: Step 3 Step 4.�From�the�“Univariate”�dialog�box,�click�on�“Plots”�to�obtain�a�profile�plot�of� means��Click�the�independent�variable�(e�g�,�level�of�attractiveness�labeled�as�“Group”)�and� move�it�into�the�“Horizontal Axis”�box�by�clicking�the�arrow�button�(see�screenshot� step� 4a)�� Then� click� on�“Add”� to� move� the� variable� into� the�“Plots”� box� at� the� bottom� of�the�dialog�box�(see�screenshot�step�4b)��Click�on�“Continue”�to�return�to�the�original� dialog�box� 317One-Factor Analysis of Variance: Fixed-Effects Model Select the independent variable from the list on the left and use the arrow to move to the “Horizontal Axis” box on the right. Then click “Add” to move the variable into the “Plots” box at the bottom. One-way ANOVA: Step 4b One-way ANOVA: Step 4a Step 5.� From� the� “Univariate”� dialog� box,� click� on�“Save”� to� select� those� elements� that�you�want�to�save�(in�our�case,�we�want�to�save�the�unstandardized�residuals�which� will�be�used�later�to�examine�the�extent�to�which�normality�and�independence�are�met)�� From�the�“Univariate”�dialog�box,�click�on�“OK”�to�return�to�generate�the�output� One-way ANOVA: Step 5 Interpreting the output:� Annotated� results� are� presented� in� Table� 11�7,� and� the� profile�plot�is�shown�in�Figure�11�3� 318 An Introduction to Statistical Concepts Table 11.7 Selected�SPSS�Results�for�the�Statistics�Lab�Example Between-Subjects Factors Value Label N 1.00 Unattractive 8 2.00 Slightly attractive 8 3.00 Moderately attractive 8 Level of attractiveness 4.00 Very attractive 8 Descriptive Statistics Dependent Variable: Number of Statistics Labs Attended Level of Attractiveness Mean Std. Deviation N Unattractive 11.1250 Slightly attractive 17.8750 Moderately attractive 20.2500 Very attractive 24.3750 Total 18.4062 5.48862 5.93867 7.28501 5.09727 7.51283 8 8 8 8 32 Levene's Test of Equality of Error Variancesa Dependent Variable: Number of Statistics Labs Attended F df 1 df 2 Sig. .905 3 28 .451 Tests the null hypothesis that the error variance of the dependent variable is equal across groups. a Design: intercept + group The table labeled “Between- Subjects Factors” provides sample sizes for each of the categories of the independent variable (recall that the independent variable is the “between subjects factor”). The table labeled “Descriptive Statistics” provides basic descriptive statistics (means, standard deviations, and sample sizes) for each group of the independent variable. The F test (and associated p value) for Levene’s Test for Equality of Error Variances is reviewed to determine if equal variances can be assumed. In this case, we meet the assumption (as p is greater than α). Note that df 1 is degrees of freedom for the numerator (calculated as J – 1) and df 2 are the degrees of freedom for the denominator (calculated as N – J ). 319One-Factor Analysis of Variance: Fixed-Effects Model Table 11.7 (continued) Selected�SPSS�Results�for�the�Statistics�Lab�Example The row labeled “GROUP” is the independent variable or between-groups variable. The between-groups mean square (246.198) tells how much the group means vary. The degrees of freedom for between groups is J – 1 (3 in this example). The p value for the omnibus F test is .001. This indicates there is a statistically significant difference in the mean number of statistics labs attended based on attractiveness of the instructor. The probability of observing these mean differences or more extreme mean differences by chance if the null hypothesis is really true (i.e., if the means really are equal) is substantially less than 1%. We reject the null hypothesis that all the population means are equal. For this example, this provides evidence to suggest that number of stats labs attended differs based on attractiveness of the instructor. The omnibus F test is computed as: 246.198 36.112 6.818 MSbetw MSwith F = = = Partial eta squared is one measure of effect size: SSbetw 738.594 SStotal 1749.719 η2p = = = .422 We can interpret this to mean that approximately 42% of the variation in the dependent variable (in this case, number of statistics labs attended) is accounted for by the attractiveness of the statistics lab instructor. Tests of Between-Subjects Effects Dependent Variable: Number of Statistics Labs Attended Source Type III Sum of Squares df Mean Square F Sig. Partial Eta Squared Noncent. Parameter Observed Powerb Corrected model Intercept 738.594a 3 246.198 6.818 .001 .422 20.453 .956 10841.281 1 10841.281 300.216 .000 .915 300.216 1.000 Group 738.594 3 246.198 6.818 .001 .422 20.453 .956 Error 1011.125 28 36.112 Total 12591.000 32 Corrected total 1749.719 31 a R squared = .422 (adjusted R squared = .360). b Computed using alpha = .05. The row labeled “Error” is within groups. The within groups mean square tells us how much the observations within the groups vary (i.e., 36.112). The degrees of freedom for within groups is (N – J ) or the total sample size minus the number of levels of the independent variable. The row labeled “corrected total” is the sum of squares total. The degrees of freedom for the total is (N – 1) or the total sample size minus 1. Observed power tells whether our test is powerful enough to detect mean differences if they really exist. Power of .956 indicates that the probability of rejecting the null hypothesis if it is really false is about 96%; this represents strong power. R squared is listed as a footnote underneath the table. R squared is the ratio of sum of squares between divided by sum of squares total: and, in the case of one-way ANOVA, is also the simple bivariate Pearson correlation between the independent variable and dependent variable squared. R2 = = = .422 738.594 1749.719 SSbetw SStotal (continued) 320 An Introduction to Statistical Concepts Table 11.7 (continued) Selected�SPSS�Results�for�the�Statistics�Lab�Example Estimated Marginal Means 1. Grand Mean Dependent Variable: Number of Statistics Labs Attended 95% Confidence IntervalMean Std. Error Lower Bound Upper Bound 18.406 1.062 16.230 20.582 2. Level of Attractiveness Dependent Variable: Number of Statistics Labs Attended 95% Confidence Interval Level of Attractiveness Mean Std. Error Lower Bound Upper Bound Unattractive 11.125 2.125 6.773 15.477 Slightly attractive 17.875 2.125 13.523 22.227 Moderately attractive 20.250 2.125 15.898 24.602 Very attractive 24.375 2.125 20.023 28.727 The “Grand Mean” (in this case, 18.406) represents the overall mean, regardless of group membership, on the dependent variable. The 95% CI represents the CI of the grand mean. The table labeled “Level of attractiveness” provides descriptive statistics for each of the categories of the independent variable (notice that these are the same means reported previously). In addition to means, the SE and 95% CI of the means are reported. The Kruskal–Wallis procedure is shown here. The p value (denoted here as Asymp. sig. for asymptotic significance) is less than α, therefore the null hypothesis is also rejected for this nonparametric test. The Welch and Brown–Forsythe robust ANOVA procedures are shown here. For both tests, the p value is less than α, therefore the null hypothesis is also rejected for these robust tests. Chi-square df Test Statisticsa,b 13.061 dv dv 3 .005 Robust Tests of Equality of Means Asymp. sig. a Kruskal–Wallis test. b Grouping variable: group. Welch Brown–Forsythe 7.862 6.818 3 25.882 .002 .002 Sig.df 2df 1Statistica 15.4543 a Asymptotically F distributed. FIGuRe 11.3 Profile�plot�for�statistics�lab�example� 1.00 2.00 3.00 Level of attractiveness 4.00 24.00 22.00 20.00 18.00 Es tim at ed m ar gi na l m ea ns 16.00 14.00 10.00 12.00 Estimated marginal means of number of statistics labs attended 321One-Factor Analysis of Variance: Fixed-Effects Model Examining Data for Assumptions Normality The residuals are computed by subtracting the group mean from the dependent variable value for each observation. For example, the mean number of labs attended for group 1 was 11.125. The residual for person 1 is then (15 – 11.125 = 3.88). As we look at our raw data, we see a new variable has been added to our dataset labeled RES_1. This is our residual. The residual will be used to review the assumptions of normality and independence. Generating normality evidence:�As�alluded�to�earlier�in�the�chapter,�understand- ing� the� distributional� shape,� specifically� the� extent� to� which� normality� is� a� reasonable� assumption,�is�important��For�the�one-way�ANOVA,�the�distributional�shape�for�the�resid- uals�should�be�a�normal�distribution��We�can�again�use�“Explore”�to�examine�the�extent� to�which�the�assumption�of�normality�is�met� The� general� steps� for� accessing� “Explore”� have� been� presented� in� previous� chapters� and�will�not�be�repeated�here��Click�the�residual�and�move�it�into�the�“Dependent List”� box�by�clicking�on�the�arrow�button��The�procedures�for�selecting�normality�statistics�were� presented� in� Chapter� 6� and� remain� the� same� here:� Click� on� “Plots”� in� the� upper� right� corner��Place�a�checkmark�in�the�boxes�for�“Normality plots with tests”�and�also� for�“Histogram.”�Then�click�“Continue”�to�return�to�the�main�“Explore”�dialog�box�� Then�click�“OK”�to�generate�the�output� 322 An Introduction to Statistical Concepts Select residuals from the list on the left and use the arrow to move to the “Dependent List” box on the right. Then click on “Plots.” Generating normality evidence Interpreting normality evidence:� We� have� already� developed� a� good� under- standing�of�how�to�interpret�some�forms�of�evidence�of�normality�including�skewness�and� kurtosis,�histograms,�and�boxplots� Mean 95% Confidence interval Lower bound Upper boundfor mean 5% Trimmed mean Median Variance Std. deviation Minimum Maximum Range Interquartile range Skewness Kurtosis Residual for labs Descriptives Statistic Std. Error .0000 1.00959 –2.0591 2.0591 .0260 .8125 32.617 5.71112 –10.25 9.88 20.13 9.25 –.239 –1.019 .809 .414 The�skewness�statistic�of�the�residuals�is�−�239�and�kurtosis�is�−1�019—both�within�the� range�of�an�absolute�value�of�2�0,�suggesting�some�evidence�of�normality� The�histogram�of�residuals�is�not�exactly�what�most�researchers�would�consider�a�classic� normally�shaped�distribution,�but�it�approaches�a�normal�distribution�and�there�is�nothing� to�suggest�normality�may�be�an�unreasonable�assumption� 323One-Factor Analysis of Variance: Fixed-Effects Model 6 4 Fr eq ue nc y 2 0 –10.00 –5.00 .00 Residual for labs 5.00 10.00 Histogram Mean = –6.66E – 16 Std. dev. = 5.711 N = 32 There�are�a�few�other�statistics�that�can�be�used�to�gauge�normality��The�formal�test�of� normality,� the� S–W� test� (SW)� (Shapiro� &� Wilk,� 1965),� provides� evidence� of� the� extent� to� which� our� sample� distribution� is� statistically� different� from� a� normal� distribution�� The� output�for�the�S–W�test�is�presented�in�the�following�and�suggests�that�our�sample�distri- bution�for�residuals�is�not�statistically�significantly�different�than�what�would�be�expected� from�a�normal�distribution�(SW�=��958,�df�=�32,�p�=��240)� Tests of Normality Residual for labs a Lilliefors significance correction. * This is a lower bound of the true significance. Statistic Statisticdf dfSig. Sig. .112 32 .200 .958 32 .240 Shapiro–WilkKolmogorov–Smirnova Q–Q� plots� are� also� often� examined� to� determine� evidence� of� normality�� Q–Q� plots� are�graphs�that�plot�quantiles�of�the�theoretical�normal�distribution�against�quantiles� of�the�sample�distribution��Points�that�fall�on�or�close�to�the�diagonal�line�suggest�evi- dence�of�normality��The�Q–Q�plot�of�residuals�shown�in�the�following�suggests�relative� normality� 324 An Introduction to Statistical Concepts 2 1 0 –1 –2 –3 –15 –10 –5 0 Observed value 5 10 15 Ex pe ct ed n or m al Normal Q–Q plot of residual for labs Examination�of�the�following�boxplot�suggests�a�relatively�normal�distributional�shape� of�residuals�and�no�outliers� 10.00 5.00 .00 –5.00 –10.00 –15.00 Residual for labs Considering�the�forms�of�evidence�we�have�examined,�skewness�and�kurtosis�statistics,� the�S–W�test,�the�Q–Q�plot,�and�the�boxplot,�all�suggest�normality�is�a�reasonable�assump- tion��We�can�be�reasonably�assured�we�have�met�the�assumption�of�normality�of�the�depen- dent�variable�for�each�group�of�the�independent�variable� Independence The� only� assumption� we� have� not� tested� for� yet� is� independence�� If� subjects� have� been� randomly�assigned�to�conditions�(in�other�words,�the�different�levels�of�the�independent� 325One-Factor Analysis of Variance: Fixed-Effects Model variable),�the�assumption�of�independence�has�been�met��In�this�illustration,�students�were� randomly� assigned� to� instructor,� and� thus,� the� assumption� of� independence� was� met�� However,�we�often�use�independent�variables�that�do�not�allow�random�assignment,�such� as� preexisting� characteristics� such� as� education� level� (high� school� diploma,� bachelor’s,� master’s,� or� terminal� degrees)�� We� can� plot� residuals� against� levels� of� our� independent� variable�using�a�scatterplot�to�get�an�idea�of�whether�or�not�there�are�patterns�in�the�data� and� thereby� provide� an� indication� of� whether� we� have� met� this� assumption�� Remember� that� these� variables� were� added� to� the� dataset� by� saving� the� unstandardized� residuals� when�we�generated�the�ANOVA�model� Please�note�that�some�researchers�do�not�believe�that�the�assumption�of�independence� can� be� tested�� If� there� is� not� a� random� assignment� to� groups,� then� these� researchers� believe�this�assumption�has�been�violated—period��The�plot�that�we�generate�will�give� us�a�general�idea�of�patterns,�however,�in�situations�where�random�assignment�was�not� performed� The�general�steps�for�generating�a�simple�scatterplot�through�“Scatter/dot”�have� been�presented�in�a�previous�chapter�(e�g�,�Chapter�10),�and�they�will�not�be�reiterated� here��From�the�“Simple Scatterplot”�dialog�screen,�click�the�residual�variable�and� move�it�into�the�“Y Axis”�box�by�clicking�on�the�arrow��Click�the�independent�vari- able�(e�g�,�level�of�attractiveness)�and�move�it�into�the�“X Axis”�box�by�clicking�on�the� arrow��Then�click�“OK.” Double�click�on�the�graph�in�the�output�to�activate�the�chart�editor� 326 An Introduction to Statistical Concepts 10.00 5.00 .00 –5.00 Re si du al fo r l ab s –10.00 –15.00 1.00 1.50 2.00 2.50 Level of attractiveness 3.00 3.50 4.00 327One-Factor Analysis of Variance: Fixed-Effects Model 10.00 5.00 .00 –5.00 Re si du al fo r l ab s –10.00 –15.00 1.00 1.50 2.00 2.50 Level of attractiveness 3.00 3.50 4.00 Interpreting independence evidence:� In� examining� the� scatterplot� for� evi- dence� of� independence,� the� points� should� be� falling� relatively� randomly� above� and� below�the�reference�line��In�this�example,�our�scatterplot�suggests�evidence�of�indepen- dence�with�a�relatively�random�display�of�points�above�and�below�the�horizontal�line� at� 0�� Thus,� had� we� not� met� the� assumption� of� independence� through� random� assign- ment�of�cases�to�groups,�this�would�have�provided�evidence�that�independence�was�a� reasonable�assumption� Nonparametric Procedures Results�from�some�of�the�recommended�alternative�procedures�can�be�obtained�from�two� other� SPSS� modules�� Here� we� discuss� the� Kruskal–Wallis,� Welch,� and� Brown–Forsythe� procedures� Kruskal–Wallis Step 1:� To� conduct� a� Kruskal–Wallis� test,� go� to� the�“Analyze”� in� the� top� pulldown� menu,� then� select� “Nonparametric Tests,”� then� select� “Legacy Dialogs,”� and� finally�select�“K Independent Samples.”�Following�the�screenshot�(step�1)�as�follows� produces�the�“Tests for Several Independent Samples”�dialog�box� 328 An Introduction to Statistical Concepts A B C D Kruskal–Wallis: Step 1 Step 2:�Next,�from�the�main�“Tests for Several Independent Samples”�dia- log�box,�click�the�dependent�variable�(e�g�,�number�of�statistics�labs�attended)�and�move� it�into�the�“Test Variable List”�box�by�clicking�on�the�arrow�button��Next,�click�the� grouping� variable� (e�g�,� attractiveness� of� instructor)� and� move� it� into� the� “Grouping Variable”� box� by� clicking� on� the� arrow� button�� You� will� notice� that� there� are� two� question� marks� next� to� the� name� of� your� grouping� variable�� This� is� SPSS� letting� you� know�that�you�need�to�define�(numerically)�which�categories�of�the�grouping�variable� you�want�to�include�in�the�analysis�(this�must�be�done�by�identifying�a�range�of�values� for�all�groups�of�interest)��To�do�that,�click�on�“Define Range�”�We�have�four�groups� or�levels�of�our�independent�variable�(labeled�1,�2,�3,�and�4�in�our�raw�data);�thus,�enter� 1�as�the�minimum�and�4�as�the�maximum��In�the�lower�left�portion�of�the�screen�under� “Test Type,”� check� “Kruskal-Wallis H”� to� generate� this� nonparametric� test�� Then�click�on�“OK”�to�generate�the�results�presented�as�follows� Select the dependent variable from the list on the left and use the arrow to move to the “Test Variable List” box on the right. Select the independent variable from the list on the left and use the arrow to move to the “Grouping Variable” box on the right. Select “Kruskal– Wallis H” as the “Test Type”. Clicking on “Define Range” will allow you to define the numeric values of the categories for the independent variable. Kruskal–Wallis: Step 2b Kruskal–Wallis: Step 2a 329One-Factor Analysis of Variance: Fixed-Effects Model Interpreting the output:�The�Kruskal–Wallis�is�literally�an�ANOVA�of�ranks��Thus,� the�null�hypothesis�is�that�the�mean�ranks�of�the�groups�of�the�independent�variable�will� not� be� significantly� different�� In� this� example,� the� results� (p� =� �005)� suggest� statistically� significant�differences�in�the�mean�ranks�of�the�dependent�variable�by�group�of�the�inde- pendent�variable� The mean rank is the rank order, from smallest to largest, of the means of the dependent variable (statistic labs attended) by group (attractiveness of the lab instructor). The p value (labeled “Asymp.Sig.”) for the Kruskal– Wallis test is .005. This indicates there is a statistically significant difference in the mean ranks [i.e., rank order of the mean number of statistic labs attended by group (i.e., attractiveness of the instructor)]. The probability of observing these mean ranks or more extreme mean ranks by chance if the null hypothesis is really true (i.e., if the mean ranks are really equal) is substantially less than 1%. We reject the null hypothesis that all the population mean ranks are equal. For the example, this provides evidence to suggest that the number of statistic labs attended differs based on the attractiveness of the instructor. Ranks Slightly attractive Moderately attractive Very attractive Total Test Statisticsa,b Chi-square 13.061 3 .005 Number of Statistics Labs Attended df Asymp. sig. a Kruskal–Wallis test. b Grouping variable: Level of attractiveness. Level of Attractiveness N Mean Rank Number of statistics labs Unattractive attended 8 8 15.25 18.75 24.25 8 8 32 7.75 Welch and Brown–Forsythe Step 1:�To�conduct�the�Welch�and�Brown–Forsythe�procedures,�go�to�the�“Analyze”�in�the� top�pulldown�menu,�then�select�“Compare Means,”�and�then�select�“One-way ANOVA.”� Following�the�screenshot�(step�1)�as�follows�produces�the�“One-way ANOVA”�dialog�box� A B C Welch and Brown– Forsythe: Step 1 330 An Introduction to Statistical Concepts Step 2:�Click�the�dependent�variable�(e�g�,�number�of�stats�labs�attended)�and�move�it�into� the�“Dependent List”�box�by�clicking�the�arrow�button��Click�the�independent�variable� (e�g�,�level�of�attractiveness)�and�move�it�into�the�“Factor”�box�by�clicking�the�arrow�but- ton��Next,�click�on�“Options.” Select the dependent variable from the list on the left and use the arrow to move to the “Dependent Variable” box on the right. Select the independent variable from the list on the left and use the arrow to move to the “Factor:” box on the right. Clicking on “Options” will allow you to obtain a number of other statistics (including the Welch and Brown–Forsythe). Welch and Brown–Forsythe: Step 2 Step 3:� Clicking� on�“Options”� will� provide� the� option� to� select� such� information� as� “Descriptive,” “Homogeneity of variance test”� (i�e�,� Levene’s� test� for� equal� variances),�“Brown-Forsythe,” “Welch,”�and�“Means plot.”�Click�on�“Continue”� to�return�to�the�original�dialog�box��From�the�“One-way ANOVA”�dialog�box,�click�on�“OK”� to�return�and�to�generate�the�output� Welch and Brown–Forsythe: Step 3 Interpreting the output:�For�illustrative�purposes�and�because�the�remainder� of� the� one-way� ANOVA� results� have� been� interpreted� previously,� only� the� results� for� the� Welch�and�Brown–Forsythe�procedures�are�displayed��Both�tests�suggest�there�are�statisti- cal�differences�between�the�groups�in�terms�of�the�number�of�stats�labs�attended� 331One-Factor Analysis of Variance: Fixed-Effects Model �e p values for the Welch and Brown– Forsythe tests are .002. �ese indicate there is a statistically significant di�erence in the mean number of statistics labs attended per group (i.e., attractiveness of the instructor). �e probability of observing the F statistics (7.862 and 6.818) or larger by chance if the means of the groups are really equal is substantially less than 1%. We reject the null hypothesis that all the population means are equal. For this example, this provides evidence to suggest that the number of statistic labs attended di�ers based on attractiveness of the instructor. Robust Tests of Equality of Means Number of Statistics Labs Attended Welch 7.862 6.818 3 3 15.454 25.882 .002 .002 Brown–Forsythe a Asymptotically F distributed. Statistica df 1 df 2 Sig. For�further�details�on�the�use�of�SPSS�for�these�procedures,�be�sure�to�examine�books�such� as�Page,�Braver,�and�MacKinnon�(2003),�or�Morgan,�Leech,�Gloeckner,�and�Barrett�(2011)� A� priori� and� post� hoc� power� can� again� be� determined� using� the� specialized� software� described�previously�in�this�text�(e�g�,�G*Power),�or�you�can�consult�a�priori�power�tables�(e�g�,� Cohen,�1988)��As�an�illustration,�we�use�G*Power�to�compute�the�post�hoc�power�of�our�test� Post Hoc Power for One-Way ANOVA Using G*Power The�first�thing�that�must�be�done�when�using�G*Power�for�computing�post�hoc�power�is� to�select�the�correct�test�family��In�our�case,�we�conducted�a�one-way�ANOVA��To�find�the� one-way�ANOVA,�we�will�select�“Tests”�in�the�top�pulldown�menu,�then�“Means,”�and� then�“Many groups: ANOVA: One-way (one independent variable).”�Once�that� selection�is�made,�the�“Test family”�automatically�changes�to�“F tests.” A B C Step 1 332 An Introduction to Statistical Concepts The�“Type of Power Analysis”�desired�then�needs�to�be�selected��To�compute�post�hoc� power,�we�need�to�select�“Post hoc: Compute achieved power—given�α, sample size, and effect size.” The default selection for “Test Family” is “t tests”. Following the procedures presented in step 1 will automatically change the test family to “F tests”. The default selection for “Statistical Test” is “Correlation: Point biserial model.” Following the procedures presented in Step 1 will automatically change the statistical test to “ANOVA: Fixed effects, omnibus, one-way”. Step 2 Once the parameters are specified, click on “Calculate”. The “Input Parameters” for computing post hoc power must be specified (the default values are shown here) including: 1. Effect size f 2. Alpha level 3. Total sample size 4. Number of groups in the independent variable The�“Input Parameters”�must�then�be�specified��The�first�parameter�is�the�effect� size,� f�� In� our� example,� the� computed� f� effect� size� was� �8546�� The� alpha� level� we� used� was��05,�the�total�sample�size�was�32,�and�the�number�of�groups�(i�e�,�levels�of�the�inde- pendent�variable)�was�4��Once�the�parameters�are�specified,�click�on�“Calculate”�to� find�the�power�statistics� 333One-Factor Analysis of Variance: Fixed-Effects Model Post hoc power Here are the post- hoc power results. The� “Output Parameters”� provide� the� relevant� statistics� given� the� input� just� speci- fied�� In� this� example,� we� were� interested� in� determining� post� hoc� power� for� a� one-way� ANOVA�with�a�computed�effect�size�f�of��8546,�an�alpha�level�of��05,�total�sample�size�of� 32,�and�4�groups�(or�categories)�in�our�independent�variable� Based� on� those� criteria,� the� post� hoc� power� was� �98�� In� other� words,� with� a� one-way� ANOVA,�computed�effect�size�f�of��8546,�alpha�level�of��05,�total�sample�size�of�32,�and� 4�groups�(or�categories)�in�our�independent�variable,�the�post�hoc�power�of�our�test�was� �98—the�probability�of�rejecting�the�null�hypothesis�when�it�is�really�false�(in�this�case,� the�probability�that�the�means�of�the�dependent�variable�would�be�equal�for�each�level� of�the�independent�variable)�was�98%,�which�would�be�considered�more�than�sufficient� power� (sufficient� power� is� often� �80� or� above)�� Note� that� this� value� is� slightly� different� than�the�observed�value�reported�in�SPSS��Keep�in�mind�that�conducting�power�analysis� a�priori�is�recommended�so�that�you�avoid�a�situation�where,�post�hoc,�you�find�that�the� sample� size� was� not� sufficient� to� reach� the� desired� level� of� power� (given� the� observed� parameters)� 334 An Introduction to Statistical Concepts A Priori Power for One-Way ANOVA Using G*Power For� a� priori� power,� we� can� determine� the� total� sample� size� needed� given� an� estimated� effect�size�f,�alpha�level,�desired�power,�and�number�of�groups�of�our�independent�variable�� In�this�example,�had�we�estimated�a�moderate�effect�f�of��25,�alpha�of��05,�desired�power�of� �80,�and�4�groups�in�the�independent�variable,�we�would�need�a�total�sample�size�of�180�(or� 45�per�group)� Here are the a priori power results. A priori power 11.9 Template and APA-Style Write-Up Finally� we� come� to� an� example� paragraph� of� the� results� for� the� statistics� lab� example�� Recall�that�our�graduate�research�assistant,�Marie,�was�working�on�a�research�project�for� an� independent� study� class� to� determine� if� there� was� a� mean� difference� in� the� number� 335One-Factor Analysis of Variance: Fixed-Effects Model of� statistics� labs� attended� based� on� the� attractiveness� of� the� lab� instructor�� Her� research� question�was�as�follows:�Is there a mean difference in the number of statistics labs students attend based on the attractiveness of the lab instructor?�Marie�then�generated�a�one-way�ANOVA�as� the� test� of� inference�� A� template� for� writing� a� research� question� for� a� one-way� ANOVA� is�presented�as�follows��Please�note�that�it�is�important�to�ensure�the�reader�understands� the�levels�or�groups�of�the�independent�variable��This�may�be�done�parenthetically�in�the� actual� research� question,� as� an� operational� definition,� or� specified� within� the� methods� section��In�this�example,�parenthetically�we�could�have�stated�the�following:�Is there a mean difference in the number of statistics labs students attend based on the attractiveness of the lab instructor (unattractive, slightly attractive, moderately attractive, very attractive)? Is there a mean difference in [dependent variable] between [indepen- dent variable]? It�may�be�helpful�to�preface�the�results�of�the�one-way�ANOVA�with�information�on�an�exam- ination�of�the�extent�to�which�the�assumptions�were�met�(recall�there�are�three�assumptions:� normality,� homogeneity� of� variance,� and� independence)�� This� assists� the� reader� in� under- standing�that�you�were�thorough�in�data�screening�prior�to�conducting�the�test�of�inference� A one-way ANOVA was conducted to determine if the mean number of statistics labs attended by students differed on the level of attractiveness of the statistics lab instructor. The assumption of normality was tested and met via examination of the residuals. Review of the S-W test for normality (SW = .958, df = 32, p = .240) and skewness (−.239) and kurtosis (−1.019) statistics suggested that normality was a reasonable assumption. The boxplot suggested a rela- tively normal distributional shape (with no outliers) of the residu- als. The Q–Q plot and histogram suggested normality was reasonable. According to Levene’s test, the homogeneity of variance assumption was satisfied [F(3, 28) = .905, p = .451]. Random assignment of indi- viduals to groups helped ensure that the assumption of independence was met. Additionally, a scatterplot of residuals against the levels of the independent variable was reviewed. A random display of points around 0 provided further evidence that the assumption of indepen- dence was met. Here� is� an� APA-style� example� paragraph� of� results� for� the� one-way� ANOVA� (remember� that� this� will� be� prefaced� by� the� previous� paragraph� reporting� the� extent� to� which� the� assumptions�of�the�test�were�met)� From Table 11.7, we see that the one-way ANOVA is statistically sig- nificant (F = 6.818, df = 3, 28, p = .001), the effect size is rather large (η2 = .422; suggesting about 42% of the variance of number of statistics labs attended is due to differences in the attractive- ness of the instructor), and observed power is quite strong (.956). The means and standard deviations of the number of statistics labs attended for each group of the independent variable were as follows: 11.125 (SD = 5.489) for the unattractive level, 17.875 (SD = 5.939) for the slightly attractive level, 20.250 (SD = 7.285) for the moderately 336 An Introduction to Statistical Concepts attractive level, and 24.375 (SD = 5.097) for the very attractive level. The means and profile plot (Figure 11.3) suggest that with increas- ing instructor attractiveness, there was a corresponding increase in mean lab attendance. For completeness, we also conducted several alternative procedures. The Kruskal-Wallis test (χ2 = 13.061, df = 3, p = .005), the Welch procedure (Fasymp = 7.862, df1 = 3, df2 = 15.454, p = .002), and the Brown-Forsythe procedure (Fasymp = 6.818, df1 = 3, df2 = 25.882, p = .002) also indicated a statistically significant effect of instructor attractiveness on statistics lab attendance, providing further support for the assumptions being satisfied. 11.10 Summary In�this�chapter,�methods�involving�the�comparison�of�multiple�group�means�for�a�single� independent�variable�were�considered��The�chapter�began�with�a�look�at�the�characteristics� of� the� one-factor� fixed-effects� ANOVA� including� (a)� control� of� the� experimentwise� error� rate�through�an�omnibus�test,�(b)�one�independent�variable�with�two�or�more�fixed�levels,� (c)� individuals� are� randomly� assigned� to� levels� or� groups� and� then� exposed� to� only� one� level�of�the�independent�variable,�and�(d)�the�dependent�variable�is�measured�at�least�at�the� interval�level��Next,�a�discussion�of�the�theory�underlying�ANOVA�was�conducted��Here� we�examined�the�concepts�of�between-�and�within-groups�variability,�sources�of�variation,� and� partitioning� the� sums� of� squares�� The� ANOVA� model� was� examined�� Some� discus- sion� was� also� devoted� to� the� ANOVA� assumptions,� their� assessment,� and� how� to� deal� with� assumption� violations�� Finally,� alternative� ANOVA� procedures� were� described�� At� this� point,� you� should� have� met� the� following� objectives:� (a)� be� able� to� understand� the� characteristics�and�concepts�underlying�the�one-factor�ANOVA,�(b)�be�able�to�determine� and�interpret�the�results�of�a�one-factor�ANOVA,�and�(c)�be�able�to�understand�and�evalu- ate�the�assumptions�of�the�one-factor�ANOVA��Chapter�12�considers�a�number�of�MCPs�for� further�examination�of�sets�of�means��Chapter�13�returns�to�ANOVA�and�discusses�models� which�have�more�than�one�independent�variable� Problems Conceptual problems 11.1� Data�for�three�independent�random�samples,�each�of�size�4,�are�analyzed�by�a�one- factor�ANOVA�fixed-effects�model��If�the�values�of�the�sample�means�are�all�equal,� what�is�the�value�of�MSbetw? � a�� 0 � b�� 1 � c�� 2 � d�� 3 337One-Factor Analysis of Variance: Fixed-Effects Model 11.2� For�a�one-factor�ANOVA�fixed-effects�model,�which�of�the�following�is�always�true? � a�� dfbetw�+�dfwith�=�dftotal � b�� SSbetw�+�SSwith�=�SStotal � c�� MSbetw�+�MSwith�=�MStotal � d�� All�of�the�above � e�� Both�a�and�b 11.3� Suppose�n1�=�19,�n2�=�21,�and�n3�=�23��For�a�one-factor�ANOVA,�the�dfwith�would�be � a�� 2 � b�� 3 � c�� 60 � d�� 62 11.4� Suppose�n1�=�19,�n2�=�21,�and�n3�=�23��For�a�one-factor�ANOVA,�the�dfbetw�would�be � a�� 2 � b�� 3 � c�� 60 � d�� 62 11.5� Suppose�n1�=�19,�n2�=�21,�and�n3�=�23��For�a�one-factor�ANOVA,�the�dftotal�would�be � a�� 2 � b�� 3 � c�� 60 � d�� 62 11.6� Suppose�n1�=�19,�n2�=�21,�and�n3�=�23��For�a�one-factor�ANOVA,�the�df�for�the�numerator� of�the�F�ratio�would�be�which�one�of�the�following? � a�� 2 � b�� 3 � c�� 60 � d�� 62 11.7� In�a�one-factor�ANOVA,�H0�asserts�that � a�� All�of�the�population�means�are�equal� � b�� The�between-groups�variance�estimate�and�the�within-groups�variance�estimate� are�both�estimates�of�the�same�population�residual�variance� � c�� The�within-groups�sum�of�squares�is�equal�to�the�between-groups�sum�of�squares� � d�� Both�a�and�b� 11.8� For�a�one-factor�ANOVA�comparing�three�groups�with�n�=�10�in�each�group,�the�F�ratio� has�degrees�of�freedom�equal�to � a�� 2,�27 � b�� 2,�29 � c�� 3,�27 � d�� 3,�29 338 An Introduction to Statistical Concepts 11.9� For�a�one-factor�ANOVA�comparing�five�groups�with�n�=�50�in�each�group,�the�F�ratio� has�degrees�of�freedom�equal�to � a�� 4,�245 � b�� 4,�249 � c�� 5,�245 � d�� 5,�249 11.10� Which�of�the�following�is�not�necessary�in�ANOVA? � a�� Observations�are�from�random�and�independent�samples� � b�� The�dependent�variable�is�measured�on�at�least�the�interval�scale� � c�� Populations�have�equal�variances� � d�� Equal�sample�sizes�are�necessary� 11.11� If�you�find�an�F�ratio�of�1�0�in�a�one-factor�ANOVA,�it�means�that � a�� Between-groups�variation�exceeds�within-groups�variation� � b�� Within-groups�variation�exceeds�between-groups�variation� � c�� Between-groups�variation�is�equal�to�within-groups�variation� � d�� Between-groups�variation�exceeds�total�variation� 11.12� �Suppose�students�in�grades�7,�8,�9,�10,�11,�and�12�were�compared�on�absenteeism�� If�ANOVA�were�used�rather�than�multiple�t�tests,�then�the�probability�of�a�Type�I� error�will�be�less��True�or�false? 11.13� Mean�square�is�another�name�for�variance�or�variance�estimate��True�or�false? 11.14� In�ANOVA,�each�independent�variable�is�known�as�a�level��True�or�false? 11.15� A�negative�F�ratio�is�impossible��True�or�false? 11.16� �Suppose�that�for�a�one-factor�ANOVA�with�J�=�4�and�n�=�10,�the�four�sample�means� are�all�equal�to�15��I�assert�that�the�value�of�MSwith�is�necessarily�equal�to�0��Am�I� correct? 11.17� �With� J� =� 3� groups,� I� assert� that� if� you� reject� H0� in� the� one-factor� ANOVA,� you� will� necessarily� conclude� that� all� three� group� means� are� different�� Am� I� correct? 11.18� �The� homoscedasticity� assumption� is� that� the� populations� from� which� each� of� the� samples�are�drawn�are�normally�distributed��True�or�false? 11.19� �When�analyzing�mean�differences�among�more�than�two�samples,�doing�indepen- dent�t�tests�on�all�possible�pairs�of�means � a�� Decreases�the�probability�of�a�Type�I�error � b�� Does�not�change�the�probability�of�a�Type�I�error � c�� Increases�the�probability�of�a�Type�I�error � d�� Cannot�be�determined�from�the�information�provided 11.20� �Suppose�for�a�one-factor�fixed-effects�ANOVA�with�J�=�5�and�n�=�15,�the�five�sample� means�are�all�equal�to�50��I�assert�that�the�F�test�statistic�cannot�be�significant��Am�I�correct? 339One-Factor Analysis of Variance: Fixed-Effects Model 11.21� �The�independence�assumption�in�ANOVA�is�that�the�observations�in�the�samples�do� not�depend�on�one�another��True�or�false? 11.22� �For�J�=�2�and�α�=��05,� if�the�result�of�the�independent� t�test�is� significant,�then�the� result�of�the�one-factor�fixed-effects�ANOVA�is�uncertain��True�or�false? 11.23� �A�statistician�conducted�a�one-factor�fixed-effects�ANOVA�and�found�the�F�ratio�to� be� less� than� 0�� I� assert� this� means� the� between-groups� variability� is� less� than� the� within-groups�variability��Am�I�correct? Computational problems 11.1� Complete� the� following� ANOVA� summary� table� for� a� one-factor� ANOVA,� where� there�are�4�groups�receiving�different�headache�medications,�each�with�16�observa- tions,�and�α�=��05� Source SS df MS F Critical Value and Decision Between 9�75 — — — Within — — — Total 18�75 — 11.2� A�social�psychologist�wants�to�determine�if�type�of�music�has�any�effect�on�the�num- ber�of�beers�consumed�by�people�in�a�tavern��Four�taverns�are�selected�that�have�dif- ferent�musical�formats��Five�people�are�randomly�sampled�in�each�tavern�and�their� beer�consumption�monitored�for�3�hours��Complete�the�following�one-factor�ANOVA� summary�table�using�α�=��05� Source SS df MS F Critical Value and Decision Between — — 7�52 5�01 Within — — — Total — — 11.3� A� psychologist� would� like� to� know� whether� the� season� (fall,� winter,� spring,� and� summer)�has�any�consistent�effect�on�people’s�sexual�activity��In�the�middle�of�each� season,�a�psychologist�selects�a�random�sample�of�n�=�25�students��Each�individual� is�given�a�sexual�activity�questionnaire��A�one-factor�ANOVA�was�used�to�analyze� these�data��Complete�the�following�ANOVA�summary�table�(α�=��05)� Source SS df MS F Critical Value and Decision Between — — — 5�00 Within 960 — — Total — — 340 An Introduction to Statistical Concepts 11.4� The� following� five� independent� random� samples� are� obtained� from� five� normally� distributed�populations�with�equal�variances��The�dependent�variable�is�the�number� of�bank�transactions�in�1�month,�and�the�groups�are�five�different�banks� Group 1 Group 2 Group 3 Group 4 Group 5 16 16 2 5 7 5 10 9 8 12 11 7 11 1 14 23 12 13 5 16 18 7 10 8 11 12 4 13 11 9 12 23 9 9 19 19 13 9 9 24 Use�SPSS�to�conduct�a�one-factor�ANOVA�to�determine�if�the�group�means�are�equal� using� α� =� �05�� Test� the� assumptions,� plot� the� group� means,� consider� an� effect� size,� interpret�the�results,�and�write�an�APA-style�summary� 11.5� The�following�three�independent�random�samples�are�obtained�from�three�normally� distributed� populations� with� equal� variances�� The� dependent� variable� is� starting� hourly�wage,�and�the�groups�are�the�types�of�position�(internship,�co-op,�work�study)� Group 1: Internship Group 2: Co-op Group 3: Work Study 10 9 8 12 8 9 11 10 8 11 12 10 12 9 8 10 11 9 10 12 9 13 10 8 Use�SPSS�to�conduct�a�one-factor�ANOVA�to�determine�if�the�group�means�are�equal� using� α� =� �05�� Test� the� assumptions,� plot� the� group� means,� consider� an� effect� size,� interpret�the�results,�and�write�an�APA-style�summary� Interpretive problems 11.1� Using�the�survey�1�dataset�from�the�website,�use�SPSS�to�conduct�a�one-factor�fixed- effects�ANOVA,�including�effect�size,�where�political�view�is�the�grouping�variable� (i�e�,�independent�variable)�(J�=�5)�and�the�dependent�variable�is�a�variable�of�interest� to� you� [the� following� variables� look� interesting:� books,� TV,� exercise,� drinks,� GPA,� GRE-Quantitative�(GRE-Q),�CDs,�hair�appointment]��Then�write�an�APA-style�para- graph�describing�the�results� 11.2� Using�the�survey�1�dataset�from�the�website,�use�SPSS�to�conduct�a�one-factor�fixed- effects�ANOVA,�including�effect�size,�where�hair�color�is�the�grouping�variable�(i�e�,� independent�variable)�(J�=�5)�and�the�dependent�variable�is�a�variable�of�interest�to� you�(the�following�variables�look�interesting:�books,�TV,�exercise,�drinks,�GPA,�GRE- Q,�CDs,�hair�appointment)��Then�write�an�APA-style�paragraph�describing�the results� 341 12 Multiple Comparison Procedures Chapter Outline 12�1� Concepts�of�Multiple�Comparison�Procedures 12�1�1� Contrasts 12�1�2� Planned�Versus�Post�Hoc�Comparisons 12�1�3� Type�I�Error�Rate 12�1�4� Orthogonal�Contrasts 12�2� Selected�Multiple�Comparison�Procedures 12�2�1� Planned�Analysis�of�Trend 12�2�2� Planned�Orthogonal�Contrasts 12�2�3� Planned�Contrasts�with�Reference�Group:�Dunnett�Method 12�2�4� Other�Planned�Contrasts:�Dunn�(or Bonferroni)�and�Dunn–Sidak�Methods 12�2�5� Complex�Post�Hoc�Contrasts:�Scheffe’�and�Kaiser–Bowden�Methods 12�2�6� Simple�Post�Hoc�Contrasts:�Tukey�HSD,�Tukey– Kramer,�Fisher�LSD,�and� Hayter�Tests 12�2�7� Simple�Post�Hoc�Contrasts�for�Unequal�Variances:�Games–Howell,� Dunnett�T3�and�C�Tests 12�2�8� Follow-Up�Tests�to�Kruskal–Wallis 12�3� SPSS 12�4� Template�and�APA-Style�Write-Up Key Concepts � 1�� Contrast � 2�� Simple�and�complex�contrasts � 3�� Planned�and�post�hoc�comparisons � 4�� Contrast-�and�family-based�Type�I�error�rates � 5�� Orthogonal�contrasts In�this�chapter,�our�concern�is�with�multiple comparison procedures�(MCPs)�that�involve� comparisons�among�the�group�means��Recall�from�Chapter�11�the�one-factor�analysis�of� variance�(ANOVA)�where�the�means�from�two�or�more�samples�were�compared��What�do� 342 An Introduction to Statistical Concepts we�do�if�the�omnibus�F�test�leads�us�to�reject�H0?�First,�consider�the�situation�where�there� are�only�two�samples�(e�g�,�assessing�the�effectiveness�of�two�types�of�medication),�and�H0� has�already�been�rejected�in�the�omnibus�test��Why�was�H0�rejected?�The�answer�should�be� obvious��Those�two�sample�means�must�be�significantly�different�as�there�is�no�other�way� that�the�omnibus�H0�could�have�been�rejected�(e�g�,�one�type�of�medication�is�significantly� more�effective�than�the�other�based�on�an�inspection�of�the�means)� Second,�consider�the�situation�where�there�are�more�than�two�samples�(e�g�,�three�types� of�medication),�and�H0�has�already�been�rejected�in�the�omnibus�test��Why�was�H0�rejected?� The� answer� is� not� so� obvious�� This� situation� is� one� where� a� multiple� comparison� proce- dure�(MCP)��would�be�quite�informative��Thus,�for�situations�where�there�are�at�least�three� groups�and�the�ANOVA�H0�has�been�rejected,�some�sort�of�MCP�is�necessary�to�determine� which�means�or�combination�of�means�are�different��Third,�consider�the�situation�where�the� researcher�is�not�even�interested�in�the�ANOVA�omnibus�test�but�is�only�interested�in�com- parisons�involving�particular�means�(e�g�,�certain�medications�are�more�effective�than�a�pla- cebo)��This�is�a�situation�where�an�MCP�is�useful�for�evaluating�those�specific�comparisons� If�the�ANOVA�omnibus�H0�has�been�rejected,�why�not�do�all�possible�independent�t�tests?� First�let�us�return�to�a�similar�question�from�Chapter�11��There�we�asked�about�doing�all� possible�pairwise�independent�t�tests�rather�than�an�ANOVA��The�answer�there�was�to� do�an�omnibus�F�test��The�reasoning�was�related�to�the�probability�of�making�a�Type�I� error� (i�e�,� α),� where� the� researcher� incorrectly� rejects� a� true� null� hypothesis�� Although� the� α� level� for� each� t� test� can� be� controlled� at� a� specified� nominal� level,� say� �05,� what� would�happen�to�the�overall�α�level�for�the�set�of�t�tests?�The�overall�α�level�for�the�set�of� tests,�often�called�the�family-wise�Type�I�error�rate,�would�be�larger�than�the�α�level�for� each�of�the�individual�t�tests��The�optimal�solution,�in�terms�of�maintaining�control�over� our�overall�α�level�as�well�as�maximizing�power,�is�to�conduct�one�overall�omnibus�test�� The�omnibus�test�assesses�the�equality�of�all�of�the�means�simultaneously� Let�us�apply�the�same�concept�to�the�situation�involving�multiple�comparisons��Rather� than� doing� all� possible� pairwise� independent� t� tests,� where� the� family-wise� error� rate� could�be�quite�large,�one�should�use�a�procedure�that�controls�the�family-wise�error�rate�in� some�way��This�can�be�done�with�MCPs��As�pointed�out�later�in�the�chapter,�there�are�two� main�methods�for�taking�the�Type�I�error�rate�into�account� This� chapter� is� concerned� with� several� important� new� concepts,� such� as� a� contrast,� planned�versus�post�hoc�comparisons,�the�Type�I�error�rate,�and�orthogonal�contrasts��The� remainder�of�the�chapter�consists�of�selected�MCPs,�including�when�and�how�to�apply�them�� The� terms� comparison� and� contrast� are� used� here� synonymously�� Also,� MCPs� are� only� applicable�for�comparing�levels�of�an�independent�variable�that�are�fixed,�in�other�words,� for�fixed-effects�independent�variables�and�not�for�random-effects�independent�variables�� Our�objectives�are�that�by�the�end�of�this�chapter,�you�will�be�able�to�(a)�understand�the� concepts�underlying�the�MCPs,�(b)�select�the�appropriate�MCP�for�a�given�research�situa- tion,�and�(c)�determine�and�interpret�the�results�of�MCPs� 12.1 Concepts of Multiple Comparison Procedures In�the�previous�chapter,�Marie,�our�very�capable�educational�researcher�graduate�student,� was�embarking�on�a�very�exciting�research�adventure�of�her�own��She�continues�to�work� toward�completion�of�this�project� 343Multiple Comparison Procedures Marie�is�enrolled�in�an�independent�study�class��As�part�of�the�course�requirement,�she� has�to�complete�a�research�study��In�collaboration�with�the�statistics�faculty�in�her�pro- gram,�Marie�designs�an�experimental�study�to�determine�if�there�is�a�mean�difference� in�student�attendance�in�the�statistics�lab�based�on�the�attractiveness�of�the�statistics�lab� instructor��Marie’s�research�question�is�as�follows:�Is there a mean difference in the number of statistics labs attended by students based on the attractiveness of the lab instructor?�Marie� determined�that�a�one-way�ANOVA�was�the�best�statistical�procedure�to�use�to�answer� her� question�� Marie� has� collected� the� data� to� analyze� her� research� question� and� has� conducted�a�one-way�ANOVA,�where�she�rejected�the�null�hypothesis��Now,�her�task�is� to�determine�which�groups�(recall�there�were�four�statistics�labs,�each�with�an�instruc- tor� with� a� different� attractiveness� rating)� are� statistically� different� on� the� outcome� (i�e�,�number�of�statistics�labs�attended)� This�section�describes�the�most�important�characteristics�of�the�MCPs��We�begin�by�defin- ing�a�contrast�and�then�move�into�planned�versus�post�hoc�contrasts,�the�Type�I�error�rates,� and�orthogonal�contrasts� 12.1.1 Contrasts A�contrast�is�a�weighted�combination�of�the�means��For�example,�one�might�wish�to�form� contrasts�involving�the�following�means:�(a)�group�1�with�group�2�and�(b)�the�combination� (or�average)�of�groups�1�and�2�with�group�3��Statistically�a�contrast�is�defined�as ψ µ µ µi J Jc c c= + + +1 1 2 2. . .… where�the�cj�represents�contrast�coefficients�(or�weights),�which�are�positive,�zero,�and�neg- ative�values�used�to�define�a�particular�contrast�ψi,�and�the�μ�j�represents�population�group� means��In�other�words,�a�contrast�is�simply�a�particular�combination�of�the�group�means,� depending� on� which� means� the� researcher� is� interested� in� comparing�� It� should� also� be� noted�that�to�form�a�fair�or�legitimate�contrast,�Σcj�=�0�for�the�equal�n’s�or�balanced�case,� and�Σ(njcj)�=�0�for�the�unequal�n’s�or�unbalanced�case� For�example,�suppose�we�wish�to�compare�the�means�of�groups�1�and�3�for�J�=�4�groups� or�levels,�and�we�call�this�contrast�1��The�contrast�would�be�written�as ψ µ µ µ µ µ µ µ µ µ 1 1 1 2 2 3 3 4 4 1 2 3 41 0 1 0 = + + + = + + + − + = c c c c. . . . . . . .( ) ( ) ( ) ( ) .. .1 3− µ What� hypotheses� are� we� testing� when� we� evaluate� a� contrast?� The� null� and� alternate� hypotheses�of�any�specific�contrast�can�be�written,�respectively,�simply�as H i0 0: ψ = and H i1 0: ψ ≠ 344 An Introduction to Statistical Concepts Thus� we� are� testing� whether� a� particular� combination� of� means,� as� defined� by� the� contrast coefficients,� are� different�� How� does� this� relate� back� to� the� omnibus� F� test?� The null� and� alternate� hypotheses� for� the� omnibus� F� test� can� be� written� in� terms� of� contrasts�as H i0 0: all ψ = H i1 0: at least one ψ ≠ Here�the�omnibus�test�is�used�to�determine�whether�any�contrast�that�could�be�formulated� for�the�set�of�J�means�is�significant�or�not� Contrasts�can�be�divided�into�simple�or�pairwise�contrasts,�and�complex�or�nonpairwise� contrasts��A�simple�or�pairwise�contrast�is�a�comparison�involving�only�two�means��Take�as� an�example�the�situation�where�there�are�J�=�3�groups��There�are�three�possible�distinct�pair- wise�contrasts�that�could�be�formed:�(a)�μ�1�−�μ�2�=�0�(comparing�the�mean�of�group�1�to�the� mean�of�group�2),�(b)�μ�1�−�μ�3�=�0�(comparing�the�mean�of�group�1�to�the�mean�of�group�3),�and� (c)�μ�2�−�μ�3�=�0�(comparing�the�mean�of�group�2�to�the�mean�of�group�3)��It�should�be�obvious� that�a�pairwise�contrast�involving�groups�1�and�2�is�the�same�contrast�whether�it�is�written� as�μ�1�−�μ�2�=�0�or�as�μ�2�−�μ�1�=�0� In�terms�of�contrast�coefficients,�these�three�contrasts�could�be�written�in�the�form�of�a� table�as�follows: c1 c2 c3 ψ1:�μ�1�−�μ�2�=�0 +1 −1 0 ψ2:�μ�1�−�μ�3�=�0 +1 0 −1 ψ3:�μ�2�−�μ�3�=�0 0 +1 −1 where� each� contrast� (i�e�,� ψ1,� ψ2,� ψ3)� is� read� across� the� table� (left� to� right)� to� determine� its�contrast�coefficients�(i�e�,�c1, c2, c3)��For�example,�the�first�contrast,�ψ1,�does�not�involve� group�3�because�that�contrast�coefficient�is�0�(see�c3�for�ψ1),�but�does�involve�groups�1�and�2� because�those�contrast�coefficients�are�not�0�(see�c1�and�c2�for�ψ1)��The�contrast�coefficients� are� +1� for� group� 1� (see� c1)� and� −1� for� group� 2� (see� c2);� consequently� we� are� interested� in� examining�the�difference�between�the�means�of�groups�1�and�2� Written�in�long�form�so�that�we�can�see�where�the�contrast�coefficients�come�from,�the� three�contrasts�are�as�follows: ψ µ µ µ µ µ1 1 2 3 1 21 1 0= + + − + = −( ) ( ) ( ). . . . . ψ µ µ µ µ µ2 1 2 3 1 31 0 1= + + + − = −( ) ( ) ( ). . . . . ψ µ µ µ µ µ3 1 2 3 2 30 1 1= + + + − = −( ) ( ) ( ). . . . . An�easy�way�to�remember�the�number�of�possible�unique�pairwise�contrasts�that�could�be� written�is�½[J(J�−�1)]��Thus�for�J�=�3,�the�number�of�possible�unique�pairwise�contrasts�is�3,� whereas�for�J�=�4,�the�number�of�such�contrasts�is�6�(or�1/2[4�(4�−�1)]�=�1/2(4)(3)�=�1/2(12)�=�6)� 345Multiple Comparison Procedures A�complex�contrast�is�a�comparison�involving�more�than�two�means��Continuing�with� the� example� of� J� =� 3� groups,� we� might� be� interested� in� testing� the� contrast� of� μ�1� −� (½) (μ�2� +� μ�3)� which could also be written as ( ) .1 . .µ µ µ − +          2 3 2 �� This� contrast� is� a� compari- son� of� the� mean� for� group� 1� (i�e�,� μ�1)� with� the� average� of� the� means� for� groups� 2� and� 3� i . e ., ( ) 2 . .µ µ2 3+          ��In�terms�of�contrast�coefficients,�this�contrast�would�be�written�as�seen�here: c1 c2 c3 ψ µ µ µ 4 1 2 3 2 2 : 0. . .− − = +1 −1/2 −1/2 Written� in� long� form� so� that� we� can� see� where� the� contrast� coefficients� come� from,� this� complex�contrast�is�as�follows: � ψ µ µ µ µ µ µ µ µ 4 1 2 3 1 2 3 11 1 2 1 2 1 2 1 2= + + − + − = − − = −( ) ( ) ( ) ( ) ( ). . . . . . ./ / / / .. .2 3 2 2 0− = µ The�number�of�unique�complex�contrasts�is�greater�than�½[J(J�−�1)]�when�J�is�at�least�4��In� other�words,�the�number�of�such�contrasts�that�could�be�formed�can�be�quite�large�when� there�are�more�than�three�groups��It�should�be�noted�that�the�total�number�of�unique�pair- wise�and�complex�contrasts�is�[1�+�½(3J�−�1)�−�2J ]�(Keppel,�1982)��Thus�for�J�=�4,�one�could� form�25�total�contrasts� Many�of�the�MCPs�are�based�on�the�same�test�statistic,�which�we�introduce�here�as�the� “standard�t�”�The�standard�t�ratio�for�a�contrast�is�given�as�follows: t s = ψ ψ ′ ′ where�sψ′ =�represents�the�standard�error�of�the�contrast�as�follows: s MS c n error j jj J ψ ′ =    = ∑ 2 1 where�the�prime�(i�e�,�′)�indicates�that�this�is�a�sample�estimate�of�the�population�value�of�the� contrast�(i�e�,�based�on�sample�data),�and�nj�refers�to�the�number�of�observations�in�group�j� 12.1.2 planned Versus post hoc Comparisons This�section�examines�specific�types�of�contrasts�or�comparisons��One�way�of�classifying� contrasts�is�whether�the�contrasts�are�formulated�prior�to�the�research�or�following�a�sig- nificant� omnibus� F� test�� Planned contrasts� (also� known� as� specific� or� a� priori� contrasts)� involve� particular� comparisons� that� the� researcher� is� interested� in� examining� prior� to� 346 An Introduction to Statistical Concepts data�collection��These�planned�contrasts�are�generally�based�on�theory,�previous�research,� and/or�specific�hypotheses��Here�the�researcher�is�interested�in�certain�specific�contrasts� a�priori,�where�the�number�of�such�contrasts�is�usually�small��Planned�contrasts�are�done� without�regard�to�the�result�of�the�omnibus�F�test�(i�e�,�whether�or�not�the�overall�F�test�is� statistically�significant)��In�other�words,�the�researcher�is�interested�in�certain�specific�con- trasts,�but�not�in�the�omnibus�F�test�that�examines�all�possible�contrasts��In�this�situation,� the�researcher�could�care�less�about�the�multitude�of�possible�contrasts�and�need�not�even� examine�the�overall�F�test,�but�rather�the�concern�is�only�with�a�few�contrasts�of�substan- tive� interest�� In� addition,� the� researcher� may� not� be� as� concerned� with� the� family-wise� error�rate�for�planned�comparisons�because�only�a�few�of�them�will�actually�be�carried�out�� Fewer�planned�comparisons�are�usually�conducted�(due�to�their�specificity)�than�post�hoc� comparisons�(due�to�their�generality),�so�planned�contrasts�generally�yield�narrower�con- fidence�intervals�(CIs),�are�more�powerful,�and�have�a�higher�likelihood�of�a�Type�I�error� than�post�hoc�comparisons� Post hoc contrasts�are�formulated�such�that�the�researcher�provides�no�advance�speci- fication�of�the�actual�contrasts�to�be�tested��This�type�of�contrast�is�done�only�following�a� statistically�significant�omnibus�F�test��Post�hoc�is�Latin�for�“after�the�fact,”�referring�to�con- trasts�tested�after�a�statistically�significant�omnibus�F�in�the�ANOVA��Here�the�researcher� may�want�to�take�the�family-wise�error�rate�into�account�somehow�to�achieve�better�overall� Type�I�error�protection��Post�hoc�contrasts�are�also�known�as�unplanned,�a�posteriori,�or� postmortem�contrasts��It�should�be�noted�that�most�MCPs�are�not�derived�or�based�on�find- ing�a�statistically�significant�F�in�the�ANOVA� 12.1.3 Type I error Rate How�does�the�researcher�deal�with�the�family-wise�Type�I�error�rate?�Depending�on�the� MCP�selected,�one�may�either�set�α�for�each�contrast�or�set�α�for�a�family�of�contrasts��In� the� former� category,� α� is� set� for� each� individual� contrast�� The� MCPs� in� this� category� are� known�as�contrast-based��We�designate�the�α�level�for�contrast-based�procedures�as�αpc ,�as� it�represents�the�per contrast�Type�I�error�rate��Thus�αpc�represents�the�probability�of�mak- ing�a�Type�I�error�for�that�particular�contrast��In�the�latter�category,�α�is�set�for�a�family�or� set�of�contrasts��The�MCPs�in�this�category�are�known�as�family-wise��We�designate�the�α� level�for�family-wise�procedures�as�αfw ,�as�it�represents�the�family-wise�Type�I�error�rate�� Thus�αfw�represents�the�probability�of�making�at�least�one�type�I�error�in�the�family�or�set� of�contrasts� For�orthogonal�(or�independent�or�unrelated)�contrasts,�the�following�property�holds: α αfw pc c= − −1 1( ) where� c� =� J� −� 1� orthogonal� contrasts� (as� defined� in� the� next� section)�� For� nonorthogonal� (or�related�or�oblique)�contrasts,�this�property�is�more�complicated,�so�we�simply�say�the� following: α αfw pcc≤ These� properties� should� be� familiar� from� the� discussion� in� Chapter� 11,� where� we� were� looking�at�the�probability�of�a�Type�I�error�in�the�use�of�multiple�independent�t�tests� 347Multiple Comparison Procedures 12.1.4 Orthogonal Contrasts Let�us�begin�this�section�by�defining�orthogonal�contrasts��A�set�of�contrasts�is�orthogonal� if�they�represent�nonredundant� and�independent� (if� the�usual� ANOVA� assumptions�are� met)�sources�of�variation��For�J�groups,�you�will�only�be�able�to�construct�J�−�1�orthogonal� contrasts�in�a�set��However,�more�than�one�set�of�orthogonal�contrasts�may�exist��Note�that� although�the�contrasts�within�each�set�are�orthogonal,�contrasts�across�such�sets�may�not� be�orthogonal� For� purposes� of� simplicity,� we� first� consider� the� equal� n’s� or� balanced� case� (in� other� words,�the�sample�sizes�are�the�same�for�each�group)��With�equal�observations�per�group,� two�contrasts�are�defined�to�be�orthogonal�if�the�products�of�their�contrast�coefficients�sum� to�0��That�is,�two�contrasts�are�orthogonal�if�the�following�holds: ( ) ...’ ’ ’ ’c c c c c c c cj j j J J J = ∑ = + + + = 1 1 1 2 2 0 where�j�and�j′�represent�two�distinct�contrasts��Thus�we�see�that�orthogonality�depends�on� the�contrast�coefficients,�the�cj,�and�not�the�group�means,�the�μ��j� For�example,�if�J�=�3,�then�we�can�form�a�set�of�two�orthogonal�contrasts��One�such�set� is�as�follows��In�this�set�of�contrasts,�the�first�contrast�(ψ1)�compares�the�mean�of�group�1� (c1�=�+1)�to�the�mean�of�group�2�(c2�=�−1)��The�second�contrast�(ψ2)�compares�the�average�of� the�means�of�group�1�(c1�=�+1/2)�and�group�2�(c2�=�+1/2)�to�the�mean�of�group�3�(c3�=�−1): c1 c2 c3 ψ1:�μ�1�−�μ�2�=�0 +1 −1 0 ψ2:�(1/2)μ�1�+�(1/2)μ�2�−�μ�3�=�0 +1/2 +1/2 −1 ( )’c cj j j J = = ∑ 1 +1/2 −1/2 0 =�0 Thus,�plugging�these�values�into�our�equation�produces�the�following: ( ) ( )( ) ( )( ) ( )(’ ’ ’ ’c c c c c c c cj j j J = ∑ = + + = + + + − + + − 1 1 1 2 2 3 3 1 1 2 1 1 2 0/ / 11 1 2 1 2 0 0) ( ) ( )= + + − + =/ / If�the�sum�of�the�contrast�coefficient�products�for�a�set�of�contrasts�is�equal�to�0,�then�we� define�this�as�an�orthogonal�set�of�contrasts� A�set�of�two�contrasts�that�are�not�orthogonal�is�the�following,�where�we�see�that�the�set� of�contrasts�does�not�sum�to�0: c1 c2 c3 ψ3:�μ�1�−�μ�2�=�0 +1 −1 0 ψ4:�μ�1�−�μ�3�=�0 +1 0 −1 ( )’c cj j j J = = ∑ 1 +1 0 0 =�+1 348 An Introduction to Statistical Concepts Thus,�plugging�these�values�into�our�equation�produces�the�following,�where�we�see�that� the�product�of�the�contrasts�also�does�not�sum�to�0: ( ) ( )( ) ( )( ) ( )( ) (’ ’ ’ ’c c c c c c c cj j j J = + + = + + + − + − = + = ∑ 1 1 1 2 2 3 3 1 1 1 0 0 1 11 0 0 1) + + = + Consider�a�situation�where�there�are�three�groups�and�we�decide�to�form�three�pairwise� contrasts,�knowing�full�well�that�they�cannot�all�be�orthogonal�to�one�another��For�this�set� of�contrasts,�the�first�contrast�(ψ1)�compares�the�mean�of�group�1�(c1�=�+1)�to�the�mean�of� group�2�(c2�=�−1)��The�second�contrast�(ψ2)�compares�the�mean�of�group�2�(c2�=�+1)�to�the� mean�of�group�3�(c3�=�−1),�and�the�third�contrast�compares�the�mean�of�group�1�(c1�=�+1)�to� the�mean�of�group�3�(c3�=�−1)� c1 c2 c3 ψ1:�μ�1�−�μ�2�=�0 +1 −1 0 ψ2:�μ�2�−�μ�3�=�0 0 +1 −1 ψ3:�μ�1�−�μ�3�=�0 +1 0 −1 Say�that�the�group�population�means�are�μ�1�=�30,�μ�2�=�24,�and�μ�3�=�20��We�find�ψ1�=�6�for�the� first�contrast�(i�e�,�ψ1:�μ�1�−�μ�2�=�30�−�24�=�6)�and�ψ2�=�4�for�the�second�contrast�(i�e�,�ψ2:�μ�2�−�μ�3�=� 24�−�20�=�4)��Because�these�three�contrasts�are�not�orthogonal�and�contain�totally�redundant� information�about�these�means,�ψ3�=�10�for�the�third�contrast�by�definition�(i�e�,�ψ3:�μ�1�−�μ�3�=� 30�−�20�=�10)��Thus�the�third�contrast�contains�no�additional�information�beyond�that�contained� in�the�first�two�contrasts� Finally,� for� the� unequal� n’s� or� unbalanced� case,� two� contrasts� are� orthogonal� if� the� following�holds: c c n j j jj J ’ 0       = = ∑ 1 The�denominator�nj�makes�it�more�difficult�to�find�an�orthogonal�set�of�contrasts�that�is�of� any�interest�to�the�applied�researcher�(see�Pedhazur,�1997,�for�an�example)� 12.2 Selected Multiple Comparison Procedures This�section�considers�a�selection�of�MCPs��These�represent�the�“best”�procedures�in�some� sense,�in�terms�of�ease�of�utility,�popularity,�and�control�of�Type�I�and�Type�II�error�rates�� Other�procedures�are�briefly�mentioned��In�the�interest�of�consistency,�each�procedure�is� discussed� in� the� hypothesis� testing� situation� based� on� a� test� statistic�� Most,� but� not� all,� of� these� procedures� can� also� be� formulated� as� CIs� (sometimes� called� a� critical differ- ence),�although�these�will�not�be�discussed�here��The�first�few�procedures�discussed�are� for� planned� comparisons,� whereas� the� remainder� of� the� section� is� devoted� to� post� hoc� comparisons��For�each�MCP,�we�describe�its�major�characteristics�and�then�present�the�test� statistic�with�an�example�using�the�data�from�Chapter�11� 349Multiple Comparison Procedures Unless� otherwise� specified,� each� MCP� makes� the� standard� assumptions� of� normality,� homogeneity�of�variance,�and�independence�of�observations��Some�of�the�procedures�do� have�additional�restrictions,�such�as�equal�n’s�per�group��Throughout�this�section,�we�also� presume�that�a�two-tailed�alternative�hypothesis�is�of�interest,�although�some�of�the�MCPs� can�also�be�used�with�a�one-tailed�alternative�hypothesis��In�general,�the�MCPs�are�fairly� robust� to� nonnormality� (but� not� for� extreme� cases),� but� are� not� as� robust� to� departures� from�homogeneity�of�variance�or�from�independence�(see�Pavur,�1988)� 12.2.1 planned analysis of Trend Trend�analysis�is�a�planned�MCP�useful�when�the�groups�represent�different�quantitative� levels�of�a�factor�(i�e�,�an�interval�or�ratio�level�independent�variable)��Examples�of�such�a� factor�might�be�age,�drug�dosage,�and�different�amounts�of�instruction,�practice,�or�trials�� Here�the�researcher�is�interested�in�whether�the�sample�means�vary�with�a�change�in�the� amount�of�the�independent�variable��We�define�trend analysis�in�the�form�of�orthogonal� polynomials� and� assume� that� the� levels� of� the� independent� variable� are� equally� spaced� (i�e�,�same�distances�between�the�levels�of�the�independent�variable,�such�as�100,�200,�300,� and�400cc)�and�that�the�number�of�observations�per�group�is�the�same��This�is�the�standard� case;�other�cases�are�briefly�discussed�at�the�end�of�this�section� Orthogonal�polynomial�contrasts�use�the�standard�t�test�statistic,�which�is�compared�to� the�critical�values�of�±α/2�tdf(error)�obtained�from�the�t�table�in�Table�A�2��The�form�of�the�con- trasts�is�a�bit�different�and�requires�a�bit�of�discussion��Orthogonal�polynomial�contrasts� incorporate�two�concepts,�orthogonal�contrasts�(recall�these�are�unrelated�or�independent� contrasts)� and� polynomial� regression�� For� J� groups,� there� can� be� only� J� −� 1� orthogonal� contrasts�in�a�set��In�polynomial�regression,�we�have�terms�in�the�model�for�a�linear�trend,� a�quadratic�trend,�a�cubic�trend,�and�so�on��For�example,�linear�trend�is�represented�by�a� straight�line�(no�bends),�quadratic�trend�by�a�curve�with�one�bend�(e�g�,�U�or�upside-down� U�shapes),�and�cubic�trend�by�a�curve�with�two�bends�(e�g�,�S�shape)� Now�put�those�two�ideas�together��A�set�of�orthogonal�contrasts�can�be�formed�where�the� first�contrast�evaluates�a�linear�trend,�the�second�a�quadratic�trend,�the�third�a�cubic�trend,� and�so�forth��Thus�for�J�groups,�the�highest�order�polynomial�that�can�be�formed�is�J�−�1�� With�four�groups,�for�example,�one�could�form�a�set�of�three�orthogonal�contrasts�to�assess� linear,�quadratic,�and�cubic�trends� You�may�be�wondering�just�how�these�contrasts�are�formed?�For�J�=�4�groups,�the�contrast� coefficients�for�the�linear,�quadratic,�and�cubic�trends�are�as�follows: c1 c2 c3 c4 ψlinear −3 −1 +1 +3 ψquadratic +1 −1 −1 +1 ψcubic −1 +3 −3 +1 where�the�contrasts�can�be�written�out�as�follows: ψ µ µ µ µlinear = − + − + + + +( ) ( ) ( ) ( ). . . .3 1 1 31 2 3 4 � ψ µ µ µ µquadratic = + + − + − + +( ) ( ) ( ) ( ). . . .1 1 1 11 2 3 4 � ψ µ µ µ µcubic = − + + + − + +( ) ( ) ( ) ( ). . . .1 3 3 11 2 3 4 350 An Introduction to Statistical Concepts These�contrast�coefficients,�for�a�number�of�different�values�of�J,�can�be�found�in�Table� A�6��If�you�look�in�the�table�of�contrast�coefficients�for�values�of�J�greater�than�6,�you�see� that�the�coefficients�for�the�higher-order�polynomials�are�not�included��As�an�example,� for�J�=�7,�coefficients�only�up�through�a�quintic�trend�are�included��Although�they�could� easily�be�derived�and�tested,�these�higher-order�polynomials�are�usually�not�of�inter- est�to�the�researcher��In�fact,�it�is�rare�to�find�anyone�interested�in�polynomials�beyond� the�cubic�because�they�are�difficult�to�understand�and�interpret�(although�statistically� sophisticated,� they� say� little� to� the� applied� researcher� as� the� results� must� be� inter- preted� in� values� that� are� highly� complex)�� The� contrasts� are� typically� tested� sequen- tially� beginning� with� the� linear� trend� and� proceeding� to� higher-order� trends� (cubic� then�quadratic)� Using�the�example�data�on�the�attractiveness�of�the�lab�instructors�from�Chapter�11,�let� us�test�for�linear,�quadratic,�and�cubic�trends��Trend�analysis�may�be�relevant�for�these�data� because� the� groups� do� represent� different� quantitative� levels� of� an� attractiveness� factor�� Because�J�=�4,�we�can�use�the�contrast�coefficients�given�previously� The�following�are�the�computations,�based�on�these�mean�values,�to�test�the�trend�analy- sis��The�critical�values�(where�dferror�is�calculated�as�N�−�J,�or�32�−�4�=�28)�are�determined�to� be�as�follows: ± = ± = ±α/ ( ) . .2 025 28 2 048t tdf error The� standard� error� for� linear trend� is� computed� as� follows� (where� nj� =� 8� for� each� of� the�J�=�4�groups;�MSerror�was�computed�in�the�previous�chapter�and�found�to�be�36�1116)�� Recall� that� the� contrast� equation� for� the� linear� trend� is� ψlinear� =� (−3)μ�1� +� (−1)μ�2� +� (+1)μ�3� +� (+3)μ�4,� and� thus� these� are� the� cj� values� in� the� following� equation� (−3,� −1,� +1,� and� +3,� respectively): s MS c n error j jj J ψ ’ . ( ) ( ) =     = − + − + +   = ∑ 2 1 2 2 2 2 36 1116 3 8 1 8 1 8 3 8   = + + +    =36 1116 9 8 1 8 1 8 9 8 9 5015. . The� standard� error� for� quadratic trend� is� determined� similarly�� Recall� that� the� contrast� equation�for�the�quadratic�trend�is�ψquadratic�=�(+1)μ�1�+�(−1)μ�2�+�(−1)μ�3�+�(+1)μ�4,�and�thus�these� are�the�cj�values�in�the�following�equation�(+1,�−1,�−1,�and�+1,�respectively): s MS c n error j jj J ψ ’ . ( ) ( ) =     = + − + − +   = ∑ 2 1 2 2 2 2 36 1116 1 8 1 8 1 8 1 8   = + + +    =36 1116 1 8 1 8 1 8 1 8 4 2492. . 351Multiple Comparison Procedures The�standard�error�for�cubic trend�is�computed�similarly��Recall�that�the�contrast�equation� for�the�cubic�trend�is�ψcubic�=�(−1)μ�1�+�(+3)μ�2�+�(−3)μ�3�+�(+1)μ�4,�and�thus�these�are�the�cj�values� in�the�following�equation�(−1,�+3,�−3,�and�+1,�respectively): s MS c n error j jj J ψ ’ . ( ) ( ) =     = − + + − +   = ∑ 2 1 2 2 2 2 36 1116 1 8 3 8 3 8 1 8   = + + +    =36 1116 1 8 9 8 9 8 1 8 9 5015. . Recall�the�following�means�for�each�group�(as�presented�in�the�previous�chapter): Number of Statistics Labs Attended by Group Group 1: Unattractive Group 2: Slightly Unattractive Group 3: Moderately Attractive Group 4: Very Attractive Overall 15 20 10 30 10 13 24 22 12 9 29 26 8 22 12 20 21 24 27 29 7 25 21 28 13 18 25 25 3 12 14 15 Means 11�1250 17�8750 20�2500 24�3750 18�4063 Variances 30�1250 35�2679 53�0714 25�9821 56�4425 Thus,�using�the�contrast�coefficients�(represented�by�the�constant�c�values�in�the�numerator� of�each�term)�and�the�values�of�the�means�for�each�of�the�four�groups�(represented�by� Y – �1,�Y – �2,�Y – �3,�Y – �4),�the�test�statistics�are�computed�as�follows: t Y Y Y Y s linear = − − + + = − − +3 1 1 3 3 11 1250 1 17 8750 1 21 2 3 4. . . . ’ ( . ) ( . ) ( ψ 00 2500 3 24 3750 9 5015 4 4335 . ) ( . ) . . + = t Y Y Y Y s quadratic = − − + = − −1 1 1 1 1 11 1250 1 17 8750 11 2 3 4. . . . ’ ( . ) ( . ) ( ψ 220 2500 1 24 3750 4 2492 0 6178 . ) ( . ) . . + = − t Y Y Y Y s cubic = − + − + = − + −1 3 3 1 1 11 1250 3 17 8750 3 201 2 3 4. . . . ’ ( . ) ( . ) ( ψ .. ) ( . ) . . 2500 1 24 3750 9 5015 0 6446 + = The�t�test�statistic�for�the�linear�trend�exceeds�the�t�critical�value��Thus�we�see�that�there� is� a� statistically� significant� linear trend� in� the� means� but� no� significant� higher-order� trend� (in�other�words,�no�significant�quadratic�or�cubic�trend)��This�should�not�be�surprising�as� shown�in�the�profile�plot�of�the�means�of�Figure�12�1,�where�there�is�a�very�strong�linear� 352 An Introduction to Statistical Concepts trend,�and�that�is�about�it��In�other�words,�there�is�a�steady�increase�in�mean�attendance�as� the�level�of�attractiveness�of�the�instructor�increases��Always�plot�the�means�so�that�you� can�interpret�the�results�of�the�contrasts� Let�us�make�some�final�points�about�orthogonal�polynomial�contrasts��First,�be�particu- larly� careful� about� extrapolating� beyond� the� range� of� the� levels� investigated�� The� trend� may�or�may�not�be�the�same�outside�of�this�range;�that�is,�given�only�those�sample�means,� we�have�no�way�of�knowing�what�the�trend�is�outside�of�the�range�of�levels�investigated�� Second,� in� the� unequal� n’s� or� unbalanced� case,� it� becomes� difficult� to� formulate� a� set� of� orthogonal�contrasts�that�make�any�sense�to�the�researcher��See�the�discussion�in�the�next� section�on�planned�orthogonal�contrasts,�as�well�as�Kirk�(1982)��Third,�when�the�levels�are� not� equally� spaced,� this� needs� to� be� taken� into� account� in� the� contrast� coefficients� (see� Kirk,�1982)� 12.2.2 planned Orthogonal Contrasts Planned�orthogonal�contrasts�(POC)�are�an�MCP�where�the�contrasts�are�defined�ahead�of� time�by�the�researcher�(i�e�,�planned)�and�the�set�of�contrasts�are�orthogonal�(or�unrelated)�� The� POC� method� is� a� contrast-based� procedure� where� the� researcher� is� not� concerned� with�control�of�the�family-wise�Type�I�error�rate�across�the�set�of�contrasts��The�set�of�con- trasts�are�orthogonal,� so�the�number� of�contrasts� should�be� small,� and�concern� with� the� family-wise�error�rate�is�lessened� Computationally,� planned� orthogonal� contrasts� use� the� standard� t� test� statistic� that� is� compared�to�the�critical�values�of�±α/2tdf(error)�obtained�from�the�t�table�in�Table�A�2��Using� the�example�dataset�from�Chapter�11,�let�us�find�a�set�of�orthogonal�contrasts�and�complete� the�computations��Since�J�=�4,�we�can�find�at�most�a�set�of�three�(or�J�−�1)�orthogonal�contrasts�� One�orthogonal�set�that�seems�reasonable�for�these�data�is�as�follows: c1 c2 c3 c4 ψ µ µ µ µ 1 1 2 3 4 2 2 0: . . . . +    − +    = +1/2 +1/2 −1/2 −1/2 ψ2:�μ�1�−�μ�2�=�0 +1 −1 0 0 ψ3:�μ�3�−�μ�4�=�0 0 0 +1 −1 FIGuRe 12.1 Profile�plot�for�statistics�lab�example� 25 22 20 18 N um be r o f la bs a tt en de d 15 12 10 1 2 3 Group 4 353Multiple Comparison Procedures Here�we�see�that�the�first�contrast�compares�the�average�of�the�two�least�attractive�groups� (i�e�,�unattractive�and�slightly�attractive)�with�the�average�of�the�two�most�attractive�groups� (i�e�,� moderately� attractive� and� very� attractive),� the� second� contrast� compares� the� means� of� the� two� least� attractive� groups� (i�e�,� unattractive�and� slightly� attractive),� and� the� third� contrast�compares�the�means�of�the�two�most�attractive�groups�(moderately�attractive�and� very�attractive)��Note�that�the�design�is�balanced�(i�e�,�the�equal�n’s�case�as�all�groups�had� a�sample�size�of�8)��What�follows�are�the�computations��The�critical�values�are�as�follows: ± = ± = ±α/ ( ) . 2.0482 025 28t tdf error The�standard�error�for�contrast�1�is�computed�as�follows�(where�nj�=�8�for�each�of�the�J�=�4� groups;�MSerror�was�computed�in�the�previous�chapter�and�found�to�be�36�1116)��The�equa- tion�for�contrast�1�is� ψ µ µ µ µ 1 1 2 3 4 2 2 0: . . . . +    − +    = ,�and�thus�these�are�the�cj�values�in�the � following�equation�(+1/2,�+1/2,�−1/2,�−1/2,�respectively,�and�these�values�are�then�squared,� which�results�in�the�value�of��25): s MS c n error j jj J ψ ’ . . . . . =     = + + +   = ∑ 2 1 36 1116 25 8 25 8 25 8 25 8  = 2 1246. Similarly,�the�standard�errors�for�contrasts�2�and�3�are�computed�as�follows: s MS c n j jj J ψ ’ . .=     = +    = = ∑error 2 1 36 1116 1 8 1 8 3 0046 The�test�statistics�are�computed�as�follows: t Y Y Y Y s 1 1 2 1 1 2 2 1 2 3 1 2 4 1 2 1 2 1 211 1250 17 8750 = + + − − = + + − . . . . ’ ( . ) ( . ) ψ (( . ) ( . ) . . 20 2500 24 3750 2 1246 3 6772 1 2− = − � t Y Y s 2 1 2 11 1250 17 8750 3 0046 2 2466= − = − = −. . ’ . . . . ψ � t Y Y s 3 3 4 20 2500 24 3750 3 0046 1 3729= − = − = −. . ’ . . . . ψ The�result�for�contrast�1�is�that�the�combined�less�attractive�groups�have�statistically�sig- nificantly� lower� attendance,� on� average,� than� the� combined� more� attractive� groups�� The� result�for�contrast�2�is�that�the�two�less�attractive�groups�are�statistically�significantly�dif- ferent�from�one�another,�on�average��The�result�for�contrast�3�is�that�the�means�of�the�two� more�attractive�groups�are�not�statistically�significantly�different�from�one�another� 354 An Introduction to Statistical Concepts There�is�a�practical�problem�with�this�procedure�because�(a)�the�contrasts�that�are�of� interest�to�the�researcher�may�not�necessarily�be�orthogonal,�or�(b)�the�researcher�may� not�be�interested�in�all�of�the�contrasts�of�a�particular�orthogonal�set��Another�problem� already�mentioned�occurs�when�the�design�is�unbalanced,�where�an�orthogonal�set�of� contrasts� may� be� constructed� at� the� expense� of� meaningful� contrasts�� Our� advice� is� simple: � 1�� If�the�contrasts�you�are�interested�in�are�not�orthogonal,�then�use�another�MCP� � 2�� If� you� are� not� interested� in� all� of� the� contrasts� of� an� orthogonal� set,� then� use� another�MCP� � 3�� If�your�design�is�not�balanced�and�the�orthogonal�contrasts�formed�are�not�mean- ingful,�then�use�another�MCP� In�each�case,�you�need�a�different�planned�MCP��We�recommend�using�one�of�the�following� procedures� discussed� later� in� this� chapter:� the� Dunnett,� Dunn� (Bonferroni),� or� Dunn– Sidak�procedure� We�defined�the�POC�as�a�contrast-based�procedure��One�could�also�consider�an�alter- native� family-wise� method� where� the� αpc� level� is� divided� among� the� contrasts� in� the� set�� This� procedure� is� defined� by� αpc� =� αfw/c,� where� c� is� the� number� of� orthogonal� con- trasts�in�the�set�(i�e�,�c�=�J�−�1)��As�we�show�later,�this�borrows�a�concept�from�the�Dunn� (Bonferroni)�procedure��If�the�variances�are�not�equal�across�the�groups,�several�approxi- mate�solutions�have�been�proposed�that�take�the�individual�group�variances�into�account� (see�Kirk,�1982)� 12.2.3 planned Contrasts with Reference Group: dunnett Method A�third�method�of�planned�comparisons�is�attributed�to�Dunnett�(1955)��It�is�designed�to� test�pairwise�contrasts�where�a�reference�group�(e�g�,�a�control�or�baseline�group)�is�com- pared�to�each�of�the�other�J�−�1�groups��Thus�a�family�of�prespecified�pairwise�contrasts�is� to�be�evaluated��The�Dunnett�method�is�a�family-wise�MCP�and�is�slightly�more�power- ful�than�the�Dunn�procedure�(another�planned�family-wise�MCP)��The�test�statistic�is�the� standard�t�except�that�the�standard�error�is�simplified�as�follows: s MS n n error c j ψ ’ = +       1 1 where�c�is�the�reference�group�and�j�is�the�group�to�which�it�is�being�compared��The�test� statistic� is� compared� to� the� critical� values� ±α/2tdf(error),J−1� obtained� from� the� Dunnett� table� located�in�Table�A�7� Using�the�example�dataset,�compare�group�1,�the�unattractive�group�(used�as�a�reference� or�baseline�group),�to�each�of�the�other�three�groups��The�contrasts�are�as�follows: c1 c2 c3 c4 ψ1:�μ�1�−�μ�2�=�0 +1 −1 0 0 ψ2:�μ�1�−�μ�3�=�0 +1 0 −1 0 ψ3:�μ�1�−�μ�4�=�0 +1 0 0 −1 355Multiple Comparison Procedures The� following� are� the� computations�� The� critical� values� are� as� follows:� ±α/2tdf(error),J−1� =� ±�025t28,3�≈�±2�48 The�standard�error�is�computed�as�follows�(where�nc�=�8�for�the�reference�group;�nj�=�8� for�each�of�the�other�groups;�MSerror�was�computed�in�the�previous�chapter�and�found�to� be�36�1116): s MS n n error c j ψ ’ . .= +       = +     = 1 1 36 1116 1 8 1 8 3 0046 The�test�statistics�for�the�three�contrasts�(i�e�,�group�1�to�group�2,�group�1�to�group�3,�and� group�1�to�group�4)�are�computed�as�follows: Unnattractive to slightly attractive : .. . ’ t Y Y s 1 1 2 11 1250= − = ψ −− = − 17 8750 3 0046 2 2466 . . . Unnattractive to moderately attractive: .. . ’ t Y Y s 2 1 3 11 12= − = ψ 550 20 2500 3 0046 3 0370 − = − . . . Unnattractive to very attractive : . .. . ’ t Y Y s 3 1 4 11 1250 24= − = − ψ 33750 3 0046 4 4099 . .= − Comparing�the�test�statistics�to�the�critical�values,�we�see�that�the�second�group�(i�e�,�slightly� attractive)�is�not�statistically�significantly�different�from�the�baseline�group�(i�e�,�unattract- ive),�but�the�third�(moderately�attractive)�and�fourth�(very�attractive)�more�attractive�groups� are�significantly�different�from�the�baseline�group� If�the�variance�of�the�reference�group�is�different�from�the�variances�of�the�other�J�−�1� groups,� then� a� modification� of� this� method� is� described� in� Dunnett� (1964)�� For� related� procedures�that�are�less�sensitive�to�unequal�group�variances,�see�Wilcox�(1987)�or�Wilcox� (1996)�(e�g�,�variation�of�the�Dunnett�T3�procedure)� 12.2.4 Other planned Contrasts: dunn (or bonferroni) and dunn–Sidak Methods The�Dunn�(1961)�procedure�(commonly�attributed�to�Dunn�as�the�developer�is�unknown),� also�often�called�the�Bonferroni�procedure�(because�it�is�based�on�the�Bonferroni�inequal- ity),�is�a�planned�family-wise�MCP��It�is�designed�to�test�either�pairwise�or�complex�con- trasts�for�balanced�or�unbalanced�designs��Thus�this�MCP�is�very�flexible�and�may�be�used� to�test�any�planned�contrast�of�interest��The�Dunn�method�uses�the�standard�t�test�statistic� with�one�important�exception��The�α�level�is�split�up�among�the�set�of�planned�contrasts�� Typically� the� per� contrast� α� level� (denoted� as� αpc)� is� set� at� α/c,� where� c� is� the� number� of� contrasts��That�is,�αpc�=�αfw/c��According�to�this�rationale,�the�family-wise�Type�I�error�rate� (denoted�as�αfw)�will�be�maintained�at�α��For�example,�if�αfw�=��05�is�desired�and�there�are� five�contrasts�to�be�tested,�then�each�contrast�would�be�tested�at�the��01�level�of�significance� (�05/5� =� �01)�� We� are� reminded� that� α� need� not� be� distributed� equally� among� the� set� of� contrasts,�as�long�as�the�sum�of�the�individual�αpc�terms�is�equal�to�αfw�(Keppel�&�Wickens,� 2004;�Rosenthal�&�Rosnow,�1985)� 356 An Introduction to Statistical Concepts Computationally,�the�Dunn�method�uses�the�standard�t�test�statistic,�which�is�compared� to�the�critical�values�of�±α/ctdf(error)�for�a�two-tailed�test�obtained�from�the�table�in�Table�A�8�� The�table�takes�the�number�of�contrasts�into�account�without�requiring�you�to�physically� split�up�the�α��Using�the�example�dataset�from�Chapter�11,�for�comparison�purposes,�let�us� test�the�same�set�of�three�orthogonal�contrasts�we�evaluated�with�the�POC�method��These� contrasts�are�as�follows: c1 c2 c3 c4 ψ µ µ µ µ 1 1 2 3 4 2 2 0: +    − +    =. . . . +1/2 +1/2 −1/2 −1/2 ψ2:�μ�1�−�μ�2�=�0 +1 −1 0 0 ψ3:�μ�3�−�μ�4�=�0 0 0 +1 −1 Following�are�the�computations,�with�the�critical�values ± = ± ≈ ±α/ ( ) . / .c df errort t05 3 28 2 539 The�standard�error�for�contrast�1�is�computed�as�follows: s MS c n error j jj J ψ ’ . . . . . =     = + + +   = ∑ 2 1 36 1116 25 8 25 8 25 8 25 8  = 2 1246. Similarly,�the�standard�error�for�contrasts�2�and�3�is�computed�as�follows: s MS c n error j jj J ψ ’ . .=     = +    = = ∑ 2 1 36 1116 1 8 1 8 3 0046 The�test�statistics�are�computed�as�follows: t Y Y Y Y s 1 1 2 1 1 2 2 1 2 3 1 2 4 1 2 1 2 1 211 1250 17 8750 = + + − − = + + − . . . . ’ ( . ) ( . ) ψ (( . ) ( . ) . . 20 2500 24 3750 2 1246 3 6772 1 2− = − � t Y Y s 2 1 2 11 1250 17 8750 3 0046 2 2466= − = − = −. . ’ . . . . ψ � t Y Y s 3 3 4 20 2500 24 3750 3 0046 1 3729= − = − = −. . ’ . . . . ψ Notice�that�the�test�statistic�values�have�not�changed�from�the�POC,�but�the�critical�value� has�changed��For�this�set�of�contrasts�then,�we�see�the�same�results�as�were�obtained�via� the� POC� procedure� with� the� exception� of� contrast� 2,� which� is� now� nonsignificant� (i�e�,� only� 357Multiple Comparison Procedures contrast� 1� is� significant)�� The� reason� for� this� difference� lies� in� the� critical� values� used,� which�were�±2�048�for�the�POC�method�and�±2�539�for�the�Dunn�method��Here�we�see�the� conservative�nature�of�the�Dunn�procedure�because�the�critical�value�is�larger�than�with� the�POC�method,�thus�making�it�a�bit�more�difficult�to�reject�H0� The� Dunn� procedure� is� slightly� conservative� (i�e�,� not� as� powerful)� in� that� the� true� αfw� may�be�less�than�the�specified�nominal�α�level��For�example,�if�the�nominal�alpha�(speci- fied�by�the�researcher)�is��05,�then�the�true�alpha�may�be�less�than��05��Thus�when�using�the� Dunn,�you�may�be�less�likely�to�reject�the�null�hypothesis�(i�e�,�less�likely�to�find�a�statisti- cally�significant�contrast)��A�less�conservative�(i�e�,�more�powerful)�modification�is�known� as� the� Dunn–Sidak� procedure� (Dunn,� 1974;� Sidak,� 1967)� and� uses� slightly� different� criti- cal�values��For�more�information,�see�Kirk�(1982),�Wilcox�(1987),�and�Keppel�and�Wickens� (2004)��The�Bonferroni�modification�can�also�be�applied�to�other�MCPs� 12.2.5 Complex post hoc Contrasts: Scheffé and kaiser–bowden Methods Another� early� MCP� due� to� Scheffé� (1953)� is� quite� versatile�� The� Scheffé� procedure� can� be� used� for� any� possible� type� of� comparison,� orthogonal� or� nonorthogonal,� pairwise� or� complex,�planned�or�post�hoc,�where�the�family-wise�error�rate�is�controlled��The�Scheffé� method�is�so�general�that�the�tests�are�quite�conservative�(i�e�,�less�powerful),�particularly� for�the�pairwise�contrasts��This�is�so�because�the�family�of�contrasts�for�the�Scheffé�method� consists�of�all�possible�linear�comparisons��To�control�the�Type�I�error�rate�for�such�a�large� family,� the� procedure� has� to� be� conservative� (i�e�,� making� it� less� likely� to� reject� the� null� hypothesis�if�it�is�really�true)��Thus�we�recommend�the�Scheffé�method�only�for�complex� post�hoc�comparisons� The�Scheffé�procedure�is�the�only�MCP�that�is�necessarily�consistent�with�the�results�of�the� F�ratio�in�ANOVA��If�the�F�ratio�is�statistically�significant,�then�this�means�that�at�least�one� contrast�in�the�entire�family�of�contrasts�will�be�significant�with�the�Scheffé�method��Do�not� forget,�however,�that�this�family�can�be�quite�large�and�you�may�not�even�be�interested�in� the�contrast(s)�that�wind�up�being�significant��If�the�F�ratio�is�not�statistically�significant,�then� none�of�the�contrasts�in�the�family�will�be�significant�with�the�Scheffé�method� The�test�statistic�for�the�Scheffé�method�is�the�standard�t�again��This�is�compared�to�the� critical�value� ( )( ), ( )J FJ df error− −1 1α �taken�from�the�F�table�in�Table�A�4��In�other�words,�the� square�root�of�the�F�critical�value�is�adjusted�by�J�−�1,�which�serves�to�increase�the�Scheffé� critical�value�and�make�the�procedure�a�more�conservative�one� Consider�a�few�example�contrasts�with�the�Scheffé�method��Using�the�example�dataset� from�Chapter�11,�for�comparison�purposes,�we�test�the�same�set�of�three�orthogonal�con- trasts�that�were�evaluated�with�the�POC�method��These�contrasts�are�again�as�follows: c1 c2 c3 c4 ψ µ µ µ µ 1 1 2 3 4 2 2 : 0. . . . +    − +    = +1/2 +1/2 −1/2 −1/2 ψ2:�μ�1�−�μ�2�=�0 +1 −1 0 0 ψ3:�μ�3�−�μ�4�=�0 0 0 +1 −1 The�following�are�the�computations��The�critical�value�is�as�follows: ( ) ( ) ( )( ) ( )( . ) ., ( ) . ,J F FJ df error− = = =−1 3 3 2 95 2 971 05 3 28α 358 An Introduction to Statistical Concepts Standard�error�for�contrast�1: s MS c n error j jj J ψ ’ . (. / . / . / . /=     = + + + = ∑ 2 1 36 1116 25 8 25 8 25 8 25 88 2 1246) .= Standard�error�for�contrasts�2�and�3: s MS n n error j j ψ ’ ’ . .= +       = +     = 1 1 36 1116 1 8 1 8 3 0046 The�test�statistics�are�computed�as�follows: t Y Y Y Y s 1 1 2 1 1 2 2 1 2 3 1 2 4 1 2 1 2 1 211 1250 17 8750 = + + − − = + + − . . . . ’ ( . ) ( . ) ψ (( . ) ( . ) . . 20 2500 24 3750 2 1246 3 6772 1 2− = − � t Y Y s 2 1 2 11 1250 17 8750 3 0046 2 2466= − = − = −. . ’ . . . . ψ � t Y Y s 3 3 4 20 2500 24 3750 3 0046 1 3729= − = − = −. . ’ . . . . ψ Using�the�Scheffé�method,�these�results�are�precisely�the�same�as�those�obtained�via�the� Dunn�procedure��There�is�somewhat�of�a�difference�in�the�critical�values,�which�were�2�97� for�the�Scheffé�method,�2�539�for�the�Dunn�method,�and�2�048�for�the�POC�method��Here� we� see� that� the� Scheffé� procedure� is� even� more� conservative� than� the� Dunn� procedure,� thus�making�it�a�bit�more�difficult�to�reject�H0� For�situations�where�the�group�variances�are�unequal,�a�modification�of�the�Scheffé�method� less�sensitive�to�unequal�variances�has�been�proposed�by�Brown�and�Forsythe�(1974)��Kaiser� and�Bowden�(1983)�found�that�the�Brown-Forsythe�procedure�may�cause�the�actual�α�level�to� exceed�the�nominal�α�level,�and�thus�we�recommend�the�Kaiser–Bowden�modification��For� more�information,�see�Kirk�(1982),�Wilcox�(1987),�and�Wilcox�(1996)� 12.2.6 Simple post hoc Contrasts: Tukey hSd, Tukey– kramer, Fisher lSd, and hayter Tests Tukey’s� (1953)� honestly� significant� difference� (HSD)� test� is� one� of� the� most� popular� post� hoc�MCPs��The�HSD�test�is�a�family-wise�procedure�and�is�most�appropriate�for�consider- ing�all�pairwise�contrasts�with�equal�n’s�per�group�(i�e�,�a�balanced�design)��The�HSD�test� is� sometimes� referred� to� as� the� studentized range test� because� it� is� based� on� the� sam- pling�distribution�of�the�studentized�range�statistic�developed�by�William�Sealy�Gossett� (forced�to�use�the�pseudonym�“Student”�by�his�employer,�the�Guinness�brewery)��For�the� 359Multiple Comparison Procedures traditional�approach,�the�first�step�in�the�analysis�is�to�rank�order�the�means�from�largest� (Y – �1)�to�smallest�(Y – �J)��The�test�statistic,�or�studentized�range�statistic,�is�computed�as�follows: q Y Y s i j j= −. . ’ ’ψ where s MS n error ψ ’ = where� i�identifies�the�specific�contrast j�and�j′�designate�the�two�group�means�to�be�compared n�represents�the�number�of�observations�per�group�(equal�n’s�per�group�is�required) The�test�statistic�is�compared�to�the�critical�value�±α�qdf(error),J,�where�dferror�is�equal�to�J(n�−�1)��The� table�for�these�critical�values�is�given�in�Table�A�9� The� first� contrast� involves� a� test� of� the� largest� pairwise� difference� in� the� set� of� J� means� (q1)�(i�e�,�largest�vs��smallest�means)��If�these�means�are�not�significantly�different,�then�the� analysis�stops�because�no�other�pairwise�difference�could�be�significant��If�these�means�are� different,�then�we�proceed�to�test�the�second�pairwise�difference�involving�the�largest�mean� (i�e�,�q2)��Contrasts�involving�the�largest�mean�are�continued�until�a�nonsignificant�difference� is�found��Then�the�analysis�picks�up�with�the�second�largest�mean�and�compares�it�with�the� smallest�mean��Contrasts�involving�the�second�largest�mean�are�continued�until�a�nonsignif- icant�difference�is�detected��The�analysis�continues�with�the�next�largest�mean�and�the�small- est�mean,�and�so�on,�until�it�is�obvious�that�no�other�pairwise�contrast�could�be�significant� Finally,�consider�an�example�using�the�HSD�procedure�with�the�attractiveness�data��The� following�are�the�computations��The�critical�values�are�as�follows: ± = ± ≈ ±α q qdf error J( ), . , .05 28 4 3 87 The�standard�error�is�computed�as�follows�where�n�represents�the�sample�size�per�group: s MS n error ψ ’ . .= = = 36 1116 8 2 1246 The�test�statistics�are�computed�as�follows: Very�attractive�to�unattractive:�q Y Y s 1 4 1 24 3750 11 1250 2 1246 6 2365= − = − =. . ’ . . . . ψ Very�attractive�to�slightly�attractive:�q Y Y s 2 4 2 24 3750 17 8750 2 1246 3 0594= − = − =. . ’ . . . . ψ Moderately�attractive�to�unattractive:�q Y Y s 3 3 1 20 2500 11 1250 2 1246 4 2949= − = − =. . ’ . . . . ψ Moderately�attractive�to�slightly�attractive:�q Y Y s 4 3 2 20 2500 17 8750 2 1246 1 1179= − = − =. . ’ . . . . ψ 360 An Introduction to Statistical Concepts Slightly�attractive�to�unattractive:�q Y Y s 5 2 1 17 8750 11 1250 2 1246 3 1771= − = − =. . ’ . . . . ψ Comparing�the�test�statistic�values�to�the�critical�value,�these�results�indicate�that�the�group� means�are�significantly�different�for�groups�1�(unattractive)�and�4�(very�attractive)�and�for� groups� 1� (unattractive)� and� 3� (moderately� attractive)�� Just� for� completeness,� we� examine� the�final�possible�pairwise�contrast�involving�groups�3�and�4��However,�we�already�know� from� the� results� of� previous� contrasts� that� these� means� cannot� possibly� be� significantly� different��The�test�statistic�result�for�this�contrast�is�as�follows: Very�attractive�to�moderately�attractive:�q Y Y s 6 4 3 24 3750 20 2500 2 1246 1 9415= − = − =. . ’ . . . . ψ Occasionally� researchers� need� to� summarize� the� results� of� their� pairwise� comparisons�� Table� 12�1� shows� the� results� of� Tukey� HSD� contrasts� for� the� example� data�� For� ease� of� interpretation,�the�means�are�ordered�from�lowest�to�highest��The�first�row�consists�of�the� results�for�those�contrasts�that�involve�group�1��Thus�the�mean�for�group�1�(unattractive)�is� statistically�different�from�those�of�groups�3�(moderately�attractive)�and�4�(very�attractive)� only��None�of�the�other�pairwise�contrasts�were�shown�to�be�significant��Such�a�table�could� also�be�developed�for�other�pairwise�MCPs� The�HSD�test�has�exact�control�of�the�family-wise�error�rate�assuming�normality,�homo- geneity,� and� equal� n’s� (better� than� Dunn� or� Dunn–Sidak)�� The� HSD� procedure� is� more� powerful�than�the�Dunn�or�Scheffé�procedure�for�testing�all�possible�pairwise�contrasts,� although� Dunn� is� more� powerful� for� less� than� all� possible� pairwise� contrasts�� The� HSD� technique�is�the�recommended�MCP�as�a�pairwise�method�in�the�equal�n’s�situation��The� HSD� test� is� reasonably� robust� to� nonnormality,� but� not� in� extreme� cases,� and� is� not� as� robust�as�the�Scheffé�MCP� There� are� several� alternatives� to� the� HSD� for� the� unequal� n’s� case�� These� include� the� Tukey–Kramer� modification� (Kramer,� 1956;� Tukey,� 1953),� which� assumes� normality� and� homogeneity��The�Tukey–Kramer�test�statistic�is�the�same�as�the�Tukey�HSD�except�that�the� standard�error�is�computed�as�follows�(note that when requesting Tukey in SPSS, the program knows which standard error to calculate): s MS n n errorψ ’ = +           1 2 1 1 1 2 The�critical�value�is�determined�in�the�same�way�as�with�the�Tukey�HSD�procedure� Table 12.1 Tukey�HSD�Contrast�Test�Statistics�and�Results Group 1: Unattractive Group 2: Slightly Unattractive Group 3: Moderately Attractive Group 4: Very Attractive Group�1�(mean�=�11�1250) — 3�1771 4�2949* 6�2365* Group�2�(mean�=�17�8750) — 1�1179 3�0594 Group�3�(mean�=�20�2500) — 1�9415 Group�4�(mean�=�24�3750) — *p�<��05;��05q28,4�=�3�87� 361Multiple Comparison Procedures Fisher’s�(1949)�least�significant�difference�(LSD)�test,�also�known�as�the�protected�t�test,� was�the�first�MCP�developed�and�is�a�pairwise�post�hoc�procedure��It�is�a�sequential�pro- cedure�where�a�significant�ANOVA�F�is�followed�by�the�LSD�test�in�which�all�(or�perhaps� some)� pairwise� t� tests� are� examined�� The� standard� t� test� statistic� is� compared� with� the� critical�values�of�±α/2tdf(error)��The�LSD�test�has�precise�control�of�the�family-wise�error�rate� for� the� three-group� situation,� assuming� normality� and� homogeneity;� but� for� more� than� three�groups,�the�protection�deteriorates�rather�rapidly��In�that�case,�a�modification�due�to� Hayter�(1986)�is�suggested�for�more�adequate�protection��The�Hayter�test�appears�to�have� more� power� than� the� Tukey� HSD� and� excellent� control� of� family-wise� error� (Keppel� &� Wickens,�2004)� 12.2.7 Simple post hoc Contrasts for unequal Variances: Games–howell, dunnett T3 and C Tests When�the�group�variances�are�unequal,�several�alternative�procedures�are�available��These� alternatives� include� the� Games� and� Howell� (1976),� and� Dunnett� T3� and� C� (1980)� proce- dures��According�to�Wilcox�(1996,�2003),�T3�is�recommended�for�n�<�50�and�Games–Howell� for�n�>�50,�and�C�performs�about�the�same�as�Games-Howell��For�further�details�on�these�
methods,� see� Kirk� (1982),� Wilcox� (1987,� 1996,� 2003),� Hochberg� (1988),� and� Benjamini� and�
Hochberg�(1995)�
12.2.8 Follow-up Tests to kruskal–Wallis
Recall�from�Chapter�11�the�nonparametric�equivalent�to�ANOVA,�the�Kruskal–Wallis�test��
Several� post� hoc� procedures� are� available� to� follow� up� a� statistically� significant� overall�
Kruskal–Wallis�test��The�procedures�discussed�here�are�the�nonparametric�equivalents�to�
the�Scheffé�and�Tukey�HSD�methods��One�may�form�pairwise�or�complex�contrasts�as�in�
the�parametric�case��The�test�statistic�is�Z�and�computed�as�follows:
Z
s
i=
ψ
ψ
’
’
where�the�standard�error�in�the�denominator�is�computed�as
s
N N c
n
j
jj
J
ψ ’
( )
=
+ 


=
∑112
2
1
and�where�N�is�the�total�number�of�observations��For�the�Scheffé�method,�the�test�statistic�
Z�is�compared�to�the�critical�value� α χ J −1 �obtained�from�the�χ2�table�in�Table�A�3��For�the�
Tukey� HSD� procedure,� the� test� statistic� Z� is� compared� to� the� critical� value� α qdf error J( ), / 2 �
obtained�from�the�table�of�critical�values�for�the�studentized�range�statistic�in�Table�A�9�
Let�us�use�the�attractiveness�data�to�illustrate��Do�not�forget�that�we�use�the�ranked�data�
as�described�in�Chapter�11��The�rank�means�for�the�groups�are�as�follows:�group�1�(unat-
tractive)�=�7�7500,�group�2�(slightly�attractive)�=�15�2500,�group�3�(moderately�attractive)�=�
18�7500,�and�group�4�(very�attractive)�=�24�2500��Here�we�only�examine�two�contrasts�and�
then�compare�the�results�for�both�the�Scheffé�and�Tukey�HSD�methods��The�first�contrast�

362 An Introduction to Statistical Concepts
compares�the�two�low-attractiveness�groups�(i�e�,�groups�1�and�2),�whereas�the�second�con-
trast�compares�the�two�low-attractiveness�groups�with�the�two�high-attractiveness�groups�
(i�e�,�groups�3�and�4)��In�other�words,�we�examine�a�pairwise�contrast�and�a�complex�con-
trast,�respectively��The�results�are�given�here��The�critical�values�are�as�follows:
Scheffé α χ χJ − = = =1 05 3 7 8147 2 7955. . .
Tukey α q qdf error J( ), . ,/ / . / .2 2 3 87 2 2 736505 28 4= ≈ ≈
The�standard�error�for�contrast�1�is�computed�as
s
N N c
n
j
jj
J
ψ ’
( ) ( )
.=
+ 



= 



+



=
=
∑112
32 33
12
1
8
1
8
4 6
2
1
9904
The�standard�error�for�contrast�2�is�calculated�as�follows:
s
N N c
n
j
jj
J
ψ ’
( ) ( ) . . .
=
+ 



= 



+ + +
=
∑112
32 33
12
25
8
25
8
25
8
2
1
..
.
25
8
3 3166




=
The�test�statistics�are�computed�as�follows:
�
Z
Y Y
s
1
1 2 7 75 15 25
4 6904
1 5990=
−
=
−
= −. .
’
. .
.
.
ψ
Z
Y Y Y Y
s
2
1
2 1
1
2 2
1
2 3
1
2 4
1
2
1
2
1
27 75 15 25 18 7=
+ + − −
=
+ + −. . . .
’
( . ) ( . ) ( .
ψ
55 24 25
3 3166
3 0151
1
2) ( . )
.
.
−
= −
For�both�procedures,�we�find�a�statistically�significant�difference�with�the�second�contrast�
but�not�with�the�first��These�results�agree�with�most�of�the�other�parametric�procedures�for�
these� particular� contrasts�� That� is,� the� less� attractive� groups� are� not� significantly� different�
(only�significant�with�POC),�whereas�the�two�less�attractive�groups�are�significantly�different�
from�the�two�more�attractive�groups�(significant�with�all�procedures)��One�could�also�devise�
nonparametric�equivalent�MCPs�for�methods�other�than�the�Scheffé�and�Tukey�procedures�
12.3 SPSS
In�our�last�section,�we�examine�what�SPSS�has�to�offer�in�terms�of�MCPs��Here�we�use�the�
general� linear� model� module� (although� the� one-way� ANOVA� module� can� also� be� used)��
The�steps�for�requesting�a�one-way�ANOVA�were�presented�in�the�previous�chapter�and�
will�not�be�reiterated�here��Rather,�we�will�assume�all�the�previously�mentioned�options�
have�been�selected��The�last�step,�therefore,�is�selection�of�one�or�more�planned�(a�priori)�or�

363Multiple Comparison Procedures
post�hoc�MCPs��For�purposes�of�this�illustration,�the�Tukey�will�be�selected��However,�you�
are�encouraged�to�examine�other�MCPs�for�this�dataset�
Step 1:�From�the�“Univariate”�dialog�box,�click�on�“Post Hoc”�to�select�various�post�
hoc�MCPs�or�click�on�“Contrasts”�to�select�various�planned�MCPs�(see�screenshot�step�1)�
Clicking on
“Contrasts”
will allow you to
conduct certain
planned MCPs.
Clicking on “Post
Hoc” will allow you
to select various
post hoc MCPs.
MCPs
Step 2 (post hoc MCP):�Click�on�the�name�of�independent�variable�in�the�“Factor(s)”�
list�box�in�the�top�left�and�move�to�the�“Post Hoc Tests for”�box�in�the�top�right�by�click-
ing�on�the�arrow�key��Check�an�appropriate�MCP�for�your�situation�by�placing�a�checkmark�in�
the�box�next�to�the�desired�MCP��In�this�example,�we�will�select�“Tukey�”�Click�on�“Continue”
to�return�to�the�original�dialog�box��Click�on�“OK”�to�return�to�generate�the�output�
MCPs for instances when
homogeneity of variance
assumption is met.
MCPs for instances
when homogeneity of
variance assumption
is not met.
Select the independent variable of
interest from the list on the left and
use the arrow to move to the “Post
Hoc Tests for” box on the right.
Post hoc MCP

364 An Introduction to Statistical Concepts
Step 3a (planned MCP):�To�obtain�trend�analysis�contrasts,�click�the�“Contrasts”�
button�from�the�“Univariate”�dialog�box�(see�screenshot�step�1)��From�the�“Contrasts”�
dialog�box,�click�the�“Contrasts”�pulldown�and�scroll�down�to�“Polynomial.”
Planned contrast
Step 3b:�Click�“Change”�to�select�“Polynomial”�and�move�it�to�be�displayed�in�paren-
theses�next�to�the�independent�variable��Recall�that�this�type�of�contrast�will�allow�testing�
of� linear,� quadratic,� and� cubic� contrasts�� Other� specific� planned� contrasts� are� also� avail-
able��Then�click�“Continue”�to�return�to�the�“Univariate”�dialog�box�
Planned contrast
Interpreting the output:�Annotated�results�from�the�Tukey�HSD�procedure,�as�one�
example�MCP,�are�shown�in�Table�12�2��Note�that�CIs�around�a�mean�difference�of�0�are�given�
to�the�right�for�each�contrast�

365Multiple Comparison Procedures
Table 12.2
Tukey�HSD�SPSS�Results�for�the�Statistics�Lab�Example
Multiple Comparisons
“Mean difference” is simply the difference between the means of
the two groups compared. For example, the mean difference
of group 1 and group 2 is calculated as
11.1250 – 17.8750 = –6.7500
Number of Statistics Labs Attended
Tukey HSD
95% Confidence Interval
(I ) Level of
Attractiveness
(J ) Level of
Attractiveness
Mean
Difference
(I–J ) Std. Error Sig.
Lower
Bound
Upper
Bound
Slightly attractive –6.7500 3.00465 .135 –14.9536 1.4536
Moderately attractive –9.1250* 3.00465 .025 –17.3286 –.9214
Unattractive
Very attractive –13.2500* 3.00465 .001 –21.4536 –5.0464
Unattractive 6.7500 3.00465 .135 –1.4536 14.9536
Moderately attractive –2.3750 3.00465 .858 –10.5786 5.8286
Slightly attractive
Very attractive –6.5000 3.00465 .158 –14.7036 1.7036
Unattractive 9.1250* 3.00465 .025 .9214 17.3286
Slightly attractive 2.3750 3.00465 .858 –5.8286 10.5786
Moderately attractive
Very attractive –4.1250 3.00465 .526 –12.3286 4.0786
Unattractive 13.2500* 3.00465 .001 5.0464 21.4536
Slightly attractive 6.5000 3.00465 .158 –1.7036 14.7036
Very attractive
Moderately attractive 4.1250 3.00465 .526 –4.0786 12.3286
Based on observed means.
The error term is Mean Square(error) = 36.112.
“Sig.” denotes the observed p value and provides the results
of the contrasts. There are only two statistically significant
contrasts. There is a statistically significant mean difference
between: (1) group 1 (unattractive) and group 3 (moderately
attractive); and (2) between group 1 (unattractive) and group 4
(very attractive). Note that there are only 6 unique contrast results:
The standard error calculated in SPSS
uses the harmonic mean
(Tukey-Kramer modification):
SΨ΄ = MSerror
1 1 +
2 n1 n2
1
SΨ΄ =
SΨ΄ = 9.028 3.00465=
Descriptive Statistics
Dependent Variable: Number of Statistics Labs Attended
Level of Attractiveness Mean Std. Deviation N
Unattractive 11.1250 5.48862 8
Slightly attractive 17.8750 5.93867 8
Moderately attractive 20.2500 7.28501 8
Very attractive 24.3750 5.09727 8
Total 18.4062 7.51283 32
Recall the means of the groups as
presented in the previous chapter.
1 +
2
1
8
1
8
+ 1
8
+ 1
8
36.112 However there are redundant results presented in the table. For
example, the comparison of group 1 and 2 (presented in results
row 1) is the same as the comparison of group 2 and 1 (presented
in results row 2).
½[J (J–1)] = ½[4 (4–1)] = ½(12) = 6.

366 An Introduction to Statistical Concepts
12.4 Template and APA-Style Write-Up
In�terms�of�an�APA-style�write-up,�the�MCP�results�for�the�Tukey�HSD�test�for�the�statistics�
lab�example�are�as�follows�
Recall� that� our� graduate� research� assistant,� Marie,� was� working� on� a� research� project�
for�an�independent�study�class�to�determine�if�there�was�a�mean�difference�in�the�number�
of� statistics� labs� attended� based� on� the� attractiveness� of� the� lab� instructor�� Her� research�
question�was�the�following:�Is there a mean difference in the number of statistics labs students
attended based on the attractiveness of the lab instructor?� Marie� then� generated� a� one-way�
ANOVA�as�the�test�of�inference��The�APA-style�example�paragraph�of�results�for�the�one-
way�ANOVA,�prefaced�by�the�extent�to�which�the�assumptions�of�the�test�were�met,�was�
presented�in�the�previous�chapter��Thus�only�the�results�of�the�MCP�(specifically�the�Tukey�
HSD)�are�presented�here�
Post hoc analyses were conducted given the statistically significant
omnibus ANOVA F test. Specifically, Tukey HSD tests were conducted
on all possible pairwise contrasts. The following pairs of groups
were found to be significantly different (p < .05): groups 1 (unat- tractive; M = 11.125, SD = 5.4886) and 3 (moderately attractive; M = 20.2500, SD = 7.2850), and groups 1 (unattractive) and 4 (very attrac- tive; M = 24.3750, SD = 5.0973). In other words, students enrolled in the least attractive instructor group attended statistically signifi- cantly fewer statistics labs than students enrolled in either of the two most attractive instructor groups. 12.5 Summary In� this� chapter,� methods� involving� the� comparison� of� multiple� group� means� for� a� single� independent� variable� were� considered�� The� chapter� began� with� a� look� at� the� characteristics� of� multiple� comparisons� including� (a)� the� definition� of� a� contrast,� (b) planned�and post�hoc�comparisons,�(c)�contrast-based�and�family-wise�Type�I�error� rates,�and (d)�orthogonal�contrasts��Next,�we�moved�into�a�lengthy�discussion�of�recom- mended�MCPs� Figure� 12�2� is� a� flowchart� to� assist� you� in� making� decisions� about� which� MCP� to� use�� Not�every�statistician�will�agree�with�every�decision�on�the�flowchart�as�there�is�not�total� consensus�about�which�MCP�is�appropriate�in�every�single�situation��Nonetheless,�this�is� simply�a�guide��Whether�you�use�it�in�its�present�form�or�adapt�it�for�your�own�needs,�we� hope�you�find�the�figure�to�be�useful�in�your�own�research� At� this� point,� you� should� have� met� the� following� objectives:� (a)� be� able� to� understand� the�concepts�underlying�the�MCPs,�(b)�be�able�to�select�the�appropriate�MCP�for�a�given� research�situation,�and�(c)�be�able�to�determine�and�interpret�the�results�of�MCPs��Chapter� 13�returns�to�ANOVA�again�and�discusses�models�for�which�there�is�more�than�one�inde- pendent�variable� 367Multiple Comparison Procedures Problems Conceptual problems 12.1� The�Tukey�HSD�procedure�requires�equal�n’s�and�equal�means��True�or�false? 12.2� Applying�the�Dunn�procedure,�given�a�nominal�family-wise�error�rate�of��10�and�two� contrasts,�what�is�the�per�contrast�alpha? � a�� 01 � b�� 05 � c�� 10 � d�� 20 Start Continuous? No Planned? Yes Orthogonal? No No No No No No No No No Yes Yes Yes Yes Yes Yes Yes Yes Yes Control only? Many contrasts? Dunnett Dunn (Bonferroni)/ Dunn–Sidak Stop Reject F ? Pairwise? Scheffe’ Kaiser– Bowden Tukey HSD/ Fisher LSD/ Hayter Equal variances? Equal n’s? Equal variances? Turkey–Kramer Games–Howell/ Dunnett T3/ Dunnett C POC Trend analysis FIGuRe 12.2 Flowchart�of�recommended�MCPs� 368 An Introduction to Statistical Concepts 12.3� �Which�of�the�following�linear�combinations�of�population�means�is�not�a�legitimate� contrast? � a�� (μ�1�+�μ�2�+�μ�3)/3�−�μ�4 � b�� μ�1�−�μ�4 � c�� (μ�1�+�μ�2)/2�−�(μ�3�+�μ�4) � d�� μ�1�−�μ�2�+�μ�3�−�μ�4 12.4� �When�a�one-factor�fixed-effects�ANOVA�results�in�a�significant�F�ratio�for�J�=�2,�one� should�follow�the�ANOVA�with�which�one�of�the�following�procedures? � a�� Tukey�HSD�method � b�� Scheffé�method � c�� Hayter�method � d�� None�of�the�above 12.5� �If� a� family-based� error� rate� for� α� is� desired,� and� hypotheses� involving� all� pairs� of� means� are� to� be� tested,� which� method� of� multiple� comparisons� should� be� selected? � a�� Tukey�HSD � b�� Scheffé � c�� Planned�orthogonal�contrasts � d�� Trend�analysis � e�� None�of�the�above 12.6� A�priori�comparisons�are�which�one�of�the�following? � a�� Are�planned�in�advance�of�the�research � b�� Often�arise�out�of�theory�and�prior�research � c�� May�be�done�without�examining�the�F�ratio � d�� All�of�the�above 12.7� �For� planned� contrasts� involving� the� control� group,� the� Dunn� procedure� is� most� appropriate��True�or�false? 12.8� Which�is�not�a�property�of�planned�orthogonal�contrasts? � a�� The�contrasts�are�independent� � b�� The�contrasts�are�post�hoc� � c�� The�sum�of�the�cross�products�of�the�contrast�coefficients�equals�0� � d�� If�there�are�J�groups,�there�are�J�−�1�orthogonal�contrasts� 12.9� Which�MCP�is�most�flexible�in�the�contrasts�that�can�be�tested? � a�� Planned�orthogonal�contrasts � b�� Newman–Keuls � c�� Dunnett � d�� Tukey�HSD � e�� Scheffé 369Multiple Comparison Procedures 12.10� Post�hoc�tests�are�necessary�after�an�ANOVA�given�which�one�of�the�following? � a�� H0�is�rejected� � b�� There�are�more�than�two�groups� � c�� H0�is�rejected�and�there�are�more�than�two�groups� � d�� You�should�always�do�post�hoc�tests�after�an�ANOVA� 12.11� �Post�hoc�tests�are�done�after�ANOVA�to�determine�why�H0�was�not�rejected��True�or� false? 12.12� �Holding�the�α�level�and�the�number�of�groups�constant,�as�the�dferror�increases,�the� critical�value�of�the�q�decreases��True�or�false? 12.13� �The�Tukey�HSD�procedure�maintains�the�family-wise�Type�I�error�rate�at�α��True�or�false? 12.14� �The�Dunnett�procedure�assumes�equal�numbers�of�observations�per�group��True�or� false? 12.15� �For�complex�post�hoc�contrasts�with�unequal�group�variances,�which�of�the�follow- ing�MCPs�is�most�appropriate? � a�� Kaiser–Bowden � b�� Dunnett � c�� Tukey�HSD � d�� Scheffé 12.16� �The�number�of�levels�of�the�independent�variable�is�6��How�many�orthogonal�con- trasts�can�be�tested? � a�� 1 � b�� 3 � c�� 5 � d�� 6 12.17� �A�researcher�is�interested�in�testing�the�following�contrasts�in�a�J�=�6�study:�group� 1�versus�2,�group�3�versus�4,�and�group�5�versus�6��I�assert�that�these�contrasts�are� orthogonal��Am�I�correct? 12.18� �I� assert� that� rejecting� H0� in� a� one-factor� fixed-effects� ANOVA� with� J� =� 3� indicates� that�all�three�pairs�of�group�means�are�necessarily�statistically�significantly�differ- ent�using�the�Scheffé�procedure��Am�I�correct? 12.19� �For�complex�post�hoc�contrasts�with�equal�group�variances,�which�of�the�following� MCPs�is�most�appropriate? � a�� Planned�orthogonal�contrasts � b�� Dunnett � c�� Tukey�HSD � d�� Scheffé 12.20� �A�researcher�finds�a�statistically�significant�omnibus�F�test��For�which�one�of�the� following�will�there�be�at�least�one�statistically�significant�MCP? � a�� Kaiser–Bowden � b�� Dunnett � c�� Tukey�HSD � d�� Scheffé 370 An Introduction to Statistical Concepts 12.21� �If�the�difference�between�two�sample�means�is�1000,�I�assert�that�H0�will�necessarily� be�rejected�with�the�Tukey�HSD��Am�I�correct? 12.22� �Suppose�all�J�=�4�of�the�sample�means�are�equal�to�100��I�assert�that�it�is�possible�to� find�a�significant�contrast�with�some�MCP��Am�I�correct? Computational problems 12.1� �A� one-factor� fixed-effects� ANOVA� is� performed� on� data� for� 10� groups� of� unequal� sizes,�and�H0�is�rejected�at�the��01�level�of�significance��Using�the�Scheffé�procedure,� test�the�contrast�that Y Y. .2 5 0− = at�the��01�level�of�significance�given�the�following�information:�dfwith�=�40,�Y – �2�=�10�8,� n2�=�8,�Y – �5�=�15�8,�n5�=�8,�and�MSwith�=�4� 12.2� �A�one-factor�fixed-effects�ANOVA�is�performed�on�data�from�three�groups�of�equal� size�(n�=�10),�and�H0�is�rejected�at�the��01�level��The�following�values�were�computed:� MSwith� =� 40� and� the� sample� means� are� Y – �1� =� 4�5,� Y – �2� =� 12�5,� and� Y – �3� =� 13�0�� Use� the� Tukey�HSD�method�to�test�all�possible�pairwise�contrasts� 12.3� �A�one-factor�fixed-effects�ANOVA�is�performed�on�data�from�three�groups�of�equal� size�(n�=�20),�and�H0�is�rejected�at�the��05�level��The�following�values�were�computed:� MSwith�=�60�and�the�sample�means�are�Y – �1�=�50,�Y – �2�=�70,�and�Y – �3�=�85��Use�the�Tukey� HSD�method�to�test�all�possible�pairwise�contrasts� 12.4� �Using�the�data�from�Chapter�11,�Computational�Problem�4,�conduct�a�trend�analysis� at�the��05�level� 12.5� �Consider�the�situation�where�there�are�J�=�4�groups�of�subjects��Answer�the�follow- ing�questions: � a�� Construct�a�set�of�orthogonal�contrasts�and�show�that�they�are�orthogonal� � b�� Is�the�following�contrast�legitimate?�Why�or�why�not? � H 0 : 2 3 4µ µ µ µ.1 . . .( )– + + � c�� Using�the�same�means,�how�might�the�contrast�in�part�(b)�be�altered�to�yield�a� legitimate�contrast? Interpretive problems 12.1� �For�the�interpretive�problem�you�selected�in�Chapter�11�(using�the�survey�1�dataset� on�the�website),�select�an�a�priori�MCP,�apply�it�using�SPSS,�and�write�an�APA-style� paragraph�describing�the�results� 12.2� �For�the�interpretive�problem�you�selected�in�Chapter�11�(using�the�survey�1�dataset� on�the�website),�select�a�post�hoc�MCP,�apply�it�using�SPSS,�and�write�an�APA-style� paragraph�describing�the�results� 371 13 Factorial Analysis of Variance: Fixed-Effects Model Chapter Outline 13�1� Two-Factor�ANOVA�Model � 13�1�1� Characteristics�of�the�Model � 13�1�2� Layout�of�Data � 13�1�3� ANOVA�Model � 13�1�4� Main�Effects�and�Interaction�Effects � 13�1�5� Assumptions�and�Violation�of�Assumptions � 13�1�6� Partitioning�the�Sums�of�Squares � 13�1�7� ANOVA�Summary�Table � 13�1�8� Multiple�Comparison�Procedures � 13�1�9� Effect�Size�Measures,�Confidence�Intervals,�and�Power 13�1�10� Example � 13�1�11� Expected�Mean�Squares 13�2� Three-Factor�and�Higher-Order�ANOVA � 13�2�1� Characteristics�of�the�Model � 13�2�2� ANOVA�Model � 13�2�3� ANOVA�Summary�Table�and�Example � 13�2�4� Triple�Interaction 13�3� Factorial�ANOVA�With�Unequal�n’s 13�4� SPSS�and�G*Power 13�5� Template�and�APA-Style�Write-Up Key Concepts � 1�� Main�effects � 2�� Interaction�effects � 3�� Partitioning�the�sums�of�squares � 4�� The�ANOVA�model � 5�� Main-effects�contrasts�and�simple�and�complex�interaction�contrasts � 6�� Nonorthogonal�designs 372 An Introduction to Statistical Concepts The�last�two�chapters�have�dealt�with�the�one-factor�analysis�of�variance�(ANOVA)�model�and� various�multiple�comparison�procedures�(MCPs)�for�that�model��In�this�chapter,�we�continue� our�discussion�of�ANOVA�models�by�extending�the�one-factor�case�to�the�two-�and�three-factor� models��This�chapter�seeks�an�answer�to�the�following�question:�What�should�we�do�if�there� are�multiple�factors�for�which�we�want�to�make�comparisons�of�the�means?�In�other�words,� the�researcher�is�interested�in�the�effect�of�two�or�more�independent�variables�or�factors�on�the� dependent�(or�criterion)�variable��This�chapter�is�most�concerned�with�two-�and�three-factor� models,�but�the�extension�to�more�than�three�factors,�when�warranted,�is�fairly�simple� For� example,� suppose� that� a� researcher� is� interested� in� the� effects� of� textbook� choice� and�time�of�day�on�statistics�achievement��Thus,�one�independent�variable�would�be�the� textbook�selected�for�the�course,�and�the�second�independent�variable�would�be�the�time� of�day�the�course�was�offered��The�researcher�hypothesizes�that�certain�texts�may�be�more� effective� in� terms� of� achievement� than� others� and� that� student� learning� may� be� greater� at� certain� times� of� the� day�� For� the� time-of-day� variable,� one� might� expect� that� students� would�not�do�as�well�in�an�early�morning�section�or�a�late�evening�section�than�at�other� times�of�the�day��In�the�example�study,�say�that�the�researcher�is�interested�in�comparing� three�textbooks�(A,�B,�and�C)�and�three�times�of�the�day�(early�morning,�mid-afternoon,� and�evening�sections)��Students�would�be�randomly�assigned�to�sections�of�statistics�based� on�a�combination�of�textbook�and�time�of�day��One�group�of�students�might�be�assigned�to� the�section�offered�in�the�evening�using�textbook�A��These�results�would�be�of�interest�to� statistics�instructors�for�selecting�a�textbook�and�optimal�time�of�the�day� Most�of�the�concepts�used�in�this�chapter�are�the�same�as�those�covered�in�Chapters�11� and�12��In�addition,�new�concepts�include�main�effects,�interaction�effects,�MCPs�for�main� and�interaction�effects,�and�nonorthogonal�designs��Our�objectives�are�that�by�the�end� of�this�chapter,�you�will�be�able�to�(a)�understand�the�characteristics�and�concepts�underly- ing�factorial�ANOVA,�(b)�determine�and�interpret�the�results�of�factorial�ANOVA,�and� (c)�understand�and�evaluate�the�assumptions�of�factorial�ANOVA� 13.1 Two-Factor ANOVA Model Marie,�the�educational�research�graduate�student�that�we�have�been�following,�successfully� conducted�an�experiment�and�used�(as�we�saw�in�a�previous�chapter)�one-way�ANOVA�to� answer�her�research�question��As�we�will�see�in�this�chapter,�Marie�will�be�extending�her� analysis�to�include�an�additional�independent�variable� As�we�learned�in�Chapter�11,�Marie�is�enrolled�in�an�independent�study�class��As�part�of�the� course�requirement,�she�was�required�to�complete�a�research�study��In�collaboration�with� the�statistics�faculty�in�her�program,�Marie�designed�an�experimental�study�to�determine�if� there�was�a�mean�difference�in�student�attendance�in�the�statistics�lab�based�on�the�attrac- tiveness�of�the�statistics�lab�instructor��Marie�had�also�included�an�additional�component�to� this�experiment—time�of�day�that�the�course�was�taken�(afternoon�or�evening)—and�she�is� now�ready�to�examine�these�data��Marie’s�research�question�is�the�following:�Is there a mean difference in the number of statistics labs attended by students based on the attractiveness of the lab instructor and time of day that the course is offered?�With�two�independent�variables,�Marie� determines�that�a�factorial�ANOVA�is�the�best�statistical�procedure�to�use�to�answer�her� question��Her�next�task�is�to�collect�and�analyze�the�data�to�address�her�research�question� 373Factorial Analysis of Variance: Fixed-Effects Model This�section�describes�the�distinguishing�characteristics�of�the�two-factor�ANOVA�model,� the�layout�of�the�data,�the�linear�model,�main�effects�and�interactions,�assumptions�of�the� model�and�their�violation,�partitioning�the�sums�of�squares,�the�ANOVA�summary�table,� MCPs,�effect�size�measures,�confidence�intervals�(CIs),�power,�an�example,�and�expected� mean�squares� 13.1.1 Characteristics of the Model The� first� characteristic� of� the� two-factor� ANOVA� model� should� be� obvious� by� now;� this� model�considers�the�effect�of�two�factors�or�independent�variables�on�a�dependent�variable�� Each�factor�consists�of�two�or�more�levels�(or�categories)��This�yields�what�we�call�a�facto- rial design�because�more�than�a�single�factor�is�included��We�see�then�that�the�two-factor� ANOVA�is�an�extension�of�the�one-factor�ANOVA��Why�would�a�researcher�want�to�compli- cate�things�by�considering�a�second�factor?�Three�reasons�come�to�mind��First,�the�researcher� may�have�a�genuine�interest�in�studying�the�second�factor��Rather�than�studying�each�fac- tor� separately� in� two� analyses,� the� researcher� includes� both� factors� in� the� same� analysis�� This� allows� a� test� not� only� of� the� effect� of� each� individual� factor,� known� as� main effects,� but� of� the� effect� of� both� factors� collectively�� This� latter� effect� is� known� as� an� interaction effect�and�provides�information�about�whether�the�two�factors�are�operating�independent� of�one�another�(i�e�,�no�interaction�exists)�or�whether�the�two�factors�are�operating�together� to�produce�some�additional�impact�(i�e�,�an�interaction�exists)��If�two�separate�analyses�were� conducted,�one�for�each�independent�variable,�no�information�would�be�obtained�about�the� interaction�effect��As�becomes�evident,�assuming�a�factorial�ANOVA�with�two�independent� variables,�the�researcher�will�test�three�hypotheses:�one�for�each�factor�or�main�effect�indi- vidually�and�a�third�for�the�interaction�between�the�factors��Factorial�ANOVA�models�with� more�than�two�independent�variables�will,�accordingly,�test�for�additional�main�effects�and� interactions��This�chapter�spends�considerable�time�discussing�interactions� A�second�reason�for�including�an�additional�factor�is�an�attempt�to�reduce�the�error�(or�within- groups)�variation,�which�is�variation�that�is�unexplained�by�the�first�factor��The�use�of�a�second� factor�provides�a�more�precise�estimate�of�error�variance��For�this�reason,�a�two-factor�design�is� generally�more�powerful�than�two�one-factor�designs,�as�the�second�factor�and�the�interaction� serve�to�control�for�additional�extraneous�variability��A�third�reason�for�considering�two�factors� simultaneously�is�to�provide�greater�generalizability�of�the�results�and�to�provide�a�more�effi- cient�and�economical�use�of�observations�and�resources��Thus,�the�results�can�be�generalized�to� more�situations,�and�the�study�will�be�more�cost�efficient�in�terms�of�time�and�money� In�addition,�for�the�two-factor�ANOVA,�every�level�of�the�first�factor�(hereafter�known� as�factor�A)�is�paired�with�every�level�of�the�second�factor�(hereafter�known�as�factor�B)�� In� other� words,� every� combination� of� factors� A� and� B� is� included� in� the� design� of� the� study,�yielding�what�is�referred�to�as�a�fully crossed design��If�some�combinations�are�not� included,�then�the�design�is�not�fully�crossed�and�may�form�some�sort�of�a�nested�design� (see�Chapter�16)��Individuals�(or�objects�or�subjects)�are�randomly�assigned�to�one�combi- nation�of�the�two�factors��In�other�words,�each�individual�responds�to�only�one�combina- tion�of�the�factors��If�individuals�respond�to�more�than�one�combination�of�the�factors,�this� would�be�some�sort�of�repeated�measures�design,�which�we�examine�in�Chapter�15��In�this� chapter,� we� only� consider� models� where� all� factors� are� fixed�� Thus,� the� overall� design� is� known�as�a�fixed-effects�model��If�one�or�both�factors�are�random,�then�the�design�is�not� a� fixed-effects� model,� which� we� discuss� in� Chapter� 15�� It� is� also� a� condition� for� factorial� ANOVA�that�the�dependent�variable�is�measured�at�least�at�the�interval�level�and�the�inde- pendent�variables�are�categorical�(either�nominal�or�ordinal)� 374 An Introduction to Statistical Concepts In�this�section�of�the�chapter,�for�simplicity�sake,�we�impose�the�restriction�that�the�num- ber�of�observations�is�the�same�for�each�factor�combination��This�yields�what�is�known�as� an�orthogonal�design,�where�the�effects�due�to�the�factors�(separately�and�collectively)�are� independent� or� unrelated�� We� leave� the� discussion� of� the� unequal� n’s� factorial� ANOVA� until�later�in�this�chapter��In�addition,�there�must�be�at�least�two�observations�per�factor� combination�so�as�to�have�within-groups�variation� In�summary,�the�characteristics�of�the�two-factor�ANOVA�fixed-effects�model�are�as�fol- lows:�(a)�two�independent�variables�(both�of�which�are�categorical)�each�with�two�or�more� levels,�(b)�the�levels�of�both�independent�variables�are�fixed�by�the�researcher,�(c)�subjects� are�randomly�assigned�to�only�one�combination�of�these�levels,�(d)�the�two�factors�are�fully� crossed,�and�(e)�the�dependent�variable�is�measured�at�least�at�the�interval�level��In�the�con- text�of�experimental�design,�the�two-factor�ANOVA�is�often�referred�to�as�the�completely randomized factorial design� 13.1.2 layout of data Before�we�get�into�the�theory�and�analysis�of�the�data,�let�us�examine�one�form�in�which�the� data� can� be� placed,� known� as� the� layout� of� the� data�� We� designate� each� observation� as� Yijk,� where�the�j�subscript�tells�us�what�level�(or�category)�of�factor�A�(e�g�,�textbook)�the�observa- tion�belongs�to,�the�k�subscript�tells�us�what�level�of�factor�B�(e�g�,�time�of�day)�the�observation� belongs� to,� and� the� i� subscript� tells� us� the� observation� or� identification� number� within� that� combination�of�factor�A�and�factor�B��For�instance,�Y321�would�mean�that�this�is�the�third�obser- vation�in�the�second�level�of�factor�A�and�the�first�level�of�factor�B��The�first�subscript�ranges� over�i�=�1,�…,�n;�the�second�subscript�ranges�over�j�=�1,�…,�J;�and�the�third�subscript�ranges�over� k�=�1,�…,�K��Note�also�that�the�latter�two�subscripts�denote�the�cell�of�an�observation��Using�the� same�example,�we�are�referring�to�the�third�observation�in�the�21�cell��Thus,�there�are�J�levels�of� factor�A,�K�levels�of�factor�B,�and�n�subjects�in�each�cell,�for�a�total�of�JKn = N�observations��For� now,�we�consider�the�case�where�there�are�n�subjects�in�each�cell�in�order�to�simplify�matters;� this�is�referred�to�as�the�equal�n’s�case��Later�in�this�chapter,�we�consider�the�unequal�n’s�case� The�layout�of�the�sample�data�is�shown�in�Table�13�1��Here�we�see�that�each�row�represents� the�observations�for�a�particular�level�of�factor�A�(textbook)�and�that�each�column�represents� the�observations�for�a�particular�level�of�factor�B�(time)��At�the�bottom�of�each�column�are�the� column�means�(Y – ��k ),�to�the�right�of�each�row�are�the�row�means�(Y – �j��),�and�in�the�lower�right- hand�corner�is�the�overall�mean�(Y – …)��We�also�need�the�cell�means�(Y – �jk ),�which�are�shown�at� the�bottom�of�each�cell��Thus,�the�layout�is�one�form�in�which�to�think�about�the�data� 13.1.3 aNOVa Model This�section�introduces�the�ANOVA�linear�model,�as�well�as�estimation�of�the�parameters� of�the�model��The�two-factor�ANOVA�model�is�a�form�of�the�general�linear�model�(GLM)� like� the� one-factor� ANOVA� model� of� Chapter� 11�� The� two-factor� ANOVA� fixed-effects� model�can�be�written�in�terms�of�population�parameters�as Yijk j k jk ijk= + + + +µ α β αβ ε( ) where Yijk�is�the�observed�score�on�the�criterion�(i�e�,�dependent)�variable�for�individual�i�in�level� j�of�factor�A�(e�g�,�text)�and�level�k�of�factor�B�(e�g�,�time)�(or�in�the�jk�cell) μ �is�the�overall�or�grand�population�mean�(i�e�,�regardless�of�cell�designation) αj�is�the�main�effect�for�level�j�of�factor�A�(row�or�text�effect) 375Factorial Analysis of Variance: Fixed-Effects Model βk�is�the�main�effect�for�level�k�of�factor�B�(column�or�time�effect) (αβ)jk�is�the�interaction�effect�for�the�combination�of�level�j�of�factor�A�and�level�k�of�factor�B εijk�is�the�random�residual�error�for�individual�i�in�cell�jk The� residual� error� can� be� due� to� individual� differences,� measurement� error,� and/or� other� factors�not�under�investigation� The�population�effects�and�residual�error�can�be�computed�as�follows: α µ µ β µ µ αβ µ µ µ µ ε µ j j k k jk jk j k ijk ijk jkY = − = − = − + − = − . . .. . . . .. . ( ) ( ) That�is,�the�row�effect�is�equal�to�the�difference�between�the�population�mean�of�level�j�of� factor�A�(a�particular�text)�and�the�overall�population�mean,�the�column�effect�is�equal�to� the�difference�between�the�population�mean�of�level�k�of�factor�B�(a�particular�time)�and� the�overall�population�mean,�the�interaction�effect�is�the�effect�of�being�in�a�certain�com- bination�of�the�levels�of�factor�A�and�factor�B�(a�particular�text�used�at�a�particular�time),� whereas� the� residual� error� is� equal� to� the� difference� between� an� individual’s� observed� Table 13.1 Layout�for�the�Two-Factor�ANOVA Level of Factor B Level of Factor A 1 2 … K Row Mean 1 Y111 Y112 … Y11K Y – �1� � � … � � … � � … Yn11 Yn12 … Yn1K — — — Y – �11 Y – �12 … Y – �1K 2 Y121 Y122 … Y12K Y – �2� � � … � � � … � � � … � Yn21 Yn22 … Yn2K — — — Y – �21 Y – �22 … Y – �2K � � � … � � J Y1J1 Y1J2 … Y1JK Y – �J� � � … � � � … � YnJ1 YnJ2 … YnJK — — — Y – �J1 Y – �J2 … Y – �JK Column�mean Y – ��1 Y – ��2 Y – ��K Y – … 376 An Introduction to Statistical Concepts score�and�the�population�mean�of�cell�jk��The�row,�column,�and�interaction�effects�can�also� be�thought�of�as�the�average�effect�of�being�a�member�of�a�particular�row�(i�e�,�a�student� who� is� assigned� to� textbook� A,� B,� or� C),� column� (i�e�,� a� student� who� attends� class� in� the� afternoon�or�evening),�or�cell�(i�e�,�a�student�assigned�to�textbook�A,�B,�or�C�who�attends� class� in� the� afternoon� or� evening),� respectively�� It� should� also� be� noted� that� the� sum� of� the row�effects�is�equal�to�0,�the�sum�of�the�column�effects�is�equal�to�0,�and�the�sum�of the� interaction� effects� is� equal� to� 0� (both� across� rows� and� across� columns)�� This� implies,� for� example,� that� if� there� are� any�nonzero� row� effects,� then�the� row� effects� will�balance� out� around�0�with�some�positive�and�some�negative�effects� You� may� be� wondering� why� the� interaction� effect� looks� a� little� different� than� the� main� effects��We�have�given�you�the�version�that�is�solely�a�function�of�population�means��A�more� intuitively�convincing�conceptual�version�of�this�effect�is�as�follows: ( ) .αβ µ α β µjk jk j k= − − − which�is�written�in�similar�fashion�to�the�row�and�column�effects��Here�we�see�that�the�inter- action�effect�[(αβ)jk]�is�equal�to�the�population�cell�mean�(μ�jk)�minus�the�following:�(a)�the�row� effect,�(αj);�(b)�the�column�effect,�(βk);�and�(c)�the�overall�population�mean,�(μ)��In�other�words,� the�interaction�is�solely�a�function�of�cell�means�without�regard�to,�or�controlling�for,�its�row� effect,�column�effect,�or�the�overall�mean� To�estimate�the�parameters�of�the�model�[μ,�αj,�βk,�(αβ)jk,�and�εijk],�the�least�squares�method� of�estimation�is�used�as�the�most�appropriate�for�GLMs�(e�g�,�regression,�ANOVA)��These� sample�estimates�are�represented�by�Y – …,�aj,�bk,�(ab)jk,�and�eijk,�respectively,�where�the�latter� four�are�computed�as�follows,�respectively: a Y Yj j= −� � �� b Y Yk k= −�� ( ) ( ). . . .. ...ab Y Y Y Yjk jk j k= − + − e Y Yijk ijk jk= − � Note�that Y – …�represents�the�overall�sample�mean Y – �j��represents�the�sample�mean�for�level�j�of�factor�A�(a�particular�text) Y – ��k�represents�the�sample�mean�for�level�k�of�factor�B�(a�particular�time) Y – �jk�represents�the�sample�mean�for�cell�jk�(a�particular�text�at�a�particular�time) For�the�two-factor�ANOVA�model,�there�are�three�sets�of�hypotheses,�one�for�each�of�the� main�effects�and�one�for�the�interaction�effect��The�null�and�alternative�hypotheses,�respec- tively,�for�testing�the�main�effect�of�factor�A�(text)�are�as�follows: H J01 1 2: . . . . . .µ µ µ= = … = H j11 not all the are equal: . .µ The�hypotheses�for�testing�the�main�effect�of�factor�B�(time)�are�noted�as�follows: H K02 1 2: .. .. ..µ µ µ= = … = H k12 not all the are equal: ..µ 377Factorial Analysis of Variance: Fixed-Effects Model Finally,�the�hypotheses�for�testing�the�interaction�effect�(text�with�time)�are�as�follows: H j kjk j k0 03 for all and: ( ). . . ..µ µ µ µ− − + = H jk j k13 not all the : ( ). . . ..µ µ µ µ− − + = 0 The�null�hypotheses�can�also�be�written�in�terms�of�row,�column,�and�interaction�effects� (which�may�make�more�intuitive�sense�to�you)�as H J0 01 1 2: α α α= = … = = H K0 02 1 2: β β β= = … = = H j kjk0 03 for all and: ( )αβ = As�in�the�one-factor�model,�all�of�the�alternative�hypotheses�are�written�in�a�general�form� to� cover� the� multitude� of� possible� mean� differences� that� could� arise�� These� range� from� only�two�of�the�means�being�different�to�all�of�the�means�being�different�from�one�another�� Also,�because�of�the�way�the�alternative�hypotheses�have�been�written,�only�a�nondirec- tional�alternative�is�appropriate��If�one�of�the�null�hypotheses�is�rejected,�then�consider�an� MCP�so�as�to�determine�which�means,�or�combination�of�means,�are�significantly�different� (discussed�later)� 13.1.4 Main effects and Interaction effects Finally� we� come� to� a� formal� discussion� of� main� effects� and� interaction� effects�� A� main effect�of�factor�A�(text)�is�defined�as�the�effect�of�factor�A,�averaged�across�the�levels�of�fac- tor�B�(time),�on�the�dependent�variable�Y�(achievement)��More�precisely,�it�represents�the� unique�effect�of�factor�A�on�the�outcome�Y,�controlling�statistically�for�factor�B��A�similar� statement�may�be�made�for�the�main�effect�of�factor�B� As�far�as�the�concept�of�interaction�is�concerned,�things�are�a�bit�more�complex��An�interac- tion�can�be�defined�in�any�of�the�following�ways:�An�interaction�is�said�to�exist�if�(a)�certain� combinations�of�the�two�factors�produce�effects�beyond�the�effects�of�the�two�factors�when� those�two�factors�are�considered�separately;�(b)�the�mean�differences�among�the�levels�of�factor� A�are�not�constant�across,�and�thus�depend�on,�the�levels�of�factor�B;�(c)�there�is�a�joint�effect�of� factors�A�and�B�on�Y;�or�(d)�there�is�a�unique�effect�that�could�not�be�predicted�from�knowledge� of�only�the�main�effects��Let�us�mention�two�fairly�common�examples�of�interaction�effects�� The�first�is�known�as�an�aptitude-treatment�interaction�(ATI)��This�means�that�the�effective- ness�of�a�particular�treatment�depends�on�the�aptitude�of�the�individual��In�other�words,�some� treatments�are�more�effective�for�individuals�with�a�high�aptitude,�and�other�treatments�are� more� effective� for� those� with� a� low� aptitude�� A� second� example� is� an� interaction� between� treatment�and�gender��Here�some�treatments�may�be�more�effective�for�males,�and�others�may� be�more�effective�for�females��This�is�often�considered�in�gender�studies�research� For�some�graphical�examples�of�main�and�interaction�effects,�take�a�look�at�the�various� plots�in�Figure�13�1��Each�plot�represents�the�graph�of�a�particular�set�of�cell�means�(the� mean�of�the�dependent�variable�for�a�cell—the�combination�of�a�particular�category�of�fac- tor�A�and�a�particular�category�of�factor�B),�sometimes�referred�to�as�a�profile plot��On�the� X�axis�are�the�levels�of�factor�A�(text),�the�Y�axis�provides�the�cell�means�on�the�dependent� variable�Y�(achievement),�and�the�separate�lines�in�the�body�of�the�plot�represent�the�lev- els�of�factor�B�(time)�(although�the�specific�placement�of�the�two�factors�here�is�arbitrary;� 378 An Introduction to Statistical Concepts alternatively�factor�B�could�be�plotted�on�the�X�axis,�and�factor�A,�as�the�separate�lines)�� Profile� plots� provide� information� about� the� possible� existence� of� a� main� effect� for� A,� a� main�effect�for�B,�and/or�an�interaction�effect��A�main�effect�for�factor�A�can�be�examined� by�taking�the�means�for�each�level�of�A�and�averaging�them�across�the�levels�of�B��If�these� marginal�means�for�the�levels�of�A�are�the�same�or�nearly�so,�this�would�indicate�no�main� effect�for�factor�A��A�main�effect�for�factor�B�can�be�assessed�by�taking�the�means�for�each� level�of�B�and�averaging�them�across�the�levels�of�A��If�these�marginal�means�for�the�levels� of�B�are�the�same�or�nearly�so,�this�would�imply�no�main�effect�for�factor�B��An�interaction� 27 Y Y 25 35 15 (c) (d) (a) (b) 1 2 A 1 2 A 1 2 A 1 2 A 32 30 22 20 Y 35 15 YB 1 2 B 1 2 B 1 2 B 1 2 Y Y 30 24 20 14 26 20 14 1 2 (g) (h) (e) (f ) A 1 2 A 1 2 A 1 2 A Y Y 25 15 27 25 17 15 B 1 2 B 1 2 B 1 2 B 1 2 FIGuRe 13.1 Display�of�possible�two-factor�ANOVA�effects� 379Factorial Analysis of Variance: Fixed-Effects Model effect�is�determined�by�whether�the�cell�means�for�the�levels�of�A�are�constant�across�the� levels�of�B�(or�vice�versa)��This�is�easily�viewed�in�a�profile�plot�by�checking�to�see�whether� or�not�the�lines�are�parallel��Parallel�lines�indicate�no�interaction,�whereas�nonparallel�lines� suggest�that�an�interaction�may�exist��Of�course,�the�statistical�significance�of�the�main�and� interaction�effects�is�a�matter�to�be�determined�by�the�F�test�statistics�(coming�up)��The�pro- file�plots�only�give�you�a�rough�idea�as�to�the�possible�existence�of�the�effects��For�instance,� lines� that� are� nearly� parallel� will� probably� not� show� up� as� a� significant� interaction�� It� is� suggested�that�the�plot�can�be�simplified�if�the�factor�with�the�most�levels�is�shown�on�the� X�axis��This�cuts�down�on�the�number�of�lines�drawn� The�plots�shown�in�Figure�13�1�represent�the�eight�different�sets�of�results�possible�for� a�two-factor�design,�that�is,�from�no�effects�to�all�three�effects�being�evident��To�simplify� matters,�only�two�levels�of�each�factor�are�used��Figure�13�1a�indicates�that�there�is�no�main� effect� either� for� factor� A� or� B,� and� there� is� no� interaction� effect�� The� lines� are� horizontal� (no�A�effect),�lie�nearly�on�top�of�one�another�(no�B�effect),�and�are�parallel�(no�interaction� effect)��Figure�13�1b�suggests�the�presence�of�an�effect�due�to�factor�A�only�(the�lines�are�not� horizontal�because�the�mean�for�A1�is�greater�than�the�mean�for�A2),�but�are�nearly�on�top�of� one�another�(no�B�effect)�and�are�parallel�(no�interaction)��In�Figure�13�1c,�we�see�a�separa- tion�between�the�lines�for�the�levels�of�B�(B1�being�greater�than�B2);�thus,�a�main�effect�for�B� is�likely,�but�the�lines�are�horizontal�(no�A�effect)�and�are�parallel�(no�interaction)� For�Figure�13�1d,�there�are�no�main�effects�(the�means�for�the�levels�of�A�are�the�same,� and�the�means�for�the�levels�of�B�are�the�same),�but�an�interaction�is�indicated�by�the�lack� of�parallel�lines��Figure�13�1e�suggests�a�main�effect�for�both�factors�as�shown�by�mean�dif- ferences�(A1�less�than�A2,�and�B1�greater�than�B2),�but�no�interaction�(the�lines�are�parallel)�� In�Figure�13�1f,�we�see�a�main�effect�for�A�(A1�less�than�A2)�and�an�interaction�effect,�but�no� main�effect�for�B�(little�separation�between�the�lines�for�factor�B)��For�Figure�13�1g,�there� appear�to�be�a�main�effect�for�B�(B1�greater�than�B2)�and�an�interaction,�but�no�main�effect� for�A��Finally,�in�Figure�13�1h,�we�see�the�likelihood�of�two�main�effects�(A1�less�than�A2,� and�B1�greater�than�B2)�and�an�interaction��Although�these�are�clearly�the�only�possible�out- comes�from�a�two-factor�design,�the�precise�pattern�will�differ�depending�on�the�obtained� cell�means��In�other�words,�if�your�study�yields�a�significant�effect�only�for�factor�A,�your� profile�plot�need�not�look�exactly�like�Figure�13�1b,�but�it�will�retain�the�same�general�pattern� and�interpretation� In�many�statistics�texts,�a�big�deal�is�made�about�the�type�of�interaction�shown�in�the�pro- file�plot��They�make�a�distinction�between�an�ordinal�interaction�and�a�disordinal�interac- tion��An�ordinal�interaction�is�said�to�exist�when�the�lines�are�not�parallel�and�they�do�not� cross;�ordinal�here�means�the�same�relative�order�of�the�cell�means�is�maintained�across�the� levels�of�one�of�the�factors��For�example,�the�means�for�level�1�of�factor�B�are�always�greater� than�the�means�for�level�2�of�B,�regardless�of�the�level�of�factor�A��A�disordinal�interaction� is�said�to�exist�when�the�lines�are�not�parallel�and�they�do�cross��For�example,�the�mean� for�B1�is�greater�than�the�mean�for�B2�at�A1,�but�the�opposite�is�true�at�A2��Dwelling�on�the� distinction�between�the�two�types�of�interaction�is�not�recommended�as�it�can�depend�on� how�the�plot�is�drawn�(i�e�,�which�factor�is�plotted�on�the�X�axis)��That�is,�when�factor�A�is� plotted�on�the�X�axis,�a�disordinal�interaction�may�be�shown,�and�when�factor�B�is�plotted� on�the�X�axis,�an�ordinal�interaction�may�be�shown��The�purpose�of�the�profile�plot�is�to� simplify�interpretation�of�the�results;�worrying�about�the�type�of�interaction�may�merely� serve�to�confuse�that�interpretation� Let� us� take� a� moment� to� discuss� how� to� deal� with� an� interaction� effect�� Consider� two� possible�situations,�one�where�there�is�a�significant�interaction�effect�and�one�where�there� is� no� such� effect�� If� there� is� no� significant� interaction� effect,� then� the� findings� regarding� 380 An Introduction to Statistical Concepts the� main� effects� can� be� generalized� with� greater� confidence�� In� this� situation,� the� main� effects� are� known� as� additive effects,� and� an� additive� linear� model� with� no� interaction� term�could�actually�be�used�to�describe�the�data��For�example,�the�results�might�be�that�for� factor�A,�the�level�1�means�always�exceed�those�of�level�2�by�10�points,�across�all�levels�of� factor�B��Thus,�we�can�make�a�blanket�statement�about�the�constant�added�benefits�of�A1� over�A2,�regardless�of�the�level�of�factor�B��In�addition,�for�the�no-interaction�situation,�the� main�effects�are�statistically�independent�of�one�another;�that�is,�each�of�the�main�effects� serves�as�an�independent�predictor�of�Y� If�there�is�a�significant�interaction�effect,�then�the�findings�regarding�the�main�effects� cannot� be� generalized� with� such� confidence�� In� this� situation,� the� main� effects� are� not� additive,� and� the� interaction� term� must� be� included� in� the� linear� model�� For� example,� the�results�might�be�that�(a)�the�mean�for�A1�is�greater�than�A2�when�considering�B1,�but� (b)�the�mean�for�A1�is�less�than�A2�when�considering�B2��Thus,�we�cannot�make�a�blanket� statement�about�the�constant�added�benefits�of�A1�over�A2,�because�it�depends�on�the�level� of�factor�B��In�addition,�for�the�interaction�situation,�the�main�effects�are�not�statistically� independent�of�one�another;�that�is,�each�of�the�main�effects�does�not�serve�as�an�indepen- dent�predictor�of�Y��In�order�to�predict�Y�well,�information�is�necessary�about�the�levels�of� factors�A�and�B��Thus,�in�the�presence�of�a�significant�interaction,�generalizations�about� the� main� effects� must� be� qualified�� A� profile� plot� should� be� examined� so� that� a� proper� graphical� interpretation� of� the� interaction� and� main� effects� can� be� made�� A� significant� interaction�serves�as�a�warning�that�one�cannot�generalize�statements�about�a�main�effect� for�A�over�all�levels�of�B��If�you�obtain�a�significant�interaction,�this�is�an�important�result�� Do�not�ignore�it�and�go�ahead�to�interpret�the�main�effects� 13.1.5 assumptions and Violation of assumptions In�Chapter�11,�we�described�in�detail�the�assumptions�for�the�one-factor�ANOVA��In�the� two-factor�model,�the�assumptions�are�again�concerned�with�independence,�homogeneity� of�variance,�and�normality��A�summary�of�the�effects�of�their�violation�is�provided�in�Table� 13�2��The�same�methods�for�detecting�violations�described�in�Chapter�11�can�be�used�for� this�model� There� are� only� two� different� wrinkles� for� the� two-factor� model� as� compared� to� the� one-factor� model�� First,� as� the� effect� of� heterogeneity� is� small� with� balanced� designs� Table 13.2 Assumptions�and�Effects�of�Violations�for�the�Two-Factor�ANOVA�Design Assumption Effect of Assumption Violation 1��Independence •��Increased�likelihood�of�a�Type�I�and/or�Type�II�error�in� the�F�statistic •��Influences�standard�errors�of�means�and�thus� inferences�about�those�means 2��Homogeneity�of�variance •�Bias�in�SSwith •�Increased�likelihood�of�a�Type�I�and/or�Type�II�error •�Less�effect�with�balanced�or�nearly�balanced�design •�Effect�decreases�as�n�increases 3��Normality •�Minimal�effect�with�moderate�violation •�Minimal�effect�with�balanced�or�nearly�balanced�design •�Effect�decreases�as�n�increases 381Factorial Analysis of Variance: Fixed-Effects Model (equal� n’s� per� cell)� or� nearly� balanced� designs,� and/or� with� larger� n’s,� this� is� a� reason� to� strive�for�such�a�design��Unfortunately,�there�is�very�little�research�on�this�problem,�except� the�classic�Box�(1954b)�article�for�a�no-interaction�model�with�one�observation�per�cell��There� are�limited�solutions�for�dealing�with�a�violation�of�the�homogeneity�assumption,�such�as�the� Welch�(1951)�test,�the�Johansen�(1980)�procedure,�and�variations�described�by�Wilcox�(1996,� 2003)��Transformations�are�not�usually�used,�as�they�may�destroy�an�additive�linear�model� and�create�interactions�that�did�not�previously�exist��Nonparametric�techniques�are�not�com- monly�used�with�the�two-factor�model,�although�see�the�description�of�the�Brunner,�Dette,� and�Munk�(1997)�procedure�in�Wilcox�(2003)��Second,�the�effect�of�nonnormality�seems�to�be� the�same�as�heterogeneity�(Miller,�1997)� 13.1.6 partitioning the Sums of Squares As�pointed�out�in�Chapter�11,�partitioning�the�sums�of�squares�is�an�important�concept�in� ANOVA��We�will�illustrate�with�a�two-factor�model,�but�this�can�be�extended�to�more�than� two� factors�� Let� us� begin� with� the� total� sum� of� squares� in� Y,� denoted� here� as� SStotal�� The� term�SStotal�represents�the�amount�of�total�variation�among�all�of�the�observations�without� regard�to�row,�column,�or�cell�membership��The�next�step�is�to�partition�the�total�variation� into�variation�between�the�levels�of�factor�A�(denoted�by�SSA),�variation�between�the�levels� of�factor�B�(denoted�by�SSB),�variation�due�to�the�interaction�of�the�levels�of�factors�A�and�B� (denoted�by�SSAB),�and�variation�within�the�cells�combined�across�cells�(denoted�by�SSwith)�� In�the�two-factor�ANOVA,�then,�we�can�partition�SStotal�into � SS SS SS SS SStotal A B AB with= + + + Then� computational� formulas� are� used� by� statistical� software� to� actually� compute� these� sums�of�squares� 13.1.7 aNOVa Summary Table The� next� step� is� to� assemble� the� ANOVA� summary� table�� The� purpose� of� the� summary� table�is�to�simply�summarize�ANOVA��A�general�form�of�the�summary�table�for�the�two- factor�model�is�shown�in�Table�13�3��The�first�column�lists�the�sources�of�variation�in�the� model��We�note�that�the�total�variation�is�divided�into�a�within-groups�source,�and�a�gen- eral�between-groups�source,�which�is�then�subdivided�into�sources�due�to�A,�B,�and�the�AB� interaction��This�is�in�keeping�with�the�spirit�of�the�one-factor�model,�where�total�variation� was�divided�into�a�between-groups�source�(just�one�effect�because�there�is�only�one�factor� and� no� interaction� term)� and� a� within-groups� source�� The� second� column� provides� the� computed�sums�of�squares� Table 13.3 Two-Factor�ANOVA�Summary�Table Source SS df MS F A SSA J�−�1 MSA MSA/MSwith B SSB K�−�1 MSB MSB/MSwith AB SSAB (J�−�1)(K�−�1) MSAB MSAB/MSwith Within SSwith N − JK MSwith Total SStotal N�−�1 382 An Introduction to Statistical Concepts The�third�column�gives�the�degrees�of�freedom�for�each�source��As�always,�degrees�of�free- dom�have�to�do�with�the�number�of�observations�that�are�free�to�vary�in�a�particular�context�� Because�there�are�J�levels�of�factor�A,�then�the�number�of�degrees�of�freedom�for�the�A�source� is�equal�to�J�−�1��As�there�are�J�means�and�we�know�the�overall�mean,�then�only�J�−�1�of�the� means�are�free�to�vary��This�is�the�same�rationale�we�have�been�using�throughout�this�text�� As�there�are�K�levels�of�factor�B,�there�are�K�−�1�degrees�of�freedom�for�the�B�source��For�the� AB�interaction�source,�we�take�the�product�of�the�degrees�of�freedom�for�the�main�effects�� Thus,�we�have�as�degrees�of�freedom�for�AB�the�product�(J�−�1)(K�−�1)��The�degrees�of�freedom� within� groups� are� equal� to� the� total� number� of� observations� minus� the� number� of� cells,� N − JK��Finally,�the�degrees�of�freedom�total�can�be�written�simply�as�N�−�1� Next,�the�sum�of�squares�terms�are�weighted�by�the�appropriate�degrees�of�freedom�to� generate�the�mean�squares�terms��Thus,�for�instance,�MSA�=�SSA/dfA��Finally,�in�the�last�col- umn�of�the�ANOVA�summary�table,�we�have�the�F�values,�which�represent�the�summary� statistics�for�ANOVA��There�are�three�hypotheses�that�we�are�interested�in�testing,�one�for� each�of�the�two�main�effects�and�one�for�the�interaction�effect,�so�there�will�be�three�F�test� statistics��For�the�factorial�fixed-effects�model,�each�F�value�is�computed�by�taking�the�MS� for�the�source�that�you�are�interested�in�testing�and�dividing�it�by�MSwith��Thus,�for�each� hypothesis,�the�same�error�term�is�used�in�forming�the�F�ratio�(i�e�,�MSwith)��We�return�to�the� two-factor�model�for�cases�where�the�effects�are�not�fixed�in�Chapter�15� Each�of�the�F�test�statistics�is�then�compared�with�the�appropriate�F�critical�value�so�as� to�make�a�decision�about�the�relevant�null�hypothesis��These�critical�values�are�found�in� the�F�table�of�Table�A�4�as�follows:�for�the�test�of�factor�A�as��αFJ−1,N−JK;�for�the�test�of�factor� B� as� αFK−1,N−JK;� and� for� the� test� of� the� interaction� as� αF(J−1)(K−1),N−JK�� Thus,� with� a� two-factor� model,�testing�two�main�effects�and�one�interaction,�there�are�three�F�tests�and�three�deci- sions�that�must�be�made��Each�significance�test�is�one-tailed�so�as�to�be�consistent�with�the� alternative�hypothesis��The�null�hypothesis�is�rejected�if�the�F�test�statistic�exceeds�the� F�critical�value� Recall�that�these�F�tests�are�omnibus�tests�that�tell�only�if�there�is�an�overall�main�effect� or�interaction�effect��If�the�F�test�statistic�does�exceed�the�F�critical�value,�and�there�is�more� than�one�degree�of�freedom�for�the�source�being�tested,�then�it�is�not�clear�precisely�why�the� null�hypothesis�was�rejected��For�example,�if�there�are�three�levels�of�factor�A�and�the�null� hypothesis�for�A�is�rejected,�then�we�are�not�sure�where�the�mean�differences�lie�among� the�levels�of�A��In�this�case,�some�MCP�should�be�used�to�determine�where�the�mean�differ- ences�are;�this�is�the�topic�of�the�next�section� 13.1.8 Multiple Comparison procedures In�this�section,�we�extend�the�concepts�related�to�multiple�comparison�procedures�(MCPs)� covered�in�Chapter�12�to�the�two-factor�ANOVA�model��This�model�includes�main�and�inter- action�effects;�consequently�you�can�examine�contrasts�of�both�main�and�interaction�effects�� In�general,�the�procedures�described�in�Chapter�12�can�be�applied�to�the�two-factor�situation�� Things�become�more�complicated�as�we�have�row�and�column�means�(i�e�,�marginal�means)� and�cell�means��Thus,�we�have�to�be�careful�about�which�means�are�being�considered� Let�us�begin�with�contrasts�of�the�main�effects��If�the�effect�for�factor�A�is�significant,�and� there�are�more�than�two�levels�of�factor�A,�then�we�can�form�contrasts�that�compare�the� levels�of�factor�A�ignoring�factor�B��Here�we�would�be�comparing�the�means�for�the�levels� of�factor�A,�which�are�marginal�means�as�opposed�to�cell�means��Considering�each�factor� separately� is� strongly� advised;� considering� the� factors� simultaneously� is� to� be� avoided�� Some� statistics� texts� suggest� that� you� consider� the� design� as� a� one-factor� model� with� JK� 383Factorial Analysis of Variance: Fixed-Effects Model levels�when�using�MCPs�to�examine�main�effects��This�is�inconsistent�with�the�design�and� the�intent�of�separating�effects,�and�is�not�recommended� For� contrasts� involving� the� interaction,� our� recommendation� is� to� begin� with� a� complex� interaction�contrast�if�there�are�more�than�four�cells�in�the�model��Thus,�for�example,�in�a�4�×�4� design�that�consists�of�four�levels�of�factor�A�(method�of�instruction)�and�four�levels�of�fac- tor�B�(instructor),�one�possibility�is�to�test�both�4�×�2�complex�interaction�contrasts��An�example� of�one�such�contrast�is�as�follows�[where,�e�g�,�(Y – �11�+�Y – �21�+�Y – �31�+�Y – �41)�is�the�sum�of�the�cell�means� of�each�level�of�factor�A�for�level�1�of�factor�B�and�(Y – �12�+�Y – �22�+�Y – �32�+�Y – �42)�is�the�sum�of�the�cell� means�of�each�level�of�factor�A�for�level�2�of�factor�B]: Ψ ’ ( ) 4 ( ) 4 . . . . . . . .= + + + − + + +Y Y Y Y Y Y Y Y11 21 31 41 12 22 32 42 with�a�standard�error�of�the�following: s MS c n jk jkk K j J Ψ’ =         == ∑∑with 2 11 where�njk�is�the�number�of�observations�in�cell�jk��This�contrast�would�examine�the�inter- action� between� the� four� methods� of� instruction� and� the� first� two� instructors�� A� second� complex�interaction�contrast�could�consider�the�interaction�between�the�four�methods�of� instruction�and�the�other�two�instructors� If�the�complex�interaction�contrast�is�significant,�then�follow�this�up�with�a�simple�inter- action� contrast� that� involves� only� four� cell� means�� This� is� a� single� degree� of� freedom� contrast�because�it�involves�only�two�levels�of�each�factor�(known�as�a�tetrad difference)�� An�example�of�such�a�contrast�is�the�following: Ψ ’ ( ) ( ). . . .= − − −Y Y Y Y11 21 12 22 with�a�similar�standard�error�term��Using�the�same�example,�this�contrast�would�examine� the�interaction�between�the�first�two�methods�of�instruction�and�the�first�two�instructors� Most�of�the�MCPs�described�in�Chapter�12�can�be�used�for�testing�main�effects�and�inter- action�effects�(although�there�is�some�debate�about�the�appropriate�use�of�interaction�con- trasts;�see�Boik,�1979;�Marascuilo�&�Levin,�1970,�1976)��Keppel�and�Wickens�(2004)�consider� interaction�contrasts�in�much�detail��Finally,�some�statistics�texts�suggest�the�use�of�simple� main�effects�in�testing�a�significant�interaction��These�involve�comparing,�for�example,�the� levels�of�factor�A�at�a�particular�level�of�factor�B�and�are�generally�conducted�by�further� partitioning�the�sums�of�squares��However,�the�simple�main�effects�sums�of�squares�repre- sent�a�portion�of�a�main�effect�plus�the�interaction�effect��Thus,�the�simple�main�effect�does� not�really�help�us�to�understand�the�interaction,�and�is�not�recommended�here� 13.1.9 effect Size Measures, Confidence Intervals, and power Various� measures� of� effect� size� have� been� proposed�� Let� us� examine� two� commonly� used� measures,�which�assume�equal�variances�across�the�cells��First�is�partial�eta�squared,�η2,�which� represents�the�proportion�of�variation�in�Y�explained�by�the�effect�of�interest�(i�e�,�by�factor�A� 384 An Introduction to Statistical Concepts or�factor�B�or�the�AB�interaction)��This�is�the�estimate�of�effect�size�that�can�be�requested�when� using�SPSS�for�factorial�ANOVA��We�determine�partial�η2�as�follows: partial SS SS SS ηA A A with 2 = + partial SS SS SS ηB B B with 2 = + partial SS SS SS ηAB AB AB with 2 = + Another�effect�size�measure�is�the�omega�squared�statistic,�ω2��We�can�determine�ω2�as�follows: ωA A with total with 2 1= − − + SS J MS SS MS ( ) ωB B with total with 2 1= − − + SS K MS SS MS ( ) ωAB AB with total with 2 1 1= − − − + SS J K MS SS MS ( )( ) Using�Cohen’s�(1988)�subjective�standards,�these�effect�sizes�can�be�interpreted�as�follows:�small� effect,�η2�or�ω2�=��01;�medium�effect,�η2�or�ω2�=��06;�and�large�effect,�η2�or�ω2�=��14��For�further�dis- cussion,�see�Keppel�(1982),�O’Grady�(1982),�Wilcox�(1987),�Cohen�(1988),�Fidler�and�Thompson� (2001),�Keppel�and�Wickens�(2004),�and�Murphy,�Myors,�and�Wolach�(2008;�with�software)� As�mentioned�in�Chapter�11,�CIs�can�be�used�for�providing�interval�estimates�of�a�popu- lation�mean�or�mean�difference;�this�gives�us�information�about�the�accuracy�of�a�sample� estimate�� In� the� case� of� the� two-factor� model,� we� can� form� CIs� for� row� means,� column� means,�cell�means,�the�overall�mean,�as�well�as�any�possible�contrast�formed�through�an� MCP�� Note� also� that� CIs� have� been� developed� for� η2� and� ω2� (Fidler� &� Thompson,� 2001;� Smithson,�2001)� As�also�mentioned�in�Chapter�11,�power�can�be�determined�either�in�the�planned�(a�pri- ori)� or� observed� (post� hoc)� power� context�� For� planned� power,� we� typically� use� tables� or� power�charts�(e�g�,�Cohen,�1988,�or�Murphy�et�al�,�2008)�or�software�(e�g�,�Power�and�Precision,� Ex-Sample,�G*Power,�or�Murphy�et�al��software,�2008)��These�are�particularly�useful�in�terms� of�determining�adequate�sample�sizes�when�designing�a�study��Observed�power�is�reported� by�statistics�software,�such�as�SPSS,�to�indicate�the�actual�power�in�a�given�study� 13.1.10 example Consider�the�following�illustration�of�the�two-factor�design��Here�we�expand�on�the�exam- ple�presented�in�Chapter�11�by�adding�a�second�factor�to�the�model��Our�dependent�vari- able�will�again�be�the�number�of�times�a�student�attends�statistics�lab�during�one�semester� (or�quarter),�factor�A�is�the�attractiveness�of�the�lab�instructor�(assuming�each�instructor� is�of�the�same�gender�and�is�equally�competent),�and�factor�B�is�the�time�of�day�the�lab�is� offered��Thus,�the�researcher�is�interested�in�whether�the�attractiveness�of�the�instructor,� 385Factorial Analysis of Variance: Fixed-Effects Model the� time� of� day,� or� the� interaction� of� attractiveness� and� time� influences� student� atten- dance� in� the� statistics� lab�� The� attractiveness�levels� are� defined� again� as� (a)� unattractive,� (b) slightly�attractive,�(c)�moderately�attractive,�and�(d)�very�attractive��The�time�of�day�lev- els�are�defined�as�(a)�afternoon�lab�and�(b)�evening�lab��Students�were�randomly�assigned� to�a�combination�of�lab�instructor�and�lab�time�at�the�beginning�of�the�semester,�and�atten- dance� was� taken� by� the� instructor�� There� were� four� students� in� each� cell� and� eight� cells� (four�levels�of�attractiveness�and�two�categories�of�time,�thus�4�×�2�or�eight�combinations�of� instructor�and�time)�for�a�total�of�32�observations��Students�could�attend�a�maximum�of�30� lab�sessions��Table�13�4�depicts�the�raw�data�and�sample�means�for�each�cell�(given�beneath� each�cell),�column,�row,�and�overall� The� results� are� summarized� in� the� ANOVA� summary� table� as� shown� in� Table� 13�5�� The�F�test�statistics�are�compared�to�the�following�critical�values�obtained�from�Table� A�4�(α�=��05):��05F3,24�=�3�01�for�the�A�(i�e�,�attractiveness)�and�AB�(i�e�,�attractiveness-time� of�day)�effects,�and��05F1,24�=�4�26�for�the�B�(time�of�day)�effect��The�test�statistics�exceed� the�critical�values�for�the�A�and�B�effects�only,�so�we�can�reject�these�H0�and�conclude� that�both�the�level�of�attractiveness�and�the�time�of�day�are�related�to�mean�differences� in�statistics�lab�attendance��The�interaction�was�shown�not�to�be�a�significant�effect��If�you� would�like�to�see�an�example�of�a�two-factor�design�where�the�interaction�is�significant,� take�a�look�at�the�end�of�chapter�problems,�Computational�Problem�13�5� Table 13.4 Data�for�the�Statistics�Lab�Example:�Number�of�Statistics�Labs� Attended,�by�Level�of�Attractiveness�and�Time�of�Day Time of Day Level of Attractiveness Afternoon Evening Row Mean Unattractive 15 10 11�1250 12 8 21 7 13 3 15�2500 7�0000 Slightly�attractive 20 13 17�8750 22 9 24 18 25 12 22�7500 13�0000 Moderately�attractive 24 10 20�2500 29 12 27 21 25 14 26�2500 14�2500 Very�attractive 30 22 24�3750 26 20 29 25 28 15 28�2500 20�5000 Column mean 23�1250 13�6875 18�4063� (overall mean) 386 An Introduction to Statistical Concepts Next�we�estimate�the�main�and�interaction�effects��The�main�effects�for�the�levels�of�A� are�estimated�to�be�the�following: Unattractive:�a Y Y1 1 11 1250 18 4063 7 2813= − = − = −. . ... . . . Slightly�attractive:�a2�=�Y – �2��−�Y – …�=�17�8750�−�18�4063�=�−�0�5313 Moderately�attractive:�a3�=�Y – �3��−�Y – …�=�20�2500�−�18�4063�=�1�8437 Very�attractive:�a4�=�Y – �4��−�Y – …�=�24�3750�−�18�4063�=�5�9687 The�main�effects�for�the�levels�of�B�(time�of�day)�are�estimated�to�be�as�follows: Afternoon:�b1�=�Y – ��1�−�Y – …�=�23�1250�−�18�4063�=�4�7187 Evening:�b2�=�Y – ��2�−�Y – …�=�13�6875�−�18�4063�=�−�4�7187 Finally,�the�interaction�effects�for�the�combinations�of�the�levels�of�factors�A�(attractiveness)� and�B�(time�of�day)�are�as�follows: ( ) ( ) . ( . . .. . . .. ...ab Y Y Y Y11 11 1 1 15 2500 11 1250 23 1250 18 4= − + − = − + − 0063 0 5937) .= − ( ) ( ) . ( . . .. . . .. ...ab Y Y Y Y12 12 1 2 7 0000 11 1250 13 6875 18 40= − + − = − + − 663 0 5938) .= ( ) ( ) . ( . . .. . . .. ...ab Y Y Y Y21 21 2 1 22 7500 17 8750 23 1250 18 4= − + − = − + − 0063 0 1563) .= ( ) ( ) . ( . . .. . . .. ...ab Y Y Y Y22 22 2 2 13 0000 17 8750 13 6875 18 4= − + − = − + − 0063 0 1562) .= − ( ) ( ) . ( . . .. . . .. ...ab Y Y Y Y31 31 3 1 26 2500 20 2500 23 1250 18 4= − + − = − + − 0063 1 2813) .= ( ) ( ) . ( . . .. . . .. ...ab Y Y Y Y32 32 3 2 14 2500 20 2500 13 6875 18 4= − + − = − + − 0063 1 2813) .= − ( ) ( ) . ( . . .. . . .. ...ab Y Y Y Y41 41 4 1 28 2500 24 3750 23 1250 18 4= − + − = − + − 0063 0 8437) .= − ( ) ( ) . ( . . .. . . .. ...ab Y Y Y Y42 42 4 2 20 5000 24 3750 13 6875 18 4= − + − = − + − 0063 0 8438) .= The�profile�plot�shown�in�Figure�13�2�graphically�depicts�these�effects��The�main�effect�for� attractiveness�(factor�A)�was�statistically�significant�and�has�more�than�two�levels,�so�let�us� Table 13.5 Two-Factor�ANOVA�Summary�Table—Statistics� Lab�Example Source SS df MS F A 738�5938 3 246�1979 21�3504a B 712�5313 1 712�5313 61�7911b AB 21�8438 3 7�2813 0�6314a Within 276�7500 24 11�5313 Total 1749�7188 31 a� �05F3,24�=�3�01� b� �05F1,24�=�4�26� 387Factorial Analysis of Variance: Fixed-Effects Model consider�one�example�of�an�MCP,�the�Tukey�HSD�test��Recall�from�Chapter�12�that�the�HSD� test�is�a�family-wise�procedure�most�appropriate�for�considering�all�pairwise�contrasts�with� a�balanced�design�(which�is�the�case�for�these�data)��The�following�are�the�computations: Critical�value�(obtained�from�Table�A�9): α q qdf Jwith( ) = =, . , .0 05 24 4 3 9 1 Standard�error: s MS n Ψ’ . .= = =with 11 5313 8 1 2006 Test�statistics: q Y Y s 1 4 1 24 3750 11 1250 1 2006 11 0361= − = − =. . . . ’ . . . . Ψ q Y Y s 2 4 2 24 3750 17 8750 1 2006 5 4140= − = − =. . . . ’ . . . . Ψ q Y Y s 3 4 3 24 3750 20 2500 1 2006 3 4358= − = − =. . . . ’ . . . . Ψ 30.00 25.00 20.00 15.00 Es tim at ed m ar gi na l m ea ns 10.00 5.00 Unattractive Slightly attractive Level of attractiveness Moderately attractive Very attractive Estimated marginal means of number of statistics labs attended Time of day Afternoon Evening FIGuRe 13.2 Profile�plot�for�example�data� 388 An Introduction to Statistical Concepts q Y Y s 4 3 1 20 2500 11 1250 1 2006 7 6004= − = − =. . . . ’ . . . . Ψ q Y Y s 5 3 2 20 2500 17 8750 1 2006 1 9782= − = − =. . . . ’ . . . . Ψ q Y Y s 6 2 1 17 8750 11 1250 1 2006 5 6222= − = − =. . . . ’ . . . . Ψ Recall� that� we� compare� the� test� statistic� value� to� the� critical� value� to� make� our� hypoth- esis�testing�decision��If�the�test�statistic�value�exceeds�the�critical�value,�we�reject�the�null� hypothesis�and�conclude�that�those�means�differ��For�these�tests,�the�results�indicate�that� the�means�for�the�levels�of�factor�A�(attractiveness)�are�statistically�significantly�different� for�levels�1�and�4�(i�e�,�the�test�statistic�value�is�11�0361,�and�the�critical�value�is�3�901),�2�and� 4,�1�and�3,�and�1�and�2��Thus,�level�1�(unattractive)�is�significantly�different�from�the�other� three�levels�of�attractiveness,�and�levels�2�and�4�(slightly�unattractive�vs��very�attractive)� are�also�significantly�different��The�only�levels�that�are�not�statistically�different�are�levels�2� and�3�(q5�=�1�9782)�and�levels�3�and�4�(q3�=�3�4358)� These� results� are� somewhat� different� than� those� found� with� the� one-factor� model� in� Chapters�11�and�12�(where�the�significantly�different�levels�were�only�1�vs��4�and�1�vs��3)�� The�MSwith�has�been�reduced�with�the�introduction�of�the�second�factor�from�36�1116�to� 11�5313�because�SSwith�has�been�reduced�from�1011�1250�to�276�7500��Although�the�SS�and� MS� for� the� attractiveness� factor� remain� unchanged,� this� resulted� in� the� F� test� statistic� being� considerably� larger� (increased� from� 6�8177� to� 21�3504),� although� observed� power� was�quite�high�in�both�models��Recall�that�this�is�one�of�the�benefits�we�mentioned�ear- lier�about�the�use�of�additional�factors�in�the�model��Also,�although�the�effect�of�factor� B� (time� of� day)� was� significant,� there� are� only� two� levels� of� time� of� day,� and,� thus,� we� need�not�carry�out�any�multiple�comparisons�(attendance�is�better�in�the�afternoon�sec- tion)��Finally,�since�the�interaction�was�not�significant,�it�is�not�necessary�to�consider�any� related�contrasts� Finally�we�can�estimate�the�effect�size�measures��The�partial�η2’s�are�determined�to�be� the�following: ηA A A with 2 738 5938 738 5938 276 7500 0 7274= + = + = SS SS SS . . . . ηB B B with 2 712 5313 712 5313 276 7500 0 7203= + = + = SS SS SS . . . . ηAB AB AB with 2 21 8438 21 8438 276 7500 0 0732= + = + = SS SS SS . . . . We�calculate�ω2�to�be�the�following: ωA A with total with 2 1 738 5938 3 11 5313 1749 = − − + = −SS J MS SS MS ( ) . ( ) . .77188 11 5313 0 3997 + = . . 389Factorial Analysis of Variance: Fixed-Effects Model ωB B with total with 2 1 712 5313 1 11 5313 1749 = − − + = −SS K MS SS MS ( ) . ( ) . .77188 11 5313 0 3980 + = . . ωAB AB with total with 2 1 1 21 8438 3 11 531= − − − + = −SS J K MS SS MS ( )( ) . ( ) . 33 1749 7188 11 5313 0 . .+ = Based� on� these� effect� size� measures,� one� would� conclude� that� there� is� a� large� effect� for� instructor� attractiveness� and� for� time� of� day,� but� no� effect� for� the� time-attractiveness� interaction� 13.1.11 expected Mean Squares As�we�asked�in�Chapter�11�for�the�one-factor�fixed-effects�model,�for�the�two-factor�fixed- effects� model� being� considered� here,� we� again� ask� the� question,� “How do we know which source of variation to use as the error term in the denominator”?�That�is,�for�the�two-factor�fixed- effects�ANOVA�model,�how�did�we�know�to�use�MSwith�as�the�error�term�in�testing�for�the� main� effects� and� the� interaction� effect?� As� we� learned� in� Chapter� 11,� an� expected� mean� square�for�a�particular�source�of�variation�represents�the�average�mean�square�value�for� that�source�obtained�if�the�same�study�were�to�be�replicated�an�infinite�number�of�times�� For�instance,�the�expected�value�of�MSA,�denoted�by�E(MSA),�is�the�average�value�of�MSA� over�repeated�samplings� Let� us� examine� what� the� expected� mean� square� terms� actually� look� like� for� our� two- factor� fixed-effects� model�� Consider� the� two� situations� of� (a)� all� of� the� H0� actually� being� true� and� (b)� all� of� the� H0� actually� being� false�� If� all� of� the� H0� are� actually� true,� such� that� there�really�are�no�main�effects�or�an�interaction�effect,�then�the�expected�mean�squares� are�as�follows: E MSA( ) = 2σε E MSB( ) = σε2 E MSAB( ) = σε2 E MSwith( ) = σε2 and�thus�using�MSwith�as�the�error�term�will�produce�F�values�around�1� If�all�of�the�H0�are�actually�false,�such�that�there�really�are�main�effects�and�an�interaction� effect,�then�the�expected�mean�squares�are�as�follows: E /M nK Jj j J A( ) = +         − = ∑σ αε2 2 1 1( ) E /MS nJ Kk k K B( ) = +       − = ∑σ βε2 2 1 1( ) 390 An Introduction to Statistical Concepts E /MS n J Kjk k K j J AB( ) = +         − − == ∑∑σ αβε2 2 11 1 1( ) ( )( ) E MSwith( ) = σε2 and�thus�using�MSwith�as�the�error�term�will�produce�F�values�greater�than�1� There�is�a�difference�in�the�main�and�interaction�effects�between�when�H0�is�actually�true� as�compared�to�when�H0�is�actually�false�because�in�the�latter�situation,�there�is�a�second� term��The�important�parts�of�this�second�term�are�α,�β,�and�αβ,�which�represent�the�effects� for�A,�B,�and�AB,�respectively��The�larger�this�part�becomes,�the�larger�the�F�ratio�becomes�� In�comparing�the�two�situations,�we�also�see�that�E(MSwith)�is�the�same�whether�H0�is�actu- ally� true� or� false,� and� thus� represents� a� reliable� estimate� of� σε 2�� This� term� is� mean-free� because�it�does�not�depend�on�any�mean�differences� Finally�let�us�put�all�of�this�information�together��In�general,�the�F�ratio�represents F = +( ) /(systematic variability error variability error variabiility) where,�for�the�two-factor�fixed-effects�model,�systematic�variability�is�variability�due�to�the� main�or�interaction�effects�(i�e�,�between�sources)�and�error�variability�is�variability�within�� The�F�ratio�is�formed�in�a�particular�way�because�we�want�to�isolate�the�systematic�vari- ability�in�the�numerator��For�this�model,�the�only�appropriate�error�term�to�use�for�each�F� ratio�is�MSwith�because�it�does�serve�to�isolate�the�systematic�variability� 13.2 Three-Factor and Higher-Order ANOVA 13.2.1 Characteristics of the Model All� of� the� characteristics� we� discussed� for� the� two-factor� model� apply� to� the� three-fac- tor�model,�with�one�obvious�exception��There�are�three�factors�rather�than�two��This�will� result� in� three� main� effects� (one� for� each� factor,� known� as� A,� B,� and� C),� three� two-way� interactions�(known�as�AB,�AC,�and�BC),�and�one�three-way�interaction�(known�as�ABC)�� The�only�new�concept�is�the�three-way�interaction,�which�may�be�stated�as�follows:�“Is�the� AB�interaction�constant�across�all�levels�of�factor�C”?�This�may�also�be�stated�as�“AC�across� the�levels�of�B”�or�as�“BC�across�the�levels�of�A�”�These�each�have�the�same�interpretation� as�there�is�only�one�way�of�testing�the�three-way�interaction��In�short,�the�three-way�inter- action�can�be�thought�of�as�the�two-way�interaction�behaving�differently�across�the�levels� of�the�third�factor� We� do� not� explicitly� consider� models� with� more� than� three� factors� (cf�,� Keppel� &� Wickens,� 2004;� Marascuilo� &� Serlin,� 1988;� Myers� &� Well,� 1995)�� However,� be� warned� that� such� models� do� exist� and� that� they� will� necessitate� more� main� effects,� more� two- way�interactions,�more�three-way�interactions,�as�well�as�higher-order�interactions—and� thus�more�complex�interpretations��Conceptually,�the�only�change�is�to�add�these�addi- tional�effects�to�the�model� 391Factorial Analysis of Variance: Fixed-Effects Model 13.2.2 aNOVa Model The�model�for�the�three-factor�design�is Yijkl j k l jk jl kl jkl ijkl= + + + + + + + +µ α β γ αβ αγ βγ αβγ ε( ) ( ) ( ) ( ) where Yijkl�is�the�observed�score�on�the�criterion�(i�e�,�dependent)�variable�for�individual�i�in�level�j� of�factor�A,�level�k�of�factor�B,�and�level�l�of�factor�C�(or�in�the�jkl�cell) μ�is�the�overall�or�grand�population�mean�(i�e�,�regardless�of�cell�designation) αj�is�the�effect�for�level�j�of�factor�A βk�is�the�effect�for�level�k�of�factor�B γl�is�the�effect�for�level�l�of�factor�C (αβ)jk� is� the� interaction� effect� for� the� combination� of� level� j� of� factor� A� and� level� k� of� factor�B (αγ)jl�is�the�interaction�effect�for�the�combination�of�level�j�of�factor�A�and�level�l�of�fac- tor�C (βγ)kl�is�the�interaction�effect�for�the�combination�of�level�k�of�factor�B�and�level�l�of�factor�C (αβγ)jkl�is�the�interaction�effect�for�the�combination�of�level�j�of�factor�A,�level�k�of�factor�B,� and�level�l�of�factor�C εijkl�is�the�random�residual�error�for�individual�i�in�cell�jkl Given� that� there� are� three� main� effects,� three� two-way� interactions,� and� one� three-way� interaction,�there�will�be�accompanying�null�and�alternative�hypotheses�for�each�of�these� effects�� At� this� point� in� your� statistics� career,� the� hypotheses� should� be� obvious� (simply� expand�on�the�hypotheses�at�the�beginning�of�this�chapter)� 13.2.3 aNOVa Summary Table and example The� ANOVA� summary� table� for� the� three-factor� model� is� shown� in� Table� 13�6,� with� the� usual� columns� for� sources� of� variation,� sums� of� squares,� degrees� of� freedom,� mean� squares,�and�F��A�quick�three-factor�example�dataset�and�the�resulting�ANOVA�summary� table�from�SPSS�are�shown�in�Table�13�7��Note�that�the�only�statistically�significant�effects� are�the�main�effect�for�B�and�the�AC�interaction�(p�<��01)� Table 13.6 Three-Factor�ANOVA�Summary�Table Source SS df MS F A SSA J�−�1 MSA MSA/MSwith B SSB K�−�1 MSB MSB/MSwith C SSC L�−�1 MSC MSC/MSwith AB SSAB (J�−�1)(K�−�1) MSAB MSAB/MSwith AC SSAC (J�−�1)(L�−�1) MSAC MSAC/MSwith BC SSBC (K�−�1)(L�−�1) MSBC MSBC/MSwith ABC SSABC (J�−�1)(K�−�1)(L�−�1) MSABC MSABC/MSwith Within SSwith N − JKL MSwith Total SStotal N�−�1 392 An Introduction to Statistical Concepts Table 13.7 Three-Factor�Analysis�of�Variance�Example–Raw�Data�and�SPSS�ANOVA�Summary�Table Raw�Data: A1B1C1:�8,�10,�12,�9 A1B1C2:�23,�17,�21,�19 A1B2C1:�22,�19,�16,�24 A1B2C2:�33,�31,�27,�30 A2B1C1:�16,�19,�21,�24 A2B1C2:�6,�8,�11,�13 A2B2C1:�27,�30,�31,�33 A2B2C2:�16,�19,�21,�25 SPSS ANOVA Summary Table: Source Type III Sum of Squares df Mean Square F Sig. A .031 1 .031 .004 .953 B 871.531 1 871.531 100.200 .000 C .031 1 .031 .004 .953 A * B .031 1 .031 .004 .953 A * C 830.281 1 830.281 95.457 .000 B * C .031 1 .031 .004 .953 A * B * C .281 1 .281 .032 .859 Error 208.750 24 8.698 Corrected total 1910.969 31 The row labeled “A” is the first independent variable or factor or between groups variable. The between groups mean square for factor A (.031) provides an indication of the variation in the dependent variable attributable to factor A. The degrees of freedom for the sum of squares between groups for factor A is J – 1 (df = 1 in this example indicating 2 levels for factor A). Similar interpretations are made for the other main effects and interactions. The omnibus F test for the main effect for factor A (and computed similarly for the other main effects and interactions) is computed as F = = = .004 MSA .031 8.698MSwith The p value for the omnibus F test of the main effect for factor A is .953. This indicates there is not a statistically significant difference in the dependent variable based on factor A, averaged across the levels of Factors B and C. In other words, there is not a unique effect of factor A on the dependent variable, controlling for factors B and C. The probability of observing these mean differences or more extreme mean differences by chance if the null hypothesis is really true (i.e., if the population means really are equal) is about 95%. We fail to reject the null hypothesis that the population means of factor A are equal. For this example, this provides evidence to suggest that the dependent variable does not differ, on average, across the levels of factor A, when controlling for factors B and C. The row labeled “Error” is within groups. The within groups sum of squares tells us how much variation there is within the cells combined across the cells (i.e., 208.750). The degrees of freedom for the sum of squares within groups is (N – JKL) or the sample size minus the number of levels of the independent variables [i.e., 32 – (2)(2)(2) = 24]. The row labeled “corrected total” is the sum of squares total. The degrees of freedom for the total is (N – 1) or the sample size minus one. 393Factorial Analysis of Variance: Fixed-Effects Model 13.2.4 Triple Interaction Everything�else�about�the�three-factor�design�follows�from�the�two-factor�model��The� assumptions�are�the�same,�MSwith�is�the�error�term�used�for�testing�each�of�the�hypoth- eses�in�the�fixed-effects�model,�and�the�MCPs�are�easily�utilized��The�main�new�feature� is�the�three-way�interaction��If�this�interaction�is�significant,�then�this�means�that�the� two-way� interaction� is� different� across� the� levels� of� the� third� factor�� This� result� will� need� to� be� taken� into� account� prior� to� interpreting� the� two-way� interactions� and� the� main�effects� Although�the�inclusion�of�additional�factors�in�the�design�should�result�in�a�reduc- tion�in�MSwith,�there�is�a�price�to�pay�for�the�study�of�additional�factors��Although�the� analysis� is� simple� for� the� computer,� you� must� consider� the� possibility� of� significant� higher-order�interactions��If�you�find,�for�example,�that�the�four-way�interaction�is�sig- nificant,� how� do� you� deal� with� it?� First� you� have� to� interpret� this� interaction,� which� could� be� difficult� if� it� is� unexpected�� Then� you� may� have� difficulty� in� dealing� with� the� interpretation� of� your� other� effects�� Our� advice� is� simple�� Do� not� include� addi- tional�factors�just�because�they�sound�interesting��Only�include�those�factors�that�are� theoretically� or� empirically� important�� Then� if� a� significant� higher-order� interaction� occurs,�you�will�be�in�a�better�position�to�understand�it�because�you�will�have�already� thought� about� its� consequences�� Reporting� that� an� interaction� is� significant,� but� not� interpretable,�is�not�sound�research�(for�additional�discussion�on�this�topic,�see�Keppel� &�Wickens,�2004)� 13.3 Factorial ANOVA With Unequal n’s Up�until�this�point�in�the�chapter,�we�have�only�considered�the�equal�n’s�or�balanced�case�� That�is,�the�model�used�was�where�the�number�of�observations�in�each�cell�was�equal��This� served�to�make�the�formulas�and�equations�easier�to�deal�with��However,�we�do�not�need� to�assume�that�the�n’s�are�equal��In�this�section,�we�discuss�ways�to�deal�with�the�unequal� n’s� (or� unbalanced)� case� for� the� two-factor� model,� although� these� notions� can� be� trans- ferred�to�higher-order�models�as�well� When�n’s�are�unequal,�things�become�a�bit�trickier�as�the�main�effects�and�the�interaction� effect�are�not�orthogonal��In�other�words,�the�sums�of�squares�cannot�be�partitioned�into� independent�effects,�and,�thus,�the�individual�SS�do�not�necessarily�add�up�to�the�SStotal�� As�a�result,�several�computational�approaches�have�been�developed��In�the�old�days,�prior� to�the�availability�of�high-speed�computers,�the�standard�approach�was�to�use�unweighted� means� analysis�� This� is� essentially� an� analysis� of� means,� rather� than� raw� scores,� which� are�unweighted�by�cell�size��This�approach�is�only�an�approximate�procedure��Due�to�the� availability� of� quality� statistical� software,� the� unweighted� means� approach� is� no� longer� necessary�� A� rather� silly� approach,� and� one� that� we� do� not� condone,� is� to� delete� enough� data�until�you�have�an�equal�n’s�model� There�are�three�more�modern�approaches�to�this�case��Each�of�these�approaches�really� tests�different�hypotheses�and�thus�may�result�in�different�results�and�conclusions:�(a)�the� sequential approach�(also�known�as�the�hierarchical� sums�of�squares�approach),�(b)�the� partially sequential approach�(also�known�as�the�partially�hierarchical,�or�experimental� design,� or� method� of� fitting� constants� approach),� and� (c)� the� regression approach� (also� known�as�the�marginal�means�or�unique�approach)��There�has�been�considerable�debate� 394 An Introduction to Statistical Concepts over� the� years� about� the� relative� merits� of� each� approach� (e�g�,� Applebaum� &� Cramer,� 1974;� Carlson� &� Timm,� 1974;� Cramer� &� Applebaum,� 1980;� Overall,� Lee,� &� Hornick,� 1981;� Overall &�Spiegel,�1969;�Timm�&�Carlson,�1975)��In�the�following,�we�describe�what�each� approach�is�actually�testing� In�the�sequential�approach,�the�effects�being�tested�are�as�follows: α µ| β µ α| , αβ µ α β| , , This� indicates,� for� example,� that� the� effect� for� factor� B� (β)� is� adjusted� or� controls� for� (as� denoted�by�the�vertical�line)�the�overall�mean�(μ)�and�the�main�effect�due�to�factor�A�(α)�� Thus,� each� effect� is� adjusted� for� prior� effects� in� the� sequential� order� given� (i�e�,� α,� β,� αβ)�� Here�the�α�effect�is�given�theoretical�or�practical�priority�over�the�β�effect��In�SAS�and�SPSS,� this�is�the�Type I sum of squares�method� In�the�partially�sequential�approach,�the�effects�being�tested�are�as�follows: α µ β| , β µ α| , αβ µ α β| , , There�is�difference�here�because�each�main�effect�controls�for�the�other�main�effect,�but�not� for�the�interaction�effect��In�SAS�and�SPSS,�this�is�the�Type II sum of squares�method��This�is� the�only�one�of�the�three�methods�where�the�sums�of�squares�will�add�up�to�the�total�sum� of�squares��Notice�in�the�sequential�and�partially�sequential�approaches�that�the�interac- tion�is�not�taken�into�account�in�estimating�the�main�effects,�which�is�only�fine�if�there�is� no�interaction�effect� In�the�regression�approach,�the�effects�being�tested�are�as�follows: α µ β αβ| , , β µ α αβ| , , αβ µ α β| , , In�this�approach,�each�effect�controls�for�each�of�the�other�effects��In�SAS�and�SPSS,�this� is�the�Type III sum of squares�method�(and�is�the�default�selection�in�SPSS)��Many�statisti- cians�(e�g�,�Glass�&�Hopkins,�1996;�Keppel�&�Wickens,�2004;�Mickey,�Dunn,�&�Clark,�2004),� including�the�authors�of�this�text,�recommend�exclusive�use�of�the�regression�approach� because� each� effect� is� estimated� taking� the� other� effects� into� account�� The� hypotheses� tested�in�the�sequential�and�partially�sequential�approaches�are�seldom�of�interest�and� are�difficult�to�interpret�(Carlson�&�Timm,�1974;�Kirk,�1982;�Overall�et�al�,�1981;�Timm�and� Carlson,� 1975)�� The� regression� approach� seems� to� be� conceptually� closest� to� the� tradi- tional�ANOVA�in�that�each�effect�is�estimated�controlling�for�all�other�effects��When�the� n’s� are� equal,� each� of� these� three� approaches� tests� the� same� hypotheses� and� yields� the� same�results� 395Factorial Analysis of Variance: Fixed-Effects Model 13.4 SPSS and G*Power Next�we�consider�the�use�of�SPSS�for�the�statistics�lab�example��Instructions�for�determin- ing�the�factorial�ANOVA�using�SPSS�are�presented�first,�followed�by�additional�steps�for� examining� the� assumptions� for� factorial� ANOVA�� Finally� we� examine� a� priori� and� post� hoc�power�for�this�model�using�G*Power� Factorial ANOVA In� this� section,� we� take� a� look� at� SPSS� for� the� statistics� lab� example�� As� already� noted� in� Chapter�11,�SPSS�needs�the�data�to�be�in�a�specific�form�for�the�analysis�to�proceed,�which�is� different�from�the�layout�of�the�data�in�Table�13�1��For�a�two-factor�ANOVA,�the�dataset�must� consist�of�three�variables�or�columns,�one�for�the�level�of�factor�A,�one�for�the�level�of�factor�B,� and�the�third�for�the�dependent�variable��Each�row�still�represents�one�individual,�indicating� the�levels�of�factors�A�and�B�that�individual�is�a�member�of,�and�their�score�on�the�dependent� variable��As�seen�in�the�following�screenshot,�for�a�two-factor�ANOVA,�the�SPSS�data�are�in� the�form�of�two�columns�that�represent�the�group�values�(i�e�,�the�two�independent�variables)� and�one�column�that�represents�the�scores�or�values�of�the�dependent�variable� The first independent variable is labeled “Group” where each value represents the attractiveness of the statistics lab instructor to which the student was assigned. Group 1, you recall, represented “unattractive”.Thus there were eight students randomly assigned to an “unattractive” instructor. Since each of these eight students was in the same group, each is coded with the same value (1, which represents that they were assigned to an “unattractive” instructor). The other groups (2, 3, and 4) follow this pattern as well. The second independent variable is labeled “Time” where each value represents the time of day of the course. One represents “afternoon” and two represents “evening.” The dependent variable is “Labs” and represents the number of statistics labs the student attended. 396 An Introduction to Statistical Concepts Step 1:�To�conduct�a�factorial�ANOVA,�go�to�“Analyze”�in�the�top�pulldown�menu,�then� select�“General Linear Model,”�and�then�select�“Univariate.”�Following�the�screen- shot�(Step�1)�that�follows�produces�the�“Univariate”�dialog�box� A B C Factorial ANOVA: Step 1 Step 2:�Click�the�dependent�variable�(e�g�,�number�of�statistics�labs�attended)�and�move� it�into�the�“Dependent Variable”�box�by�clicking�the�arrow�button��Click�the�first�inde- pendent�variable�(e�g�,�level�of�attractiveness)�and�move�it�into�the�“Fixed Factors”�box� by�clicking�the�arrow�button��Follow�this�same�step�to�move�the�second�independent�vari- able�into�the�“Fixed Factors”�box��Next,�click�on�“Options.” Select the dependent variable from the list on the left and use the arrow to move to the “Dependent Variable” box on the right. Select the independent variables from the list on the left and use the arrow to move to the “Fixed Factor(s)” box on the right. Clicking on “Contrasts” will allow you to conduct certain planned MCPs. Clicking on “Plots” will allow you to generate profile plots. Clicking on “Post Hoc” will allow you to generate post hoc MCPs. Clicking on “Save” will allow you to save various forms of residuals, among other variables. Clicking on “Options” will allow you to obtain a number of other statistics (e.g., descriptive statistics, effect size, power, homogeneity tests). Factorial ANOVA: Step 2 397Factorial Analysis of Variance: Fixed-Effects Model Step 3:� Clicking� on�“Options”� will� provide� the� option� to� select� such� information� as� “Descriptive Statistics,” “Estimates of effect size,” “Observed power,”� “Homogeneity tests”� (i�e�,� Levene’s� test� for� equal� variances),� and�“Spread versus level plots”�(those�are�the�options�that�we�typically�utilize)��Click�on�“Continue”�to� return�to�the�original�dialog�box� Select from the list on the left those variables that you wish to display means for and use the arrow to move to the “Display Means for” box on the right. Factorial ANOVA: Step 3 Step 4:� From� the�“Univariate”� dialog� box,� click� on�“Plots”� to� obtain� a� profile� plot� of� means�� Click� the� independent� variable� (e�g�,� level� of� attractiveness� labeled� as� “Group”)� and� move�it�into�the�“Horizontal Axis”�box�by�clicking�the�arrow�button�(see�screenshot�step� 4a)��(Tip: Placing the independent variable that has the most categories or levels on the horizontal axis of the profile plots will make for easier interpretation of the graph.)�Then�click�the�second�independent� variable�(e�g�,�“Time”)�and�move�it�into�the�“Separate Lines”�box�by�clicking�the�arrow�but- ton�(see�screenshot�Step�4a)��Then�click�on�“Add”�to�move�the�variable�into�the�“Plots”�box� at�the�bottom�of�the�dialog�box�(see�screenshot�Step�4b)��Click�on�“Continue”�to�return�to�the� original�dialog�box� 398 An Introduction to Statistical Concepts Select one independent variable from the list on the left and use the arrow to move it to the “Horizontal Axis” box on the right. Factorial ANOVA: Step 4a Select the second independent variable and use the arrow to move it to the “Separate Lines” box on the right. �en click “Add” to move the variable into the “Plots” box at the bottom. Factorial ANOVA: Step 4b Step 5:�From�the�“Univariate”�dialog�box,�click�on�“Post Hoc”�to�select�various�post� hoc�MCPs�or�click�on�“Contrasts”�to�select�various�planned�MCPs�(see�screenshot�Step�1)�� From�the�“Post Hoc Multiple Comparisons for Observed Means”�dialog�box,�click� on�the�names�of�the�independent�variables�in�the�“Factor(s)”�list�box�in�the�top�left�(e�g�,� “Group”�and�“Time”)�and�move�them�to�the�“Post Hoc Tests for”�box�in�the�top�right� by�clicking�on�the�arrow�key��Check� an� appropriate� MCP� for� your� situation� by� placing� a� checkmark�in�the�box�next�to�the�desired�MCP��In�this�example,�we�will�select�“Tukey�”�Click� on�“Continue”�to�return�to�the�original�dialog�box� 399Factorial Analysis of Variance: Fixed-Effects Model Select the independent variables of interest from the list on the left and use the arrow to move to the “Post Hoc Tests for” box on the right. MCPs for instances when the homogeneity of variance assumption is met. Factorial ANOVA: Step 5 MCPs for instances when the homogeneity of variance assumption is not met. Step 6:�From�the�“Univariate”�dialog�box,�click�on�“Save”�to�select�those�elements� that�you�want�to�save�(in�our�case,�we�want�to�save�the�unstandardized�residuals�which� will�be�used�later�to�examine�the�extent�to�which�normality�and�independence�are�met)�� From�the�“Univariate”�dialog�box,�click�on�“OK”�to�return�to�generate�the�output� Factorial ANOVA: Step 6 Interpreting the output:� Annotated� results� are� presented� in� Table� 13�8,� and� the� profile�plot�is�shown�in�Figure�13�2��Note�that�in�order�to�test�interaction�contrasts�in�SPSS,� syntax�is�required�rather�than�the�use�of�point-and-click�features�used�primarily�in�this�text� (cf�,�Page,�Braver,�&�MacKinnon,�2003)��Note�also�that�the�SPSS�ANOVA�summary�table�will� include�additional�sources�of�variation�that�we�find�not�to�be�useful�(i�e�,�corrected�model,� intercept,�total);�thus,�they�are�not�annotated�in�Table�13�8� 400 An Introduction to Statistical Concepts Table 13.8 Selected�SPSS�Results�for�the�Statistics�Lab�Example Descriptive Statistics Dependent Variable: Number of Statistics Labs Attended Level of Attractiveness Time of Day Mean Std. Deviation N Afternoon Evening Unattractive Total Afternoon Evening Slightly attractive Total Afternoon Evening Moderately attractive Total Afternoon Evening Very attractive Total Afternoon Evening Total Total 15.2500 7.0000 11.1250 22.7500 13.0000 17.8750 26.2500 14.2500 20.2500 28.2500 20.5000 24.3750 23.1250 13.6875 18.4062 4.03113 2.94392 5.48862 2.21736 3.74166 5.93867 2.21736 4.78714 7.28501 1.70783 4.20317 5.09727 5.65538 6.09611 7.51283 4 4 8 4 4 8 4 4 8 4 4 8 16 16 32 Between-Subjects Factors Value Label N 1.00 2.00 3.00 Level of attractiveness 4.00 1.00Time of day 2.00 Unattractive Slightly attractive Moderately attractive Very attractive Afternoon Evening 8 8 8 8 16 16 The table labeled “Between- Subjects Factors” provides sample sizes for each of the categories of the independent variables (recall that the independent variables are the “between subjects factors”). The table labeled “Descriptive Statistics” provides basic descriptive statistics (means, standard deviations, and sample sizes) for each cell of the design. Levene's Test of Equality of Error Variancesa Dependent Variable: Number of Statistics Labs Attended F df 1 df 2 Sig. .579 7 24 .766 Note: Tests the null hypothesis that the error variance of the dependent variable is equal across groups. a Design: Intercept + Group + Time + Group * Time. The F test (and associated p value) for Levene’s Test for Equality of Error Variances is reviewed to determine if equal variances can be assumed. In this case, we meet the assumption (as p is greater than α). Note that df1 is calculated as (JK – 1) and df 2 is calculated as (N – JK). 401Factorial Analysis of Variance: Fixed-Effects Model Table 13.8 (continued) Selected�SPSS�Results�for�the�Statistics�Lab�Example Tests of Between-Subjects Effects Dependent Variable: Number of Statistics Labs Attended Source Type III Sum of Squares df Mean Square F Sig. Partial Eta Squared Noncent. Parameter Observed Powerb Corrected model 1472.969a 7 210.424 18.248 .000 .842 127.737 1.000 Intercept 10841.281 1 940.165 .000 .975 940.165 1.000 Group 738.594 3 246.198 10841.281 21.350 .000 .727 64.051 1.000 Time 712.531 1 712.531 61.791 .000 .720 61.791 1.000 Group * Time 21.844 3 7.281 .631 .602 .073 1.894 .162 Error 276.750 24 11.531 Total 12591.000 32 Corrected total R2 = = = .842 SSbetw 738.594 + 712.531 + 21.844 1749.719SStotal 1749.719 31 a R squared = .842 (adjusted R squared = .796). b Computed using alpha = .05. Observed power tells whether our test is powerful enough to detect mean differences if they really exist. Power of 1.000 indicates the maximum probability of rejecting the null hypothesis if it is really false (i.e., very strong power). R squared is listed as a footnote underneath the table. R squared is the ratio of sum of squares between (i.e., combined SS for main effects and for the interaction) divided by sum of squares total: �e row labeled “Error” is for within groups. �e within groups sum of squares tells us how much variation there is within the cells combined across the cells (i.e., 276.750). �e degrees of freedom for within groups is (N – JK) or the sample size minus the independent variables [i.e., 32 – (4)(2) = 24]. �e row labeled “Corrected Total” is the sum of squares total. The degrees of freedom for the total is (N – 1) or the total sample size –1. �e omnibus F test for the main effect for “Group” (i.e., attractiveness) (and computed similarly for the other main effects and interactions) is computed as �e p value for the omnibus F test for the main effect for attractiveness is .000. �is indicates there is a statistically significant difference in the dependent variable based on attractiveness, averaged across time of day (afternoon and evening). In other words, there is a unique effect of attractiveness on the number of stat labs attended, controlling for time of day. �e probability of observing these mean differences or more extreme mean differences by chance if the null hypothesis is really true (i.e., if the population means are really equal) is less than 1%. We reject the null hypothesis that the population means of attractiveness are equal. For our example, this provides evidence to suggest that the number of stat labs differs, on average, across the levels of attractiveness, when controlling for time of day. F = = = 21.350 MSA 246.198 11.531MSwith (continued) 402 An Introduction to Statistical Concepts Table 13.8 (continued) Selected�SPSS�Results�for�the�Statistics�Lab�Example Dependent Variable: Number of Statistics Labs Attended 1. Grand Mean 95% Confidence Interval Mean Std. Error Lower Bound Upper Bound 18.406 .600 17.167 19.645 2. Level of Attractiveness Dependent Variable: Number of Statistics Labs Attended 95% Confidence Interval Level of Attractiveness Mean Std. Error Lower Bound Upper Bound Unattractive Slightly attractive Moderately attractive Very attractive 11.125 17.875 20.250 24.375 1.201 1.201 1.201 1.201 8.647 15.397 17.772 21.897 13.603 20.353 22.728 26.853 3. Time of Day Dependent Variable: Number of Statistics Labs Attended 95% Confidence Interval Time of Day Mean Std. Error Lower Bound Upper Bound Afternoon Evening 23.125 13.688 .849 .849 21.373 11.935 24.877 15.440 The “Grand Mean” (in this case, 18.406) represents the overall mean, regardless of group membership, on the dependent variable. The 95% CI represents the CI of the grand mean. The table labeled “Level of attractiveness” provides descriptive statistics for each of the categories of the first independent variable. In addition to means, the SE and 95% CI of the means are reported. The table labeled “Time of day” provides descriptive statistics for each of the categories of the second independent variable. In addition to means, the SE and 95% CI of the means are reported. 4. Level of Attractiveness * Time of Day Dependent Variable: Number of Statistics Labs Attended 95% Confidence Interval Level of Attractiveness Time of Day Mean Std. Error Lower Bound Upper Bound AfternoonUnattractive Evening AfternoonSlightly attractive Evening AfternoonModerately attractive Evening AfternoonVery attractive Evening 15.250 7.000 22.750 13.000 26.250 14.250 28.250 20.500 1.698 1.698 1.698 1.698 1.698 1.698 1.698 1.698 11.746 3.496 19.246 9.496 22.746 10.746 24.746 16.996 18.754 10.504 26.254 16.504 29.754 17.754 31.754 24.004 The table labeled “Level of attractiveness * Time of day” provides descriptive statistics for each of the categories of the first independent variable by the second independent variable (i.e., cell means) (notice that these are the same means reported previously). In addition to means, the SE and 95% CI of the means are reported. 403Factorial Analysis of Variance: Fixed-Effects Model Table 13.8 (continued) Selected�SPSS�Results�for�the�Statistics�Lab�Example Number of Statistics Labs Attended Tukey HSD 95% Confidence Interval (I) Level of Attractiveness (J) Level of Attractiveness Mean Difference (I – J) Std. Error Sig. Lower Bound Upper Bound Slightly attractive –6.7500* 1.69788 .003 –11.4338 –2.0662 Moderately attractive –9.1250* 1.69788 .000 –13.8088 –4.4412 Unattractive Very attractive –13.2500* 1.69788 .000 –17.9338 –8.5662 Unattractive 6.7500* 1.69788 .003 2.0662 11.4338 Moderately attractive –2.3750 1.69788 .512 –7.0588 2.3088 Slightly attractive Very attractive –6.5000* 1.69788 .004 –11.1838 –1.8162 Unattractive 9.1250* 1.69788 .000 4.4412 13.8088 Slightly attractive 2.3750 1.69788 .512 –2.3088 7.0588 Moderately attractive Very attractive –4.1250 1.69788 .098 –8.8088 .5588 Unattractive 13.2500* 1.69788 .000 8.5662 17.9338 Slightly attractive 6.5000* 1.69788 .004 1.8162 11.1838 Very attractive Moderately attractive 4.1250 1.69788 .098 –.5588 8.8088 Note: Based on observed means. The error term is mean square(error) = 11.531. * The mean difference is significant at the .05 level. “Mean difference” is simply the difference between the means of the two levels of attractiveness being compared. For example, the mean difference of level 1 and level 2 is calculated as11.1250 –17.8750 = –6.7500. The standard error calculated in SPSS uses the harmonic mean (Tukey–Kramer modification): “Sig.” denotes the observed p values and provides the results of the contrasts. There are four statistically significant mean differences between: (1) group 1 (unattractive) and group 2 (slightly attractive); (2) group 1 (unattractive) and group 3 (moderately attractive); (3) group 1 (unattractive) and group 4 (very attractive); and (4) group 2 (slightly attractive) and 4 (very attractive). ½[ J (J – 1)] = ½[4(4 – 1)] = ½(12) = 6. Thus there are redundant results presented in the table. For example, the comparison of group 1 and 2 (presented in results row 1) is the same as the comparison of group 2 and 1 (presented in results row 2). SΨ΄ = SΨ΄ = SΨ΄ = MSerror 1 11.531 2.88275 = 1.69788 1 + 2 n1 1 n2 1 + 2 1 8 1 8 + 1 8 + 1 8 Multiple Comparisons Note that there are only six unique contrast results: (continued) 404 An Introduction to Statistical Concepts Table 13.8 (continued) Selected�SPSS�Results�for�the�Statistics�Lab�Example Number of Statistics Labs AttendedTukey HSDab SubsetLevel of Attractiveness N 1 2 3 Unattractive 8 11.1250 Slightly attractive 8 1 7.8750 Moderately attractive 8 20.2500 Very attractive 8 24.3750 Sig. 1.000 .512 .098 Means for groups in homogeneous subsets are displayed. Note: Based on observed means. �e error term is mean square(error) = 11.531. a Uses Harmonic Mean Sample Size = 8.000. b Alpha = .05. �is table displays the means for the groups that are not statistically significantly different. For example, in subset 2 the means for group 2 (slightly attractive) and group 3 (moderately attractive) are displayed, indicating that those group means are “homogeneous”or not significantly different. 20.2500 Spread vs. level plots are plots of the dependent variable standard deviations (or variances) against the cell means. �ese plots can be used to determine what to do when the homogeneity of variance assumption has been violated (remember, we already have evidence of meeting the homogeneity of variance assumption). In addition to Levene’s test, homogeneity is suggested when the spread vs. level plots provide a random display of points (i.e., no systematic pattern). If the plot suggests a linear relationship between the standard deviation and mean, transforming the data by taking the log of the dependent variable values may be a solution to the heterogeneity (since the calculation of logarithms requires positive values, this assumes all the data values are positive). If there is a linear relationship between the variance and mean, transforming the data by taking the square root of the dependent variable values may be a solution to the heterogeneity (since the calculation of square roots requires positive values, this assumes all the data values are positive). 5.00 10.00 15.00 20.00 Level (mean) Group: Group * Time 25.00 30.00 5 4 3 Sp re ad (s ta nd ar d de vi at io n) 2 1 Spread vs. level plot of number of statistics labs attended 25 20 15 Sp re ad (v ar ia nc e) 10 5 0 5.00 10.00 15.00 20.00 25.00 30.00 Level (mean) Groups: Group * Time Spread vs. level plot of number on statistics labs attended 405Factorial Analysis of Variance: Fixed-Effects Model Examining Data for Assumptions Normality We� will� use� the� residuals� (which� were� requested� and� created� through� the� “Save”� option�when�generating�our�factorial�ANOVA)�to�examine�the�extent�to�which�normal- ity�was�met� �e residuals are computed by substracting the cell mean from the dependent variable value for each observation. For example, the cell mean for time 1 group 1 was 15.25. �us the residual for the �rst person is: (15 – 15.25 = –.25). As we look at our raw data, we see a new variable has been added to our dataset labeled RES_1. �is is our residual. �e residual will be used to review the assumptions of normality and independence. Generating normality evidence:�As�alluded�to�earlier�in�the�chapter,�understand- ing� the� distributional� shape,� specifically� the� extent� to� which� normality� is� a� reasonable� assumption,�is�important��For�factorial�ANOVA,�the�distributional�shape�for�the�residuals� should�be�a�normal�distribution��We�can�again�use�“Explore”�to�examine�the�extent�to� which�the�assumption�of�normality�is�met� The�general�steps�for�accessing�“Explore”�have�been�presented�in�previous�chapters,� and�will�not�be�repeated�here��Click�the�residual�and�move�it�into�the�“Dependent List”� box�by�clicking�on�the�arrow�button��The�procedures�for�selecting�normality�statistics�were� presented�in�Chapter�6,�and�remain�the�same�here:�Click�on�“Plots”�in�the�upper�right� corner��Place�a�checkmark�in�the�boxes�for�“Normality plots with tests”�and�also� for�“Histogram.”� Then�click�“Continue”�to�return�to�the�main�“Explore”�dialog�box�� Then�click�“OK”�to�generate�the�output� 406 An Introduction to Statistical Concepts Select residuals from the list on the left and use the arrow to move to the “Dependent List” box on the right. Then click on “Plots.” Generating normality evidence Interpreting normality evidence:� We� have� already� developed� a� good� under- standing�of�how�to�interpret�some�forms�of�evidence�of�normality�including�skewness�and� kurtosis,�histograms,�and�boxplots� Mean for mean 5% Trimmed mean Median Variance Std. deviation Minimum Maximum Range Interquartile range Skewness Kurtosis 95% Con�dence interval Residual for labs Descriptives Lower bound Upper bound Statistic .52819.0000 –1.0772 1.0772 –.0747 –.2500 8.927 2.98788 –5.50 6.75 12.25 3.94 .400 –.162 .809 .414 Std. Error The� skewness� statistic� of� the� residuals� is� �400� and� kurtosis� is� −�162—both� within� the� range�of�an�absolute�value�of�2�0,�suggesting�some�evidence�of�normality� 407Factorial Analysis of Variance: Fixed-Effects Model As�suggested�by�the�skewness�statistic,�the�histogram�of�residuals�is�slightly�positively� skewed,�but�it�approaches�a�normal�distribution�and�there�is�nothing�to�suggest�normality� may�be�an�unreasonable�assumption� 8 6 4 2 0 –5.00 –2.50 .00 2.50 Residual for labs 5.00 7.50 Fr eq ue nc y Histogram Mean = –3.33E-16 Std. dev. = 2.988 N = 32 There�are�a�few�other�statistics�that�can�be�used�to�gauge�normality��The�formal�test�of� normality,�the�Shapiro–Wilk�(S–W)�test�(SW)�(Shapiro�&�Wilk,�1965),�provides�evidence�of� the�extent�to�which�our�sample�distribution�is�statistically�different�from�a�normal�distri- bution��The�output�for�the�S–W�test�is�presented�as�follows�and�suggests�that�our�sample� distribution� for� residuals� is� not� statistically� significantly� different� than� what� would� be� expected�from�a�normal�distribution�(SW�=��977,�df�=�32,�p�=��701)� Residual for labs Statistic Statisticdf dfSig. Sig. Tests of Normality Shapiro–WilkKolmogorov–Smirnova .094 32 32 .701.200 .977 a Lilliefors significance correction. *This is a lower bound of the true significance. Quantile–quantile� (Q–Q)� plots� are� also� often� examined� to� determine� evidence� of� nor- mality�� Q–Q� plots� are� graphs� that� plot� quantiles� of� the� theoretical� normal� distribution� 408 An Introduction to Statistical Concepts against� quantiles� of� the� sample� distribution�� Points� that� fall� on� or� close� to� the� diagonal� line�suggest�evidence�of�normality��The�Q–Q�plot�of�residuals�shown�as�follows�suggests� relative�normality� 3 2 1 0 –1Ex pe ct ed n or m al –2 –3 –6 –3 0 Observed value 3 6 Normal Q–Q plot of residual for labs Examination�of�the�following�boxplot�suggests�a�relatively�normal�distributional�shape� of�residuals�and�no�outliers� 7.50 5.00 2.50 .00 –2.50 –5.00 Residual for labs Considering�the�forms�of�evidence�we�have�examined,�skewness�and�kurtosis�statistics,� the�S–W�test,�the�Q–Q�plot,�and�the�boxplot,�all�suggest�normality�is�a�reasonable�assump- tion��We�can�be�reasonably�assured�that�we�have�met�the�assumption�of�normality�of�the� dependent�variable�for�each�group�of�the�independent�variable� 409Factorial Analysis of Variance: Fixed-Effects Model Independence The�only�assumption�we�have�not�tested�for�yet�is�independence��As�we�discussed�in�ref- erence� to� the� one-way� ANOVA,� if� subjects� have� been� randomly� assigned� to� conditions� (or� to� the� different� combinations� of� the� levels� of� the� independent� variables� in� a� factorial� ANOVA),� the� assumption� of� independence� has� been� met�� In� this� illustration,� students� were�randomly�assigned�to�instructor�and�time�of�day,�and,�thus,�the�assumption�of�inde- pendence�was�met��However,�we�often�use�independent�variables�that�do�not�allow�ran- dom�assignment,�such�as�preexisting�characteristics�such�as�education�level�(high�school� diploma,�bachelor’s,�master’s,�or�doctoral�degrees)��We�can�plot�residuals�against�levels�of� our�independent�variables�in�a�scatterplot�to�get�an�idea�of�whether�or�not�there�are�pat- terns�in�the�data�and�thereby�provide�an�indication�of�whether�we�have�met�this�assump- tion�� Given� we� have� multiple� independent� variables� in� the� factorial� ANOVA,� we� will� split�the�scatterplot�by�levels�of�one�independent�variable�(“Group”)�and�then�generate�a� bivariate�scatterplot�for�“Time”�by�residual��Remember�that�the�residual�was�added�to�the� dataset�by�saving�it�when�we�generated�the�factorial�ANOVA�model� Please�note�that�some�researchers�do�not�believe�that�the�assumption�of�independence� can�be�tested��If�there�is�not�random�assignment�to�groups,�then�these�researchers�believe� this�assumption�has�been�violated—period��The�plot�that�we�generate�will�give�us�a�gen- eral�idea�of�patterns,�however,�in�situations�where�random�assignment�was�not�performed� or�not�possible� Splitting the file:�The�first�step�is�to�split�our�file�by�the�levels�of�one�of�our�inde- pendent�variables�(e�g�,�“Group”)��To�do�that,�go�to�“Data”�in�the�top�pulldown�menu�and� then�select�“Split File.” A B Generating independence evidence: Step 1 410 An Introduction to Statistical Concepts Select independent variable from the list on the left and use the arrow to move to the “Group Based on” box on the right. Then click on “Ok.” Generating independence evidence: Step 2 Generating the scatterplot:� The� general� steps� for� generating� a� simple� scatterplot� through�“Scatter/dot”�have�been�presented�in�a�previous�chapter�(e�g�,�Chapter�10),�and� they�will�not�be�reiterated�here��From�the�“Simple Scatterplot”�dialog�screen,�click�the� residual�variable�and�move�it�into�the�“Y Axis”�box�by�clicking�on�the�arrow��Click�the�inde- pendent�variable�that�was�not�used�to�split�the�file�(e�g�,�“Time”)�and�move�it�into�the�“X Axis”� box�by�clicking�on�the�arrow��Then�click�“OK.” 411Factorial Analysis of Variance: Fixed-Effects Model Interpreting independence evidence:�In�examining�the�scatterplots�for�evidence� of�independence,�the�points�should�fall�relatively�randomly�above�and�below�a�horizontal� line�at�0��(You�may�recall�in�Chapter�11�that�we�added�a�reference�line�to�the�graph�using� Chart�Editor��To�add�a�reference�line,�double�click�on�the�graph�in�the�output�to�activate� the� chart� editor�� Select�“Options”� in� the� top� pulldown� menu,� then�“Y axis refer- ence line.”� This� will� bring� up� the “Properties”� dialog� box�� Change� the� value� of� the�position�to�be�“0�”�Then�click�on�“Apply”�and “Close”�to�generate�the�graph�with�a� horizontal�line�at�0�) In�this�example,�our�scatterplot�for�each�level�of�attractiveness�generally�suggests�evi- dence� of� independence� with� a� relatively� random� display� of� residuals� above� and� below� the�horizontal�line�at�0�for�each�category�of�time��Thus,�had�we�not�met�the�assumption�of� independence�through�random�assignment�of�cases�to�groups,�this�would�have�provided� evidence�that�independence�was�a�reasonable�assumption� 8.00 6.00 4.00 2.00 .00 –2.00 –4.00 Re si du al fo r l ab s 1.00 1.20 1.40 1.60 Time of day 1.80 2.00 Level of attractiveness: Unattractive 6.00 4.00 2.00 .00 –2.00 –4.00 Re si du al fo r l ab s 1.00 1.20 1.40 1.60 Time of day 1.80 2.00 Level of attractiveness: Slightly attractive 412 An Introduction to Statistical Concepts 8.00 6.00 4.00 2.00 .00 –2.00 –4.00 Re si du al fo r l ab s 1.00 1.20 1.40 1.60 Time of day 1.80 2.00 Level of attractiveness: Moderately attractive 1.00 1.20 1.40 1.60 Time of day 1.80 2.00 –5.00 –2.50 .00 Re si du al fo r l ab s 2.50 5.00 Level of attractiveness: Very attractive 413Factorial Analysis of Variance: Fixed-Effects Model Post Hoc Power for Factorial ANOVA Using G*Power Main effects:� When� there� are� multiple� independent� variables,� G*Power� must� be� cal- culated�for�each�main�effect�and�for�each�interaction��We�will�illustrate�the�main�effect�for� attractiveness� of� instructor,� but� note� that� computing� post� hoc� power� for� the� other� main� effect(s)�and�interaction(s)�is�similarly�obtained� The�first�thing�that�must�be�done�when�using�G*Power�for�computing�post�hoc�power�is� to�select�the�correct�test�family��In�our�case,�we�conducted�a�factorial�ANOVA��To�find�the� factorial�ANOVA,�we�select�“Tests”�in�the�top�pulldown�menu,�then�“Means,”�and�then� “Many groups: ANOVA: Main effects and interactions (two or more inde- pendent variables).”�Once�that�selection�is�made,�the�“Test family”�automatically� changes�to�“F tests.” A B C Step 1 The�“Type of Power Analysis”�desired�then�needs�to�be�selected��To�compute�post� hoc� power,� we� need� to� select�“Post hoc: Compute achieved power—given α, sample size, and effect size.” 414 An Introduction to Statistical Concepts The default selection for “Statistical Test” is “Correlation: Point biserial model.” Following the procedures presented in Step 1 will automatically change the statistical test to “ANOVA: Fixed effects, special, main effects and interactions” (two or more independent variables). The default selection for “Test Family” is “t tests.” Following the procedures presented in Step 1 will automatically change the test family to “F test.” Click on “Determine” to pop out the effect size calculator box (shown below). �is will allow you to compute f given partial eta squared. Once the parameters are specified, click on “Calculate.” The “Input Parameters” for computing post hoc power must be specified (the default values are shown here) including: Step 2 1. Effect size f 2. Alpha level 3. Total sample size 4. Numerator df 5. Number of groups The�“Input Parameters”�must�then�be�specified��We�compute�the�effect�size�f�last,�so� skip� that� for� the� moment�� In� our� example,� the� alpha� level� we� used� was� �05,� and� the� total� sample�size�was�32��The�numerator df�for�attractiveness�(recall�that�we�are�computing�post�hoc� power�for�the�main�effect�of�attractiveness�here)�is�equal�to�the�number�of�categories�of�this� variable�(i�e�,�4)�minus�1;�thus,�there�are�three�degrees�of�freedom�for�attractiveness��The�num- ber of groups�is�equal�to�the�product�of�the�number�of�levels�or�categories�of�the�independent� variables�or�(J)(K)��In�this�example,�the�number�of�groups�or�cells�then�equals�(J)(K)�=�(4)(2)�=�8� We� skipped� filling� in� the� first� parameter,� the� effect� size� f,� for� a� reason�� SPSS� only� provided� a� partial� eta� squared� effect� size�� Thus,� we� will� use� the� pop-out� effect� size� calculator�in�G*Power�to�compute�the�effect�size�f�(we�saved�this�parameter�for�last�as� the�calculation�is�based�on�the�previous�values�just�entered)��To�pop�out�the�effect�size� calculator,�click�on�“Determine”�which�is�displayed�under�“Input Parameters.”�In� the�pop-out�effect�size�calculator,�click�on�the�radio�button�for�“Direct”�and�then�enter� the� partial� eta� squared� value� for� attractiveness� that� was� calculated� in� SPSS� (i�e�,� �842)�� Clicking�on�“Calculate”�in�the�pop-out�effect�size�calculator�will�calculate�the�effect� size�f��Then�click�on�“Calculate and Transfer to Main Window”�to�transfer�the� calculated� effect� size� (i�e�,� 2�3084874)� to� the�“Input Parameters.”� Once� the� param- eters�are�specified,�click�on�“Calculate”�to�find�the�power�statistics� 415Factorial Analysis of Variance: Fixed-Effects Model Post hoc power Here are the post hoc power results. The�“Output Parameters”�provide�the�relevant�statistics�given�the�input�just�speci- fied��In�this�example,�we�were�interested�in�determining�post�hoc�power�for�a�two-factor� ANOVA�with�a�computed�effect�size�f�of�2�308,�an�alpha�level�of��05,�total�sample�size�of�32,� numerator�degrees�of�freedom�of�3,�and�8�groups�or�cells� Based�on�those�criteria,�the�post�hoc�power�for�the�main�effect�of�attractiveness�was�1�00�� In�other�words,�with�a�factorial�ANOVA,�computed�effect�size�f�of�2�308,�alpha�level�of��05,� total�sample�size�of�32,�numerator�degrees�of�freedom�of�3,�and�8�groups�(or�cells),�the�post� hoc� power� of� our� main� effect� was� 1�00—the� probability� of� rejecting� the� null� hypothesis� when�it�is�really�false�(in�this�case,�the�probability�that�the�means�of�the�dependent�vari- able�would�be�equal�for�each�level�of�the�independent�variable)�was�1�00,�which�would�be� considered�maximum�power�(sufficient�power�is�often��80�or�above)��Note�that�this�value�is� the�same�as�that�reported�in�SPSS��Keep�in�mind�that�conducting�power�analysis�a�priori� is�recommended�so�that�you�avoid�a�situation�where,�post�hoc,�you�find�that�the�sample� size�was�not�sufficient�to�reach�the�desired�level�of�power�(given�the�observed�parameters)� 416 An Introduction to Statistical Concepts Interactions:� Calculation� of� power� for� interactions� is� conducted� similarly�� The� input� of� �075� for� partial� eta� squared� results� in� the� following� output� for� interaction� power�� The� post�hoc�power�of�the�interaction�effect�for�this�test�was��204—the�probability�of�rejecting� the�null�hypothesis�when�it�is�really�false�(in�this�case,�the�probability�that�the�means� of�the�dependent�variable�would�be�equal�for�each�cell)�was�about�20%,�which�would�be� considered�very�low�power�(sufficient�power�is�often��80�or�above)��Note�that�this�value�is� not�the�same�as�that�reported�in�SPSS� Here are the post hoc power results for the attractiveness by time of day interaction. Post hoc power: Interaction A Priori Power for Factorial ANOVA Using G*Power For�a�priori�power,�we�can�determine�the�total�sample�size�needed�for�the�main�effects�and/ or�interactions�given�an�estimated�effect�size�f,�alpha�level,�desired�power,�numerator�degrees� of�freedom�(i�e�,�number�of�categories�of�our�independent�variable�or�interaction,�depending� on� which� a� priori� power� is� of� interest),� and� number� of� groups� or� cells� (i�e�,� the� product� of� the�number�of�levels�of�the�independent�variables)��We�follow�Cohen’s�(1988)�conventions�for� effect�size�(i�e�,�small,�f�=��10;�moderate,�f�=��25;�large,�f�=��40)��In�this�example,�had�we�estimated� a�moderate�effect�f�of��25,�alpha�of��05,�desired�power�of��80,�numerator�degrees�of�freedom�of� 3�(four�groups�in�attractiveness,�two�levels�of�time�of�day,�thus�4�−�1�×�2�−�1�=�3),�and�number� of�groups�of�8�(i�e�,�four�categories�of�attractiveness�and�two�levels�in�time�of�day�or�4�×�2�=�8),� we�would�need�a�total�sample�size�of�179�(or�about�22�or�23�individuals�per�cell)� 417Factorial Analysis of Variance: Fixed-Effects Model A priori power: Interaction Here are the a priori power results. 13.5 Template and APA-Style Write-Up Finally� we� come� to� an� example� paragraph� of� the� results� for� the� two-factor� statistics� lab� example�� Recall� that� our� graduate� research� assistant,� Marie,� was� working� on� a� research� project� for� an� independent� study� class� to� determine� if� there� was� a� mean� difference� in� the� number� of� statistics� labs� attended� based� on� the� attractiveness� of� the� lab� instructor� (four�categories)�and�time�of�day�the�lab�was�attended�(afternoon�or�evening)��Her�research� question�was�the�following:�Is there a mean difference in the number of statistics labs students attended based on the attractiveness of the lab instructor and time of day the lab was attended?� Marie�then�generated�a�factorial�ANOVA�as�the�test�of�inference��A�template�for�writing�a� research�question�for�a�factorial�ANOVA�is�presented�as�follows; 418 An Introduction to Statistical Concepts Is there a mean difference in [dependent variable] based on [inde- pendent variable 1] and [independent variable 2]? This�is�illustrated�assuming�a�two-factor�model,�but�it�can�easily�be�extended�to�more�than�two� factors��As�we�noted�in�Chapter�11,�it�is�important�to�ensure�the�reader�understands�the�levels� or�groups�of�the�independent�variables��This�may�be�done�parenthetically�in�the�actual�research� question,�as�an�operational�definition,�or�specified�within�the�methods�section��In�this�example,� parenthetically�we�could�have�stated�the�following:�Is there a mean difference in the number of statis- tics labs students attend based on the attractiveness of the lab instructor (unattractive, slightly attractive, moderately attractive, very attractive) and time of day the lab was attended (afternoon or evening)? It�may�be�helpful�to�preface�the�results�of�the�factorial�ANOVA�with�information�on� an�examination�of�the�extent�to�which�the�assumptions�were�met�(recall�there�are�three� assumptions:�normality,�homogeneity�of�variance,�and�independence)��This�assists�the� reader�in�understanding�that�you�were�thorough�in�data�screening�prior�to�conducting� the�test�of�inference: A factorial ANOVA was conducted to determine if the mean number of statistics labs attended by students differed based on the level of attractiveness of the statistics lab instructor (unattractive, slightly attractive, moderately attractive, very attractive) and the time of day the lab was attended (afternoon or evening). The assumption of normality was tested and met via examination of the residuals. Review of the S–W test for normality (SW = .977, df = 32, p = .701) and skew- ness (.400) and kurtosis (−.162) statistics suggested that normal- ity was a reasonable assumption. The boxplot suggested a relatively normal distributional shape (with no outliers) of the residuals. The Q–Q plot and histogram suggested normality was reasonable. According to Levene’s test, the homogeneity of variance assumption was satis- fied [F(7, 24) = .579, p = .766]. Random assignment of individuals to groups helped ensure that the assumption of independence was met. Additionally, scatterplots of residuals against the levels of the inde- pendent variables were reviewed. A random display of points around 0 provided further evidence that the assumption of independence was met. Here� is� an� APA-style� example� paragraph� of� results� for� the� factorial� ANOVA� (remember� that� this� will� be� prefaced� by� the� previous� paragraph� reporting� the� extent� to� which� the� assumptions�of�the�test�were�met): From Table 13.8, we see that the interaction of attractiveness by time of day is not statistically significant, but there are statistically significant main effects for both attractiveness and time of day (Fattract = 21.350, df = 3, 24, p = .001; Ftime = 61.791, df = 1, 24, p = .001). Effect sizes are large for both attractiveness and time (partial η2attract = .727; partial η2time = .720), and observed power for attractiveness and time is maximal (i.e., 1.000). Post hoc analyses were conducted given the statistically significant omnibus ANOVA F tests. The profile plot (Figure 13.2) summarizes these differences. Tukey HSD tests were conducted on all possible 419Factorial Analysis of Variance: Fixed-Effects Model pairwise contrasts. For the main effect of attractiveness, Tukey HSD post hoc comparisons revealed that the unattractive level had sta- tistically significantly lower attendance than all the other levels of attractiveness and that the slightly attractive level had statisti- cally significantly lower attendance than the very attractive level. More specifically, the following pairs of groups were found to be significantly different (p < .05): •� Groups 1 (unattractive; M = 11.125, SD = 5.4886) and 2 (slightly attractive; M = 17.875, SD = 5.9387) •� Groups 1 (unattractive) and 3 (moderately attractive; M = 20.2500, SD = 7.2850) •� Groups 1 (unattractive) and 4 (very attractive; M = 24.3750, SD = 5.0973) •� Groups 2 (slightly attractive) and 4 (very attractive) In other words, students enrolled in the least attractive instructor group attended statistically significantly fewer statistics labs than students enrolled in any of the three more attractive instructor groups. For the main effect of time of day, Tukey HSD post hoc comparisons revealed that the students enrolled in the afternoon (M = 23.125, SD = 5.655) had statistically significantly higher statistics lab atten- dance than students in the evening (M = 13.688, SD = 6.096). 13.6 Summary This�chapter�considered�methods�involving�the�comparison�of�means�for�multiple�inde- pendent� variables�� The� chapter� began� with� a� look� at� the� characteristics� of� the� factorial� ANOVA,�including�(a)�two�or�more�independent�variables�each�with�two�or�more�fixed� levels;�(b)�subjects�are�randomly�assigned�to�cells�and�then�exposed�to�only�one�combina- tion�of�the�independent�variables;�(c)�the�factors�are�fully�crossed�such�that�all�possible� combinations� of� the� factors’� levels� are� included� in� the� design;� and� (d)� the� dependent� variable�measured�at�the�interval�level�or�better��The�ANOVA�model�was�examined�and� followed�by�a�discussion�of�main�effects�and,�in�particular,�the�interaction�effect��Some� discussion�was�also�devoted�to�the�ANOVA�assumptions��The�ANOVA�summary�table� was� shown� along� with� partitioning� the� sums� of� squares�� MCPs� were� then� extended� to� factorial� models�� Then� effect� size� measures,� CIs,� power,� and� expected� mean� squares� were� considered�� Finally,� several� approaches� were� given� for� the� unequal� n’s� case� with� factorial�models��At�this�point,�you�should�have�met�the�following�objectives:�(a)�be�able� to�understand�the�characteristics�and�concepts�underlying�factorial�ANOVA,�(b)�be�able�to� determine� and� interpret� the� results� of� factorial� ANOVA,� and� (c)� be� able� to� understand� and�evaluate�the�assumptions�of�factorial�ANOVA��In�Chapter�14,�we�introduce�the�analysis� of�covariance� 420 An Introduction to Statistical Concepts Problems Conceptual problems 13.1� �You�are�given�a�two-factor�design�with�the�following�cell�means�(cell�11�=�25;�cell�12�=�75;� cell�21�=�50;�cell�22�=�50;�cell�31�=�75;�cell�32�=�25)��Assume�that�the�within-cell�variation�is� small��Which�one�of�the�following�conclusions�seems�most�probable? � a�� The�row�means�are�significantly�different� � b�� The�column�means�are�significantly�different� � c�� The�interaction�is�significant� � d�� All�of�the�above� 13.2� �In�a�two-factor�ANOVA,�one�independent�variable�has�five�levels�and�the�second�has� four�levels��If�each�cell�has�seven�observations,�what�is�dfwith? � a�� 20 � b�� 120 � c�� 139 � d�� 140 13.3� �In� a� two-factor� ANOVA,� one� independent� variable� has� three� levels� or� categories� and�the�second�has�three�levels�or�categories��What�is�dfAB,�the�interaction�degrees�of� freedom? � a�� 3 � b�� 4 � c�� 6 � d�� 9 13.4� �Which�of�the�following�conclusions�would�result�in�the�greatest�generalizability�of� the�main�effect�for�factor�A�across�the�levels�of�factor�B?�The�interaction�between�the� independent�variables�A�and�B�was�… � a�� Not�significant�at�the��25�level � b�� Significant�at�the��10�level � c�� Significant�at�the��05�level � d�� Significant�at�the��01�level � e�� Significant�at�the��001�level 13.5� �In�a�two-factor�fixed-effects�ANOVA�tested�at�an�alpha�of��05,�the�following�p�values� were� found:� main� effect� for� factor� A,� p� =� �06;� main� effect� for� factor� B,� p� =� �09;� and� interaction�AB,�p�=��02��What�can�be�interpreted�from�these�results? � a�� There�is�a�statistically�significant�main�effect�for�factor�A� � b�� There�is�a�statistically�significant�main�effect�for�factor�B� � c�� There�is�a�statistically�significant�main�effect�for�factors�A�and�B� � d�� There�is�a�statistically�significant�interaction�effect� 421Factorial Analysis of Variance: Fixed-Effects Model 13.6� �In�a�two-factor�fixed-effects�ANOVA,�FA�=�2,�dfA�=�3,�dfB�=�6,�dfAB�=�18,�and�dfwith�=�56�� The�null�hypothesis�for�factor�A�can�be�rejected � a�� At�the��01�level � b�� At�the��05�level,�but�not�at�the��01�level � c�� At�the��10�level,�but�not�at�the��05�level � d�� None�of�the�above 13.7� In�ANOVA,�the�interaction�of�two�factors�is�certainly�present�when � a�� The�two�factors�are�positively�correlated� � b�� The�two�factors�are�negatively�correlated� � c�� Row�effects�are�not�consistent�across�columns� � d�� Main�effects�do�not�account�for�all�of�the�variation�in�Y� � e�� Main�effects�do�account�for�all�of�the�variation�in�Y� 13.8� For�a�design�with�four�factors,�how�many�interactions�will�there�be? � a�� 4 � b�� 8 � c�� 11 � d�� 12 � e�� 16 13.9� �Degrees� of� freedom� for� the� AB� interaction� are� equal� to� which� one� of� the� following? � a�� dfA�−�dfB � b�� (dfA)(dfB) � c�� dfwith�−�(dfA�+�dfB) � d�� dftotal�−�dfwith 13.10� �A�two-factor�experiment�means�that�the�design�necessarily�includes�which�one�of� the�following? � a�� Two�independent�variables � b�� Two�dependent�variables � c�� An�interaction�between�the�independent�and�dependent�variables � d�� Exactly�two�separate�groups�of�subjects 13.11� Two�independent�variables�are�said�to�interact�when�which�one�of�the�following�occurs? � a�� Both�variables�are�equally�influenced�by�a�third�variable� � b�� These�variables�are�differentially�affected�by�a�third�variable� � c�� Each�factor�produces�a�change�in�the�subjects’�scores� � d�� The�effect�of�one�variable�depends�on�the�second�variable� 13.12� �If� there� is� an� interaction� between� the� independent� variables� textbook� and� time� of� day,�this�means�that�the�textbook�used�has�the�same�effect�at�different�times�of�the� day��True�or�false? 422 An Introduction to Statistical Concepts 13.13� �If�the�AB�interaction�is�significant,�then�at�least�one�of�the�two�main�effects�must�be� significant��True�or�false? 13.14� �I�assert�that�a�two-factor�experiment�(factors�A�and�B)�yields�no�more�information� than�two�one-factor�experiments�(factor�A�in�experiment�1�and�factor�B�in�experi- ment�2)��Am�I�correct? 13.15� �For�a�two-factor�fixed-effects�model,�if�the�degrees�of�freedom�for�testing�factor�A�=�2,�24,� then�I�assert�that�the�degrees�of�freedom�for�testing�factor�B�will�necessarily�be�=�2,�24�� Am�I�correct? � �Questions�13�16�through�13�18�are�based�on�the�following�ANOVA�summary�table� (fixed-effects): Source df MS F A 2 45 4�5 B 1 70 7�0 AB 2 170 17�0 Within 60 10 13.16� �For� which� source� of� variation� is� the� null� hypothesis� rejected� at� the� �01� level� of� significance? � a�� A � b�� B � c�� AB � d�� All�of�the�above 13.17� How�many�cells�are�there�in�the�design? � a�� 1 � b�� 2 � c�� 3 � d�� 5 � e�� None�of�the�above 13.18� The�total�sample�size�for�the�design�is�which�one�of�the�following? � a�� 66 � b�� 68 � c�� 70 � d�� None�of�the�above � �Questions�13�19�through�13�21�are�based�on�the�following�ANOVA�summary�table� (fixed�effects): Source df MS F A 2 164 5�8 B 1 80 2�8 AB 2 68 2�4 Within 9 28 423Factorial Analysis of Variance: Fixed-Effects Model 13.19� �For� which� source� of� variation� is� the� null� hypothesis� rejected� at� the� �01� level� of� significance? � a�� A � b�� B � c�� AB � d�� All�of�the�above 13.20� How�many�cells�are�there�in�the�design? � a�� 1 � b�� 2 � c�� 3 � d�� 6 � e�� None�of�the�above 13.21� The�total�sample�size�for�the�design�is�which�one�of�the�following? � a�� 10 � b�� 15 � c�� 20 � d�� 25 Computational problems 13.1� �Complete� the� following� ANOVA� summary� table� for� a� two-factor� fixed-effects� ANOVA,�where�there�are�two�levels�of�factor�A�(drug)�and�three�levels�of�factor�B� (dosage)��Each�cell�includes�26�students�and�α�=��05� Source SS df MS F Critical Value Decision A 6�15 — — — — — B 10�60 — — — — — AB 9�10 — — — — — Within — — — Total 250�85 — 13.2� �Complete� the� following� ANOVA� summary� table� for� a� two-factor� fixed-effects� ANOVA,�where�there�are�three�levels�of�factor�A�(program)�and�two�levels�of�factor� B�(gender)��Each�cell�includes�four�students�and�α�=��01� Source SS df MS F Critical Value Decision A 3�64 — — — — — B �57 — — — — — AB 2�07 — — — — — Within — — — Total 8�18 — 424 An Introduction to Statistical Concepts 13.3� Complete� the� following� ANOVA� summary� table� for� a� two-factor� fixed-effects� ANOVA,� where� there� are� two� levels� of� factor� A� (undergraduate� vs�� graduate)� and� two�levels�of�factor�B�(gender)��Each�cell�includes�four�students�and�α�=��05� Source SS df MS F Critical Value Decision A 14�06 — — — — — B 39�06 — — — — — AB 1�56 — — — — — Within — — — Total 723�43 — 13.4� Conduct�a�two-factor�fixed-effects�ANOVA�to�determine�if�there�are�any�effects�due� to�A�(task�type),�B�(task�difficulty),�or�the�AB�interaction�(α�=��01)��Conduct�Tukey�HSD� post�hoc�comparisons,�if�necessary��The�following�are�the�scores�from�the�individual� cells�of�the�model: � A1B1:�41,�39,�25,�25,�37,�51,�39,�101 � A1B2:�46,�54,�97,�93,�51,�36,�29,�69 � A1B3:�113,�135,�109,�96,�47,�49,�68,�38 � A2B1:�86,�38,�45,�45,�60,�106,�106,�31 � A2B2:�74,�96,�101,�124,�48,�113,�139,�131 � A2B3:�152,�79,�135,�144,�52,�102,�166,�155 13.5� An�experimenter�is�interested�in�the�effects�of�strength�of�reinforcement�(factor�A),� type� of� reinforcement� (factor� B),� and� sex� of� the� adult� administering� the� reinforce- ment�(factor�C)�on�children’s�behavior��Each�factor�consists�of�two�levels��Thirty-two� children�are�randomly�assigned�to�eight�cells�(i�e�,�four�per�cell),�one�for�each�of�the� factor�combinations��Using�the�scores�from�the�individual�cells�of�the�model�that�fol- low,�conduct�a�three-factor�fixed-effects�ANOVA�(α�=��05)��If�there�are�any�significant� interactions,�graph�and�interpret�the�interactions� � A1B1C1:�3,�6,�3,�3 � A1B1C2:�4,�5,�4,�3 � A1B2C1:�7,�8,�7,�6 � A1B2C2:�7,�8,�9,�8 � A2B1C1:�1,�2,�2,�2 � A2B1C2:�2,�3,�4,�3 � A2B2C1:�5,�6,�5,�6 � A2B2C2:�10,�10,�9,�11 13.6� A�replication�study�dataset�of�the�example�from�this�chapter�is�given�as�follows� (A�=�attractiveness,�B�=�time;�same�levels)��Using�the�scores�from�the�individual�cells� of�the�model�that�follow,�conduct�a�two-factor�fixed-effects�ANOVA�(α�=��05)��Are�the� results�different�as�compared�to�the�original�dataset? � A1B1:�10,�8,�7,�3 � A1B2:�15,�12,�21,�13 � A2B1:�13,�9,�18,�12 � A2B2:�20,�22,�24,�25 425Factorial Analysis of Variance: Fixed-Effects Model � A3B1:�24,�29,�27,�25 � A3B2:�10,�12,�21,�14 � A4B1:�30,�26,�29,�28 � A4B2:�22,�20,�25,�15 Interpretive problem 13.1� Building� on� the� interpretive� problem� from� Chapter� 11,� utilize� the� survey� 1� dataset� from�the�website��Use�SPSS�to�conduct�a�two-factor�fixed-effects�ANOVA,�including� effect�size,�where�political�view�is�factor�A�(as�in�Chapter�11,�J�=�5),�gender�is�factor�B� (a�new�factor,�K�=�2),�and�the�dependent�variable�is�the�same�one�you�used�previously� in�Chapter�11��Then�write�an�APA-style�paragraph�summarizing�the�results� 13.2� Building�on�the�interpretive�problem�from�Chapter�11,�use�the�survey�1�dataset�from� the�website��Use�SPSS�to�conduct�a�two-factor�fixed-effects�ANOVA,�including�effect� size,�where�hair�color�is�factor�A�(i�e�,�one�independent�variable)�(J�=�5),�gender�is�fac- tor�B�(a�new�factor,�K�=�2),�and�the�dependent�variable�is�a�variable�of�interest�to�you� (the�following�variables�look�interesting:�books,�TV,�exercise,�drinks,�GPA, GRE-Q,� CDs,�hair�appointment)��Then�write�an�APA-style�paragraph�describing�the�results� 427 14 Introduction to Analysis of Covariance: One- Factor Fixed-Effects Model With Single Covariate Chapter Outline 14�1� Characteristics�of�the�Model 14�2� Layout�of�Data 14�3� ANCOVA�Model 14�4� ANCOVA�Summary�Table 14�5� Partitioning�the�Sums�of�Squares 14�6� Adjusted�Means�and�Related�Procedures 14�7� Assumptions�and�Violation�of�Assumptions � 14�7�1� Independence � 14�7�2� Homogeneity�of�Variance � 14�7�3� Normality � 14�7�4� Linearity � 14�7�5� Fixed�Independent�Variable � 14�7�6� Independence�of�the�Covariate�and�the�Independent�Variable � 14�7�7� Covariate�Measured�Without�Error � 14�7�8� Homogeneity�of�Regression�Slopes 14�8� Example 14�9� ANCOVA�Without�Randomization 14�10� More�Complex�ANCOVA�Models 14�11� Nonparametric�ANCOVA�Procedures 14�12� SPSS�and�G*Power 14�13� Template�and�APA-Style�Paragraph Key Concepts � 1�� Statistical�adjustment � 2�� Covariate � 3�� Adjusted�means � 4�� Homogeneity�of�regression�slopes � 5�� Independence�of�the�covariate�and�the�independent�variable 428 An Introduction to Statistical Concepts We� have� now� considered� several� different� analysis� of� variance� (ANOVA)� models�� As� we� moved�through�Chapter�13,�we�saw�that�the�inclusion�of�additional�factors�helped�to�reduce� the� residual� or� uncontrolled� variation�� These� additional� factors� served� as� “experimental� design�controls”�in�that�their�inclusion�in�the�design�helped�to�reduce�the�uncontrolled�varia- tion��In�fact,�this�could�be�the�reason�an�additional�factor�is�included�in�a�factorial�design� In�this�chapter,�a�new�type�of�variable,�known�as�a�covariate,�is�incorporated�into�the�analy- sis��Rather�than�serving�as�an�“experimental�design�control,”�the�covariate�serves�as�a�“statisti- cal�control”�where�uncontrolled�variation�is�reduced�statistically�in�the�analysis��Thus,�a�model� where�a�covariate�is�used�is�known�as�analysis of covariance�(ANCOVA)��We�are�most�con- cerned�with�the�one-factor�fixed-effects�model�here,�although�this�model�can�be�generalized� to�any�of�the�other�ANOVA�designs�considered�in�this�text��That�is,�any�of�the�ANOVA�models� discussed�in�the�text�can�also�include�a�covariate�and�thus�become�an�ANCOVA�model� Most� of� the� concepts� used� in� this� chapter� have� already� been� covered� in� the� text�� In� addition,� new� concepts� include� statistical� adjustment,� covariate,� adjusted� means,� and� two� important� assumptions:� homogeneity� of� regression� slopes� and� independence� of� the� covariate�and�the�independent�variable��Our�objectives�are�that�by�the�end�of�this�chapter,� you�will�be�able�to�(a)�understand�the�characteristics�and�concepts�underlying�ANCOVA;� (b)�determine�and�interpret�the�results�of�ANCOVA,�including�adjusted�means�and�mul- tiple�comparison�procedures�(MCPs);�and�(c)�understand�and�evaluate�the�assumptions�of� ANCOVA� 14.1 Characteristics of the Model For�the�past�few�chapters,�we�have�been�following�Marie,�the�educational�research�graduate� student�who,�as�part�of�her�independent�study�course,�conducted�an�experiment�to�examine� statistics�lab�attendance��She�has�examined�attendance�based�on�attractiveness�of�instructor� (Chapters�11�and�12)�and�based�on�attractiveness�and�time�of�day�(Chapter�13)��As�we�will�see� in�this�chapter,�Marie�will�be�continuing�to�examine�data�generated�from�a�different�experi- ment�of�students�enrolled�in�statistics�courses,�now�controlling�for�aptitude� As�we�learned�in�previous�chapters,�Marie�is�enrolled�in�an�independent�study�class�� Her�previous�study�was�so�successful�that�Marie,�again�in�collaboration�with�the�sta- tistics�faculty�in�her�program,�has�designed�another�experimental�study�to�determine� if� there� was� a� mean� difference� in� statistics� quiz� performance� based� on� the� teaching� method�utilized�(traditional�lecture�method�or�innovative�instruction)��Twelve�students� were�randomly�assigned�to�two�different�sections�of�the�same�class��One�section�was� taught�using�traditional�lecture�methods,�and�the�second�was�taught�with�more�inno- vative�instruction�which�included,�for�example,�small-group�and�self-directed�instruc- tion�� Prior� to� random� assignment� to� sections,� participants� were� also� measured� on� aptitude�toward�statistics��Marie�is�now�ready�to�examine�these�data��Marie’s�research� question�is�the�following:�Is there a mean difference in statistics quiz scores based on teaching method, controlling for aptitude toward statistics?�With�one�independent�variable�and�one� covariate�for�which�to�control,�Marie�determines�that�an�ANCOVA�is�the�best�statistical� procedure�to�use�to�answer�her�question��Her�next�task�is�to�analyze�the�data�to�address� her�research�question� 429Introduction to Analysis of Covariance In�this�section,�we�describe�the�distinguishing�characteristics�of�the�one-factor�fixed-effects� ANCOVA�model��However,�before�we�begin�an�extended�discussion�of�these�characteris- tics,�consider�the�following�example�(a�situation�similar�to�which�we�find�Marie)��Imagine� a�situation�where�a�statistics�professor�is�scheduled�to�teach�two�sections�of�introductory� statistics�� The� professor,� being� a� cunning� researcher,� decides� to� perform� a� little� experi- ment� where� Section� 14�1� is� taught� using� the� traditional� lecture� method� and� Section� 14�2� is�taught�with�more�innovative�methods�using�extensive�graphics,�computer�simulations,� and� computer-assisted� and� calculator-based� instruction,� as� well� as� using� mostly� small- group�and�self-directed�instruction��The�professor�is�interested�in�which�section�performs� better�in�the�course� Before�the�study/course�begins,�the�professor�thinks�about�whether�there�are�other�vari- ables�related�to�statistics�performance�that�should�somehow�be�taken�into�account�in�the� design�� An� obvious� one� is� ability� in� quantitative� methods�� From� previous� research� and� experience,�the�professor�knows�that�ability�in�quantitative�methods�is�highly�correlated� with�performance�in�statistics�and�decides�to�give�a�measure�of�quantitative�ability�in�the� first�class�and�use�that�as�a�covariate�in�the�analysis��A�covariate�(e�g�,�quantitative�ability)� is�defined�as�a�source�of�variation�not�controlled�for�in�the�design�of�the�experiment�but� that�the�researcher�believes�to�affect�the�dependent�variable�(e�g�,�course�performance)��The� covariate�is�used�to�statistically�adjust�the�dependent�variable��For�instance,�if�Section�14�1� has�higher�quantitative�ability�than�Section�14�2�going�into�the�study,�then�it�would�be�wise� to�take�this�into�account�in�the�analysis��Otherwise�Section�14�1�might�outperform�Section� 14�2�due�to�their�higher�quantitative�ability�rather�than�due�to�the�method�of�instruction�� This�is�precisely�the�point�of�the�ANCOVA��Some�of�the�more�typical�examples�of�covari- ates�in�education�and�the�behavioral�sciences�are�pretest�(where�the�dependent�variable�is� the�posttest),�prior�achievement,�weight,�IQ,�aptitude,�age,�experience,�previous�training,� motivation,�and�grade�point�average�(GPA)� Let�us�now�begin�with�the�characteristics�of�the�ANCOVA�model��The�first�set�of�char- acteristics� is� obvious� because� they� carry� over� from� the� one-factor� fixed-effects� ANOVA� model��There�is�a�single�independent�variable�or�factor�with�two�or�more�levels�or�catego- ries�(thus�the�independent�variable�continues�to�be�either�nominal�or�ordinal�in�measure- ment�scale)��The�levels�of�the�independent�variable�are�fixed�by�the�researcher�rather�than� randomly�sampled�from�a�population�of�levels��Once�the�levels�of�the�independent�variable� are�selected,�subjects�or�individuals�are�somehow�assigned�to�these�levels�or�groups��Each� subject�is�then�exposed�to�only�one�level�of�the�independent�variable�(although�ANCOVA� with�repeated�measures�is�also�possible,�but�is�not�discussed�here)��In�our�example,�method� of� statistics� instruction� is� the� independent� variable� with� two� levels� or� groups,� the� tradi- tional�lecture�method�and�the�cutting-edge�method� Situations� where� the� researcher� is� able� to� randomly� assign� subjects� to� groups� are� known� as� true experimental designs�� Situations� where� the� researcher� does� not� have� control� over� which� level� a� subject� is� assigned� to� are� known� as� quasi-experimental designs�� This� lack� of� control� may� occur� for� one� of� two� reasons�� First,� the� groups� may� be�already�in�place�when�the�researcher�arrives�on�the�scene;�these�groups�are�referred� to� as� intact groups� (e�g�,� based� on� class� assignments� made� by� students� at� the� time� of� registration)��Second,�it�may�be�theoretically�impossible�for�the�researcher�to�assign�sub- jects�to�groups�(e�g�,�income�level)��Thus,�a�distinction�is�typically�made�about�whether� or� not�the�researcher�can�control�the�assignment�of�subjects�to�groups��The�distinction� between�the�use�of�ANCOVA�in�true�and�quasi-experimental�situations�has�been�quite� controversial�over�the�past�few�decades;�we�look�at�it�in�more�detail�later�in�this�chapter�� For�further�information�on�true�experimental�designs�and�quasi-experimental�designs,� 430 An Introduction to Statistical Concepts we� suggest� you� consider� Campbell� and� Stanley� (1966),� Cook and Campbell� (1979),� and� Shadish,�Cook,�and�Campbell�(2002)��In�our�example�again,�if�assignment�of�students�to� sections�is�random,�then�we�have�a�true�experimental�design��If�assignment�of�students� to�sections�is�not�random,�perhaps�already�assigned�at�registration,�then�we�have�a�quasi- experimental�design� One�final�item�in�the�first�set�of�characteristics�has�to�do�with�the�measurement�scales� of�the�variables��In�the�ANCOVA,�it�is�assumed�the�dependent�variable�is�measured�at�the� interval� level� or� better�� If� the� dependent� variable� is� measured� at� the� ordinal� level,� then� nonparametric�procedures�described�toward�the�end�of�this�chapter�should�be�considered�� It�is�also�assumed�that�the�covariate�is�measured�at�the�interval�level�or�better��Lastly,�as� indicated�previously,�the�independent�variable�must�be�a�grouping�or�categorical�variable� The� remaining� characteristics� have� to� do� with� the� uniqueness� of� the� ANCOVA�� As� already� mentioned,� the� ANCOVA� is� a� form� of� statistical� control� developed� specifically� to�reduce�unexplained�error�variation��The�covariate�(sometimes�known�as�a�concomitant variable,�as�it�accompanies�or�is�associated�with�the�dependent�variable)�is�a�source�of�varia- tion�not�controlled�for�in�the�design�of�the�experiment�but�believed�to�affect�the�dependent� variable��In�a�factorial�design,�for�example,�a�factor�could�be�included�to�reduce�error�varia- tion��However,�this�represents�an�experimental�design�form�of�control�as�it�is�included�as� a�factor�in�the�model� In�ANCOVA,�the�dependent�variable�is�adjusted�statistically�to�remove�the�effects�of�the� portion�of�uncontrolled�variation�represented�by�the�covariate��The�group�means�on�the� dependent�variable�are�adjusted�so�that�they�now�represent�groups�with�the�same�means� on�the�covariate��The�ANCOVA�is�essentially�an�ANOVA�on�these�“adjusted�means�”�This� needs�further�explanation��Consider�first�the�situation�of�the�randomized�true�experiment� where� there� are� two� groups�� Here� it� is� unlikely� that� the� two� groups� will� be� statistically� different�on�any�variable�related�to�the�dependent�measure��The�two�groups�should�have� roughly�equivalent�means�on�the�covariate,�although�5%�of�the�time,�we�would�expect�a� significant�difference�due�to�chance�at�α�=��05��Thus,�we�typically�do�not�see�preexisting� differences� between� the� two� groups� on� the� covariate� in� a� true� experiment—that� is� the� value� and� beauty� of� random� assignment,� especially� as� it� relates� to� ANCOVA�� However,� the�relationship�between�the�covariate�and�the�dependent�variable�is�important��If�these� variables�are�linearly�related�(discussed�later),�then�the�use�of�the�covariate�in�the�analysis� will�serve�to�reduce�the�unexplained�variation�in�the�model��The�greater�the�magnitude�of� the�correlation,�the�more�uncontrolled�variation�can�be�removed,�as�shown�by�a�reduction� in�mean�square�error� Consider� next� the� situation� of� the� quasi-experiment,� that� is,� without� randomization�� Here�it�is�more�likely�that�the�two�groups�will�be�statistically�different�on�the�covariate� as�well�as�other�variables�related�to�the�dependent�variable��Thus,�there�may�indeed�be�a� preexisting�difference�between�the�two�groups�on�the�covariate��If�the�groups�do�differ� on�the�covariate�and�we�ignore�it�by�conducting�an�ANOVA,�our�ability�to�get�a�precise� estimate�of�the�group�effects�will�be�reduced�as�the�group�effect�will�be�confounded�with� the�effect�of�the�covariate��For�instance,�if�a�significant�group�difference�is�revealed�by�the� ANOVA,�we�would�not�be�certain�if�there�was�truly�a�group�effect�or�whether�the�effect� was�due�to�preexisting�group�differences�on�the�covariate,�or�some�combination�of�group� and�covariate�effects��The�ANCOVA�takes�the�covariate�mean�difference�into�account�as� well�as�the�linear�relationship�between�the�covariate�and�the�dependent�variable� Thus,�the�covariate�is�used�to�(a)�reduce�error�variation,�(b)�take�any�preexisting�group� mean� difference� on� the� covariate� into� account,� (c)� take� into� account� the� relationship� between�the�covariate�and�the�dependent�variable,�and�(d)�yield�a�more�precise�and�less� 431Introduction to Analysis of Covariance biased� estimate� of� the� group� effects�� If� error� variation� is� reduced,� the� ANCOVA� will� be� more� powerful� and� require� smaller� sample� sizes� than� the� ANOVA� (Keppel� &� Wickens,� 2004;�Mickey,�Dunn,�&�Clark,�2004;�Myers�&�Well,�1995)��If�error�variation�is�not�reduced,� the�ANOVA�is�more�powerful��A�more�extensive�comparison�of�ANOVA�versus�ANCOVA� is�given�in�Chapter�16��In�addition,�as�shown�later,�one�degree�of�freedom�is�lost�from�the� error�term�for�each�covariate�used��This�results�in�a�larger�critical�value�for�the�F�test�and� makes�it�a�bit�more�difficult�to�find�a�statistically�significant�F�test�statistic��This�is�the�major� cost�of�using�a�covariate��If�the�covariate�is�not�effective�in�reducing�error�variance,�then� we�are�worse�off�than�if�we�had�ignored�the�covariate��Important�references�on�ANCOVA� include�Elashoff�(1969)�and�Huitema�(1980)� 14.2 Layout of Data Before�we�get�into�the�theory�and�subsequent�analysis�of�the�data,�let�us�examine�the�lay- out�of�the�data��We�designate�each�observation�on�the�dependent�or�criterion�variable�as�Yij,� where�the�j�subscript�tells�us�what�group�or�level�the�observation�belongs�to�and�the�i�sub- script�tells�us�the�observation�or�identification�number�within�that�group��The�first�subscript� ranges�over�i�=�1,�…,�nj,�and�the�second�subscript�ranges�over�j�=�1,�…,�J��Thus,�there�are�J�levels� of�the�independent�variable�and�nj�subjects�in�group�j��We�designate�each�observation�on�the� covariate�as�Xij,�where�the�subscripts�have�the�same�meaning� The�layout�of�the�data�is�shown�in�Table�14�1��Here�we�see�that�each�pair�of�columns�rep- resents�the�observations�for�a�particular�group�or�level�of�the�independent�variable�on�the� dependent�variable�(i�e�,�Y)�and�the�covariate�(i�e�,�X)��At�the�bottom�of�the�pair�of�columns� for�each�group�j�are�group�means�(Y – �j,�X – �j)��Although�the�table�shows�there�are�n�observations� for�each�group,�we�need�not�make�such�a�restriction,�as�this�was�done�only�for�purposes�of� simplifying�the�table� 14.3 ANCOVA Model The�ANCOVA�model�is�a�form�of�the�general�linear�model�(GLM),�much�like�the�models� shown�in�the�last�few�chapters�of�this�text��The�one-factor�ANCOVA�fixed-effects� model� can�be�written�in�terms�of�population�parameters�as�follows: Table 14.1 Layout�for�the�One-Factor�ANCOVA Level of the Independent Variable 1 2 … J Y11 X11 Y12 X12 … Y1J X1J Y21 X21 Y22 X22 … Y2J X2J … … … … … … … Yn1 Xn1 Yn2 Xn2 … YnJ XnJ Y – �1 X – �1 Y – �2 X – �2 … Y – �J X – �J 432 An Introduction to Statistical Concepts Y Xij Y j w ij X ij= + + − +µ α β µ ε( ) where Yij�is�the�observed�score�on�the�dependent�variable�for�individual�i�in�group�j μY�is�the�overall�or�grand�population�mean�(i�e�,�regardless�of�group�designation)�for�the� dependent�variable�Y αj�is�the�group�effect�for�group�j βw�is�the�within-groups�regression�slope�from�the�regression�of�Y�on�X�(i�e�,�the�covariate) Xij�is�the�observed�score�on�the�covariate�for�individual�i�in�group�j μX�is�the�overall�or�grand�population�mean�(i�e�,�regardless�of�group�designation)�for�the� covariate�X εij�is�the�random�residual�error�for�individual�i�in�group�j The�residual�error�can�be�due�to�individual�differences,�measurement�error,�and/or�other� factors�not�under�investigation��As�you�would�expect,�the�least�squares�sample�estimators� for�each�of�these�parameters�are�as�follows:�Y – �for�μY,�X – �for�μX,�aj�for�αj,�bw�for�βw,�and�eij�for�εij�� Just�like�in�the�ANOVA,�the�sum�of�the�group�effects�is�equal�to�0��This�implies�that�if�there� are�any�nonzero�group�effects,�then�the�group�effects�will�balance�out�around�0�with�some� positive�and�some�negative�effects� The�hypotheses�consist�of�testing�the�equality�of�the�adjusted�means�(defined�by�μ = �j� and�discussed�later)�as�follows: H0:�μ�1�=�μ�2�=�…�=�μ�J H1:�not�all�the�μ�j�are�equal 14.4 ANCOVA Summary Table We�turn�our�attention�to�the�familiar�summary�table,�this�time�for�the�one-factor�ANCOVA� model��A�general�form�of�the�summary�table�is�shown�in�Table�14�2��Under�the�first�column,� you�see�the�following�sources:�adjusted�between-groups�variation,�adjusted�within-groups� variation,�variation�due�to�the�covariate,�and�total�variation��The�second�column�notes�the� sums�of�squares�terms�for�each�source�(i�e�,�SSbetw(adj),�SSwith(adj),�SScov,�and�SStotal)��Recall�that� the�between�source�represents�the�independent�variable�being�systematically�studied�and� the�within�source�represents�the�error�or�residual� The�third�column�gives�the�degrees�of�freedom�for�each�source��For�the�adjusted�between- groups� source� (i�e�,� the� independent� variable� controlling� for� the� covariate),� because� there� are�J�group�means,�the�dfbetw(adj)�is�J�−�1,�the�same�as�in�the�one-factor�ANOVA�model��For� the�adjusted�within-groups�source,�because�there�are�N�total�observations�and�J�groups,�we� Table 14.2 One-Factor�ANCOVA�Summary�Table Source SS df MS F Between�adjusted SSbetw(adj) J − 1 MSbetw(adj) MSbetw(adj)/MSwith(adj) Within�adjusted�(i�e�,�error) SSwith(adj) N − J − 1 MSwith(adj) Covariate SScov 1 MScov MScov/MSwith(adj) Total SStotal N − 1 433Introduction to Analysis of Covariance would�expect�the�degrees�of�freedom�within�to�be�N − J,�because�that�was�the�case�in�the� one-factor�ANOVA�model��However,�as�we�pointed�out�earlier�in�the�characteristics�of�the� ANCOVA�model,�a�price�is�paid�for�the�use�of�a�covariate��The�price�here�is�that�we�lose�one� degree�of�freedom�from�the�within�term�for�a�single�covariate,�so�that�dfwith(adj)�is�N − J�−�1�� For� multiple� covariates,� we� lose� one� degree� of� freedom� for� each� covariate� used� (see� later� discussion)��This�degree�of�freedom�has�gone�to�the�covariate�source�such�that�dfcov�is�equal� to 1��Finally,�for�the�total�source,�as�there�are�N�total�observations,�the�dftotal�is�the�usual�N�−�1� The�fourth�column�gives�the�mean�squares�for�each�source�of�variation��As�always,� the�mean�squares�represent�the�sum�of�squares�weighted�by�their�respective�degrees� of�freedom��Thus,�[MSbetw(adj)�=�SSbetw(adj)/(J�−�1)],�[MSwith(adj)�=�SSwith(adj)/(N − J�−�1)],�and� [MScov�=�SScov/1]��The�last�column�in�the�ANCOVA�summary�table�is�for�the�F�values��Thus,� for�the�one-factor�fixed-effects�ANCOVA�model,�the�F�value�tests�for�differences�between� the�adjusted�means�(i�e�,�to�test�for�differences�in�the�mean�of�the�dependent�variable�based� on�the�levels�of�the�independent�variable�when�controlling�for�the�covariate)�and�is�com- puted�as�F = MSbetw(adj)/MSwith(adj)��A�second�F�value,�which�is�obviously�not�included�in�the� ANOVA�model,�is�the�test�of�the�covariate��To�be�specific,�this�F�statistic�is�actually�testing� the�hypothesis�of�H0:�βw�=�0��If�the�slope�is�equal�to�0,�then�the�covariate�and�the�dependent� variable� are� unrelated�� This� F� value� is� equal� to� F = MScov/MSwith(adj)�� If� the� F� test� for� the� covariate�is�not�statistically�significant�(and�has�a�negligible�effect�size),�the�researcher�may� want�to�consider�removing�that�covariate�from�the�model� The�critical�value�for�the�test�of�difference�between�the�adjusted�means�is�αFJ−1,�N−J−1��The� critical�value�for�the�test�of�the�covariate�is�αF1,�N−J−1��The�null�hypotheses�in�each�case�are� rejected�if�the�F�test�statistic�exceeds�the�F�critical�value��The�critical�values�are�found�in� the�F�table�of�Table�A�4� If�the�F�test�statistic�for�the�adjusted�means�exceeds�the�F�critical�value,�and�there�are�more� than�two�groups,�then�it�is�not�clear�exactly�how�the�means�are�different��In�this�case,�some� MCP� may� be� used� to� determine� which� means� are� different� (see� later� discussion)�� For� the� test� of� the� covariate� (i�e�,� the� within-groups� regression� slope),� we� hope� that� the� F� test� sta- tistic�does�exceed�the�F�critical�value��Otherwise�the�power�and�precision�of�the�test�of�the� adjusted�means�in�ANCOVA�will�be�lower�than�the�test�of�the�unadjusted�means�in�ANOVA� because�the�covariate�is�not�significantly�related�to�the�dependent�variable��[As�stated�previ- ously,�if�the�F�test�for�the�covariate�is�not�statistically�significant�(and�has�a�negligible�effect� size),�the�researcher�may�want�to�consider�removing�that�covariate�from�the�model�] 14.5 Partitioning the Sums of Squares As� seen� already,� the� partitioning� of� the� sums� of� squares� is� the� backbone� of� all� GLMs,� whether�we�are�dealing�with�an�ANOVA�model,�an�ANCOVA�model,�or�a�linear�regression� model��As�always,�the�first�step�is�to�partition�the�total�variation�into�its�relevant�parts�or� sources�of�variation��As�we�have�learned�from�the�previous�section,�the�sources�of�varia- tion�for�the�one-factor�ANCOVA�model�are�adjusted�between�groups�(i�e�,�the�independent� variable),�adjusted�within�groups�(i�e�,�error),�and�the�covariate��This�is�written�as SS SS SS SStotal betw(adj) with(adj) cov= + + From�this�point,�the�statistical�software�is�used�to�handle�the�remaining�computations� 434 An Introduction to Statistical Concepts 14.6 Adjusted Means and Related Procedures In�this�section,�we�formally�define�the�adjusted�mean,�briefly�examine�several�MCPs,�and� very�briefly�consider�power,�confidence�intervals�(CIs),�and�effect�size�measures� We�have�spent�considerable�time�already�discussing�the�analysis�of�the�adjusted�means�� Now�it�is�time�to�define�them��The�adjusted�mean�is�denoted�by�Yj� ′�and�estimated�by Y Y b X Xj j w j. . . ..( )′ = − − Here�it�should�be�noted�that�the�adjusted�mean�is�simply�equal�to�the�unadjusted�mean� (i�e�,� Y – �j)� minus� the� adjustment� [i�e�,� bw(X – �j� −� X – ��)]�� The� adjustment� is� a� function� of� the� within-groups�regression�slope�(i�e�,�bw)�and�the�difference�between�the�group�mean�and� the�overall�mean�for�the�covariate�(i�e�,�the�difference�being�the�group�effect,�X – �j�−�X – ��)�� No� adjustment� will� be� made� if� (a)� bw� =� 0� (i�e�,� X� and� Y� are� unrelated),� or� (b)� the� group� means�on�the�covariate�are�all�the�same��Thus,�in�both�cases,�Y Yj j� �= ′��In�all�other�cases,� at�least�some�adjustment�will�be�made�for�some�of�the�group�means�(although�not�neces- sarily�for�all�of�the�group�means)� You�may�be�wondering�how�this�adjustment�actually�works��Let�us�assume�the�covariate� and�the�dependent�variable�are�positively�correlated�such�that�bw�is�also�positive,�and�there� are�two�treatment�groups�with�equal�n’s�that�differ�on�the�covariate��If�group�1�has�a�higher� mean� on� both� the� covariate� and� the� dependent� variable� than� group� 2,� then� the� adjusted� means�will�be�closer�together�than�the�unadjusted�means��For�our�first�example,�we�have� the�following�conditions: b Y Y X X Xw = = = = = =1 50 30 20 10 151 2 1 2, , , , ,. . . . .. The�adjusted�means�are�determined�as�follows: Y Y b X Xw. . . ..( ) ( )1 1 1 50 1 20 15 45′ = − − = − − = Y Y b X Xw. . . ..( ) ( )2 2 2 30 1 10 15 35′ = − − = − − = This� is� shown� graphically� in� Figure� 14�1a�� In� looking� at� the� covariate� X,� we� see� that� group�1�has�a�higher�mean�(X – �1�=�20)�than�group�2�(X – �2�=�10)�by�10�points��The�vertical�line� represents� the� overall� mean� on� the� covariate� (X – �� =� 15)�� In� looking� at� the� dependent� variable�Y,�we�see�that�group�1�has�a�higher�mean�(Y – �1�=�50)�than�group�2�(Y – �2�=�30)�by�20� points��The�diagonal�lines�represent�the�regression�lines�for�each�group,�with�bw�=�1�0�� The�points�at�which�the�regression�lines�intersect�(or�cross)�the�vertical�line�(X – ��=�15)�rep- resent�on�the�Y�scale�the�values�of�the�adjusted�means��Here�we�see�that�the�adjusted� mean� for� group� 1�( ).Y1 45′ = � is� larger� than� the� adjusted� mean� for� group� 2� ( ).Y2 35′ = � by� 10� points�� Thus,� because� of� the� preexisting� difference� on� the� covariate,� the� adjusted� means�here�are�somewhat�closer�together�than�the�unadjusted�means�(10 points�vs��20� points,�respectively)� 435Introduction to Analysis of Covariance If�group�1�has�a�higher�mean�on�the�covariate�and�a�lower�mean�on�the�dependent�vari- able� than� group� 2,� then� the� adjusted� means� will� be� further� apart� than� the� unadjusted� means��As�a�second�example,�we�have�the�following�slightly�different�conditions: b Y Y X X Xw = = = = = =1 30 50 20 10 151 2 1 2, , , , ,. . . . .. Then�the�adjusted�means�become�as�follows: Y Y b X Xw. . . ..( ) ( )1 1 1 30 1 20 15 25′ = − − = − − = Y Y b X Xw. . . ..( ) ( )2 2 2 50 1 10 15 55′ = − − = − − = This� is� shown� graphically� in� Figure� 14�1b,� where� the� unadjusted� means� differ� by� 20� points� and� the� adjusted� means� differ� by� 30� points�� There� are� obviously� other� possible� situations� Let� us� briefly� examine� MCPs� for� use� in� the� ANCOVA� situation�� Most� of� the� proce- dures� described� in� Chapter� 12� can� be� adapted� for� use� with� a� covariate,� although� a� few� procedures� are� not� mentioned� here� as� critical� values� do� not� currently� exist�� The� adapted� procedures� involve� a� different� form� of� the� standard� error� of� a� contrast�� The� contrasts�are�formed�based�on�adjusted�means,�of�course��Let�us�briefly�outline�just�a� few�procedures��Each�of�the�test�statistics�has�as�its�numerator�the�contrast,�ψ′,�such�as� ψ ′ ′ ′= −Y Y. .1 2��The�standard�errors�do�differ�somewhat�depending�on�the�specific�MCP,� just�as�they�do�in�ANOVA� The�example�procedures�briefly�described�here�are�easily�translated�from�the�ANOVA� context�into�the�ANCOVA�context��The�Dunn�(or�the�Bonferroni)�method�is�appropriate�to� use�for�a�small�number�of�planned�contrasts�(still�utilizing�the�critical�values�from�Table� A�8)��The�Scheffé�procedure�can�be�used�for�unplanned�complex�contrasts�with�equal�group� variances�(again�based�on�the�F�table�in�Table�A�4)��The�Tukey�HSD�test�is�most�desirous� 50 Y 40 30 20 10 0 0 5 15 20 X10 X.2 – X.1 – Y.1 – Y.2 – Y.2 – Y.1 – Y.́1 – Y.2́ – X.. – 0 5 15 20 X10 X.2 – X.1 – X.. – (a) 60 Y 50 40 30 20 10 0 Y.́1 – Y.́2 – (b) Group 1Group 2 Group 2 Group 1 FIGuRe 14.1 Graphs�of�ANCOVA�adjustments� 436 An Introduction to Statistical Concepts for�unplanned�pairwise�contrasts�with�equal�n’s�per�group��There�has�been�some�discus- sion�in�the�literature�about�the�appropriateness�of�this�test�in�ANCOVA��Most�statisticians� currently�argue�that�the�procedure�is�only�appropriate�when�the�covariate�is�fixed,�when�in� fact�it�is�almost�always�random��As�a�result,�the�Bryant�and�Paulson�(1976)�generalization�of� the�Tukey�procedure�has�been�developed�for�the�random�covariate�case��The�test�statistic� is�compared�to�the�critical�value�αqX,df(error),J�taken�from�Table�A�10,�where�X�is�the�number� of�covariates��If�the�group�sizes�are�unequal,�the�harmonic�mean�can�be�used�in�ANCOVA� (Huitema,�1980)��A�generalization�of�the�Tukey-Bryant�procedure�for�unequal�n’s�ANCOVA� was� developed� by� Hochberg� and� Varon-Salomon� (1984)� (also� see� Hochberg� &� Tamhane,� 1987;�Miller,�1997)� Finally� a� very� brief� comment� about� power,� CIs,� and� effect� size� measures� for� the� one- factor�ANCOVA�model��In�short,�these�procedures�work�exactly�the�same�as�in�the�factorial� ANOVA�model,�except�that�they�are�based�on�adjusted�means�(Cohen,�1988),�and�as�we�will� see� in� SPSS,� partial� eta� squared� is� still� the� effect� size� computed�� There� really� is� nothing� more�to�say�than�that� 14.7 Assumptions and Violation of Assumptions The� introduction� of� a� covariate� requires� several� assumptions� beyond� the� traditional� ANOVA� assumptions�� For� the� familiar� assumptions� (e�g�,� independence� of� observations,� homogeneity,�and�normality),�the�discussion�is�kept�to�a�minimum�as�these�have�already� been� described� in� Chapters� 11� and� 13�� The� new� assumptions� are� as� follows:� (a)� linear- ity,� (b)� independence� of� the� covariate� and� the� independent� variable,� (c)� the� covariate� is� measured� without� error,� and� (d)� homogeneity� of� the� regression� slopes�� In� this� section,� we� describe� each� assumption,� how� each� assumption� can� be� evaluated,� the� effects� that� a� violation� of� the� assumption� might� have,� and� how� one� might� deal� with� a� serious� viola- tion��Later�in�the�chapter,�when�we�illustrate�how�to�use�SPSS�to�generate�ANCOVA,�we� will�specifically�test�for�the�assumptions�of�independence�of�observations,�homogeneity�of� variance,�normality,�linearity,�independence�of�the�covariate�and�the�independent�variable,� and�homogeneity�of�regression�slopes� 14.7.1 Independence As�we�learned�previously,�the�assumption�of�independence�of�observations�can�be�met� by�(a)�keeping�the�assignment�of�individuals�to�groups�(i�e�,�to�the�levels�or�categories� of� the� independent� variable)� separate� through� the� design� of� the� experiment� (specifi- cally�random�assignment—not�to�be�confused�with�random�selection),�and�(b)�keeping� the� individuals� separate� from� one� another� through� experimental� control� so� that� the� scores�on�the�dependent�variable�Y�are�independent�across�subjects�(both�within�and� across�groups)� As�in�previous�ANOVA�models,�the�use�of�independent�random�samples�is�also�cru- cial� in� the� ANCOVA�� The� F� ratio� is� very� sensitive� to� violation� of� the� independence� assumption�in�terms�of�increased�likelihood�of�a�Type�I�and/or�Type�II�error��A�violation� of�the�independence�assumption�may�affect�the�standard�errors�of�the�sample�adjusted� means� and� thus� influence� any� inferences� made� about� those� means�� One� purpose� of� 437Introduction to Analysis of Covariance random�assignment�of�individuals�to�groups�is�to�achieve�independence��If�each�indi- vidual� is� only� observed� once� and� individuals� are� randomly� assigned� to� groups,� then� the� independence� assumption� is� usually� met�� Random� assignment� is� important� for� valid� interpretation� of� both� the� F� test� and� MCPs�� Otherwise,� the� F� test� and� adjusted� means�may�be�biased� The�simplest�procedure�for�assessing�independence�is�to�examine�residual�plots�by�group�� If� the� independence� assumption� is� satisfied,� then� the� residuals� should� fall� into� a� random� display�of�points��If�the�assumption�is�violated,�then�the�residuals�will�fall�into�some�type�of� cyclical�pattern��As�discussed�in�Chapter�11,�the�Durbin�and�Watson�statistic�(1950,�1951,�1971)� can�be�used�to�test�for�autocorrelation��Violations�of�the�independence�assumption�generally� occur� in� the� three� situations� we� mentioned� in� Chapter� 11:� time� series� data,� observations� within�blocks,�or�replication��For�severe�violations�of�the�independence�assumption,�there�is� no�simple�“fix,”�such�as�the�use�of�transformations�or�nonparametric�tests�(see�Scariano�&� Davenport,�1987)� 14.7.2 homogeneity of Variance The�second�assumption�is�that�the�variances�of�each�population�are�the�same,�known� as� the�homogeneity�of�variance�assumption�� A�violation�of�this�assumption�may�lead� to� bias� in� the� SSwith� term,� as� well� as� an� increase� in� the� Type� I� error� rate,� and� possibly� an�increase�in�the�Type�II�error�rate��A�summary�of�Monte�Carlo�research�on�ANCOVA� assumption� violations� by� Harwell� (2003)� indicates� that� the� effect� of� the� violation� is� negligible� with� equal� or� nearly� equal� n’s� across� the� groups�� There� is� a� more� serious� problem�if�the�larger�n’s�are�associated�with�the�smaller�variances�(actual�or�observed� α� >� nominal� or� stated� α� selected� by� the� researcher,� which� is� a� liberal� result),� or� if� the�
larger� n’s� are� associated� with� the� larger� variances� (actual� α� <� nominal� α,� which� is� a� conservative�result)� In�a�plot�of�Y�versus�the�covariate�X�for�each�group,�the�variability�of�the�distributions� may� be� examined� for� evidence� of� the� extent� to� which� this� assumption� is� met�� Another� method�for�detecting�violation�of�the�homogeneity�assumption�is�the�use�of�formal�statisti- cal�tests�(e�g�,�Levene’s�test),�as�discussed�in�Chapter�11�and�as�we�illustrate�using�SPSS�later� in�this�chapter��Several�solutions�are�available�for�dealing�with�a�violation�of�the�homoge- neity�assumption��These�include�the�use�of�variance-stabilizing�transformations�or�other� ANCOVA� models� that� are� less� sensitive� to� unequal� variances,� such� as� nonparametric� ANCOVA�procedures�(described�at�the�end�of�this�chapter)� 14.7.3 Normality The�third�assumption�is�that�each�of�the�populations�follows�the�normal�distribution�� Based� on� the� classic� work� by� Box� and� Anderson� (1962)� and� Atiqullah� (1964),� as� well� as� the� summarization� of� modern� Monte� Carlo� work� by� Harwell� (2003),� the� F� test� is� relatively� robust� to� nonnormal� Y� distributions,� “minimizing� the� role� of� a� normally� distributed�X”�(Harwell,�2003,�p��62)��Thus,�we�need�only�really�be�concerned�with�seri- ous� nonnormality� (although� “serious� nonnormality”� is� a� subjective� call� made� by� the� researcher)� The� following� graphical� techniques� can� be� used� to� detect� violation� of� the� normality� assumption:� (a)� frequency� distributions� (such� as� stem-and-leaf� plots,� boxplots,� or� histo- grams)�or�(b)�normal�probability�plots��There�are�also�several�statistical�procedures�available� for�the�detection�of�nonnormality�[e�g�,�the�Shapiro–Wilk�(S–W)�test,�1965]��If�the�assumption� 438 An Introduction to Statistical Concepts of�normality�is�violated,�transformations�can�also�be�used�to�normalize�the�data,�as�previ- ously�discussed�in�Chapter�11��In�addition,�one�can�use�one�of�the�rank�ANCOVA�procedures� previously�mentioned� 14.7.4 linearity The�next�assumption�is�that�the�regression�of�Y�(i�e�,�the�dependent�variable)�on�X�(i�e�,� the�covariate)�is�linear��If�the�relationship�between�Y�and�X�is�not�linear,�then�use�of�the� usual�ANCOVA�procedure�is�not�appropriate,�just�as�linear�regression�(see�Chapter�17)� would�not�be�appropriate�in�cases�of�nonlinearity��In�ANCOVA�(as�well�as�in�correlation� and�linear�regression),�we�fit�a�straight�line�to�the�data�points�in�a�scatterplot��When�the� relationship�is�nonlinear,�a�straight�line�will�not�fit�the�data�particularly�well��In�addition,� the�magnitude�of�the�linear�correlation�will�be�smaller��If�the�relationship�is�not�linear,� the�estimate�of�the�group�effects�will�be�biased,�and�the�adjustments�made�in�SSwith�and� SSbetw�will�be�smaller� Violations� of� the� linearity� assumption� can� generally� be� detected� by� looking� at� scatter- plots� of� Y� versus� X,� overall� and� for� each� group� or� category� of� the� independent� variable�� Once�a�serious�violation�of�the�linearity�assumption�has�been�detected,�there�are�two�alter- natives� that� can� be� used,� transformations� and� nonlinear� ANCOVA�� Transformations� on� one�or�both�variables�can�be�used�to�achieve�linearity�(Keppel�&�Wickens,�2004)��The�sec- ond� option� is� to� use� nonlinear� ANCOVA� methods� as� described� by� Huitema� (1980)� and� Keppel�and�Wickens�(2004)� 14.7.5 Fixed Independent Variable The�fifth�assumption�states�that�the�levels�of�the�independent�variable�are�fixed�by�the� researcher�� This� results� in� a� fixed-effects� model� rather� than� a� random-effects� model�� As�in�the�one-factor�ANOVA�model,�the�one-factor�ANCOVA�model�is�the�same�com- putationally� in� the� fixed-� and� random-effects� cases�� The� summary� of� Monte� Carlo� research� by� Harwell� (2003)� indicates� that� the� impact� of� a� random-effect� on� the� F� test� is�minimal� 14.7.6 Independence of the Covariate and the Independent Variable A� condition� of� the� ANCOVA� model� (although� not� an� assumption)� requires� that� the� covariate� and� the� independent� variable� be� independent�� That� is,� the� covariate� is� not� influenced�by�the�independent�or�treatment�variable��If�the�covariate�is�affected�by�the� treatment�itself,�then�the�use�of�the�covariate�in�the�analysis�either�(a)�may�remove�part� of�the�treatment�effect�or�produce�a�spurious�(inflated)�treatment�effect�or�(b)�may�alter� the�covariate�scores�as�a�result�of�the�treatment�being�administered�prior�to�obtaining� the�covariate�data��The�obvious�solution�to�this�potential�problem�is�to�obtain�the�covari- ate� scores� prior� to� the� administration� of� the� treatment�� In� other� words,� be� alert� prior� to�the�study�for�possible�covariate�candidates��There�are�many�researchers�who�argue� that,� because� of� this� assumption,� ANCOVA� is� only� appropriate� in� the� case� of� a� true� experiment� where� random� assignment� of� cases� to� groups� was� performed�� Thus,� in� a� true�experiment,�the�treatment�(i�e�,�independent�variable)�and�covariate�are�not�related� by�default�of�random�assignment,�and,�thereby,�the�assumption�of�independence�of�the� 439Introduction to Analysis of Covariance covariate� and� independent� variable� is� met�� If� randomization� is� not� possible,� closely� matching�participants�on�the�covariate�may�also�help�to�ensure�the�assumption�is�not� violated� Let�us�consider�an�example�where�this�condition�is�obviously�violated��A�psychologist�is� interested�in�which�of�several�hypnosis�treatments�is�most�successful�in�reducing�or�elimi- nating�cigarette�smoking��A�group�of�heavy�smokers�is�randomly�assigned�to�the�hypnosis� treatments�� After� the� treatments� have� been� completed,� the� researcher� suspects� that� some� patients�are�more�susceptible�to�hypnosis�(i�e�,�are�more�suggestible)�than�others��By�using� suggestibility�as�a�covariate�after�the�study�is�completed,�the�researcher�would�not�be�able� to�determine�whether�group�differences�were�a�result�of�hypnosis�treatment,�suggestibility,� or�some�combination��Thus,�the�measurement�of�suggestibility�after�the�hypnosis�treatments� have�been�administered�would�be�ill-advised��An�extended�discussion�of�this�condition�is� given�in�Maxwell�and�Delaney�(1990)� Evidence�of�the�extent�to�which�this�assumption�is�met�can�be�done�by�examining�mean� differences�on�the�covariate�across�the�levels�of�the�independent�variable��If�the�indepen- dent�variable�has�only�two�levels,�an�independent�t�test�would�be�appropriate��If�the�inde- pendent�variable�has�more�than�two�categories,�a�one-way�ANOVA�would�suffice��If�the� groups� are� not� statistically� different� on� the� covariate,� then� that� lends� evidence� that� the� assumption�of�independence�of�the�covariate�and�the�independent�variable�has�been�met�� If�the�groups�are�statistically�different�on�the�covariate,�then�the�groups�are�not�likely�to� be�equivalent� 14.7.7 Covariate Measured Without error An�assumption�that�we�have�not�yet�discussed�in�this�text�is�that�the�covariate�is�mea- sured�without�error��This�is�of�special�concern�in�education�and�the�behavioral�sciences� where� variables� are� often� measured� with� considerable� measurement� error�� In� random- ized�experiments,�bw�(i�e�,�the�within-groups�regression�slope�from�the�regression�of�the� dependent� variable,� Y,� on� the� covariate,� X)� will� be� underestimated� so� that� less� of� the� covariate� effect� is� removed� from� the� dependent� variable� (i�e�,� the� adjustments� will� be� smaller)�� In� addition,� the� reduction� in� the� unexplained� variation� will� not� be� as� great,� and� the� F� test� will� not� be� as� powerful�� The� F� test� is� generally� conservative� in� terms� of� Type�I�error�(the�actual�observed�α�will�be�less�than�the�nominal�α�which�was�selected� by�the�researcher—the�nominal�alpha�is�often��05)��However,�the�treatment�effects�will� not�be�biased��In�quasi-experimental�designs,�bw�will�also�be�underestimated�with�simi- lar�effects��However,�the�treatment�effects�may�be�seriously�biased��A�method�by�Porter� (1967)�is�suggested�for�this�situation� There�is�considerable�discussion�about�the�effects�of�measurement�error�(e�g�,�Cohen� &�Cohen,�1983;�Huitema,�1980;�Keppel�&�Wickens,�2004;�Lord,�1960,�1967,�1969;�Mickey� et�al��2004;�Pedhazur,�1997;�Porter,�1967;�Reichardt,�1979;�Weisberg,�1979)��Obvious�viola- tions�of�this�assumption�can�be�detected�by�computing�the�reliability�of�the�covariate� prior�to�the�study�or�from�previous�research��This�is�the�minimum�that�should�be�done�� One� may� also� want� to� consider� the� validity� of� the� covariate� as� well,� where� validity� may�be�defined�as�the�extent�to�which�an�instrument�measures�what�it�was�intended�to� measure��While�this�is�the�first�mention�in�the�text�of�measurement�error,�it�is�certainly� important�that�all�measures�included�in�a�model—regardless�of�which�statistical�pro- cedure�is�being�conducted—are�measured�such�that�the�scores�provide�high�reliability� and�validity� 440 An Introduction to Statistical Concepts 14.7.8 homogeneity of Regression Slopes The�final�assumption�puts�forth�that�the�slope�of�the�regression�line�between�the�depen- dent�variable�and�covariate�is�the�same�for�each�category�of�the�independent�variable��Here� we�assume�that�β1�=�β2�=�…�=�βJ��This�is�an�important�assumption�because�it�allows�us�to� use�bw,�the�sample�estimator�of�βw,�as�the�within-groups�regression�slope��Assuming�that� the�group�slopes�are�parallel�allows�us�to�test�for�group�intercept�differences,�which is all we are really doing when we test for differences among the adjusted means��Without�this�assumption� of� homogeneity� of� regression� slopes,� groups� can� differ� on� both� the� regression� slope� and� intercept,�and�βw�cannot�legitimately�be�used��If�the�slopes�differ,�then�the�regression�lines� interact�in�some�way��As�a�result,�the�size�of�the�group�differences�in�Y�(i�e�,�the�dependent� variable)�will�depend�on�the�value�of�X�(i�e�,�the�covariate)��For�example,�treatment�1�may�be� most�effective�on�the�dependent�variable�for�low�values�of�the�covariate,�treatment�2�may� be�most�effective�on�the�dependent�variable�for�middle�values�of�the�covariate,�and�treat- ment�3�may�be�most�effective�on�the�dependent�variable�for�high�values�of�the�covariate�� Thus,�we�do�not�have�constant�differences�on�the�dependent�variable�between�the�groups� of�the�independent�variable�across�the�values�of�the�covariate��A�straightforward�interpre- tation�is�not�possible,�which�is�the�same�situation�in�factorial�ANOVA�when�the�interaction� between�factor�A�and�factor�B�is�found�to�be�significant��Thus,�unequal�slopes�in�ANCOVA� represent�a�type�of�interaction� There�are�other�potential�outcomes�if�this�assumption�is�violated��Without�homogeneous� regression�slopes,�the�use�of�βw�can�yield�biased�adjusted�means�and�can�affect�the�F�test�� Earlier�simulation�studies�by�Peckham�(1968)�and�Glass,�Peckham,�and�Sanders�(1972)�sug- gest�that�for�the�one-factor�fixed-effects�model,�the�effects�will�be�minimal��Later�analyti- cal�research�by�Rogosa�(1980)�suggests�that�there�is�little�effect�on�the�F�test�for�balanced� designs�with�equal�variances,�but�the�F�is�less�robust�for�mild�heterogeneity��However,�a� summary�of�modern�Monte�Carlo�work�by�Harwell�(2003)�indicates�that�the�effect�of�slope� heterogeneity�on�the�F�test�is�(a)�negligible�with�equal�n’s�and�equal�covariate�means�(ran- domized� studies),� (b)� modest� with� equal� n’s� and� unequal� covariate� means� (nonrandom- ized�studies),�and�(c)�modest�with�unequal�n’s� A�formal�statistical�procedure�is�often�conducted�to�test�for�homogeneity�of�slopes�using� statistical� software� such� as� SPSS� (discussed� later� in� this� chapter),� although� the� eyeball� method�(i�e�,�see�if�the�slopes�look�about�the�same�by�reviewing�scatterplots�of�the�depen- dent�variable�and�covariate�for�each�category�of�the�independent�variable)�can�be�a�good� starting�point��Some�alternative�tests�for�equality�of�slopes�when�the�variances�are�unequal� are�provided�by�Tabatabai�and�Tan�(1985)� Several�alternatives�are�available�if�the�homogeneity�of�slopes�assumption�is�violated�� The�first�is�to�use�the�concomitant�variable�not�as�a�covariate�but�as�a�blocking�variable�� This� will� work� because� this� assumption� is� not� made� for� the� randomized� block� design� (see�Chapter�16)��A�second�option,�and�not�a�very�desirable�one,�is�to�analyze�each�group� separately�with�its�own�slope�or�subsets�of�the�groups�having�equal�slopes��A�third�pos- sibility�is�to�utilize�interaction�terms�between�the�covariate�and�the�independent�variable� and�conduct�a�regression�analysis�(see�Agresti�&�Finlay,�1986)��A�fourth�option�is�to�use� the�Johnson�and�Neyman�(1936)�technique,�whose�purpose�is�to�determine�the�values�of� X�(i�e�,�the�covariate)�that�are�related�to�significant�group�differences�on�Y�(i�e�,�the�depen- dent�variable)��This�procedure�is�beyond�the�scope�of�this�text,�and�the�interested�reader� is� referred� to� Huitema� (1980)� or� Wilcox� (1987)�� A� fifth� option� is� to� use� more-modern� robust�methods�(e�g�,�Maxwell�&�Delaney,�1990;�Wilcox,�2003)� A�summary�of�the�ANCOVA�assumptions�is�presented�in�Table�14�3� 441Introduction to Analysis of Covariance 14.8 Example Consider�the�following�illustration�of�what�we�have�covered�in�this�chapter��Our�dependent� variable�is�the�score�on�a�statistics�quiz�(with�a�maximum�possible�score�of�6),�the�covariate�is� the�score�on�an�aptitude�test�for�statistics�taken�at�the�beginning�of�the�course�(with�a�maxi- mum�possible�score�of�10),�and�the�independent�variable�is�the�section�of�statistics�taken�(where� group�1�receives�the�traditional�lecture�method�and�group�2�receives�the�modern�innovative� method�that�includes�components�such�as�small-group�and�self-direction�instruction)��Thus,� the�researcher�is�interested�in�whether�the�method�of�instruction�influences�student�perfor- mance�in�statistics,�controlling�for�statistics�aptitude�(assume�we�have�developed�an�aptitude� measure�that�is�relatively�error-free)��Students�are�randomly�assigned�to�one�of�the�two�groups� at� the� beginning� of� the� semester� when� the� measure� of� statistics� aptitude� is� administered�� Table 14.3 Assumptions�and�Effects�of�Violations—One-Factor�ANCOVA Assumption Effect of Assumption Violation 1��Independence •�Increased�likelihood�of�a�Type�I�and/or�Type�II�error�in�F •�Affects�standard�errors�of�means�and�inferences�about�those�means 2��Homogeneity�of�variance •�Bias�in�SSwith;�increased�likelihood�of�a�Type�I�and/or�Type�II�error •�Negligible�effect�with�equal�or�nearly�equal�n’s •��Otherwise�more�serious�problem�if�the�larger�n’s�are�associated�with� the�smaller�variances�(increased�α)�or�larger�variances�(decreased�α) 3��Normality •��F�test�relatively�robust�to�nonnormal�Y,�minimizing�the�role�of� nonnormal�X 4��Linearity •�Reduced�magnitude�of�rXY •�Straight�line�will�not�fit�data�well •�Estimate�of�group�effects�biased •�Adjustments�made�in�SS�smaller 5��Fixed-effect •�Minimal�impact 6��Covariate�and�factor�are� independent •�May�reduce/increase�group�effects;�may�alter�covariate�scores 7��Covariate�measured�without� error •�True�experiment: •� bw�underestimated •� Adjustments�smaller •� Reduction�in�unexplained�variation�smaller •� F�less�powerful •� Reduced�likelihood�of�Type�I�error •�Quasi-experiment: •� bw�underestimated •� Adjustments�smaller •� Group�effects�seriously�biased 8��Homogeneity�of�slopes •�Negligible�effect�with�equal�n’s�in�true�experiment •�Modest�effect�with�equal�n’s�in�quasi-experiment •�Modest�effect�with�unequal�n’s 442 An Introduction to Statistical Concepts There�are�6�students�in�each�group�for�a�total�of�12��The�layout�of�the�data�is�shown�in�Table� 14�4,�where�we�see�the�data�and�sample�statistics�(means,�variances,�slopes,�and�correlations)� The�results�are�summarized�in�the�ANCOVA�summary�table�as�shown�in�the�top�panel� of� Table� 14�5�� The� ANCOVA� test� statistics� are� compared� to� the� critical� value� �05F1,9� =� 5�12� obtained�from�Table�A�4,�using�the��05�level�of�significance��Both�test�statistics�exceed�the� critical�value,�so�we�reject�H0�in�each�case��We�conclude�that�(a)�the�quiz�score�means�do� differ�for�the�two�statistics�groups�when�adjusted�(or�controlling)�for�aptitude�in�statistics,� and� (b)� the� slope� of� the� regression� of� Y� (i�e�,� dependent� variable)� on� X� (i�e�,� covariate)� is� statistically�significantly�different�from�0�(i�e�,�the�test�of�the�covariate)��Just�to�be�complete,� the�results�for�the�ANOVA�on�Y�are�shown�in�the�bottom�panel�of�Table�14�5��We�see�that�in� the�analysis�of�the�unadjusted�means�(i�e�,�the�ANOVA),�there�is�no�significant�group�differ- ence��Thus,�the�adjustment�(i�e�,�ANCOVA�which�controlled�for�aptitude�toward�statistics)� yielded�a�different�statistical�result��The�covariate�also�“did�its�thing”�in�that�a�reduction� Table 14.4 Data�and�Summary�Statistics�for�the�Statistics�Instruction�Example Group 1 Group 2 Overall Statistic Quiz (Y) Aptitude (X) Quiz (Y) Aptitude (X) Quiz (Y) Aptitude (X) 1 4 1 1 2 3 2 3 3 5 4 2 4 6 5 4 5 7 6 5 6 9 6 7 Means 3�5000 5�6667 4�0000 3�6667 3�7500 4�6667 Variances 3�5000 4�6667 4�4000 4�6667 3�6591 5�3333 bYX 0�8143 0�8143 0�5966 rXY 0�9403 0�8386 0�7203 Adjusted� means 2�6857 4�8143 Table 14.5 One-Factor�ANCOVA�and�ANOVA�Summary� Tables—Statistics�Instruction�Example Source SS df MS F ANCOVA Between�adjusted 10�8127 1 10�8127 11�3734a Within�adjusted 8�5560 9 0�9507 Covariate 20�8813 1 20�8813 21�9641a Total 40�2500 11 ANOVA Between 0�7500 1 0�7500 0�1899b Within 39�5000 10 3�9500 Total 40�2500 11 a� �05F1,9�=�5�12�(critical�value)� b� �05F1,10�=�4�96�(critical�value)� 443Introduction to Analysis of Covariance in�MSwith�resulted�due�to�the�strong�relationship�between�the�covariate�and�the�dependent� variable�(i�e�,�rXY�=�0�7203�overall)� Let�us�next�examine�the�group�quiz�score�means,�as�shown�in�Table�14�4��Here�we�see�that� with�the�unadjusted�quiz�score�means�(i�e�,�prior�to�controlling�for�the�covariate),�there�is�a� 0�5000�point�difference�in�favor�of�group�2�(the�innovative�teaching�method),�whereas�for�the� adjusted� quiz� score� means� (i�e�,� the� ANCOVA� results� which� controlled� for� aptitude),� there� is�a�2�1286�point�difference�in�favor�of�group�2��In�other�words,�the�adjustment�(i�e�, control- ling�for�statistics�aptitude)�in�this�case�resulted�in�a�greater�difference�between�the�adjusted� quiz�score�means�than�between�the�unadjusted�quiz�score�means��Since�there�are�only�two� groups,�an�MCP�is�unnecessary�(although�we�illustrate�this�in�the�SPSS�section)� 14.9 ANCOVA Without Randomization As� referenced� previously� in� the� discussion� of� assumptions,� there� has� been� a� great� deal� of� discussion� and� controversy� over� the� years,� particularly� in� education� and� the� behav- ioral� sciences,� about� the� use� of� the� ANCOVA� in� situations� where� randomization� is� not� conducted��Randomization�is�defined�as�an�experiment�where�individuals�are�randomly� assigned�to�groups�(or�cells�in�a�factorial�design)��In�the�Campbell�and�Stanley�(1966)�sys- tem�of�experimental�design,�these�designs�are�known�as�true experiments��(Do�not�con- fuse� random� assignment� with� random� selection,� the� latter� of� which� deals� with� how� the� cases�are�sampled�from�the�population�) In�certain�situations,�randomization�either�has�not�occurred�or�is�not�possible�due�to�cir- cumstances�in�the�study��The�best�example�is�the�situation�where�there�are�intact groups,� which�are�groups�that�have�been�formed�prior�to�the�researcher�arriving�on�the�scene��Either� the�researcher�chooses�not�to�randomly�assign�these�individuals�to�groups�through�a�reas- signment�(e�g�,�it�is�just�easier�to�keep�the�groups�in�their�current�form)�or�the�researcher�can- not�randomly�assign�them�(legally,�ethically,�or�otherwise)��When�randomization�does�not� occur,�the�resulting�designs�are�known�as�quasi-experimental��For�instance,�in�classroom� research,� the� researcher� is� almost� never� able� to� come� into� a� school� and� randomly� assign� students�to�groups��Once�students�are�given�their�class�assignments�at�the�beginning�of�the� year,�this�cannot�be�altered��On�occasion,�the�researcher�might�be�able�to�pull�a�few�students� out�of�several�classrooms,�randomly�assign�them�to�small�groups,�and�conduct�a�true�exper- iment��In�general,�this�is�possible�only�on�a�very�small�scale�and�for�short�periods�of�time� Let�us�briefly�consider�the�issues�as�it�relates�to�ANCOVA,�as�not�all�statisticians�agree��In�true� experiments�(i�e�,�with�randomization),�there�is�no�cause�for�concern�(except�for�dealing�with� the�statistical�assumptions)��The�ANCOVA�is�more�powerful�and�has�greater�precision�for�true� experiments�than�for�quasi-experiments��So�if�you�have�a�choice,�go�with�a�true�experimental� situation�(which�is�a�big�if)��In�a�true�experiment,�the�probability�that�the�groups�differ�on�the� covariate� or� any� other� concomitant� variable� is� equal� to� α�� That� is,� the� likelihood�that�the� group�means�will�be�different�on�the�covariate�is�small,�and,�thus,�the�adjustment�in�the�group� means�may�be�small��The�payoff�is�in�the�possibility�that�the�error�term�will�be�greatly�reduced� In� quasi-experiments,� as� it� relates� to� ANCOVA,� there� are� several� possible� causes� for� concern�� Although� this� is� the� situation� where� the� researcher� needs� the� most� help,� this� is�also�the�situation�where�less�help�is�available��Here�it�is�more�likely�that�there�will�be� statistically�significant�differences�among�the�group�means�on�the�covariate��Thus,�the� adjustment�in�the�group�means�can�be�substantial�(assuming�that�bw�is�different�from�0)�� 444 An Introduction to Statistical Concepts Because�there�are�significant�mean�differences�on�the�covariate,�any�of�the�following�may� occur:�(a)�it�is�likely�that�the�groups�may�be�different�on�other�important�characteristics� as�well,�which�have�not�been�controlled�for�either�statistically�or�experimentally;�(b)�the� homogeneity�of�regression�slopes�assumption�is�less�likely�to�be�met;�(c)�adjusting�for�the� covariate�may�remove�part�of�the�treatment�effect;�(d)�equating�groups�on�the�covariate� may�be�an�extrapolation�beyond�the�range�of�possible�values�that�occur�for�a�particular� group�(e�g�,�the�examples�by�Lord,�1967,�1969,�on�trying�to�equate�men�and�women,�or�by� Ferguson�&�Takane,�1989,�on�trying�to�equate�mice�and�elephants;�these�groups�should� not�be�equated�on�the�covariate�because�their�distributions�on�the�covariate�do�not�over- lap);�(e)�although�the�slopes�may�be�equal�for�the�range�of�Xs�obtained,�when�extrapolat- ing�beyond�the�range�of�scores,�the�slopes�may�not�be�equal;�(f)�the�standard�errors�of�the� adjusted� means� may� increase,� making� tests� of� the� adjusted� means� not� significant;� and� (g)�there�may�be�differential�growth�in�the�groups�confounding�the�results�(e�g�,�adult�vs�� child�groups)� Although�one�should�be�cautious�about�the�use�of�ANCOVA�in�quasi-experiments,�this�is� not�to�suggest�that�ANCOVA�should�never�be�used�in�such�situations��Just�be�extra�careful� and�do�not�go�too�far�in�terms�of�interpreting�your�results��If�at�all�possible,�replicate�your� study��For�further�discussion,�see�Huitema�(1980),�or�Porter�and�Raudenbush�(1987)� 14.10 More Complex ANCOVA Models The�one-factor�ANCOVA�model�can�be�extended�to�more-complex�models�in�the�same�way� as�we�expanded�the�one-factor�ANOVA�model��Thus,�we�can�consider�ANCOVA�designs� that� involve� any� of� the� following� characteristics:� (a)� factorial� designs� (i�e�,� having� more� than� one� factor� or� independent� variable);� (b)� fixed-,� random-,� and� mixed-effects� designs;� (c)�repeated�measures�and�split-plot�(mixed)�designs;�(d)�hierarchical�designs;�and�(e)�ran- domized� block� designs�� Conceptually� there� is� nothing� new� for� these� types� of� ANCOVA� designs,�and�you�should�have�no�trouble�getting�a�statistical�package�to�do�such�analyses�� For� further� information� on� these� designs,� see� Huitema� (1980),� Keppel� (1982),� Kirk� (1982),� Myers�and�Well�(1995),�Page,�Braver,�and�MacKinnon�(2003),�or�Keppel�and�Wickens�(2004)�� One�can�also�utilize�multiple�covariates�in�an�ANCOVA�design;�for�further�information,� see� Huitema� (1980),� Kirk� (1982),� Myers� and� Well� (1995),� Page� et� al�� (2003),� or� Keppel� and� Wickens�(2004)� 14.11 Nonparametric ANCOVA Procedures In� situations� where� the� assumptions� of� normality,� homogeneity� of� variance,� and/or� linearity� have� been� seriously� violated,� one� alternative� is� to� consider� nonparametric� ANCOVA�procedures��Some�rank�ANCOVA�procedures�have�been�proposed�by�Quade� (1967),�Puri�and�Sen�(1969),�Conover�and�Iman�(1982),�and�Rutherford�(1992)��For�a�descrip- tion�of�such�procedures,�see�these�references�as�well�as�Huitema�(1980),�Harwell�(2003),� or�Wilcox�(2003)� 445Introduction to Analysis of Covariance 14.12 SPSS and G*Power Next�we�consider�SPSS�for�the�statistics�instruction�example��As�noted�in�previous�chap- ters,�SPSS�needs�the�data�to�be�in�a�specific�form�for�the�analysis�to�proceed,�which�is�dif- ferent� from� the� layout� of� the� data� in� Table� 14�1�� For� a� one-factor� ANCOVA� with� a� single� covariate,�the�dataset�must�contain�three�variables�or�columns:�one�for�the�level�of�the�fac- tor�or�independent�variable,�one�for�the�covariate,�and�a�third�for�the�dependent�variable�� The�following�screenshot�presents�an�example�of�the�dataset�for�the�statistics�quiz�score� example�� Each� row� still� represents� one� individual,� displaying� the� level� of� the� factor� (or� independent�variable)�for�which�they�are�a�member,�as�well�as�their�scores�on�the�covariate� and�the�scores�for�the�dependent�variable� The dependent variable is “quiz” and represents the statistics quiz score. The covariate is “aptitude” measured prior to the course beginning. The independent variable is labeled “Group” where each value represents the instructional method to which the student was assigned (i.e., 1=traditional and 2=innovative). Step 1:�To�conduct�an�ANCOVA,�go�to�“Analyze”�in�the�top�pulldown�menu,�then�select� “General Linear Model,”�and�then�select�“Univariate.”�Following�the�screenshot� (step�1)�that�follows�produces�the�“Univariate”�dialog�box� A B C ANCOVA: Step 1 446 An Introduction to Statistical Concepts Step 2:�From�the�“Univariate”�dialog�box�(see�screenshot�step�2),�click�the�depen- dent�variable�(e�g�,�quiz�score)�and�move�it�into�the�“Dependent Variable”�box�by� clicking�the�arrow�button��Click�the�independent�variable�(e�g�,�group)�and�move�it�into� the�“Fixed Factor(s)”� box� by� clicking� the� arrow� button�� Click� the� covariate� (e�g�,� aptitude)�and�move�it�into�the�“Covariate(s)”�box�by�clicking�the�arrow�button��Next,� click�on�“Options.” Select the dependent variable from the list on the left and use the arrow to move it to the “Dependent variable” box on the right. Select the independent variable from the list on the left and use the arrow to move it to the “Fixed Factor(s)” box on the right. Select the covariate from the list on the left and use the arrow to move it to the “Covariate(s)” box on the right. Clicking on “Model” will allow you to change specifications to the model. Clicking on “Plots” will allow you to generate profile plots. Clicking on “Save” will allow you to save various forms of residuals, among other variables. Clicking on “Options” will allow you to obtain a number of other statistics (e.g., descriptive statistics, effect size, power, homogeneity tests). ANCOVA: Step 2 Step 3:� Clicking� on� “Options”� will� provide� the� option� to� select� such� information� as� “Descriptive Statistics,” “Estimates of effect size,”� “Observed� power,”�and�“Homogeneity tests.”�While�there,�move�the�items�that�are�listed�in�the� “Factor(s) and Factor Interactions:”� box� into� the�“Display Means for:”� box�to�generate�adjusted�means��Also,�check�the�box�“Compare Main Effects,”�then� click�the�pulldown�for�“Confidence interval adjustment”�to�choose�among�the� LSD,�Bonferroni,�or�Sidak�MCPs�of�the�adjusted�means��For�this�illustration,�we�select�the� “Bonferonni.”�Notice�that�the�“Post Hoc”�option�button�from�the�main�“Univariate”� dialog�box�(see�step�2)�is�not�active;�thus,�you�are�restricted�to�the�three�MCPs�just�men- tioned�that�are�accessible�from�this�“Options”�screen��Click�on�“Continue”�to�return� to�the�original�dialog�box� 447Introduction to Analysis of Covariance Select from the list on the left those variables that you wish to display means for and use the arrow to move to the “Display Means for” box on the right. Check the box to “Compare main effects,” then use the pulldown to select “Bonferroni.” ANCOVA: Step 3 Step 4: From�the�“Univariate”�dialog�box�(see�step�2),�click�on�“Plots”�to�obtain�a�pro- file�plot�of�means��Click�the�independent�variable�(e�g�,�statistics�course�section,�“Group”)� and�move�it�into�the�“Horizontal Axis”�box�by�clicking�the�arrow�button�(see�screen- shot�step�4a)��Then�click�on�“Add”�to�move�the�variable�into�the�“Plots”�box�at�the�bottom� of�the�dialog�box�(see�screenshot�step�4b)��Click�on�“Continue”�to�return�to�the�original� dialog�box� Select the independent variable from the list on the left and use the arrow to move it to the “Horizontal Axis” box on the right. ANCOVA: Step 4a 448 An Introduction to Statistical Concepts Then click “Add” to move the variable into the “Plots” box at the bottom. ANCOVA: Step 4b Step 5:�Finally,�in�order�to�generate�the�appropriate�sources�of�variation�and�results�as� recommended�in�this�chapter,�from�the�main�“Univariate”�dialog�box�(see�step�2),�you�need� to�click�on�the�“Model”�button��Then�select�“Type I”�from�the�“Sum of squares”�pull- down�menu��Click�on�“Continue”�to�return�to�the�original�dialog�box� You� may� be� asking� yourself� why� we� need� to� utilize� the� Type� I� sum� of� squares,� as� up� until� this� point� in� the� text,� we� have� always� recommended� the� Type� III� (which� is� the�default�in�SPSS)��In�a�study�conducted�by�Li�and�Lomax�(2011),�the�following�were� confirmed� with� SPSS� (as� well� as� with� SAS)�� First,� when� generating� the� Type� I� sum� of� squares,�the�covariate�is�extracted�first,�then�the�treatment�is�estimated�controlling�for� the�covariate��The�Type�I�sum�of�squares�will�also�correctly�add�up�to�the�total�sum�of� squares��Second,�when�generating�the�Type�III�sum�of�squares,�each�effect�is�estimated� controlling�for�each�of�the�other�effects��In�other�words,�the�covariate�is�computed�con- trolling�for�the�treatment,�and�the�treatment�is�determined�controlling�for�the�covari- ate��The�former�is�not�of�interest�as�the�treatment�is�administered�after�the�covariate�has� been� measured;� thus,� no� such� control� is� necessary�� Also,� the� Type� III� sum� of� squares� will� not� add� up� to� the� total� sum� of� squares� as� the� covariate� sum� of� squares� will� be� different� than� when� using� Type� I�� Thus,� you� do� not� want� to� estimate� the� covariate� controlling�for�the�treatment,�and,�thus,�you�want�to�use�the�Type�I,�not�Type�III,�in�the� ANCOVA�context� 449Introduction to Analysis of Covariance ANCOVA: Step 5 Step 6:�From�the�“Univariate”�dialog�box�(see�step�2),�click�on�“Save”�to�select�those� elements� that� you� want� to� save� (here� we� want� to� save� the� unstandardized� residuals� for� later�use�in�order�to�examine�the�extent�to�which�normality�and�independence�are�met)�� Click�on�“Continue”�to�return�to�the�original�dialog�box��From�the�“Univariate”�dialog� box,�click�on�“OK”�to�return�to�generate�the�output� ANCOVA: Step 6 Interpreting the output:�Annotated�results�are�presented�in�Table�14�6� 450 An Introduction to Statistical Concepts Table 14.6 Selected�SPSS�Results�for�the�Statistics�Instruction�Example Between-subjects factors Value label N Group 1.00 Traditional lecture method of instruction 6 2.00 Small group and self-directed instruction 6 Descriptive statistics Dependent variable: Quiz score Group Mean Std. Deviation N 3.5000 1.87083 4.0000 2.09762 Traditional lecture method of instruction Small group and self-directed instruction Total 3.7500 1.91288 6 6 12 Levene’s Test of Equality of Error Variancesa Dependent variable: Quiz score F df1 df2 Sig. 6.768 1 10 .026 Tests the null hypothesis that the error variance of the dependent variable is equal across groups. a Design: Intercept + aptitude + group �e table labeled “Between-Subjects Factors” provides sample sizes for each of the categories of the independent variable (recall that the independent variable is the ‘between subjects factor’). �e table labeled “Descriptive Statistics” provides basic descriptive statistics (means, standard deviations, and sample sizes) for each level of the independent variable. �e F test (and associated p value) for Levene’s Test for Equality of Error Variances is reviewed to determine if equal variances can be assumed. In this case, we meet the assumption (as p is greater than α). Note that df 1 is degrees of freedom for the numerator (calculated as J – 1) and df 2 are the degrees of freedom for the denominator (calculated as N – J ). 451Introduction to Analysis of Covariance Table 14.6 (continued) Selected�SPSS�Results�for�the�Statistics�Instruction�Example Dependent variable: Quiz s core Source Type I Sum of Squares df F Sig. Partial Eta Squared Noncent. Param eter Obs erved Powerb Corrected model 31.693a 15.846 16.667 .787 .993 Intercept 168.750 177.483 .952 1.000 Aptitude 20.881 21.961 .709 .986 Group 10.812 11.372 .001 .000 .001 .008 .558 33.333 177.483 21.961 11.372 .850 Error .951 Total 209.000 Corrected total 2 1 1 1 9 12 11 a Squared = .787 (Adjus ted R Squared = .740) b Com puted us ing alpha = .05 Partial eta squared is one measure of effect size: We can interpret this to say that approximately 56% of the variation in the dependent variable (in this case, statistics quiz score) is accounted for by the instructional method when controlling for aptitude. The row labeled “GROUP” is the independent variable or between groups variable. The between groups mean square (10.812) tells how much observations vary between groups. The degrees of freedom for between groups is J –1 (or 2-1 = 1 here). The omnibus F test is computed as The p value for the independent variable F test is .008. This indicates there is a statistically significant difference in quiz scores based on instructional method, controlling for aptitude. The probability of observing these mean differences or more extreme mean differences by chance if the null hypothesis is really true (i.e., if the means really are equal) is substantially less than 1%. We reject the null hypothesis that all the population adjusted means are equal. The p value for the covariate F test is .001. This indicates there is a statistically significant relationship between the covariate (aptitude) and quiz score. 168.750 20.881 10.812 8.557 40.250 The row labeled “Error” is within groups. The within groups mean square tells us how much the observations within the groups vary (i.e., .951). The degrees of freedom for within groups is (N – J – 1) or the sample size minus the number of levels of the independent variable minus one covariate. The row labeled “corrected total” is the sum of squares total. The degrees of freedom for the total is (N – 1) or the sample size minus one. Observed power tells whether our test is powerful enough to detect mean differences if they really exist. Power of .850 indicates that the probability of rejecting the null hypothesis if it is really false is about 85%, strong power. SStotal SSbetw + SScovR2 R2 = 40.250 10.812 + 20.881 = .951 10.812 = 11.37== MSwith MSbetwF Tests of Between-Subjects Effects Mean Square R squared is listed as a footnote underneath the table. R squared is the ratio of SS between and SS covariate divided by sum of squares total: 10.812 + 8.557 10.812 = SSbetw SSbetw + SSerror η2p = = .558 = .787 (continued) 452 An Introduction to Statistical Concepts Table 14.6 (continued) Selected�SPSS�Results�for�the�Statistics�Instruction�Example Estimated Marginal Means 1. Grand Mean Dependent variable: Quiz score Mean Std. Error 95% Confidence Interval Lower Bound Upper Bound 3.750a .281 3.113 4.387 a Covariates appearing in the model are evaluated at the following values: Aptitude = 4.6667. 2. Group Estimates Dependent variable: Quiz score Group Mean Std. Error 95% Confidence Interval Lower Bound Upper Bound Traditional lecture method of instruction 2.686a .423 3.642 Small group and self-directed instruction 4.814a .423 1.729 3.858 5.771 a Covariates appearing in the model are evaluated at the following values: Aptitude = 4.6667. �e ‘Grand Mean’ (in this case, 3.750) represents the overall mean, regardless of group membership in the independent variable. �e 95% CI represents the CI of the grand mean. �e table labeled “Group” provides descriptive statistics for each of the categories of the independent variable, controlling for the covariate (notice that these are NOT the same means reported previously; also note the table footnote). In addition to means, the SE and 95% CI of the means are reported. 453Introduction to Analysis of Covariance Table 14.6 (continued) Selected�SPSS�Results�for�the�Statistics�Instruction�Example Pairwise Comparisons Dependent variable: Quiz score (I) Group (J) Group Mean Di�erence (I–J) Std. Error Sig.a 95% Con�dence interval for di�erencea Lower Bound Upper Bound Traditional lecture method of instruction Traditional lecture method of instruction Small group and self-directed instruction Small group and self-directed instruction .631 .008 .631 –2.129* –701 2.129* .008 –3.556 .701 3.556 Based on estimated marginal means *The mean difference is significant at the .05 level. a Adjustment for multiple comparisons: Bonferroni. ‘Mean di�erence’ is simply the di�erence between the adjusted group means of the two groups compared. For example, the mean di�erence of group 1 and group 2, controlling for the covariate, is calculated as 2.686–4.814 = –2.128 (rounded). Because there are only two groups of the independent variable, the values in the table are the same (in absolute value) for row 1 as compared to row 2 (the exception is that the CI for the di�erence is switched). ‘Sig.’ denotes the observed p value and provides the results of the Bonferroni post hoc procedure. �ere is a statistically signi�cant adjusted mean di�erence between traditional instruction and innovative instruction (i.e., controlling for aptitude). Because we had only two groups, requesting post hoc results really was not necessarily. We could have reviewed the F test and then the adjusted means to determine which group had the higher adjusted mean. �e pairwise comparison results will become more valuable when the ANCOVA includes independent variables with more than two categories. (continued) 454 An Introduction to Statistical Concepts Table 14.6 (continued) Selected�SPSS�Results�for�the�Statistics�Instruction�Example Univariate Tests Dependent variable: Quiz score Sum of Squares df Mean Square F Sig. Partial Eta Squared Noncent. Parameter Observed Powera Contrast 1 10.812 11.372 .008 .558 11.372 .850 Error 10.812 8.557 9 .951 The F tests the effect of Group. This test is based on the linearly independent pairwise comparisons among the estimated marginal means. a Computed using alpha = .05 The table labeled “Univariate Tests” is simply another version of the omnibus F test. In the case of one independent variable, the row labeled “Contrast” provides the same results for the independent variable as that presented in the summary table previously. �e results from this table suggest there is a statistically significant difference in adjusted mean quiz score based on instructional method when controlling for aptitude. The profile plot is a plot of the adjusted means (i.e., controlling for the covariate) against the categories of the independent variable. This provides visual representation of the extent to which the quiz score means differ by instructional method when controlling for aptitude. 2.50 3.00 3.50 Es ti m at ed m ar gi na l m ea ns Estimated marginal means of quiz score 4.00 4.50 5.00 Group Traditional lecture method of instruction Small group and self-directed instruction Covariates appearing in the model are evaluated at the following values: aptitude = 4.667 455Introduction to Analysis of Covariance Examining Data for Assumptions The� assumptions� that� we� will� test� for� in� our� ANCOVA� model� include� (a)� independence� of� observations,� (b)� homogeneity� of� variance� (this� was� previously� generated;� thus,� you� can� examine� Table� 14�6� for� this� assumption� as� it� will� not� be� reiterated� here),� (c)� normal- ity,� (d)� linearity,� (e)� independence� of� the� covariate� and� the� independent� variable,� and� (f)� homogeneity�of�regression�slopes��We�will�examine�the�assumptions�after�generating�the� ANCOVA�results��This�is�because�many�of�the�tests�for�assumptions�are�based�on�examina- tion�of�the�residuals,�which�were�requested�when�generating�the�ANCOVA� Independence If�subjects�have�been�randomly�assigned�to�conditions�(in�other�words,�the�different�lev- els� of� the� independent� variable),� the� assumption� of� independence� has� been� met�� In� this� illustration,�students�were�randomly�assigned�to�instructional�method�(i�e�,�traditional�or� innovative),�and,�thus,�the�assumption�of�independence�was�met��As�we�have�learned�in� previous�chapters,�however,�we�often�use�independent�variables�that�do�not�allow�random� assignment� (e�g�,� intact� groups)�� We� can� plot� residuals� against� levels� of� the� independent� variable� in� a� scatterplot� to� get� an� idea� of� whether� or� not� there� are� patterns� in� the� data� and� thereby� provide� an� indication� of� the� extent� to� which� we� have� met� this� assumption�� Remember�that�these�variables�were�added�to�the�dataset�by�saving�the�unstandardized� residuals�when�we�generated�the�ANCOVA�model� Note�that�some�researchers�do�not�believe�that�the�assumption�of�independence�can�be� tested�� If� there� is� not� random� assignment� to� groups,� then� these� researchers� believe� this� assumption� has� been� violated—period�� The� plot� that� we� generate� will� give� us� a� general� idea�of�patterns,�however,�in�situations�where�random�assignment�was�not�performed� The�general�steps�for�generating�a�simple�scatterplot�through�“Scatter/dot”�have�been� presented�in�a�previous�chapter�(e�g�,�Chapter�10),�and�they�will�not�be�reiterated�here��From� the�“Simple Scatterplot”�dialog�screen,�click�the�residual�variable�and�move�it�into�the� “Y Axis”�box�by�clicking�on�the�arrow��Click�the�independent�variable�(e�g�,�group)�and� move�it�into�the�“X Axis”�box�by�clicking�on�the�arrow��Then�click�“OK.” Interpreting independence evidence:�In�examining�the�scatterplot�for�evidence� of�independence,�the�points�should�fall�relatively�randomly�above�and�below�the�horizon- tal�reference�line�at�0��In�this�example,�the�scatterplot�does�suggest�evidence�of�indepen- dence�with�relative�randomness�of�points�above�and�below�the�horizontal�line�at�0� 1.00 .50 .00 –.50 Re si du al fo r q ui z –1.00 –1.50 1.00 Group Group: Traditional lecture method of instruction 456 An Introduction to Statistical Concepts Normality Generating normality evidence:�As�alluded�to�earlier�in�the�chapter,�understand- ing� the� distributional� shape,� specifically� the� extent� to� which� normality� is� a� reasonable� assumption,� is� important�� For� the� ANCOVA,� the� distributional� shape� for� the� residuals� should�be�a�normal�distribution��We�can�again�use�“Explore”�to�examine�the�extent�to� which�the�assumption�of�normality�is�met� The�general�steps�for�accessing�“Explore”�have�been�presented�in�previous�chapters,�and� will�not�be�repeated�here��From�the�“Explore”�dialog�menu�(see�following�screenshot),�click� the�residual�and�move�it�into�the�“Dependent List”�box�by�clicking�on�the�arrow�button�� The�procedures�for�selecting�normality�statistics�were�presented�in�Chapter�6,�and�remain� the�same�here:�Click�on�“Plots”�in�the�upper�right�corner��Place�a�checkmark�in�the�boxes�for� “Normality plots with tests”�and�also�for�“Histogram.”�Then�click�“Continue”�to� return�to�the�main�“Explore”�dialog�box��Then�click�“OK”�to�generate�the�output� Interpreting normality evidence:� We� have� already� developed� a� good� under- standing�of�how�to�interpret�some�forms�of�evidence�of�normality�including�skewness�and� kurtosis,�histograms,�and�boxplots��Here�we�examine�the�output�for�these�statistics�again� The�skewness�statistic�of�the�residuals�is�−�237�and�kurtosis�is�−1�024—both�are�within� the�range�of�an�absolute�value�of�2�0,�suggesting�some�evidence�of�normality�(see�“descrip- tives”�output�as�follows)� Residual for quiz Mean for mean 5% Trimmed mean Median Variance Std. deviation Minimum Maximum Range Skewness Kurtosis 95% Con�dence interval Lower bound Upper bound Descriptives Statistic Std. Error .25461.0000 –.5604 .5604 .0056 .1357 .778 .88200 –1.46 1.36 2.81 1.51 –.237 –1.024 Interquartile range 1.232 .637 457Introduction to Analysis of Covariance The�histogram�of�residuals�is�not�what�most�would�consider�normal�in�shape,�and�this� is�largely�an�artifact�of�the�small�sample�size��Because�of�this,�we�will�rely�more�heavily�on� the�other�forms�of�normality�evidence� 3 2 1 Fr eq ue nc y 0 –1.50 –1.00 –.50 .00 Residual for quiz .50 1.00 1.50 Histogram Mean = –5.69E–16 Std. dev. = .882 N = 12 There�are�a�few�other�statistics�that�can�be�used�to�gauge�normality��The�formal�test�of� normality,� the� S–W� test� (SW)� (Shapiro� &� Wilk,� 1965),� provides� evidence� of� the� extent� to� which� our� sample� distribution� is� statistically� different� from� a� normal� distribution�� The� output�for�the�S-W�test�is�presented�as�follows�and�suggests�that�our�sample�distribution� for�residuals�is�not�statistically�significantly�different�than�what�would�be�expected�from�a� normal�distribution�(SW�=��965,�df�=�12,�p�=��854)� Residual for quiz Statistic Statisticdf dfSig. Sig. .85412.965.20012.124 Shapiro–WilkKolmogorov–Smirnova Tests of Normality a Lilliefors significance correction. *This is a lower bound of the true significance. Quantile–quantile� (Q–Q)� plots� are� also� often� examined� to� determine� evidence� of� nor- mality�� Q–Q� plots� are� graphs� that� plot� quantiles� of� the� theoretical� normal� distribution� against� quantiles� of� the� sample� distribution�� Points� that� fall� on� or� close� to� the� diagonal� line�suggest�evidence�of�normality��The�Q–Q�plot�of�residuals�shown�as�follows�suggests� relative�normality� 458 An Introduction to Statistical Concepts 2 1 0 Ex pe ct ed n or m al –1 –2 –2 –1 0 Observed value 1 2 Normal Q–Q plot of residual for quiz Examination�of�the�following�boxplot�suggests�a�relatively�normal�distributional�shape� of�residuals�and�no�outliers� 1.50 1.00 .50 .00 –.50 –1.00 –1.50 Residual for quiz Considering�the�forms�of�evidence�we�have�examined,�skewness�and�kurtosis�statistics,� histogram,�the�S-W�test,�the�Q–Q�plot,�and�the�boxplot,�all�suggest�normality�is�a�reason- able�assumption��We�can�be�reasonably�assured�we�have�met�the�assumption�of�normality� of�the�dependent�variable�for�each�group�of�the�independent�variable� 459Introduction to Analysis of Covariance Linearity Recall�that�the�assumption�of�linearity�means�that�the�regression�of�the�dependent�vari- able�(i�e�,�“quiz”�in�this�illustration)�on�the�covariate�(i�e�,�“aptitude”)�is�linear��Evidence�of� the�extent�to�which�this�assumption�is�met�can�be�done�by�examining�scatterplots�of�the� dependent�variable�versus�the�covariate—both�overall�and�also�for�each�category�or�group� of�the�independent�variable� Linearity evidence: Overall.� The� general� steps� for� generating� a� simple� scatter- plot�through�“Scatter/dot”�have�been�presented�in�a�previous�chapter�(e�g�,�Chapter�10),� and�they�will�not�be�reiterated�here��To�generate�the�overall�scatterplot,�from�the�“Simple Scatterplot”�dialog�screen,�click�the�dependent�variable�and�move�it�into�the�“Y Axis”� box� by� clicking� on� the� arrow�� Click� the� covariate� (e�g�,� aptitude)� and� move� it� into� the� “X Axis”�box�by�clicking�on�the�arrow��Then�click�“OK.” Interpreting independence of linearity (overall):�In�examining�the�scat- terplot�for�overall�evidence�of�linearity,�the�points�should�fall�relatively�linearly�(in�other� words,�we�should�not�be�seeing�a�curvilinear�or�some�other�nonlinear�relationship)��In�this� example,�our�scatterplot�suggests�we�have�evidence�of�overall�linearity�as�there�is�a�rela- tively�clear�pattern�of�points�which�suggest�a�positive�and�linear�relationship�between�the� dependent�variable�and�covariate� 6.00 5.00 4.00 3.00Q ui z sc or e 2.00 1.00 .00 2.00 4.00 6.00 Aptitude 8.00 10.00 R2 linear = 0.519 Linearity evidence: By group of independent variable.�To�generate�the�scat- terplot�of�the�dependent�variable�and�covariate�for�each�group�of�the�independent�variable,� we�must�first�split�the�data�file��To�do�this,�go�to “Data”�in�the�top�pulldown�menu��Then� select�“Split File.” 460 An Introduction to Statistical Concepts A B Linearity evidence by group of independent variable From� the�“Split File”� dialog� screen,� select� the� radio� button� for�“Organize out- put by groups,”�and�then�click�the�independent�variable�and�move�it�into�the�“Groups Based on”�box�by�clicking�on�the�arrow��Then�click�“OK.” Click the radio button for “Organize output by groups.” Select the independent variable from the list on the left and use the arrow to move it to the “Groups Based on” box on the right. After�splitting�the�file,�the�next�step�is�to�generate�the�scatterplot�of�the�dependent�variable� by�covariate��Because�we�have�split�the�file,�there�will�be�two�scatterplots�generated:�one�for� the�traditional�teaching�method�and�one�for�the�innovative�teaching�method��The�general� 461Introduction to Analysis of Covariance steps�for�generating�a�simple�scatterplot�through�“Scatter/dot”�have�been�presented�in� a�previous�chapter�(e�g�,�Chapter�10),�and�they�will�not�be�repeated�here��Because�we�have� just�generated�the�overall�scatterplot,�the�selections�made�previously�will�remain,�and,�thus,� from�the�“Simple Scatterplot”�dialog�screen,�simply�click�“OK”�to�generate�the�output� Interpreting evidence of linearity (by group of independent vari- able):�In�examining�the�scatterplot�for�evidence�of�linearity�by�group�of�the�independent� variable,�our�interpretation�should�remain�the�same:�the�points�should�fall�relatively�lin- early� (in� other� words,� we� should� not� see� a� curvilinear� or� some� other� nonlinear� relation- ship)��In�this�example,�our�scatterplots�suggest�we�have�evidence�of�linearity�by�group�of� the� independent� variable� as� there� is� a� relatively� clear� pattern� of� points� which� suggest� a� positive� and� linear� relationship� between� the� dependent� variable� and� covariate� for� each� group�of�the�independent�variable� 6.00 5.00 4.00 3.00Q ui z sc or e 2.00 1.00 3.00 4.00 5.00 6.00 Aptitude 7.00 8.00 9.00 Group: Traditional lecture method of instruction R2 linear = 0.884 6.00 5.00 4.00 3.00Q ui z sc or e 2.00 1.00 1.00 2.00 3.00 4.00 Aptitude 5.00 6.00 7.00 Group: Small group and self-directed instruction R2 linear = 0.703 462 An Introduction to Statistical Concepts Independence of Covariate and Independent Variable Recall� the� assumption� of� independence� of� the� covariate� and� independent� variable�� In� other�words,�the�levels�of�the�independent�variable�should�not�differ�on�the�covariate��If� subjects� have� been� randomly� assigned� to� conditions� (in� other� words,� the� different� lev- els� of� the� independent� variable),� the� assumption� of� independence� of� the� covariate� and� independent� variable� has� likely� been� met�� In� this� illustration,� students� were� randomly� assigned� to� teaching� method� (i�e�,� traditional� or� innovative),� and,� thus,� the� assumption� of�independence�of�the�covariate�and�independent�variable�was�likely�met��As�we�have� learned�in�previous�chapters,�however,�we�often�use�independent�variables�that�do�not� allow�random�assignment��Evidence�of�the�extent�to�which�this�assumption�is�met�can� be� done� by� examining� mean� differences� on� the� covariate� based� on� the� independent� variable�� If� the� independent� variable� has� only� two� levels,� an� independent� t� test� would� be� appropriate�� If� the� independent� variable� has� more� than� two� categories,� a� one-way� ANOVA�would�suffice��If�the�groups�are�not�statistically�different�on�the�covariate,�then� that�lends�evidence�that�the�assumption�of�independence�of�the�covariate�and�the�inde- pendent�variable�has�been�met� We� have� two� levels� of� our� independent� variable;� thus,� we� will� generate� an� indepen- dent� t� test�� The� general� steps� for� generating� an� independent� t� test� have� been� presented� in� Chapter� 8,� and� they� will� not� be� reiterated� here�� From� the�“Independent Samples T Test”� dialog� screen,� click� the� covariate� (e�g�,� aptitude)� and� move� it� into� the� “Test Variable(s)”�box�by�clicking�on�the�arrow��Click�the�independent�variable�(e�g�,�group)� and� move� it� into� the� “Grouping Variable”� box� by� clicking� on� the� arrow�� Click� the� “Define Groups”� box� and� enter� “1”� for� “Group� 1”� and� “2”� for� “Group� 2�”� Then� click� “Continue”�to�return�to�the�main�“Independent Samples T Test”�dialog�screen,�and� click�on�“OK”�to�generate�the�output� Interpreting independence of covariate and independent variable evidence:�In�examining�the�independent�t�test�results,�evidence�of�independence�of�the� covariate�and�independent�variable�is�provided�when�the�test�results�are�not�statistically� significant��In�this�example,�our�results�suggest�we�have�evidence�of�independence�of�the� covariate� and� independent� variable� as� the� results� are� not� statistically� significant,� t(10)� =� 1�604,� p� =� �140�� Thus,� we� have� likely� met� this� assumption� through� random� assignment� of�cases�to�groups,�and�this�provides�further�confirmation�that�we�have�not�violated�the� assumption�of�independence�of�the�covariate�and�independent�variable� Independent�Samples�Test Levene’s Test for Equality of Variances t-Test for Equality of Means t df Sig. (Two- Tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference F Sig. Lower Upper Aptitude Equal� variances� assumed Equal� variances� not�assumed �000 1�000 1�604 10 �140 2�00000 1�24722 −�77898 4�77898 1�604 10�000 �140 2�00000 1�24722 −�77898 4�77898 463Introduction to Analysis of Covariance Homogeneity of Regression Slopes Step 1:�In�order�to�test�the�homogeneity�of�slopes�assumption,�you�will�need�to�rerun� the�ANCOVA�analysis��Keep�every�screen�the�same�as�before,�with one exception��Return�to� the�main�“Univariate”�dialog�box�(see�step�2)�and�click�on�“Model.”�From�the�“Model”� dialog�box,�click�on�the “Custom”�button�to�build�a�custom�model�to�include�the�inter- action� between� the� independent� and� covariate� variables�� To� do� this,� under� the�“Build Terms”�pulldown�in�the�middle�of�the�dialog�box,�select�“Main effects.” Step 1: Generating homogeneity of regression slopes evidence Step 2:�Click�the�independent�variable�and�move�it�into�the�“Model”�box�by�clicking�on� the�arrow�button��Next,�click�the�covariate�and�move�it�into�the�“Model”�box�by�clicking� on�the�arrow�button��This�will�place�“Group”�and�“Aptitude”�in�the “Model”�box�on�the� right�of�the�screen� For the main effects, select the independent variable and covariate from the list on the left and use the arrow to move them to the “Model” box on the right. Step 2: Generating homogeneity of regression slopes evidence 464 An Introduction to Statistical Concepts Step 3:�Then�from�the�“Build Terms”�pulldown�menu,�select�“Interaction.” Step 3: Generating homogeneity of regression slopes evidence Step 4:�Click�both�variables�at�the�same�time�(e�g�,�using�the�shift�key)�and�use�the�arrow�key�to� move�the�interaction�of�Aptitude�*�Group�into�the�“Model”�box�on�the�right��There�should�now� be�three�terms�in�the�Model�box:�the�interaction�and�two�main�effects��Then�click�“Continue”� to�return�to�the�main�“Univariate”�dialog�box��Then�click�“OK”�to�generate�the�output� For the interaction, select both the independent variable and covariate from the list on the left and use the arrow to move them to the “Model” box on the right. Step 4: Generating homogeneity of regression slopes evidence Interpreting homogeneity of regression slopes evidence:�Selected�results,� specifically�the�ANCOVA�summary�table�which�presents�the�results�for�the�homogeneity�of� slopes�test,�are�presented�as�follows��Here�the�only�thing�that�we�care�about�is�the�test�of�the� interaction,�which�we�want�to�be�nonsignificant�[and�we�find�this�to�be�the�case:�F(1,�8)�=��000,� p�=�1�000]��This�indicates�that�we�have�met�the�homogeneity�of�regression�slopes�assumption� 465Introduction to Analysis of Covariance Dependent Variable: Quiz Score Source Corrected model Intercept Group Aptitude Group*Aptitude Error Total Corrected total a R squared = .787 (adjusted R squared = .708). Type I Sum of Squares df F Sig. Partial Eta Squared Mean Square Tests of Between-Subjects E�ects Noncent Parameter Observed Powerb .955 1.000 .115 .997 .050.000 28.928 .701 157.763 29.629.005 .000 .427 .001 1.000 .787 .952 .081 .783 .000.000 28.928 .701 157.763 9.87631.693a 168.750 .750 30.943 .000 8.557 209.000 40.250 3 1 1 1 1 8 12 11 10.564 168.750 .750 30.943 .000 1.070 b Computed using alpha = .05. Post Hoc Power for ANCOVA Using G*Power Generating�power�analysis�for�ANCOVA�models�follows�similarly�to�that�for�ANOVA�and� factorial�ANOVA��In�particular,�if�there�is�more�than�one�independent�variable,�we�must� test�for�main�effects�and�interactions�separately��Because�we�only�have�one�independent� variable�for�our�ANCOVA�model,�our�illustration�assumes�only�one�main�effect��If�there� were�additional�independent�variables�and/or�interactions,�we�would�have�followed�these� steps�for�those�as�well� The�first�thing�that�must�be�done�when�using�G*Power�for�computing�post�hoc�power�is�to� select�the�correct�test�family��In�our�case,�we�conducted�an�ANCOVA��To�find�ANCOVA,�we� will�select�“Tests”�in�the�top�pulldown�menu,�then “Means,”�and�then�“Many groups: ANCOVA: Main effects and interactions.”�Once�that�selection�is�made,�the�“Test family”�automatically�changes�to�“F tests.” A B C Step 1 466 An Introduction to Statistical Concepts The�“Type of Power Analysis”�desired�then�needs�to�be�selected��To�compute�post� hoc� power,� we� need� to� select�“Post hoc: Compute achieved power—given α, sample size, and effect size.” �e default selection for “Test Family” is “t tests”. Following the procedures presented in Step 1 will automatically change the test family to“F tests”. �e default selection for “Statistical Test” is “Correlation: Point biserial model”. Following the procedures presented in Step 1 will automatically change the statistical test to “ANCOVA: Fixed effects, main effects and interactions”. Click on “Determine” to pop out the effect size calculator box (shown below). �is will allow you to compute f given partial eta squared. Step 2 Once the parameters are specified, click on “Calculate”.�e “Input Parameters” for computing post hoc power must be specified (the default values are shown here) including: 1. Effect size f 2. α level 3. Total sample size 4. Numerator df 5. Number of groups 6. Number of convariates The�“Input Parameters”�must�then�be�specified��We�will�compute�the�effect�size�f�last,� so�we�skip�that�for�the�moment��In�our�example,�the�alpha�level�we�used�was��05,�and�the� total�sample�size�was�12��The�numerator degrees of freedom�for�group�(our�independent�vari- able)�are�equal�to�the�number�of�categories�of�this�variable�(i�e�,�2)�minus�1;�thus,�there�is� one� degree� of� freedom� for� the� numerator�� The� number of groups� equals,� in� the� case� of� an� ANCOVA� with� multiple� independent� variables,� the� product� of� the� number� of� levels� or� categories�of�the�independent�variables�or�(J)(K)��In�this�example,�we�have�only�one�inde- pendent�variable��Thus,�the�number�of�groups�when�there�is�only�one�independent�variable� is�equal�to�the�number�of�categories�of�this�independent�variable�(i�e�,�2)��The�last�param- eter�that�must�be�inputted�is�the�number�of�covariates��In�this�example,�we�have�only�one� covariate;�thus,�we�enter�1�in�this�box� We�skipped�filling�in�the�first�parameter,�the�effect�size�f,�for�a�reason��SPSS�only�pro- vides�a�partial�eta�squared�measure�of�effect�size��Thus,�we�will�use�the�pop-out�effect� size�calculator�in�G*Power�to�compute�the�effect�size�f�(we�saved�this�parameter�for�last� as�the�calculation�is�based�on�the�previous�values�just�entered)��To�pop�out�the�effect�size� 467Introduction to Analysis of Covariance calculator,� click� on� “Determine”� which� is� displayed� under� “Input Parameters.”� In� the� pop-out� effect� size� calculator,� click� on� the� radio� button� for� “Direct”� and� then�enter�the�partial�eta�squared�value�for�group�that�was�calculated�in�SPSS�(i�e�,��558)�� Clicking� on�“Calculate”� in� the� pop-out� effect� size� calculator� will� calculate� the� effect� size�f��Then�click�on�“Calculate and Transfer to Main window”�to�transfer�the� calculated�effect�size�(i�e�,�1�1235851)�to�the�“Input Parameters.”�Once�the�parameters� are�specified,�click�on�“Calculate”�to�find�the�power�statistics� Post hoc power Here are the post-hoc power results. The�“Output Parameters”�provide�the�relevant�statistics�given�the�input�just�speci- fied��In�this�example,�we�were�interested�in�determining�post�hoc�power�for�an�ANCOVA� with� a� computed� effect� size� f� of� 1�1235851,� an� alpha� level� of� �05,� total� sample� size� of� 12,� numerator�degrees�of�freedom�of�1,�two�groups,�and�one�covariate� Based�on�those�criteria,�the�post�hoc�power�for�the�main�effect�of�instructional�method�(i�e�,� our�only�independent�variable)�was��93��In�other�words,�with�an�ANCOVA,�computed�effect� size�f�of�1�124,�alpha�level�of��05,�total�sample�size�of�12,�numerator�degrees�of�freedom�of�1,� two�groups,�and�one�covariate,�the�post�hoc�power�of�our�main�effect�for�this�test�was��93— the�probability�of�rejecting�the�null�hypothesis�when�it�is�really�false�(in�this�case,�the�prob- ability�that�the�adjusted�means�of�the�dependent�variable�would�be�equal�for�each�level�of�the� 468 An Introduction to Statistical Concepts independent�variable,�controlling�for�the�covariate)�was�about�93%,�which�would�be�consid- ered�more�than�sufficient�power�(sufficient�power�is�often��80�or�above)��Note�that�this�value� differs�slightly�than�that�reported�in�SPSS��Keep�in�mind�that�conducting�power�analysis�a� priori�is�recommended�so�that�you�avoid�a�situation�where,�post�hoc,�you�find�that�the�sample� size�was�not�sufficient�to�reach�the�desired�level�of�power�(given�the�observed�parameters)� A Priori Power for ANCOVA Using G*Power For� a� priori� power,� we� can� determine� the� total� sample� size� needed� for� the� main� effects� and/or�interactions�given�an�estimated�effect�size�f,�alpha�level,�desired�power,�numerator� degrees�of�freedom�(i�e�,�number�of�categories�of�our�independent�variable�and/or�interac- tion,�depending�on�which�a�priori�power�we�are�interested�in�and�depending�on�the�number� of�independent�variables),�number�of�groups�(i�e�,�the�number�of�categories�of�the�indepen- dent�variable�in the case of only one independent variable�OR�the�product�of�the�number�of�levels� of�the�independent�variables�in the case of multiple independent variables),�and�the�number�of� covariates��We�follow�Cohen’s�(1988)�conventions�for�effect�size�(i�e�,�small,�f�=��10;�moderate,� f =��25;�large,�f�=��40)��In�this�example,�had�we�estimated�a�moderate�effect�f�of��25,�alpha�of��05,� desired�power�of��80,�numerator�degrees�of�freedom�of�1�(two�categories�in�our�independent� variable�thus�2�−�1�=�1),�number�of�groups�of�2�(i�e�,�there�is�only�one�independent�variable,� and�there�were�two�categories),�and�one�covariate,�we�would�need�a�total�sample�size�of�9� A priori power Here are the a priori power results. 469Introduction to Analysis of Covariance 14.13 Template and APA-Style Paragraph Finally�we�come�to�an�example�paragraph�of�the�results�for�the�statistics�instruction�exam- ple��Recall�that�our�graduate�research�assistant,�Marie,�was�building�on�work�that�she�had� conducted�as�part�of�a�research�project�for�an�independent�study�class�and�had�now�con- ducted� a� second� experiment�� She� was� looking� to� see� if� there� was� a� mean� difference� in� statistics�quiz�scores�based�on�the�instructional�method�of�the�class�(two�categories:�tradi- tional�or�innovative)�while�controlling�for�aptitude��Her�research�question�was�the�follow- ing:� Is there a mean difference in statistics quiz scores based on teaching method, controlling for aptitude?�Marie�then�generated�an�ANCOVA�as�the�test�of�inference��A�template�for�writing� a�research�question�for�ANCOVA�is�presented�as�follows: Is there a mean difference in [dependent variable] based on [inde- pendent variable], controlling for [covariate]? This�is�illustrated�assuming�a�one-factor�(i�e�,�one�independent�variable)�model,�but�it�can� easily�be�extended�to�two�or�more�factors��As�we�noted�in�previous�chapters,�it�is�important� to�be�sure�the�reader�understands�the�levels�or�groups�of�the�independent�variables��This� may�be�done�parenthetically�in�the�actual�research�question,�as�an�operational�definition,� or� specified� within� the� methods� section�� In� this� example,� parenthetically� we� could� have� stated�the�following:�Is there a mean difference in statistics quiz scores based on teaching method (traditional vs. innovative), controlling for aptitude? It�may�be�helpful�to�preface�the�results�of�the�ANCOVA�with�information�on�an�examina- tion�of�the�extent�to�which�the�assumptions�were�met�(recall�there�are�several�assumptions� that�we�tested:�(a)�independence�of�observations,�(b)�homogeneity�of�variance,�(c)�normal- ity,� (d)� linearity,� (e)� independence� of� the� covariate� and� the� independent� variable,� and� (f)� homogeneity�of�regression�slopes): An ANCOVA was conducted to determine if the mean statistics quiz score differed based on the instructional method of the statistics course (traditional vs. innovative) while controlling for aptitude. Independence of observations was met by random assignment of stu- dents to instructional method. This assumption was also confirmed by review of a scatterplot of residuals against the levels of the inde- pendent variable. A random display of points around 0 provided fur- ther evidence that the assumption of independence was met. According to Levene’s test, the homogeneity of variance assumption was not satisfied [F(1, 10) = 6.768, p = .026]. However, research suggests that violation of homogeneity is minimal when the groups of the indepen- dent variable are equal in size (Harwell, 2003), as in the case of this study. The assumption of normality was tested and met via exami- nation of the residuals. Review of the S-W test for normality (SW = .965, df = 12, p = .854) and skewness (−.237) and kurtosis (−1.024) statistics suggested that normality was a reasonable assumption. The boxplot and histogram suggested a relatively normal distributional shape (with no outliers) of the residuals. The Q–Q plot suggested normality was reasonable. In general, there is evidence that nor- mality has been met. Linearity of the dependent variable with the 470 An Introduction to Statistical Concepts covariate was examined with scatterplots, both overall and by group of the independent variable. Overall, the scatterplot of the depen- dent variable with the covariate suggested a positive linear rela- tionship. This same pattern was present for the scatterplot of the dependent variable with the covariate when disaggregated by the cat- egories of the independent variables. Independence of the covariate and independent variable was met by random assignment of students to instructional method. This assumption was also confirmed by an inde- pendent t test which examined the mean difference on the covariate (i.e., aptitude) by independent variable (i.e., teaching method). The results were not statistically significant, t(10) = 1.604, p = .140, which further confirms evidence of independence of the covariate and independent variable. There was not a mean difference in statistics aptitude based on teaching method. Homogeneity of regression slopes was suggested by similar regression lines evidenced in the scatter- plots of the dependent variable and covariates by group (reported earlier as evidence for linearity). This assumption was confirmed by a nonstatistically significant interaction of aptitude by group, F(1, 8) = .000, p = 1.000. Here�is�an�APA-style�example�paragraph�of�results�for�the�ANCOVA�(remember�that�this� will� be� prefaced� by� the� previous� paragraph� reporting� the� extent� to� which� the� ANCOVA� assumptions�were�met): The results of the ANCOVA suggest a statistically significant effect of the covariate, aptitude, on the dependent variable, statistics quiz score (Faptitude = 21.961; df = 1,9; p = .001). More importantly, there is a statistically significant effect for instructional method (Fgroup = 11.372; df = 1,9; p = .008), with a large effect size and strong power (partial η2group = .558, observed power = .850). The effect size suggests that about 56% of the variance in statistics quiz scores can be accounted for by teaching method when controlling for aptitude. The� unadjusted� group� statistics� quiz� score� mean� (i�e�,� prior� to� controlling� for� aptitude)� was� larger� for� the� innovative� instruction� group� (M� =� 4�00,� SD� =� 2�10)� as� compared� to� the�traditional�lecture�method�(M�=�3�50,�SD�=�1�87)�by�only��50��However,�the�adjusted mean�for�the�innovative�instruction�group�(M�=�4�814,�SE�=��423)�as�compared�to�the�tra- ditional�lecture�method�(M�=�2�686,�SE�=��423)�was�larger�by�2�128��Thus,�the�use�of�the� covariate�resulted�in�a�large�significant�difference�between�the�instructional�groups��In� summary,�students�assigned�to�the�innovative�teaching�method�outperformed�students� in�the�traditional�lecture�method�on�the�statistics�quiz�score�when�controlling�for�sta- tistics�aptitude� If�our�independent�variable�had�more�than�two�groups,�we�would�have�needed�to�evalu- ate�and�report�the�results�of�a�post�hoc�MCP�when�generating�SPSS�(recall�that�we�asked� for�Bonferroni�post�hoc�results)��The�following�provides�a�template�for�how�these�results� may�have�been�written,�had�our�analyses�required�them: Follow-up tests were conducted to evaluate the pairwise differences among the adjusted means of [dependent variable] based on [indepen- dent variable]. The [post hoc procedure selected, e.g., Bonferroni] 471Introduction to Analysis of Covariance was applied to control for the risk of increased Type I error across all pairwise comparisons. Pairwise comparisons revealed [report spe- cific results, including means and standard deviations here]. 14.14 Summary In�this�chapter,�methods�involving�the�comparison�of�adjusted�group�means�for�a�single� independent�variable�were�considered��The�chapter�began�with�a�look�at�the�unique�char- acteristics�of�the�ANCOVA,�including�(a)�statistical�control�through�the�use�of�a�covariate,� (b)�the�dependent�variable�means�adjusted�by�the�covariate,�(c)�the�covariate�used�to�reduce� error�variation,�(d)�the�relationship�between�the�covariate�and�the�dependent�variable�taken� into�account�in�the�adjustment,�and�(e)�the�covariate�measured�at�least�at�the�interval�level�� The� layout� of� the� data� was� shown,� followed� by� an� examination� of� the� ANCOVA� model,� and�the�ANCOVA�summary�table��Next�estimation�of�the�adjusted�means�was�considered� along� with� several� different� MCPs�� Some� discussion� was� also� devoted� to� the� ANCOVA� assumptions,�their�assessment,�and�how�to�deal�with�assumption�violations��We�illustrated� the� use� of� the� ANCOVA� by� looking� at� an� example�� Finally,� we� finished� off� the� chapter� by� briefly� examining� (a)� some� cautions� about� the� use� of� ANCOVA� in� situations� without� randomization,�(b)�ANCOVA�for�models�having�multiple�factors�and/or�multiple�covari- ates,� (c)� nonparametric� ANCOVA� procedures,� and� (d)� SPSS� and� G*Power�� At� this� point,� you�should�have�met�the�following�objectives:�(a)�be�able�to�understand�the�characteristics� and�concepts�underlying�ANCOVA;�(b)�be�able�to�determine�and�interpret�the�results�of� ANCOVA,�including�adjusted�means�and�MCPs;�and�(c)�be�able�to�understand�and�evalu- ate�the�assumptions�of�ANCOVA��Chapter�15�takes�us�beyond�the�fixed-effects�models�we� have�discussed�thus�far�and�considers�random-�and�mixed-effects�models� Problems Conceptual problems 14.1� �Malani�wants�to�determine�whether�children�whose�preschool�classroom�has�a�win- dow�differ�in�their�receptive�vocabulary�as�compared�to�children�whose�classroom� does� not� have� a� window�� At� the� beginning� of� the� school� year,� Malani� randomly� assigns� 10� children� at� Rainbow� Butterfly� Preschool� to� one� of� two� different� class- rooms:�one�classroom�has�a�window�that�looks�out�onto�a�grassy�area,�and�the�other� classroom�has�no�windows��At�the�end�of�the�school�year,�Malani�measures�children� on�their�receptive�vocabulary��Is�ANCOVA�appropriate�given�this�scenario? 14.2� �Joe� wants� to� determine� whether� the� time� to� run� the� Magic� Mountain� Marathon� (ratio�level�variable)�differs,�on�average,�for�nonprofessional�athletes�who�complete� a� 12� week� endurance� training� program� as� compared� to� those� who� complete� a� 4� week�endurance�training�program��Joe�randomly�assigns�nonprofessional�athletes� to�one�of�the�two�training�programs��In�conducting�this�experiment,�Joe�also�wants� to� control� for� the� number� of� prior� marathons� in� which� the� participant� has� run�� Is� ANCOVA�appropriate�given�this�scenario? 472 An Introduction to Statistical Concepts 14.3� �Tami� has� generated� an� ANCOVA�� In� testing� the� assumptions,� she� reviews� a� scat- terplot� of� the� residuals� for� each� category� of� the� independent� variable�� For� which� assumption�is�Tami�likely�reviewing�evidence? � a�� Homogeneity�of�regression�slopes � b�� Homogeneity�of�variance � c�� Independence�of�observations � d�� Independence�of�the�covariate�and�the�independent�variable � e�� Linearity 14.4� �Wesley�has�generated�an�ANCOVA��In�his�model,�there�is�one�independent�vari- able� which� has� three� categories� (type� of� phone:� Blackberry,� iPhone,� and� Droid)� and� one� covariate� (amount� of� time� spent� on� desktop� or� laptop� computer)�� In� testing�the�assumptions,�he�reviews�a�one-way�ANOVA,�the�dependent�variable� being�amount�of�time�spent�on�desktop�or�laptop�computer�and�the�independent� variable�being�type�of�phone��For�which�assumption�is�Wesley�likely�reviewing� evidence? � a�� Homogeneity�of�regression�slopes � b�� Homogeneity�of�variance � c�� Independence�of�observations � d�� Independence�of�the�covariate�and�the�independent�variable � e�� Linearity 14.5� �If� the� correlation� between� the� covariate� X� and� the� dependent� variable� Y� differs� markedly�in�the�two�treatment�groups,�it�seems�likely�that � a�� The�assumption�of�normality�is�suspect� � b�� The�assumption�of�homogeneity�of�slopes�is�suspect� � c�� A�nonlinear�relation�exists�between�X�and�Y� � d�� The�adjusted�means�for�Y�differ�significantly� 14.6� �If�for�both�the�treatment�and�control�groups�the�correlation�between�the�covariate� X�and�the�dependent�variable�Y�is�substantial�but�negative,�the�error�variation�for� ANCOVA�as�compared�to�that�for�ANOVA�is � a�� Less � b�� About�the�same � c�� Greater � d�� Unpredictably�different 14.7� �An�experiment�was�conducted�to�compare�three�different�instructional�strategies�� Fifteen�subjects�were�included�in�each�group��The�same�test�was�administered�prior� to�and�after�the�treatments��If�both�pretest�and�IQ�are�used�as�covariates,�what�are� the�degrees�of�freedom�for�the�error�term? � a�� 2 � b�� 40 � c�� 41 � d�� 42 473Introduction to Analysis of Covariance 14.8� �The� effect� of� a� training� program� concerned� with� educating� heart� attack� patients� to�the�benefits�of�moderate�exercise�was�examined��A�group�of�recent�heart�attack� patients� was� randomly� divided� into� two� groups;� one� group� received� the� training� program� and� the� other� did� not�� The� dependent� variable� was� the� amount� of� time� taken�to�jog�three�laps,�with�the�weight�of�the�patient�after�the�program�used�as�a� covariate��Examination�of�the�data�after�the�study�revealed�that�the�covariate�means� of� the� two� groups� differed�� Which� of� the� following� assumptions� is� most� clearly� violated? � a�� Linearity � b�� Homogeneity�of�slopes � c�� Independence�of�the�treatment�and�the�covariate � d�� Normality 14.9� In�ANCOVA,�the�covariate�is�a�variable�which�should�have�a � a�� Low,�positive�correlation�with�the�dependent�variable � b�� High,�positive�correlation�with�the�independent�variable � c�� High,�positive�correlation�with�the�dependent�variable � d�� Zero�correlation�with�the�dependent�variable 14.10� �In�ANCOVA,�how�will�the�correlation�of�0�between�the�covariate�and�the�dependent� variable�appear? � a�� Unequal�group�means�on�the�dependent�variable � b�� Unequal�group�means�on�the�covariate � c�� Regression�of�the�dependent�variable�on�the�covariate�with�bw�=�0 � d�� Regression�of�the�dependent�variable�on�the�covariate�with�bw�=�1 14.11� Which�of�the�following�is�not�a�necessary�requirement�for�using�ANCOVA? � a�� Covariate�scores�are�not�affected�by�the�treatment� � b�� There�is�a�linear�relationship�between�the�covariate�and�the�dependent�variable� � c�� The�covariate�variable�is�the�same�measure�as�the�dependent�variable� � d�� Regression�slopes�for�the�groups�are�similar� 14.12� Which�of�the�following�is�the�most�desirable�situation�to�use�ANCOVA? � a�� The�slope�of�the�regression�line�equals�0� � b�� The�variance�of�the�dependent�variable�for�a�specific�covariate�score�is�relatively� large� � c�� The�correlation�between�the�covariate�and�the�dependent�variable�is�−�95� � d�� The�correlation�between�the�covariate�and�the�dependent�variable�is��60� 14.13� �A� group� of� students� were� randomly� assigned� to� one� of� three� instructional� strat- egies�� Data� from� the� study� indicated� an� interaction� between� slope� and� treatment� group��It�seems�likely�that � a�� The�assumption�of�normality�is�suspect� � b�� The�assumption�of�homogeneity�of�slopes�is�suspect� � c�� A�nonlinear�relation�exists�between�X�and�Y� � d�� The�covariate�is�not�independent�of�the�treatment� 474 An Introduction to Statistical Concepts 14.14� �If�the�mean�on�the�dependent�variable�GPA�(Y)�for�persons�of�middle�social�class� (X)� is� higher� than� for� persons� of� lower� and� higher� social� classes,� one� would� expect�that � a�� The�relationship�between�X�and�Y�is�curvilinear� � b�� The�covariate�X�contains�substantial�measurement�error� � c�� GPA�is�not�normally�distributed� � d�� Social�class�is�not�related�to�GPA� 14.15� �If�both�the�covariate�and�the�dependent�variable�are�assessed�after�the�treatment�has� been�concluded,�and�if�both�are�affected�by�the�treatment,�the�use�of�ANCOVA�for� these�data�would�likely�result�in � a�� An�inflated�F�ratio�for�the�treatment�effect � b�� An�exaggerated�difference�in�the�adjusted�means � c�� An�underestimate�of�the�treatment�effect � d�� An�inflated�value�of�the�slope�bw 14.16� �When� the� covariate� correlates� +�5� with� the� dependent� variable,� I� assert� that� the� adjusted�MSwith�from�the�ANCOVA�will�be�less�than�the�MSwith�from�the�ANOVA�� Am�I�correct? 14.17� �For� each� of� two� groups,� the� correlation� between� the� covariate� and� the� dependent� variable�is�substantial,�but�negative�in�direction��I�assert�that�the�error�variance�for� ANCOVA,�as�compared�to�that�for�ANOVA,�is�greater��Am�I�correct? 14.18� In�ANCOVA,�X�is�known�as�a�factor��True�or�false? 14.19� �A� study� was� conducted� to� compare� six� types� of� diets�� Twelve� subjects� were� included�in�each�group��Their�weights�were�taken�prior�to�and�after�treatment��If� pre-weight�is�used�as�a�covariate,�what�are�the�degrees�of�freedom�for�the�error� term? � a�� 5 � b�� 65 � c�� 66 � d�� 71 14.20� �A� researcher� conducts� both� a� one-factor� ANOVA� and� a� one-factor� ANCOVA� on� the�same�data��In�comparing�the�adjusted�group�means�to�the�unadjusted�group� means,�they�find�that�for�each�group,�the�adjusted�mean�is�equal�to�the�unadjusted� mean�� I� assert� that� the� researcher� must� have� made� a� computational� error�� Am� I� correct? 14.21� �The�correlation�between�the�covariate�and�the�dependent�variable�is�0��I�assert�that� ANCOVA�is�still�preferred�over�ANOVA��Am�I�correct? 14.22� �If�there�is�a�nonlinear�relationship�between�the�covariate�X�and�the�dependent�vari- able�Y,�then�it�is�very�likely�that � a�� There�will�be�less�reduction�in�SSwith� � b�� The�group�effects�will�be�biased� � c�� The�correlation�between�X�and�Y�will�be�smaller�in�magnitude� � d�� All�of�the�above� 475Introduction to Analysis of Covariance Computational problems 14.1� �Consider�the�ANCOVA�situation�where�the�dependent�variable�Y�is�the�posttest�of� an� achievement� test� and� the� covariate� X� is� the� pretest� of� the� same� test�� Given� the� data� that� follow,� where� there� are� three� groups,� (a)� calculate� the� adjusted� Y� values� assuming�that�bw�=�1�00,�and�(b)�determine�what�effects�the�adjustment�had�on�the� posttest�results� Group X X – Y Y – 40 120 1 50 50 125 125 60 130 70 140 2 75 75 150 150 80 160 90 160 3 100 100 175 175 110 190 14.2� Malani� wants� to� determine� whether� children� whose� preschool� classroom� has� a� window�differ�in�their�receptive�vocabulary�as�compared�to�children�whose�class- room� does� not� have� a� window�� At� the� beginning� of� the� school� year,� Malani� ran- domly�assigns�10�children�at�Rainbow�Butterfly�Preschool�to�one�of�two�different� classrooms:�one�classroom�which�has�a�window�that�looks�out�onto�a�grassy�area� or�another�classroom�that�has�no�windows��At�the�end�of�the�school�year,�Malani� measures� children� on� their� receptive� vocabulary�� In� the� following� are� two� inde- pendent�random�samples�(classroom�with�and�without�window)�of�paired�values� on�the�covariate�(X;�receptive�vocabulary�measured�at�beginning�of�school�year)� and�the�dependent�variable�essay�score�(Y;�receptive�vocabulary�measured�at�the� end� of� the� school� year)�� Conduct� an� ANOVA� on� Y,� an� ANCOVA� on� Y� using� X� as� a� covariate,� and� compare� the� results� (α� =� �05)�� Determine� the� unadjusted� and� adjusted�means� Classroom with Window Classroom Without Window X Y X Y 80 105 80 95 75 100 85 100 85 105 90 105 70 100 85 100 90 110 95 105 14.3� In� the� following� are� four� independent� random� samples� (different� methods� of� instruc- tion)�of�paired�values�on�the�covariate�IQ�(X)�and�the�dependent�variable�essay�score�(Y)�� Conduct�an�ANOVA�on�Y,�an�ANCOVA�on�Y�using�X�as�a�covariate,�and�compare�the� results�(α�=��05)��Determine�the�unadjusted�and�adjusted�means� 476 An Introduction to Statistical Concepts Group 1 Group 2 Group 3 Group 4 X Y X Y X Y X Y 94 14 80 38 92 55 94 24 96 19 84 34 96 53 94 37 98 17 90 43 99 55 98 22 100 38 97 43 101 52 100 43 102 40 97 61 102 35 103 49 105 26 112 63 104 46 104 24 109 41 115 93 107 57 104 41 110 28 118 74 110 55 108 26 111 36 120 76 111 42 113 70 130 66 120 79 118 81 115 63 14.4� A�communications�researcher�wants�to�know�which�of�five�versions�of�commercials� for�a�new�television�show�is�most�effective�in�terms�of�viewing�likelihood��Each�com- mercial�is�viewed�by�six�students��A�one-factor�ANCOVA�was�used�to�analyze�these� data� where� the� covariate� was� amount� of� television� previously� viewed� per� week�� Complete�the�following�ANCOVA�summary�table�(α�=��05): Source SS df MS F Critical Value Decision Between�adjusted 96 — — — — — Within�adjusted 192 — — Covariate — — — — — — Total 328 — Interpretive problems 14.1� The� first� interpretive� problem� in� Chapter� 11� requested� the� following:� “Using� the� survey� 1� dataset� from� the� website,� use� SPSS� to� conduct� a� one-factor� fixed-effects� ANOVA,� including� effect� size,� where� political� view� is� the� grouping� variable� (i�e�,� independent�variable)�(J�=�5)�and�the�dependent�variable�is�a�variable�of�interest�to� you�[the�following�variables�look�interesting:�books,�TV,�exercise,�drinks,�GPA,�GRE- Quantitative� (GRE-Q),� CDs,� hair� appointment]�”� Using� these� same� data,� select� an� appropriate�covariate�and�then�generate�a�one-factor�ANCOVA�(including�testing�the� assumptions�of�both�the�ANOVA�and�ANCOVA)��Compare�and�contrast�the�results� of�the�ANOVA�and�ANCOVA��Which�method�would�you�select�and�why? 14.2� The� second� interpretive� problem� in� Chapter� 11� requested� the� following:� “Using� the� survey� 1� dataset� from� the� website,� use� SPSS� to� conduct� a� one-factor� fixed-effects� ANOVA,�including�effect�size,�where�hair�color�is�the�grouping�variable�(i�e�,�indepen- dent�variable)�(J�=�5)�and�the�dependent�variable�is�a�variable�of�interest�to�you�(the�fol- lowing�variables�look�interesting:�books,�TV,�exercise,�drinks,�GPA,�GRE-Q,�CDs,�hair� appointment)�”�Using�these�same�data,�select�an�appropriate�covariate�and�then�gener- ate�a�one-factor�ANCOVA�(including�testing�the�assumptions�of�both�the�ANOVA�and� ANCOVA)�� Compare� and� contrast� the� results� of� the� ANOVA� and� ANCOVA�� Which� method�would�you�select�and�why? 477 15 Random- and Mixed-Effects Analysis of Variance Models Chapter Outline 15�1� The�One-Factor�Random-Effects�Model 15�1�1� Characteristics�of�the�Model 15�1�2� ANOVA�Model 15�1�3� ANOVA�Summary�Table�and�Expected�Mean�Squares 15�1�4� Assumptions�and�Violation�of�Assumptions 15�1�5� Multiple�Comparison�Procedures 15�2� Two-Factor�Random-Effects�Model 15�2�1� Characteristics�of�the�Model 15�2�2� ANOVA�Model 15�2�3� ANOVA�Summary�Table�and�Expected�Mean�Squares 15�2�4� Assumptions�and�Violation�of�Assumptions 15�2�5� Multiple�Comparison�Procedures 15�3� Two-Factor�Mixed-Effects�Model 15�3�1� Characteristics�of�the�Model 15�3�2� ANOVA�Model 15�3�3� ANOVA�Summary�Table�and�Expected�Mean�Squares 15�3�4� Assumptions�and�Violation�of�Assumptions 15�3�5� Multiple�Comparison�Procedures 15�4� One-Factor�Repeated�Measures�Design 15�4�1� Characteristics�of�the�Model 15�4�2� Layout�of�Data 15�4�3� ANOVA�Model 15�4�4� Assumptions�and�Violation�of�Assumptions 15�4�5� ANOVA�Summary�Table�and�Expected�Mean�Squares 15�4�6� Multiple�Comparison�Procedures 15�4�7� Alternative�ANOVA�Procedures 15�4�8� Example 15�5� Two-Factor�Split-Plot�or�Mixed�Design 15�5�1� Characteristics�of�the�Model 15�5�2� Layout�of�Data 15�5�3� ANOVA�Model 15�5�4� Assumptions�and�Violation�of�Assumptions 478 An Introduction to Statistical Concepts 15�5�5� ANOVA�Summary�Table�and�Expected�Mean�Squares 15�5�6� Multiple�Comparison�Procedures 15�5�7� Example 15�6� SPSS�and�G*Power 15�7� Template�and�APA-Style�Write-Up Key Concepts � 1�� Fixed-,�random-,�and�mixed-effects�models � 2�� Repeated�measures�models � 3�� Compound�symmetry/sphericity�assumption � 4�� Friedman�repeated�measures�test�based�on�ranks � 5�� Split-plot�or�mixed�designs�(i�e�,�both�between-�and�within-subjects�factors) In� this� chapter,� we� continue� our� discussion� of� the� analysis� of� variance� (ANOVA)� by� considering�models�in�which�there�is�a�random-effects�factor,�previously�introduced�in� Chapter�11��These�models�include�the�one-factor�and�factorial�designs,�as�well�as�repeated� measures�designs��As�becomes�evident,�repeated�measures�designs�are�used�when�there� is� at� least� one� factor� where� each� individual� is� exposed� to� all� levels� of� that� factor�� This� factor�is�referred�to�as�a�repeated factor,�for�obvious�reasons��This�chapter�is�mostly�con- cerned� with� one-� and� two-factor� random-effects� models,� the� two-factor� mixed-effects� model,�and�one-�and�two-factor�repeated�measures�designs� It�should�be�noted�that�effect�size�measures,�power,�and�confidence�intervals�(CIs)�can�be� determined�in�the�same�fashion�for�the�models�in�this�chapter�as�for�previously�described� ANOVA�models��The�standard�effect�size�measures�already�described�are�applicable�(i�e�,� ω2� and� η2),� although� the� intraclass� correlation� coefficient,� ρI,� can� be� utilized� for� random� effects� (similarly� interpreted)�� For� additional� discussion� of� these� issues� in� the� context� of� this� chapter,� see� Cohen� (1988),� Fidler� and� Thompson� (2001),� Keppel� and� Wickens� (2004),� Murphy,�Myors,�and�Wolach�(2008),�and�Wilcox�(1996,�2003)� Many� of� the� concepts� used� in� this� chapter� are� the� same� as� those� covered� in� Chapters� 11� through�14��In�addition,�the�following�new�concepts�are�addressed:�random-�and�mixed-effects� factors,� repeated� measures� factors,� the� compound� symmetry/sphericity� assumption,� and� mixed�designs��Our�objectives�are�that�by�the�end�of�this�chapter,�you�will�be�able�to�(a)�under- stand�the�characteristics�and�concepts�underlying�random-�and�mixed-effects�ANOVA�mod- els,�(b)�determine�and�interpret�the�results�of�random-�and�mixed-effects�ANOVA�models,�and� (c)�understand�and�evaluate�the�assumptions�of�random-�and�mixed-effects�ANOVA�models� 15.1 One-Factor Random-Effects Model Through� the� previous� chapters,� we� have� learned� about� many� statistical� procedures� as�Marie�has�assisted�others�and�conducted�studies�of�her�own��What�is�in�store�for� Marie�now? 479Random- and Mixed-Effects Analysis of Variance Models For�the�past�few�chapters,�we�have�followed�Marie,�a�graduate�student�enrolled�in�an� educational� research� program� who,� as� part� of� her� independent� study� course,� exam- ined� various� questions� related� to� measures� drawn� from� students� enrolled� in� statis- tics� courses�� Knowing� the� success� that� Marie� achieved� in� analysis� of� data� from� her� independent�study�course,�Marie’s�faculty�advisor�feels�confident�that�Marie�can�assist� another�faculty�member�at�the�university��Marie�is�working�with�Mark,�the�coordinator� of�the�English�program��Mark� has�conducted�an�experiment� in�which� eight�students� were�randomly�assigned�to�one�of�two�instructors��Each�student�was�then�assessed�on� writing�by�four�raters��Mark�wants�to�know�the�following:�if�there�is�a�mean�difference� in�writing�based�on�instructor,�if�there�is�a�mean�difference�in�writing�based�on�rater,� and�if�there�is�a�mean�difference�in�writing�based�on�the�rater�by�instructor�interaction�� The�research�questions�presented�to�Mark�from�Marie’s�include�the�following: •� Is there a mean difference in writing based on instructor? •� Is there a mean difference in writing based on rater? •� Is there a mean difference in writing based on rater by instructor? With� one� between-subjects� independent� variable� (i�e�,� instructor)� and� one� within- subjects�factor�(i�e�,�rating�on�writing�task),�Marie�determines�that�a�two-factor�split-plot� ANOVA�is�the�best�statistical�procedure�to�use�to�answer�Mark’s�question��Her�next� task�is�to�assist�Mark�in�analyzing�the�data� This�section�describes�the�distinguishing�characteristics�of�the�one-factor�random-effects� ANOVA�model,�the�linear�model,�the�ANOVA�summary�table�and�expected�mean�squares,� assumptions�and�their�violation,�and�multiple�comparison�procedures�(MCPs)� 15.1.1 Characteristics of the Model The�characteristics�of�the�one-factor�fixed-effects�ANOVA�model�have�already�been�covered� in�Chapter�11��These�characteristics�include�(a)�one�factor�(or�independent�variable)�with� two�or�more�levels,�(b)�all�levels�of�the�factor�of�interest�are�included�in�the�design�(i�e�,�a� fixed-effects�factor),�(c)�subjects�are�randomly�assigned�to�one�level�of�the�factor,�and�(d)�the� dependent� variable� is� measured� at� least� at� the� interval� level�� Thus,� the� overall� design� is� a�fixed-effects�model,�where�there�is�one�factor�and�the�individuals�respond�to�only�one� level�of�the�factor��If�individuals�respond�to�more�than�one�level�of�the�factor,�then�this�is�a� repeated�measures�design,�as�shown�later�in�this�chapter� The�characteristics�of�the�one-factor�random-effects�ANOVA�model�are�the�same�with�one� obvious�exception��This�has�to�do�with�the�selection�of�the�levels�of�the�factor��In�the�fixed- effects�case,�researchers�select�all�of�the�levels�of�interest�because�they�are�only�interested� in� making� generalizations� (or� inferences)� about� those� particular� levels�� Thus,� in� replica- tions�of�this�design,�each�replicate�would�use�precisely�the�same�levels��Considering�analy- ses�that�are�conducted�on�individuals,�examples�of�factors�that�are�typically�fixed�include� SES,�gender,�specific�types�of�drug�treatment,�age�group,�weight,�or�marital�status� In� the� random-effects� case,� researchers� randomly� select� levels� from� the� population� of� levels� because� they� are� interested� in� making� generalizations� (or� inferences)� about� the� entire� population� of� levels,� not� merely� those� that� have� been� sampled�� Thus,� in� replica- tions�of�this�design,�each�replicate�need�not�have�the�same�levels�included��The�concept�of� random�selection�of�factor�levels�from�the�population�of�levels�is�the�same�as�the�random� 480 An Introduction to Statistical Concepts selection�of�subjects�from�the�population��Here�the�researcher�is�making�an�inference�from� the� sampled� levels� to� the� population� of� levels,� instead� of� making� an� inference� from� the� sample�of�individuals�to�the�population�of�individuals��In�a�random-effects�design�then,� a�random�sample�of�factor�levels�is�selected�in�the�same�way�as�a�random�sample�of�indi- viduals�is�selected� For� instance,� a� researcher� interested� in� teacher� effectiveness� may� have� randomly� sampled�history�teachers�(i�e�,�the�independent�variable)�from�the�population�of�history� teachers�in�a�particular�school�district��Generalizations�can�then�be�made�about�all�his- tory� teachers� in� that� school� district� that� could� have� been� sampled�� Other� examples� of� factors�that�are�typically�random�include�randomly selected�classrooms,�types�of�medica- tion,�observers�or�raters,�time�(seconds,�minutes,�hours,�days,�weeks,�etc�),�animals,�stu- dents,�or�schools��It�should�be�noted�that�in�educational�settings,�the�random�selection� of�schools,�classes,�teachers,�and/or�students�is�not�often�possible�as�that�decision�is�not� under� the� researcher’s� control�� Here� we� would� need� to� consider� such� factors� as� fixed� rather�than�random�effects� 15.1.2 aNOVa Model The� one-factor� ANOVA� random-effects� model� is� written� in� terms� of� population� param- eters�as Y aij j ij= + +µ ε where Yij�is�the�observed�score�on�the�dependent�variable�for�individual�i�in�level�j�of�factor�A μ�is�the�overall�or�grand�population�mean aj�is�the�random�effect�for�level�j�of�factor�A εij�is�the�random�residual�error�for�individual�i�in�level�j The�residual�error�can�be�due�to�individual�differences,�measurement�error,�and/or�other� factors�not�under�investigation��Note�that�we�use�aj�to�designate�the�random�effects�to�dif- ferentiate�them�from�αj�in�the�fixed-effects�model� Because�the�random-effects�model�consists�of�only�a�sample�of�the�effects�from�the�popu- lation,� the� sum� of� the� sampled� effects� is� not� necessarily� 0�� For� instance,� we� may� select� a� sample�having�only�positive�effects�(e�g�,�all�very�effective�teachers)��If�the�entire�popula- tion�of�effects�were�examined,�then�the�sum�of�those�effects�would�indeed�be�0� For�the�one-factor�random-effects�ANOVA�model,�the�hypotheses�for�testing�the�effect� of� factor� A� are� written� in� terms� of� equality� of� the� variances� among� the� means� of� the� random�levels,�as�follows�(i�e�,�the�means�for�each�level�are�about�the�same,�and,�thus,� the�variability�among�those�means�is�about�0)��It�should�be�noted�that�the�sign�for�the� alternative�hypothesis�is�“greater�than,”�reflecting�the�fact�that�the�variance�cannot�be� negative: H a0 0: σ 2 = H a1 2: σ > 0

481Random- and Mixed-Effects Analysis of Variance Models
Recall�for�the�one-factor�fixed-effects�ANOVA�model�that�the�hypotheses�for�testing�the�effect�
of�factor�A�are�written�in�terms�of�equality�of�the�means�of�the�groups�(as�presented�here):
H J0 : . . .µ µ µ 1 2= = … =
H j1 .: not all the µ are equal
This�reflects�the�difference�in�the�inferences�made�in�the�random-�and�fixed-effects�models��In�
the�fixed-effects�case,�the�null�hypothesis�is�about�specific�population�means;�in�the�random-
effects�case,�the�null�hypothesis�is�about�variation�among�the�entire�population�of�means��
As�becomes�evident,�the�difference�in�the�models�is�reflected�in�the�MCPs�
15.1.3 aNOVa Summary Table and expected Mean Squares
Here�there�are�very�few�differences�between�the�one-factor�random-effects�and�one-factor�
fixed-effects� models�� The� sources� of� variation� are� still� A� (or� between),� within,� and� total��
The�sums�of�squares,�degrees�of�freedom,�mean�squares,�F�test�statistic,�and�critical�value�
are�determined�in�the�same�way�as�in�the�fixed-effects�case��Obviously�then,�the�ANOVA�
summary�table�looks�the�same�as�well��Using�the�example�from�Chapter�11,�assuming�the�
model�is�now�a�random-effects�model,�we�obtain�a�test�statistic�F�=�6�8177,�which�is�again�
significant�at�the��05�level�
As� in� Chapters� 11� and� 13,� the� formation� of� a� proper� F� ratio� is� related� to� the� expected�
mean�squares��If�H0�is�actually�true,�then�the�expected mean squares�are�as�follows:
E AMS( ) = σε2
E withMS( ) = σε2
and�thus�the�ratio�of�expected�mean�squares�is�as�follows:
E
E
1A
with
MS
MS
( )
( )
=
where
the�expected�value�of�F�is�E(F)�=�dfwith/(dfwith�−�2)
σε
2�is�the�population�variance�of�the�residual�errors
If�H0�is�actually�false,�then�the�expected�mean�squares�are�as�follows:
E AMS n a( ) = +σ σε2 2
E withMS( ) = σε2
and�thus�the�ratio�of�the�expected�mean�squares�is�as�follows:
E
E
1A
with
MS
MS
( )
( )
>
where� E(F)� >� dfwith/(dfwith� −� 2)� and� σa
2� is� the� population� variance� of� the� levels� of� factor� A��
Thus,�the�important�part�of�E(MSA)�is�the�magnitude�of�the�second�term,�n aσ
2�

482 An Introduction to Statistical Concepts
As�in�previous�ANOVA�models,�the�proper�F�ratio�should�be�formed�as�follows:
F = +( )/(systematic variability error variability error variabiility)
For� the� one-factor� random-effects� model,� the� only� appropriate� F� ratio� is� MSA/MSwith�
because�it�does�serve�to�isolate�the�systematic�variability�(i�e�,�the�variability�between�the�
levels�or�groups�in�factor�A,�the�independent�variable)��That�is,�the�within�term�must�be�
utilized�as�the�error�term�in�the�F�ratio�
15.1.4 assumptions and Violation of assumptions
In� Chapter� 11,� we� described� the� assumptions� for� the� one-factor� fixed-effects� model�� The�
assumptions� are� nearly� the� same� for� the� one-factor� random-effects� model,� and� we� need�
not� devote� much� attention� to� them� here�� In� short,� the� assumptions� are� again� concerned�
with�the�distribution�of�the�dependent�variable�scores,�specifically�that�scores�are�random�
and�independent,�coming�from�normally�distributed�populations�with�equal�population�
variances��The�effect�of�assumption�violations�and�how�to�deal�with�them�have�been�thor-
oughly� discussed� in� Chapter� 11� (although� see� Wilcox,� 1996,� 2003,� for� alternative� proce-
dures�when�variances�are�unequal)�
Additional� assumptions� must� be� made� for� the� random-effects� model�� These� assump-
tions�deal�with�the�effects�for�the�levels�of�the�independent�variable,�the�aj��First,�here�are�
a�few�words�about�the�aj��The�random�group�effects�aj�are�computed,�in�the�population,�by�
the�following:
aj j= −. ..µ µ
For�example,�a3�represents�the�effect�for�being�a�member�of�group�3��If�the�overall�mean�μ��
is�60�and�the�mean�of�group�3�(i�e�,�μ�3)�is�100,�then�the�group�effect�would�be
a3 3 1 6 4= − = − =. ..µ µ 00 0 0
In�other�words,�the�effect�for�being�a�member�of�group�3�is�an�increase�of�40�points�over�
the�overall�mean�
The�assumptions�are�that�the�aj�group�effects�are�randomly�and�independently�sampled�
from� the�normally�distributed� population� of� group� effects,� with� a�population� mean�of�0�
and�a�population�variance�of�σ2a��Stated�another�way,�there�is�a�population�of�group�effects�
out� there� from� which� we� are� taking� a� random� sample�� For� example,� with� teacher� as� the�
factor�of�interest,�we�are�interested�in�examining�the�effectiveness�of�teachers�as�measured�
by� academic� performance� of� students� in� their� class�� We� take� a� random� sample� of� teach-
ers� from� the� population� of� second-grade� teachers�� For� these� teachers,� we� measure� their�
effectiveness� in� the� classroom� via� student� performance� and� generate� an� effect� for� each�
teacher�(i�e�,�the�aj)��These�effects�indicate�the�extent�to�which�a�particular�teacher�is�more�or�
less�effective�than�the�population�average�of�teachers��Their�effects�are�known�as�random�
effects�as�the�teachers�are�randomly�selected��In�selecting�teachers,�each�teacher�is�selected�
independently�of�all�other�teachers�to�prevent�a�biased�sample�

483Random- and Mixed-Effects Analysis of Variance Models
The� effects� of� the� violation� of� the� assumptions� about� the� aj� are� the� same� as� with� the�
dependent�variable�scores��The�F�test�is�quite�robust�to�nonnormality�of�the�aj�terms�and�
unequal�variances�of�the�aj�terms��However,�the�F�test�is�quite�sensitive�to�nonindepen-
dence� among� the� aj� terms,� with� no� known� solutions�� A� summary� of� the� assumptions�
and�the�effects�of�their�violation�for�the�one-factor�random-effects�model�is�presented�in�
Table�15�1�
15.1.5 Multiple Comparison procedures
Let�us�think�for�a�moment�about�the�use�of�MCPs�for�the�random-effects�model��In�general,�
the�researcher�is�not�usually�interested�in�making�inferences�about�just�the�levels�of�A�that�
were�sampled��Thus,�estimation�of�the�aj�terms�does�not�provide�us�with�any�information�
about�the�aj�terms�that�were�not�sampled��Also,�the�aj�terms�cannot�be�summarized�by�their�
mean,�as�they�do�not�necessarily�sum�to�0�for�the�levels�sampled,�only�for�the�population�
of�levels�
15.2 Two-Factor Random-Effects Model
In� this� section,� we� describe� the� distinguishing� characteristics� of� the� two-factor� random-
effects�ANOVA�model,�the�linear�model,�the�ANOVA�summary�table�and�expected�mean�
squares,�assumptions�of�the�model�and�their�violation,�and�MCPs�
15.2.1 Characteristics of the Model
The� characteristics� of� the� one-factor� random-effects� ANOVA� model� have� already� been�
covered�in�this�chapter,�and�of�the�two-factor�fixed-effects�model,�in�Chapter�13��Here�we�
extend� and� combine� these� characteristics� to� form� the� two-factor� random-effects� model��
These�characteristics�include�(a)�two�factors�(or�independent�variables)�each�with�two�or�
more�levels,�(b)�the�levels�of�each�of�the�factors�are�randomly�sampled�from�the�population�
of�levels�(i�e�,�two�random-effects�factors),�(c)�subjects�are�randomly�assigned�to�one�combi-
nation�of�the�levels�of�the�two�factors,�and�(d)�the�dependent�variable�is�measured�at�least�
at�the�interval�level��Thus,�the�overall�design�is�a�random-effects�model,�with�two�factors,�and�
the�individuals�respond�to�only�one�combination�of�the�levels�of�the�two�factors�(note�that�
this�is�not�a�popular�model�in�education�and�the�behavioral�sciences;�in�factorial�designs,�
Table 15.1
Assumptions�and�Effects�of�Violations:�One-Factor�Random-Effects�Model
Assumption Effect of Assumption Violation
Independence •��Increased�likelihood�of�a�Type�I�and/or�Type�II�error�in�F
•��Affects�standard�errors�of�means�and�inferences�about�those�means
Homogeneity�of�variance •��Bias�in�SSwith;�increased�likelihood�of�a�Type�I�and/or�Type�II�error
•��Small�effect�with�equal�or�nearly�equal�n’s;�otherwise�effect�decreases�
as�n�increases
Normality •�Minimal�effect�with�equal�or�nearly�equal�n’s

484 An Introduction to Statistical Concepts
we�typically�see�a�random-effects�factor�with�a�fixed-effects�factor)��If�individuals�respond�
to�more�than�one�combination�of�the�levels�of�the�two�factors,�then�this�is�a�repeated�mea-
sures�design�(discussed�later�in�this�chapter)�
15.2.2 aNOVa Model
The� two-factor� ANOVA� random-effects� model� is� written� in� terms� of� population� param-
eters�as
Y a b abijk j k jk ijk= + + + +µ ε( )
where
Yijk�is�the�observed�score�on�the�dependent�variable�for�individual�i�in�level�j�of�factor�A�
and�level�k�of�factor�B�(or�in�the�jk�cell)
μ�is�the�overall�or�grand�population�mean�(i�e�,�regardless�of�cell�designation)
aj�is�the�random�effect�for�level�j�of�factor�A�(row�effect)
bk�is�the�random�effect�for�level�k�of�factor�B�(column�effect)
(ab)jk�is�the�interaction�random�effect�for�the�combination�of�level�j�of�factor�A�and�level�k�
of�factor�B
εijk�is�the�random�residual�error�for�individual�i�in�cell�jk
The�residual�error�can�be�due�to�individual�differences,�measurement�error,�and/or�other�
factors�not�under�investigation��Note�that�we�use�aj,�bk,�and�(ab)jk�to�designate�the�random�
effects� to� differentiate� them� from� the� αj,� βk,� and� (αβ)jk� in� the� fixed-effects� model�� Finally,�
there�is�no�requirement�that�the�sum�of�the�main�or�interaction�effects�is�equal�to�0�as�only�
a�sample�of�these�effects�are�taken�from�the�population�of�effects�
There�are�three�sets�of�hypotheses,�one�for�each�of�the�two�main�effects�and�one�for�the�
interaction�effect��The�null�and�alternative�hypotheses,�respectively,�for�testing�the�main�
effect�of�factor�A�(i�e�,�independent�variable�A)�follows��The�null�hypothesis�tests�whether�
the�variance�among�the�means�for�the�random�effect�of�independent�variable�A�is�equal�
to�0�(i�e�,�the�means�for�each�level�of�factor�A�are�about�the�same;�thus,�the�variability�among�
those�means�is�about�0)��It�should�be�noted�that�the�sign�for�the�alternative�hypothesis�is�
“greater�than,”�reflecting�the�fact�that�the�variance�cannot�be�negative:
H a0 01
2: σ =
H a11
2: σ > 0
The�hypotheses�for�testing�the�main�effect�of�factor�B�(i�e�,�independent�variable�B)�similarly�
test�whether�the�variance�among�the�means�for�the�random�effect�of�independent�variable�B�
is�equal�to�0�(i�e�,�the�means�for�each�level�of�factor�B�are�about�the�same,�and,�thus,�the�vari-
ability� among� those� means� is� about� 0)�� It� should� be� noted� that� the� sign� for� the� alternative�
hypothesis�is�“greater�than,”�reflecting�the�fact�that�the�variance�cannot�be�negative:
H b02
2: σ = 0
H b12
2: >σ 0

485Random- and Mixed-Effects Analysis of Variance Models
Finally,� the� hypotheses� for� testing� the� interaction� effect� are� presented� next�� In� this� case,�
the�null�hypothesis�tests�whether�the�variance�among�the�means�for�the�interaction�of�the�
random�effects�of�factors�A�and�B�is�equal�to�0�(i�e�,�the�means�for�each�AB�cell�are�about�
the�same,�and,�thus,�the�variability�among�those�means�is�about�0)��It�should�be�noted�that�
the�sign�for�the�alternative�hypothesis�is�“greater�than,”�reflecting�the�fact�that�the�variance�
cannot�be�negative:
H ab03
2: 0σ =
H ab13
2: σ > 0
These� hypotheses� again� reflect� the� difference� in� the� inferences� made� in� the� random-�
and� fixed-effects� models�� In� the� fixed-effects� case,� the� null� hypotheses� are� about� means,�
whereas� in� the� random-effects� case,� the� null� hypotheses� are� about� variation� among� the�
means�
15.2.3 aNOVa Summary Table and expected Mean Squares
Here� there� are� very� few� differences� between� the� two-factor� fixed-effects� and� random-
effects�models��The�sources�of�variation�are�still�A,�B,�AB,�within,�and�total��The�sums�of�
squares,�degrees�of�freedom,�and�mean�squares�are�determined�the�same�as�in�the�fixed-
effects�case��However,�the�F�test�statistics�are�different�due�to�the�expected�mean�squares,�
as�are�the�critical�values�used��The�F�test�statistics�are�formed�for�the�test�of�factor�A�(i�e�,�the�
main�effect�for�independent�variable�A)�as�follows:
F
MS
MS
= A
AB
for� the� test� of� factor� B� (i�e�,� the� main� effect� for� independent� variable� B)� as� presented�
here:
F
MS
MS
= B
AB
and�for�the�test�of�the�AB�interaction�as�indicated:
F
MS
MS
= AB
with
Recall�that�in�the�fixed-effects�model,�the�MSwith�was�used�as�the�error�term�for�all�three�
hypotheses��However,�in�the�random-effects�model,�the�MSwith�is�used�as�the�error�term�only�
for�the�test�of�the�interaction��The�MSAB�is�used�as�the�error�term�for�the�tests�of�both�main�
effects��The�critical�values�used�are�those�based�on�the�degrees�of�freedom�for�the�numera-
tor�and�denominator�of�each�hypothesis�tested��Thus,�using�the�example�from�Chapter�13,�

486 An Introduction to Statistical Concepts
assuming�that�the�model�is�now�a�random-effects�model,�we�obtain�the�following�as�our�
test�statistic�for�the�test�of�factor�A�(i�e�,�the�main�effect�for�independent�variable�A):
F
MS
MS
A = = =
A
AB
246 1979
7 2813
33 8124
.
.
.
for�the�test�of�factor�B,�the�test�statistic�is�computed�as�follows:
F
MS
MS
B = = =
B
AB
712 5313
7 2813
97 8577
.
.
.
and�for�the�test�of�the�AB�interaction,�we�find�the�following:
F
MS
MS
AB = = =
AB
with
7 2813
11 5313
0 6314
.
.
.
The�critical�value�for�the�test�of�factor�A�is�found�in�the�F�table�of�Table�A�4�as�αFJ−1,�(J−1)(K−1),�
which�for�the�example�is��05F3,3�=�9�28,�and�is�significant�at�the��05�level��The�critical�value�for�
the�test�of�factor�B�is�found�in�the�F�table�as�αFK−1,(J−1)�(K−1),�which�for�the�example�is��05F1,3�=�10�13,�
and�is�significant�at�the��05�level��The�critical�value�for�the�test�of�the�interaction�is�found�in�
the�F�table�as�αF(J−1)�(K−1),N−JK,�which�for�the�example�is��05F3,24�=�3�01,�and�is�not�significant�at�the�
�05�level��It�just�so�happens�for�the�example�data�that�the�results�for�the�random-�and�fixed-
effects�models�are�the�same��This�will�not�always�be�the�case�
The�formation�of�the�proper�F�ratios�is�again�related�to�the�expected�mean�squares��Recall�
that�our�hypotheses�for�the�two-factor�random-effects�model�are�based�on�variation�among�
the�means�of�the�random�effects�(rather�than�the�means�as�seen�in�the�fixed-effects�case)��If�
H0�is�actually�true�(i�e�,�there�is�no�variation�among�the�means�of�the�random�effects),�then�
the�expected mean squares�are�as�follows:
E 2( )MSA = σε
E 2( )MSB = σε
E 2( )MSAB = σε
E 2( )MSwith = σ ε
where�σε
2�is�the�population�variance�of�the�residual�errors�
If�H0�is�actually�false�(i�e�,�there�is�variation�among�the�means�of�the�random�effects),�then�
the�expected�mean�squares�are�as�follows:
E( )MS n Knab aA = + +σ σ σε
2 2 2
E 2( )MS n Jnab bB = + +σ σ σε
2 2
E 2( )MS n abAB = +σ σε
2
E 2( )MSwith = σε
where�σa
2,�σb
2,�and�σab
2 �are�the�population�variances�of�A,�B,�and�AB,�respectively�

487Random- and Mixed-Effects Analysis of Variance Models
As�in�previous�ANOVA�models,�the�proper�F�ratio�should�be�formed�as�follows:
F = +( ) /(systematic variability error variability error variabiility)
For�the�two-factor�random-effects�model,�the�appropriate�error�term�for�the�main�effects�is�
MSAB�and�the�appropriate�error�term�for�the�interaction�effect�is�MSwith�
15.2.4 assumptions and Violation of assumptions
Previously� we� described� the� assumptions� for� the� one-factor� random-effects� model�� The�
assumptions� are� nearly� the� same� for� the� two-factor� random-effects� model,� and� we� need�
not�devote�much�attention�to�them�here��As�before,�the�assumptions�are�concerned�with�the�
distribution�of�the�dependent�variable�scores,�and�of�the�random-effects�(sampled�levels�of�
the�independent�variables,�the�aj,�bk,�and�their�interaction�(ab)jk)��However,�there�are�a�few�
new�wrinkles��Little�is�known�about�the�effect�of�unequal�variances�(i�e�,�heteroscedastic-
ity)�or�dependence�(i�e�,�violation�of�the�assumption�of�independence)�for�this�random-effects�
model,� although� we� expect� the� effects� to� be� the� same� as� for� the� fixed-effects� model�� For�
violation�of�the�normality�assumption,�effects�are�known�to�be�substantial��A�summary�of�
the�assumptions�and�the�effects�of�their�violation�for�the�two-factor�random-effects�model�
is�presented�in�Table�15�2�
15.2.5 Multiple Comparison procedures
The�story�of�multiple�comparisons�for�the�two-factor�random-effects�model�is�the�same�as�that�for�
the�one-factor�random-effects�model��In�general,�the�researcher�is�not�usually�interested�in�mak-
ing�inferences�about�just�the�levels�of�A,�B,�or�AB�that�were�sampled,�and�thus�performing�MCPs�
in�a�two-factor�random-effects�model�is�a�moot�point��Thus,�estimation�of�the�aj,�bk,�or�(ab)jk�terms�
does�not�provide�us�with�any�information�about�the�aj,�bk,�or�(ab)jk�terms�that�were�not�sampled��
Also,�the�aj,�bk,�or�(ab)jk�terms�cannot�be�summarized�by�their�means�as�they�will�not�necessarily�
sum�to�0�for�the�levels�sampled,�only�for�the�population�of�levels�
Table 15.2
Assumptions�and�Effects�of�Violations:�Two-Factor�Random-Effects�Model
Assumption Effect of Assumption Violation
Independence Little�is�known�about�the�effects�of�dependence;�however,�based�on�the�
fixed-effects�model,�we�might�expect�the�following:
•�Increased�likelihood�of�a�Type�I�and/or�Type�II�error�in�F
•��Affects�standard�errors�of�means�and�inferences�about�those�means
Homogeneity�of�variance �Little�is�known�about�the�effects�of�heteroscedasticity;�however,�based�
on�the�fixed-effects�model,�we�might�expect�the�following:
•�Bias�in�SSwith
•�Increased�likelihood�of�a�Type�I�and/or�Type�II�error
•�Small�effect�with�equal�or�nearly�equal�n’s
•�Otherwise�effect�decreases�as�n�increases
Normality •�Minimal�effect�with�equal�or�nearly�equal�n’s
•�Otherwise�substantial�effects

488 An Introduction to Statistical Concepts
15.3 Two-Factor Mixed-Effects Model
This� section� describes� the� distinguishing� characteristics� of� the� two-factor� mixed-effects�
ANOVA�model,�the�linear�model,�the�ANOVA�summary�table�and�expected�mean�squares,�
assumptions�of�the�model�and�their�violation,�and�MCPs�
15.3.1 Characteristics of the Model
The� characteristics� of� the� two-factor� random-effects� ANOVA� model� have� already� been�
covered�in�the�preceding�section,�and�of�the�two-factor�fixed-effects�model,�in�Chapter�13��
Here�we�combine�these�characteristics�to�form�the�two-factor�mixed-effects�model��These�
characteristics� include� (a)� two� factors� (or� independent� variables)� each� with� two� or� more�
levels,�(b)�the�levels�for�one�of�the�factors�are�randomly�sampled�from�the�population�of�lev-
els�(i�e�,�the�random-effects�factor)�and�all�of�the�levels�of�interest�for�the�second�factor�are�
included�in�the�design�(i�e�,�the�fixed-effects�factor),�(c)�subjects�are�randomly�selected�and�
assigned�to�one�combination�of�the�levels�of�the�two�factors,�and�(d)�the�dependent�variable�
is�measured�at�least�at�the�interval�level��Thus,�the�overall�design�is�a�mixed-effects�model,�
with� one� fixed-effects� factor� and� one� random-effects� factor,� and� individuals� respond� to�
only�one�combination�of�the�levels�of�the�two�factors��If�individuals�respond�to�more�than�
one�combination,�then�this�is�a�repeated�measures�design�
15.3.2 aNOVa Model
There�are�actually�two�variations�of�the�two-factor�mixed-effects�model,�one�where�fac-
tor� A� is� fixed� and� factor� B� is� random� and� the� other� where� factor� A� is� random� and�
factor�B�is�fixed��The�labeling�of�a�factor�as�A�or�B�is�arbitrary,�so�we�only�consider�the�
former�variation�where�A�is�fixed�and�B�is�random��For�the�latter�variation,�merely�switch�
the�labels�of�the�factors��The�two-factor�ANOVA�mixed-effects�model�is�written�in�terms�
of�population�parameters�as
Y b bijk j k jk ijk= + + + +µ α α ε( )
where
Yijk�is�the�observed�score�on�the�dependent�variable�for�individual�i�in�level�j�of�factor�A�
and�level�k�of�factor�B�(or�in�the�jk�cell)
μ�is�the�overall�or�grand�population�mean�(i�e�,�regardless�of�cell�designation)
αj�is�the�fixed�effect�for�level�j�of�factor�A�(row�effect)
bk�is�the�random�effect�for�level�k�of�factor�B�(column�effect)
(αb)jk�is�the�interaction�mixed�effect�for�the�combination�of�level�j�of�factor�A�and�level�k�
of�factor�B
εijk�is�the�random�residual�error�for�individual�i�in�cell�jk
The�residual�error�can�be�due�to�individual�differences,�measurement�error,�and/or�other�
factors�not�under�investigation��Note�that�we�use�bk�and�(αb)jk�to�designate�the random�
and� mixed� effects,� respectively,� to� differentiate� them� from� βk� and� (αβ)jk� in� the� fixed-
effects�model�

489Random- and Mixed-Effects Analysis of Variance Models
As�shown�in�Figure�15�1,�due�to�the�nature�of�the�mixed-effects�model,�only�some�of�the�
columns� are� randomly� selected� for� inclusion� in� the� design�� Each� cell� of� the� design� will�
include� row� (α),� column� (b),� and� interaction� (αb)� effects�� With� an� equal� n’s� model,� if� we�
sum�these�effects�for�a�given�column,�then�the�effects�will�sum�to�0��However,�if�we�sum�
these�effects�for�a�given�row,�then�the�effects�will�not�sum�to�0,�as�some�columns�were�not�
sampled�
The� null� and� alternative� hypotheses,� respectively,� for� testing� the� effect� of� factor� A� are�
presented�as�follows��These�hypotheses�reflect�testing�the�equality�of�means�of�the�levels�
of�independent�variable�A�(the�fixed�effect):
H J01 . . . . .: .µ µ µ 1 2= = … =
H j11 . .: not all the are equalµ
The�hypotheses�for�testing�the�effect�of�factor�B,�the�random�effect,�follow��The�null�hypoth-
esis� tests� whether� the� variance� among� the� means� for� the� random� effect� of� independent�
variable�B�is�equal�to�0�(i�e�,�the�means�for�each�level�of�factor�B�are�about�the�same,�and,�
thus,� the� variability� among� those� means� is� about� 0)�� It� should� be� noted� that� the� sign� for�
the�alternative�hypothesis�is�“greater�than,”�reflecting�the�fact�that�the�variance�cannot�be�
negative:
H b02 : 0
2σ =
H b12
2: 0σ >
Finally,� the� hypotheses� for� testing� the� interaction� effect� are� presented� next�� In� this� case,�
the�null�hypothesis�tests�whether�the�variance�among�the�means�for�the�interaction�of�the�
random�effects�of�factors�A�and�B�is�equal�to�0�(i�e�,�the�means�for�each�AB�cell�are�about�the�
same,�and,�thus,�the�variability�among�those�means�is�about�0)��It�should�be�noted�that�
the�sign�for�the�alternative�hypothesis�is�“greater�than,”�reflecting�the�fact�that�the�variance�
cannot�be�negative:
H b0 03
2: σα =
H b13
2: 0σα >
b1 b2 b3 b4 b5 b6
α1
α2
α3
α4
FIGuRe 15.1
Conditions� for� the� two-factor� mixed-effects� model:� although� all� four� levels� of� factor� A� are� selected� by� the�
researcher�(A�is�fixed),�only�three�of�the�six�levels�of�factor�B�are�selected�(B�is�random)��If�the�levels�of�B�selected�
are�1,�3,�and�6,�then�the�design�will�only�consist�of�the�shaded�cells��In�each�cell�of�the�design�are�row,�column,�
and�cell�effects��If�we�sum�these�effects�for�a�given�column,�then�the�effects�will�sum�to�0��If�we�sum�these�effects�
for�a�given�row,�then�the�effects�will�not�sum�to�0�(due�to�missing�cells)�

490 An Introduction to Statistical Concepts
These�hypotheses�reflect�the�difference�in�the�inferences�made�in�the�mixed-effects�model��
Here�we�see�that�the�hypotheses�about�the�fixed-effect�A�(i�e�,�the�main�effect�for�indepen-
dent�variable�A)�are�about�means,�whereas�the�hypotheses�involving�the�random-effect�B�
(i�e�,�the�main�effect�of�B�and�the�interaction�effect�AB)�are�about�variation among the means�
as�these�involve�a�random�effect�
15.3.3 aNOVa Summary Table and expected Mean Squares
There�are�very�few�differences�between�the�two-factor�fixed-effects,�random-effects,�and�
mixed-effects� models�� The� sources� of� variation� for� the� mixed-effects� model� are� again� A�
(the�fixed� effect),� B�(the�random�effect),� AB�(the�interaction�effect),� within,� and� total��The�
sums� of� squares,� degrees� of� freedom,� and� mean� squares� are� determined� the� same� as� in�
the�fixed-effects�case��However,�the�F�test�statistics�are�different�in�each�of�these�models,�as�
well�as�the�critical�values�used��The�F�test�statistics�are�formed�for�the�test�of�factor�A,�the�
fixed�effect,�as�seen�here:
F
MS
MS
A =
A
AB
for�the�test�of�factor�B,�the�random�effect,�is�computed�as�follows:
F
MS
MS
B =
B
with
and�for�the�test�of�the�AB�interaction,�the�mixed�effect,�as�indicated�here:
F
MS
MS
AB =
AB
with
Recall� that� in� the� fixed-effects� model,� the� MSwith� is� used� as� the� error� term� for� all� three�
hypotheses�� However,� in� the� random-effects� model,� the� MSwith� is� used� as� the� error� term�
only�for�the�test�of�the�interaction,�and�the�MSAB�is�used�as�the�error�term�for�the�tests�of�
both�main�effects��Finally,�in�the�mixed-effects�model,�the�MSwith�is�used�as�the�error�term�
for�the�test�of�factor�B�(the�random�effect)�and�the�interaction�(i�e�,�AB),�whereas�the�MSAB�
is�used�as�the�error�term�for�the�test�of�factor�A�(the�fixed�effect)��The�critical�values�used�
are� those� based� on� the� degrees� of� freedom� for� the� numerator� and� denominator� of� each�
hypothesis�tested�
Thus,� using� the� example� from� Chapter� 13,� let� us� assume� the� model� is� now� a� mixed-
effects�model�where�factor�A,�the�fixed�effect,�is�the�level�of�attractiveness�(four�catego-
ries)��Factor�B,�the�random�effect,�is�time�of�day�(two�randomly�selected�categories)��We�
obtain�as�our�test�statistic�for�the�test�of�factor�A,�the�fixed�effect�of�level�of�attractiveness,�
as�follows:
F
MS
MS
A = = =
A
AB
246 1979
7 2813
33 8124
.
.
.

491Random- and Mixed-Effects Analysis of Variance Models
for� the� test� of� factor�B,� the� random� effect� of� time�of� day,� the� test� statistic� is� computed� as�
follows:
F
MS
MS
B = = =
B
with
712 5313
11 5313
61 7911
.
.
.
and�for�the�test�of�the�AB�(fixed�by�random�effect,�levels�of�attractiveness�by�time�of�day)�
interaction,�we�find�a�test�statistic�as�follows:
F
MS
MS
AB = = =
AB
with
7 2813
11 5313
0 6314
.
.
.
The�critical�value�for�the�test�of�factor�A�(the�fixed�effect,�level�of�attractiveness)�is�found�
in�the�F�table�as�αFJ−1,�(J−1)(K−1),�which�for�the�example�is��05F3,3�=�9�28,�and�is�statistically�sig-
nificant�at�the��05�level��The�critical�value�for�the�test�of�factor�B�(the�random�effect,�time�
of�day)�is�found�in�the�F�table�as� αFK−1,N−JK,�which�for�the�example�is� �05F1,24�=�4�26,�and�is�
significant�at�the��05�level��The�critical�value�for�the�test�of�the�interaction�between�level�of�
attractiveness�and�time�of�day�is�found�in�the�F�table�as�αF(J−1)(K−1),�N−JK,�which�for�the�exam-
ple�is��05F3,24�=�3�01,�and�is�not�significant�at�the��05�level��It�just�so�happens�for�the example�
data�that�the�results�for�the�mixed-,�random-,�and�fixed-effects�models�are�the�same��This�
is�not�always�the�case�
The�formation�of�the�proper�F�ratio�is�again�related�to�the�expected�mean�squares��If�H0�is�actu-
ally�true�(i�e�,�the�variance�among�the�means�is�0),�then�the�expected mean squares�are�as�follows:
E 2MSA( ) = σε
� E
2MSB( ) = σε
� E
2MSAB( ) = σε
� E
2MSwith( ) = σε
where�σε
2�is�the�population�variance�of�the�residual�errors�
If�H0�is�actually�false�(the�variance�among�the�means�is�not�equal�to�0),�then�the�expected�
mean�squares�are�as�follows:
E 2MS n Kn Jb j
j
J
A( ) = + + −








=
∑σ σ αε α2 2
1
1/( )
� E
2MS Jn bB( ) = +σ σε 2
�
E 2MS n bAB( ) = +σ σε α2
�
E MSwith( ) = σε2
where�all�terms�have�been�previously�defined�

492 An Introduction to Statistical Concepts
As�in�previous�ANOVA�models,�the�proper�F�ratio�should�be�formed�as�follows:
F = +( ) /(systematic variability error variability error variabiility)
For�the�two-factor�mixed-effects�model,�MSAB�must�be�used�as�the�error�term�for�the�test�of�A,�
and�MSwith�must�be�used�as�the�error�term�for�the�test�of�B�and�for�the�interaction�test�
15.3.4 assumptions and Violation of assumptions
Previously� we� described� the� assumptions� for� the� two-factor� random-effects� model�� The�
assumptions�are�nearly�the�same�for�the�two-factor�mixed-effects�model,�and�we�need�not�
devote�much�attention�to�them�here��As�before,�the�assumptions�are�concerned�with�the�
distribution� of� the� dependent� variable� scores� and� of� the� random� effects�� However,� note�
that�not�much�is�known�about�the�effects�of�dependence�or�heteroscedasticity�for�random�
effects,� although� we�expect�the�effects� are�the�same�as�for�the�fixed-effects� case��A�sum-
mary�of�the�assumptions�and�the�effects�of�their�violation�for�the�two-factor�mixed-effects�
model�is�presented�in�Table�15�3�
15.3.5 Multiple Comparison procedures
For� multiple� comparisons� in� the� two-factor� mixed-effects� model,� the� researcher� is� not�
usually�interested�in�making�inferences�about�just�the�levels�of�the�random-effects�factor�
(i�e�,� B)� or� the� interaction� (i�e�,� AB)� that� were� randomly� sampled�� Thus,� estimation� of� the�
bk�or�(αb)jk�terms�does�not�provide�us�with�any�information�about�the�bk�or�(αb)jk�terms�not�
sampled��Also,�the�bk�or�(αb)jk�terms�cannot�be�summarized�by�their�means�as�they�will�not�
necessarily� sum� to� 0� for� the� levels� sampled,� only� for� the� population� of� levels�� However,�
inferences�about�the�fixed-factor�A�can�be�made�in�the�same�way�they�were�made�for�the�
two-factor� fixed-effects� model�� We� have� already� used� the� example� data� to� look� at� some�
MCPs�in�Chapter�13�
Table 15.3
Assumptions�and�Effects�of�Violations:�Two-Factor�Mixed-Effects�Model
Assumption Effect of Assumption Violation
Independence Little�is�known�about�the�effects�of�dependence;�however,�based�on�the�fixed-effects�model,�
we�might�expect�the�following:
•�Increased�likelihood�of�a�Type�I�and/or�Type�II�error�in�F
•�Affects�standard�errors�of�means�and�inferences�about�those�means
Homogeneity�
of�variance
Little�is�known�about�the�effects�of�heteroscedasticity;�however,�based�on�the�fixed-effects�
model,�we�might�expect�the�following:
•�Bias�in�SSwith
•�Increased�likelihood�of�a�Type�I�and/or�Type�II�error
•�Small�effect�with�equal�or�nearly�equal�n’s
•�Otherwise�effect�decreases�as�n�increases
Normality •�Minimal�effect�with�equal�or�nearly�equal�n’s
•�Otherwise�substantial�effects

493Random- and Mixed-Effects Analysis of Variance Models
This�concludes�our�discussion�of�random-�and�mixed-effects�models�for�the�one-�and�two-
factor�designs��For�three-factor�designs,�see�Keppel�(1982)�or�Keppel�and�Wickens�(2004)��In�
the�major�statistical�software,�the�analysis�of�random�effects�can�be�treated�as�follows:�in�SAS�
PROC�general�linear�model�(GLM),�use�the�RANDOM�statement�to�designate�random�effects;�
in�SPSS�GLM,�random�effects�can�also�be�designated,�either�in�the�point-and-click�mode�(by�
using�the�“Random�Factor(s)”�box)�or�in�the�syntax�mode�to�designate�random�effects�
15.4 One-Factor Repeated Measures Design
In� this� section,� we� describe� the� distinguishing� characteristics� of� the� one-factor� repeated�
measures� ANOVA� model,� the� layout� of� the� data,� the� linear� model,� assumptions� of� the�
model�and�their�violation,�the�ANOVA�summary�table�and�expected�mean�squares,�MCPs,�
alternative�ANOVA�procedures,�and�an�example�
15.4.1 Characteristics of the Model
The� one-factor� repeated� measures� model� is� the� logical� extension� to� the� dependent� t� test��
Although�in�the�dependent�t�test�there�are�only�two�measurements�for�each�subject�(e�g�,�the�
same�individuals�measured�prior�to�an�intervention�and�then�again�after�an�intervention),�
in� the� one-factor� repeated� measures� model,� two� or more� measurements� can� be� examined��
The�characteristics�of�the�one-factor�repeated�measures�ANOVA�model�are�somewhat�simi-
lar�to�the�one-factor�fixed-effects�model,�yet�there�are�a�number�of�obvious�exceptions��The�
first�unique�characteristic�has�to�do�with�the�fact�that�each�subject�responds�to�each�level�of�
factor�A��This�is�in�contrast�to�the�nonrepeated�case�where�each�subject�is�exposed�to�only�
one�level�of�factor�A��This�design�is�often�referred�to�as�a�within-subjects design,�as�each�
subject�responds�to�each�level�of�factor�A��Thus,�subjects�serve�as�their�own�controls�such�that�
individual�differences�are�taken�into�account��This�was�not�the�case�in�any�of�the�previously�
discussed�ANOVA�models��As�a�result,�subjects’�scores�are�not�independent�across�the�levels�
of�factor�A��Compare�this�design�to�the�one-factor�fixed-effects�model�where�total�variation�
was�decomposed�into�variation�due�to�A�(or�between)�and�due�to�the�residual�(or�within)��
In�the�one-factor�repeated�measures�design,�residual�variation�is�further�decomposed�into�
variation�due�to�subjects�and�variation�due�to�the�interaction�between�A�and�subjects��The�
reduction� in� the� residual� sum� of� squares� yields� a� more� powerful� design� as� well� as� more�
precision�in�estimating�the�effects�of�A�and�thus�is�more�economical�in�that�less�subjects�are�
necessary�than�in�previously�discussed�models�(Murphy,�Myors,�&�Wolach,�2008)�
The�one-factor�repeated�measures�design�is�also�a�mixed�model��The�subjects�factor�is�a�
random�effect,�whereas�the�A�factor�is�almost�always�a�fixed�effect��For�example,�if�time�is�the�
fixed�effect,�then�the�researcher�can�examine�phenomena�over�time��Finally,�the�one-factor�
repeated� measures� design� is� similar� in� some� ways� to� the� two-factor� mixed-effects� design�
except�with�one�subject�per�cell��In�other�words,�the�one-factor�repeated�measures�design�is�
really�a�special�case�of�the�two-factor�mixed-effects�design�with�n�=�1�per�cell��Unequal�n’s�
can�only�happen�when�subjects�miss�the�administration�of�one�or�more�levels�of�factor�A�
On�the�down�side,�the�repeated�measures�design�includes�some�risk�of�carryover�effects�from�
one�level�of�A�to�another�because�each�subject�responds�to�all�levels�of�A��As�examples�of�the�
carryover�effect,�subjects’�performance�may�be�altered�due�to�fatigue�(decreased�performance),�

494 An Introduction to Statistical Concepts
practice� (increased� performance),� or� sensitization� (increased� performance)� effects�� These�
effects�may�be�minimized�by�(a)�counterbalancing�the�order�of�administration�of�the�levels�of�
A�so�that�each�subject�does�not�receive�the�same�order�of�the�levels�of�A�(this�can�also�minimize�
problems�with�the�compound�symmetry�assumption;�see�subsequent�discussion),�(b)�allowing�
some�time�to�pass�between�the�administration�of�the�levels�of�A,�or�(c)�matching�or�blocking�
similar�subjects�with�the�assumption�of�subjects�within�a�block�being�randomly�assigned�to�a�
level�of�A��This�last�method�is�a�type�of�randomized�block�design�(see�Chapter�16)�
15.4.2 layout of data
The� layout� of� the� data� for� the� one-factor� repeated� measures� model� is� shown� in� Table� 15�4��
Here�we�see�the�columns�designated�as�the�levels�of�factor�A�and�the�rows�as�the�subject��
Thus,�the�columns�or�“levels”�of�factor�A�represent�the�different�measurements��An�example�
is�measuring�children�on�reading�performance�before,�immediately�after,�and�6�months�after�
they�participate�in�a�reading�intervention��Row,�column,�and�overall�means�are�also�shown�
in�Table�15�4,�although�the�subject�means�are�seldom�of�any�utility�(and�thus�are�not�reported�
in�research�studies)��Here�you�see�that�the�layout�of�the�data�looks�the�same�as�the�two-factor�
model,�although�there�is�only�one�observation�per�cell�
15.4.3 aNOVa Model
The�one-factor�repeated�measures�ANOVA�model�is�written�in�terms�of�population�param-
eters�as
Y s sij j i ij ij= + + + +µ α α ε( )
where
Yij�is�the�observed�score�on�the�dependent�variable�for�individual�i�responding�to�level�j�
of�factor�A
μ�is�the�overall�or�grand�population�mean
αj�is�the�fixed�effect�for�level�j�of�factor�A
si�is�the�random�effect�for�subject�i�of�the�subject�factor
(sα)ij�is�the�interaction�between�subject�i�and�level�j
εij�is�the�random�residual�error�for�individual�i�in�level�j
The�residual�error�can�be�due�to�measurement�error�and/or�other�factors�not�under�inves-
tigation��From�the�model,�you�can�see�this�is�similar�to�the�two-factor�model�only�with�one�
Table 15.4
Layout�for�the�One-Factor�Repeated�Measures�
ANOVA
Level of Factor A
(Repeated Factor)
Level of Factor S 1 2 … J Row Mean
1 Y11 Y12 … Y1J Y
–
1�
2 Y21 Y22 … Y2J Y
–
2�
… … … … … …
n Yn1 Yn2 YnJ Y
–
n�
Column�mean Y
–
�1 Y
–
�2 … Y
–
�J Y
–
��

495Random- and Mixed-Effects Analysis of Variance Models
observation�per�cell��Also,�the�fixed�effect�is�denoted�by�α�and�the�random�effect�by�s;�thus,�
we�have�a�mixed-effects�model��Lastly,�for�the�equal�n’s�model,�the�effects�for�α�and�sα�sum�
to�0�for�each�subject�(or�row)�
The�hypotheses�for�testing�the�effect�of�factor�A�are�as�follows��The�null�hypothesis�indi-
cates�that�the�means�for�each�measurement�are�the�same:
H J01 1 2: . . .µ µ µ= = … =
H j11: not all the are equal.µ
The� hypotheses� are� written� in� terms� of� means� because� factor� A� is� a� fixed� effect� (i�e�,� all�
sampled�cases�have�been�measured)�
15.4.4 assumptions and Violation of assumptions
Previously� we� described� the� assumptions� for� the� two-factor� mixed-effects� model�� The�
assumptions� are� nearly� the� same� for� the� one-factor� repeated� measures� model� (since� it� is�
similar�to�the�two-factor�mixed-effects�model)�and�are�again�mainly�concerned�with�the�
distribution�of�the�dependent�variable�scores�and�of�the�random�effects�
A� new� assumption� is� known� as� compound symmetry� and� states� that� the� covariances�
between� the� scores� of� the� subjects� across� the� levels� of� the� repeated� factor� A� are� constant��
In�other�words,�the�covariances�for�all�pairs�of�levels�of�the�fixed�factor�are�the�same�across�
the�population�of�random�effects�(i�e�,�the�subjects)��The�analysis�of�variance�(ANOVA)�is�not�
particularly�robust�to�a�violation�of�this�assumption��In�particular,�the�assumption�is�often�
violated�when�factor�A�is�time,�as�the�relationship�between�adjacent�levels�of�A�is�stronger�
than� when� the� levels� are� farther� apart�� For� example,� consider� the� previous� illustration� of�
children� measured� in� reading� performance� before,� after,� and� 6� months� after� intervention��
The�means�of�the�pre-�and�immediate�post-reading�performance�will�likely�be�more�similar�
than� the� means� of� the� pre-� and� 6� months� post-reading� performance�� If� the� assumption� is�
violated,�three�alternative�procedures�are�available��The�first�is�to�limit�the�levels�of�factor�A�
(i�e�,�the�repeated�measures�factor)�either�to�those�that�meet�the�assumption,�or�to�limit�the�
number�of�repeated�measures�to�2�(in�which�case,�there�would�be�only�one�covariance�and�
thus�nothing�to�assume)��The�second�and�more�plausible�alternative�is�to�use�adjusted�F�tests��
These�are�reported�shortly��The�third�is�to�use�multivariate�analysis�of�variance�(MANOVA),�
which�makes�no�compound�symmetry�assumption,�but�is�slightly�less�powerful��For�readers�
interested�in�MANOVA,�there�are�a�number�of�excellent�multivariate�textbooks�that�can�be�
referred�to�(e�g�,�Hair,�Black,�Babin,�Anderson,�&�Tatham,�2006;�Tabachnick�&�Fidell,�2007)�
Huynh�and�Feldt�(1970)�showed�that�the�compound�symmetry�assumption�is�a�sufficient�
but�not�necessary�condition�for�the�validity�of�the�F�test��Thus,�the�F�test�may�also�be�valid�
under�less�stringent�conditions��The�necessary�and�sufficient�condition�for�the�validity�of�
the�F�test�is�known�as�sphericity��This�assumes�that�the�variance�of�the�difference�scores�
for�each�pair�of�factor�levels�is�the�same�(e�g�,�with�J�=�3�levels,�the�variance�of�the�difference�
score�between�levels�1�and�2�is�the�same�as�the�variance�of�the�difference�score�between�
levels�1�and�3,�which�is�the�same�as�the�variance�of�the�difference�score�between�levels�2�
and�3;�thus,�another�type�of�homogeneity�of�variance�assumption)��Further�discussion�of�
sphericity�is�beyond�the�scope�of�this�text�(see�Keppel,�1982;�Kirk,�1982;�or�Myers�&�Well,�
1995)��A�summary�of�the�assumptions�and�the�effects�of�their�violation�for�the�one-factor�
repeated�measures�design�is�presented�in�Table�15�5�

496 An Introduction to Statistical Concepts
15.4.5 aNOVa Summary Table and expected Mean Squares
The� sources� of� variation� for� this� model� are� similar� to� those� for� the� two-factor� model,�
except� that� there� is� no� within-cell� variation�� The� ANOVA� summary� table� is� shown� in�
Table�15�6,�where�we�see�the�following�sources�of�variation:�A�(i�e�,�the�repeated�measure),�
subjects� (denoted� by� S),� the� SA� interaction,� and� total�� The� test� of� subject� differences� is�
of�no�real�interest��Quite�naturally,�we�expect�there�to�be�variation�among�the�subjects��
From�the�table,�we�see�that�although�three�mean�square�terms�can�be�computed,�only�one�
F�ratio�results�for�the�test�of�factor�A;�thus,�the�subjects�effect�cannot�be�tested�anyway�
as�there�is�no�appropriate�error�term��This�is�subsequently�shown�through�the�expected�
mean�squares�
Next� we� need� to� consider� the� sums� of� squares� for� the� one-factor� repeated� measures�
model��If�we�take�the�total�sum�of�squares�and�decompose�it,�we�have
SS SS SS SStotal A S SA= + +
These�three�terms�can�then�be�computed�by�statistical�software��The�degrees�of�freedom,�
mean�squares,�and�F�ratio�are�determined�as�shown�in�Table�15�6�
Table 15.5
Assumptions�and�Effects�of�Violations:�One-Factor�Repeated�Measures�Model
Assumption Effect of Assumption Violation
Independence Little�is�known�about�the�effects�of�dependence;�however,�based�on�the�
fixed-effects�model,�we�might�expect�the�following:
•�Increased�likelihood�of�a�Type�I�and/or�Type�II�error�in�F
•�Affects�standard�errors�of�means�and�inferences�about�those�means
Homogeneity�of�variance Little�is�known�about�the�effects�of�heteroscedasticity;�however,�based�on�the�
fixed-effects�model,�we�might�expect�the�following:
•�Bias�in�SSSA
•�Increased�likelihood�of�a�Type�I�and/or�Type�II�error
•�Small�effect�with�equal�or�nearly�equal�n’s
•�Otherwise�effect�decreases�as�n�increases
Normality •�Minimal�effect�with�equal�or�nearly�equal�n’s
•�Otherwise�substantial�effects
Sphericity •�F�not�particularly�robust
•��Consider�usual�F�test,�Geisser–Greenhouse�conservative�F�test,�and�adjusted�
(Huynh–Feldt)�F�test,�if�necessary
Table 15.6
One-Factor�Repeated�Measures�ANOVA�
Summary�Table
Source SS df MS F
A SSA J�−�1 MSA MSA/MSSA
S SSS n�−�1 MSS
SA SSSA (J�−�1)(n�−�1) MSSA
Total SStotal N�−�1

497Random- and Mixed-Effects Analysis of Variance Models
The�formation�of�the�proper�F�ratio�is�again�related�to�the�expected�mean�squares��If�H0�
is�actually�true�(in�other�words,�the�means�are�the�same�for�each�of�the�measures),�then�the�
expected mean squares�are�as�follows:
E A
2MS( ) = σε
E S
2MS( ) = σε
E SA
2MS( ) = σε
where�σε
2�is�the�population�variance�of�the�residual�errors�
If�H0�is�actually�false�(i�e�,�the�means�are�not�the�same�for�each�of�the�measures),�then�the�
expected�mean�squares�are�as�follows:
E A
2MS n Js j
j
J
( ) = + + −








=
∑σ σ αε α2 2
1
1/( )
�
E S
2MS J s( ) = +σ σε 2
�
E SA
2MS s( ) = +σ σε α2
where�σs
2�and�σ αs
2 �represent�variability�due�to�subjects�and�to�the�interaction�of�factor�A�and�
subjects,�respectively,�and�other�terms�are�as�before�
As�in�previous�ANOVA�models,�the�proper�F�ratio�should�be�formed�as�follows:
F = +( )/(systematic variability error variability error variabiility)
For�the�one-factor�repeated�measures�model,�MSSA�must�be�used�as�the�error�term�for�the�
test�of�A,�and�there�is�no�appropriate�error�term�for�the�test�of�S�or�the�test�of�SA�(although�
that� is� fine� as� we� are� not� really� interested� in� those� tests� anyway� since� they� refer� to� the�
individual�cases)�
As�noted�earlier�in�the�discussion�of�assumptions�for�this�model,�the�F�test�is�not�very�
robust� to� violation� of� the� compound� symmetry� assumption�� This� assumption� is� often�
violated� in� education� and� the� behavioral� sciences;� consequently,� statisticians� have� spent�
considerable�time�studying�this�problem��Research�suggests�that�the�following�sequential�
procedure�be�used�in�the�test�of�factor�A��First,�do�the�usual�F�test�that�is�quite�liberal�in�
terms�of�rejecting�H0�too�often��If�H0�is�not�rejected,�then�stop��If�H0�is�rejected,�then�continue�
with�step�2,�which�is�to�use�the�Geisser�and�Greenhouse�(1958)�conservative�F�test��For�the�
model�being�considered�here,�the�degrees�of�freedom�for�the�F�critical�value�are�adjusted�
to�be�1�and�n�−�1��If�H0�is�rejected,�then�stop��This�would�indicate�that�both�the�liberal�and�
conservative�tests�reached�the�same�conclusion�to�reject�H0��If�H0�is�not�rejected,�then�the�
two� tests� did� not� reach� the� same� conclusion,� and� a� further� test� (a� tiebreaker)� should� be�
undertaken��Thus,�in�step�3,�an�adjusted�F�test�is�conducted��The�adjustment�is�known�as�
Box’s�(1954b)�correction�(usually�referred�to�as�the�Huynh�and�Feldt�[1970]�procedure)��Here�

498 An Introduction to Statistical Concepts
the�numerator�degrees�of�freedom�are�(J�−�1)ε,�and�the�denominator�degrees�of�freedom�are�
(J�−�1)(n�−�1)ε,�where�ε�is�a�correction�factor�(not�to�be�confused�with�the�residual�term�ε)��
The�correction�factor�is�quite�complex�and�is�not�shown�here�(see�Keppel�&�Wickens,�2004;�
Myers,�1979;�Myers�&�Well,�1995;�or�Wilcox,�1987)��Most�major�statistical�software�conducts�
the�Geisser–Greenhouse�and�Huynh–Feldt� tests��The�Huynh–Feldt� test�is�recommended�
due�to�greater�power�(Keppel�&�Wickens,�2004;�Myers�&�Well,�1995);�thus,�when�available,�
you�can�simply�use�the�Huynh–Feldt�procedure�rather�than�the�previously�recommended�
sequence�
15.4.6 Multiple Comparison procedures
If�the�null�hypothesis�for�repeated�factor�(i�e�,�factor�A)�is�rejected�and�there�are�more�than�
two�levels�of�the�factor,�then�the�researcher�may�be�interested�in�which�means�or�combina-
tions�of�means�are�different�(in�other�words,�which�measurement�means�differ�from�one�
other)��This�could�be�assessed,�as�we�have�seen�in�previous�chapters,�by�the�use�of�some�
MCP��In�general,�most�of�the�MCPs�outlined�in�Chapter�12�can�be�used�in�the�one-factor�
repeated�measures�model�(see�additional�discussion�in�Keppel�&�Wickens,�2004;�Mickey,�
Dunn,�&�Clark,�2004)�
It�has�been�shown�that�these�MCPs�are�seriously�affected�by�a�violation�of�the�compound�
symmetry� assumption�� In� this� situation,� two� alternatives� are� recommended�� The� first�
alternative�is,�rather�than�using�the�same�error�term�for�each�contrast�(i�e�,�MSSA),�to�use�a�
separate�error�term�for�each�contrast�tested��Then�many�of�the�MCPs�previously�covered�in�
Chapter�12�can�be�used��This�complicates�matters�considerably�(see�Keppel,�1982;�Keppel�&�
Wickens,�2004;�or�Kirk,�1982)��A�second�alternative,�recommended�by�Maxwell�(1980)�and�
Wilcox�(1987),�involves�the�use�of�multiple�dependent�t�tests�where�the�α�level�is�adjusted�
much�like�the�Bonferroni�procedure��Maxwell�concluded�that�this�procedure�is�better�than�
many�of�the�other�MCPs��For�other�similar�procedures,�see�Hochberg�and�Tamhane�(1987)�
15.4.7 alternative aNOVa procedures
There� are� several� alternative� procedures� to� the� one-factor� repeated� measures� ANOVA�
model��These�include�the�Friedman�(1937)�test,�as�well�as�others,�such�as�the�Agresti�and�
Pendergast�(1986)�test��The�Friedman�test,�like�the�Kruskal–Wallis�test,�is�a�nonparametric�
procedure�based�on�ranks��However,�the�Kruskal–Wallis�test�cannot�be�used�in�a�repeated�
measures� model� as� it� assumes� that� the� individual� scores� are� independent�� This� is� obvi-
ously� not� the� case� in� the� one-factor� repeated� measures� model� where� each� individual� is�
exposed�to�all�levels�of�factor�A�
Let�us�outline�how�the�Friedman�test�is�conducted��First,�scores�are�ranked�within�sub-
ject��For�instance,�if�there�are�J�=�4�levels�of�factor�A,�then�the�scores�for�each�subject�would�
be�ranked�from�1�to�4��From�this,�one�can�compute�a�mean�ranking�for�each�level�of�fac-
tor� A�� The� null� hypothesis� essentially� becomes� a� test� of� whether� the� mean� rankings� for�
the�levels�of�A�are�equal��The�test�statistic�is�a�χ2�statistic��In�the�case�of�tied�ranks,�either�
the�available�ranks�can�be�averaged,�or�a�correction�factor�can�be�used�as�done�with�the�
Kruskal–Wallis�test�(see�Chapter�11)��The�test�statistic�is�compared�to�the�critical�value�of�
αχ
2
J�−1�(see�Table�A�3)��The�null�hypothesis�that�the�mean�rankings�are�the�same�for�the��levels�
of�factor�A�will�be�rejected�if�the�test�statistic�exceeds�the�critical�value�
You� may� also� recall� from� the� Kruskal–Wallis� test� the� problem� with� small� n’s� in� terms�
of� the� test� statistic� not� being� precisely� distributed� as� χ2�� The� same� problem� exists� with�
the� Friedman� test� when� J� <� 6� and� n� <� 6,� so� we� suggest� you� consult� the� table� of� critical� 499Random- and Mixed-Effects Analysis of Variance Models values� in� Marascuilo� and� McSweeney� (1977,� Table� A-22,� p�� 521)�� The� Friedman� test,� like� the� Kruskal–Wallis� test,� assumes� that� the� population� distributions� have� the� same� shape� (although�not�necessarily�normal)�and�variability�and�that�the�dependent�measure�is�con- tinuous��For�a�discussion�of�other�alternative�nonparametric�procedures,�see�Agresti�and� Pendergast�(1986),�Myers�and�Well�(1995),�and�Wilcox�(1987,�1996,�2003)��For�information�on� more�advanced�within-subjects� ANOVA� models,� see�Cotton�(1998),� Keppel� and�Wickens� (2004),�and�Myers�and�Well�(1995)� Various�MCPs�can�be�used�for�the�Friedman�test��For�the�most�part,�these�MCPs�are�ana- logs�to�their�parametric�equivalents��In�the�case�of�planned�(or�a�priori)�pairwise�compari- sons,�one�may�use�multiple�matched-pair�Wilcoxon�tests�(i�e�,�a�form�of�the�Kruskal–Wallis� test�for�two�groups)�in�a�Bonferroni�form�(i�e�,�taking�the�number�of�contrasts�into�account� through�an�adjustment�of�the�α�level;�for�example,�if�there�are�six�contrasts�with�an�alpha� of� �05,� the� adjusted� alpha� would� be� �05/6,� or� �008)�� For� post� hoc� comparisons,� numerous� parametric� analogs� are� available�� For� additional� discussion� on� MCPs� for� this� model,� see� Marascuilo�and�McSweeney�(1977)� 15.4.8 example Let�us�consider�an�example�to�illustrate�the�procedures�used�for�this�model��The�data�are� shown�in�Table�15�7,�where�there�are�eight�subjects,�each�of�whom�has�been�evaluated�by� four� raters� on� a� task� of� writing� assessment�� First,� let� us� take� a� look� at� the� results� for� the� parametric�ANOVA�model,�as�shown�in�Table�15�8��The�F�test�statistic�is�compared�to�the� usual�F�test�critical�value�of��05F3,21�=�3�07,�which�is�significant��For�the�Geisser–Greenhouse� conservative� procedure,� the� test� statistic� is� compared� to� the� critical� value� of� �05F1,7� =� 5�59,� which� is�also�significant��The�two�procedures�both�yield�a�statistically�significant� result;� thus,�we�need�not�be�concerned�with�a�violation�of�the�compound�symmetry�assumption�� As�an�example�MCP,�the�Bonferroni�procedure�determined�that�all�pairs�of�raters�are�sig- nificantly�different�from�one�another,�except�for�rater�1�versus�rater�2� Finally,�let�us�take�a�look�at�the�Friedman�test��The�test�statistic�is�χ2�=�22�9500��This�test� statistic�is�compared�to�the�critical�value��05χ23�=�7�8147,�which�is�significant��Thus,�the�con- clusions�for�the�parametric�ANOVA�and�nonparametric�Friedman�tests�are�the�same�here�� This�will�not�always�be�the�case,�particularly�when�ANOVA�assumptions�are�violated� Table 15.7 Data�for�the�Writing�Assessment�Example�One-Factor�Design:� Raw Scores�and�Rank�Scores�on�the�Writing�Assessment�Task� by Subject�and�Rater Rater 1 Rater 2 Rater 3 Rater 4 Subject Raw Rank Raw Rank Raw Rank Raw Rank 1 3 1 4 2 7 3 8 4 2 6 2 5 1 8 3 9 4 3 3 1 4 2 7 3 9 4 4 3 1 4 2 6 3 8 4 5 1 1 2 2 5 3 10 4 6 2 1 3 2 6 3 10 4 7 2 1 4 2 5 3 9 4 8 2 1 3 2 6 3 10 4 500 An Introduction to Statistical Concepts 15.5 Two-Factor Split-Plot or Mixed Design In� this� section,� we� describe� the� distinguishing� characteristics� of� the� two-factor� split- plot� or� mixed� ANOVA� design,� the� layout� of� the� data,� the� linear� model,� assumptions� and their� violation,� the� ANOVA� summary� table� and� expected� mean� squares,� MCPs,� and�an�example� 15.5.1 Characteristics of the Model The�characteristics�of�the�two-factor�split-plot�or�mixed�ANOVA�design�are�a�combina- tion�of�the�characteristics�of�the�one-factor�repeated�measures�and�the�two-factor�fixed- effects�models��It�is�unique�because�there�are�two�factors,�only�one�of�which�is�repeated�� For�this�reason,�the�design�is�often�called�a�mixed design��Thus,�one�of�the�factors�is�a� between-subjects�factor,�the�other�is�a�within-subjects�factor,�and�the�result�is�known�as�a� split-plot design�(from�agricultural�research)��Each�subject�then�responds�to�every�level� of�the�repeated�factor�but�to�only�one�level�of�the�nonrepeated�factor��Subjects�then�serve� as�their�own�controls�for�the�repeated�factor�but�not�for�the�nonrepeated�factor��The�other� characteristics� carry� over� from� the� one-factor� repeated� measures� model� and� the� two- factor�model� 15.5.2 layout of data The�layout�of�the�data�for�the�two-factor�split-plot�or�mixed�design�is�shown�in�Table�15�9�� Here� we� see� the� rows� designated� as� the� levels� of� factor� A,� the� between-subjects� or� non- repeated�factor,�and�the�columns�as�the�levels�of�factor�B,�the�within-subjects�or�repeated� factor��Within�each�factor�level�combination�or�cell�are�the�subjects��Notice�that�the�same� subjects�appear�at�all�levels�of�factor�B�(the�within-subjects�factor,�the�repeated�measure)� but�only�at�one�level�of�factor�A�(the�between-subjects�factor)��Row,�column,�cell,�and�over- all�means�are�also�shown��Here�you�see�that�the�layout�of�the�data�looks�much�the�same�as� the�two-factor�model� Table 15.8 One-Factor�Repeated�Measures�ANOVA� Summary�Table�for�the�Writing�Assessment� Example Source SS df MS F Within�subjects Rater�(A) 198�125 3 66�042 73�477a Error�(SA) 18�875 21 �899 Between�subjects Error�(S) 14�875 7 2�125 Total 231�875 31 a� �05F3,21�=�3�07� 501Random- and Mixed-Effects Analysis of Variance Models 15.5.3 aNOVa Model The�two-factor�split-plot�model�can�be�written�in�terms�of�population�parameters�as Y s sijk j i j k jk ki j ijk= + + + + + +( )µ α β αβ β ε( ) ( ) ( ) where Yijk�is�the�observed�score�on�the�dependent�variable�for�individual�i�in�level�j�of�factor�A� (the�between-subjects�factor)�and�level�k�of�factor�B�(i�e�,�the�jk�cell,�the�within-subjects� factor�or�repeated�measure) μ�is�the�overall�or�grand�population�mean�(i�e�,�regardless�of�cell�designation) αj�is�the�effect�for�level�j�of�factor�A�(row�effect�for�the�nonrepeated�factor) Table 15.9 Layout�for�the�Two-Factor�Split-Plot�or�Mixed�ANOVA Level of Factor A (Nonrepeated Factor) Level of Factor A (Repeated Factor) 1 2 … K Row Mean 1 Y111 Y112 … Y11K � � … � � � … � Y – �1� � � … � Yn11 Yn12 … Yn1K — — — Y – �11 Y – �12 Y – �1K 2 Y121 Y122 … Y12K � � … � � � … � Y – �2� � � … � Yn21 Yn22 … Yn2K — — — Y – �21 Y – �22 … Y – �2K � � � … � � � � � … � � � � � … � � J Y1J1 Y1J2 … Y1JK � � … � � � … � Y – �J� � � … � YnJ1 YnJ2 … YnJK — — — Y – �J1 Y – �J2 … Y – �JK Column�Mean Y – ��1 Y – ��2 Y – ��K Y – … Note:� Each� subject� is� measured� at� all� levels� of� factor� B,� but� at� only� one�level�of�factor�A� 502 An Introduction to Statistical Concepts si(j)�is�the�effect�of�subject�i�that�is�nested�within�level�j�of�factor�A�(i�e�,�i(j)�denotes�that� i�is�nested�within�j) βk�is�the�effect�for�level�k�of�factor�B�(column�effect�for�the�repeated�factor) (αβ)jk�is�the�interaction�effect�for�the�combination�of�level�j�of�factor�A�and�level�k�of�factor�B (βs)ki(j)� is� the� interaction� effect� for� the� combination� of� level� k� of� factor� B� (the� within- subjects� factor,� the� repeated� measure)� and� subject� i� that� is� nested� within� level� j� of� factor�A�(the�between-subjects�factor) εijk�is�the�random�residual�error�for�individual�i�in�cell�jk We�use�the�terminology�“subjects�are�nested�within�factor�A”�to�indicate�that�a�particular� subject�si�is�only�exposed�to�one�level�of�factor�A�(the�between-subjects�factor),�level�j��This� observation� is� then� denoted� in� the� subjects� effect� by� si(j)� and� in� the� interaction� effect� by� (βs)ki(j)��This�is�due�to�the�fact�that�not�all�possible�combinations�of�subject�with�the�levels� of�factor�A�are�included�in�the�model��A�more�extended�discussion�of�designs�with�nested� factors� is� given� in� Chapter� 16�� The� residual� error� can� be� due� to� individual� differences,� �measurement�error,�and/or�other�factors�not�under�investigation��We�assume�for�now�that� A�and�B�are�fixed-effects�factors�and�that�S�is�a�random-effects�factor� It�should�be�mentioned�that�for�the�equal�n’s�model,�the�sum�of�the�row�effects,�the�sum�of� the�column�effects,�and�the�sum�of�the�interaction�effects�are�all�equal�to�0,�both�across�rows� and� across� columns�� This� implies,� for� example,� that� if� there� are� any� nonzero� row� effects,� then�the�row�effects�will�balance�out�around�0�with�some�positive�and�some�negative�effects� The�hypotheses�to�be�tested�here�are�exactly�the�same�as�in�the�nonrepeated�two-factor� ANOVA� model� (see� Chapter� 13)�� For� the� two-factor� ANOVA� model,� there� are� three� sets� of�hypotheses,�one�for�each�of�the�main�effects�and�one�for�the�interaction�effect��The�null� and�alternative�hypotheses,�respectively,�for�testing�the�main�effect�of�factor�A�(between- subjects�factor)�are�as�follows: H J01 1 2: . . . . . .µ µ µ= = … = H j11 not all the are equal: . .µ The� hypotheses� for� testing� the� main� effect� of� factor� B� (within-subjects� factor,� i�e�,� the� repeated�measure)�are�noted�as�follows: H K02 1 2: .. .. ..µ µ µ= = … = H k12 : not all the are equal..µ Finally,�the�hypotheses�for�testing�the�interaction�effect�(between�by�within�factors)�are�as� follows: H j kjk j k03: ( ) 0 for all and. . . ..µ µ µ µ− − + = H jk j k13 : not all the ( ). . . ..µ µ µ µ− − + = 0 If� one� of� the� null� hypotheses� is� rejected,� then� the� researcher� may� want� to� consider� an� MCP�so�as�to�determine�which�means�or�combination�of�means�are�significantly�differ- ent�(discussed�later�in�this�chapter)� 503Random- and Mixed-Effects Analysis of Variance Models 15.5.4 assumptions and Violation of assumptions Previously�we�described�the�assumptions�for�the�different�two-factor�models�and�the�one- factor� repeated� measures� model�� The� assumptions� for� the� two-factor� split-plot� or� mixed� design�are�actually�a�combination�of�these�two�sets�of�assumptions� The� assumptions� can� be� divided� into� two� sets� of� assumptions,� one� for� the� between- subjects� factor� and� one� for� the� within-subjects� (or� repeated� measures)� factor�� For� the� between-subjects�factor,�we�have�the�usual�assumptions�of�population�scores�being�ran- dom,�independent,�and�normally�distributed�with�equal�variances��For�the�within-sub- jects�factor�(i�e�,�the�repeated�measure),�the�assumption�is�the�already�familiar�compound� symmetry�assumption��For�this�design,�the�assumption�involves�the�population�covari- ances�for�all�pairs�of�the�levels�of�the�within-subjects�factor�(i�e�,�k�and�k’)�being�equal,�at� each�level�of�the�between-subjects�factor�(for�all�levels�j)��To�deal�with�this�assumption,� we� look� at� alternative� F� tests� in� the� next� section�� A� summary� of� the� assumptions� and� the� effects� of� their� violation� for� the� two-factor� split-plot� or� mixed� design� is� presented� in Table�15�10� 15.5.5 aNOVa Summary Table and expected Mean Squares The�ANOVA�summary�table�is�shown�in�Table�15�11,�where�we�see�the�following�sources�of� variation:�A,�S,�B,�AB,�BS,�and�total��The�table�is�divided�into�within-subjects�sources�and� between-subjects�sources��The�between-subjects�sources�are�A�and�S,�where�S�will�be�used� as� the� error� term� for� the� test� of� factor� A�� The� within-subjects� sources� are� B,� AB,� and� BS,� where�BS�will�be�used�as�the�error�term�for�the�test�of�factor�B�and�of�the�AB�interaction�� This�will�become�clear�when�we�examine�the�expected�mean�squares�shortly� Next�we�need�to�consider�the�sums�of�squares�for�the�two-factor�mixed�design��Taking� the�total�sum�of�squares�and�decomposing�it�yields SS SS SS SS SS SStotal A S B AB BS= + + + + We�leave�the�computation�of�these�five�terms�for�statistical�software��The�degrees�of�freedom,� mean�squares,�and�F�ratios�are�computed�as�shown�in�Table�15�11� Table 15.10 Assumptions�and�Effects�of�Violations:�Two-Factor�Split-Plot�or�Mixed�Model Assumption Effect of Assumption Violation Independence •�Increased�likelihood�of�a�Type�I�and/or�Type�II�error�in�F •�Affects�standard�errors�of�means�and�inferences�about�those�means Homogeneity�of�variance •�Bias�in�error�terms •�Increased�likelihood�of�a�Type�I�and/or�Type�II�error •�Small�effect�with�equal�or�nearly�equal�n’s •�Otherwise�effect�decreases�as�n�increases Normality •�Minimal�effect�with�equal�or�nearly�equal�n’s •�Otherwise�substantial�effects Sphericity •�F�not�particularly�robust •��Consider�usual�F�test,�Geisser–Greenhouse�conservative�F�test,�and�adjusted� (Huynh–Feldt)�F�test,�if�necessary 504 An Introduction to Statistical Concepts The�formation�of�the�proper�F�ratio�is�again�related�to�the�expected�mean�squares��If�H0�is� actually�true�(i�e�,�the�means�are�really�equal),�then�the�expected mean squares�are�as�follows: E 2MSA( ) = σε E 2MSS( ) = σε E 2MSB( ) = σε � E 2MSAB( ) = σε � E 2MSBS( ) = σε where�σε 2�is�the�population�variance�of�the�residual�errors� If�H0�is�actually�false�(i�e�,�the�means�are�really�not�equal),�then�the�expected�mean�squares� are�as�follows: E 2MS K nK Js j j J A( ) = + + −         = ∑σ σε 2 2 1 1α /( ) � E 2MS K sS( ) = +σ σε 2 � E 2MS nJ Ks k k K B( ) = + + −        = ∑σ σ βε β2 2 1 1/( ) � E 2MS n J Ks jk k K j J AB( ) = + + − −         == ∑∑σ σ αβε β2 2 11 1 1( ) /( )( ) Table 15.11 Two-Factor�Split-Plot�or�Mixed�Model�ANOVA� Summary�Table Source SS df MS F Between�subjects A SSA J�−�1 MSA MSA/MSS S SSS J(n�−�1) MSS Within�subjects B SSB K�−�1 MSB MSB/MSBS AB SSAB (J�−�1)�(K�−�1) MSAB MSAB/MSBS BS SSBS (K�−�1)�J(n�−�1) MSBS Total SStotal N�−�1 505Random- and Mixed-Effects Analysis of Variance Models E 2MS sBS( ) = +σ σε β2 where�σβ s 2 � represents� variability� due� to� the� interaction� of� factor� B� (the� within-subjects� or� repeated�measures�factor)�and�subjects,�and�the�other�terms�are�as�before� As�in�previous�ANOVA�models,�the�proper�F�ratio�should�be�formed�as�follows: F = +( ) /(systematic variability error variability error variabiility) For� the� two-factor� split-plot� design,� the� error� term� for� the� proper� test� of� factor� A� (the� between-subjects� factor)� is� the� S� term,� whereas� the� error� term� for� the� proper� tests� of� factor�B�(the�within-subjects�or�repeated�measures�factor)�and�the�AB�interaction�is�the� BS�interaction��For�models�where�factors�A�and�B�are�not�both�fixed-effects�factors,�see� Keppel�(1982)� As�the�compound�symmetry�assumption�is�often�violated,�we�again�suggest�the�follow- ing�sequential�procedure�to�test�for�B�(the�repeated�measure)�and�for�AB�(the�within-�by� between-subjects� factor� interaction)�� First,� do� the� usual� F� test,� which� is� quite� liberal� in� terms�of�rejecting�H0�too�often��If�H0�is�not�rejected,�then�stop��If�H0�is�rejected,�then�con- tinue�with�step�2,�which�is�to�use�the�Geisser�and�Greenhouse�(1958)�conservative�F�test�� For� the� model� under� consideration� here,� the� degrees� of� freedom� for� the� F� critical� values� are�adjusted�to�be�1�and�J(n�−�1)�for�the�test�of�B,�and�J�−�1�and�J(n�−�1)�for�the�test�of�the� AB�interaction��There�is�no�conservative�test�necessary�for�factor�A,�the�between-subjects� or�nonrepeated�factor,�as�the�assumption�does�not�apply;�thus,�the�usual�test�is�all�that�is� necessary�for�the�test�of�A��If�H0�for�B�and/or�AB�is�rejected,�then�stop��This�would�indicate� that�both�the�liberal�and�conservative�tests�reached�the�same�conclusion�to�reject�H0��If�H0� is�not�rejected,�then�the�two�tests�did�not�yield�the�same�conclusion,�and�an�adjusted�F�test� is�conducted��The�adjustment�is�known�as�Box’s�(1954b)�correction�(or�the�Huynh�and�Feldt� [1970]� procedure)�� Most� major� statistical� software� conducts� the� Geisser–Greenhouse� and� Huynh–Feldt�tests� 15.5.6 Multiple Comparison procedures Consider�the�situation�where�the�null�hypothesis�for�any�of�the�three�hypotheses�is�rejected� (i�e�,� for� A,� B,� and/or� AB)�� If� there� is� more� than� one� degree� of� freedom� in� the� numerator� for� any� of� these� hypotheses,� then� the� researcher� may� be� interested� in� which� means� or� combinations� of� means� are� different�� This� could� be� assessed� again� by� the� use� of� some� MCP��Thus,�the�procedures�outlined�in�Chapter�13�(i�e�,�for�main�effects�and�for�simple�and� complex� interaction� contrasts)� for� the� regular� two-factor� ANOVA� model� can� be� adapted� to�this�model� However,� it� has� been� shown� that� the� MCPs� involving� the� repeated� factor� are� seri- ously� affected� by� a� violation� of� the� compound� symmetry� assumption�� In� this� situa- tion,�two�alternatives�are�recommended��The�first�alternative�is,�rather�than�using�the� same�error�term�for�each�contrast�involving�the�repeated�factor�(i�e�,�MSB�or�MSAB),�to� use�a�separate�error�term�for�each�contrast�tested��Then�many�of�the�MCPs�previously� covered�in�Chapter�12�can�be�used��This�complicates�matters�considerably�(see�Keppel,� 1982;� Keppel� &� Wickens,� 2004;� or� Kirk,� 1982)�� The� second� and� simpler� alternative� is� 506 An Introduction to Statistical Concepts suggested� by� Shavelson� (1988)�� He� recommended� that� the� appropriate� error� terms� be� used�in�MCPs�involving�the�main�effects,�but�for�interaction�contrasts,�both�error�terms� be�pooled�(or�added)�together�(this�procedure�is�conservative�yet�simpler�than�the�first� alternative)� 15.5.7 example Consider� now� an� example� problem� to� illustrate� the� two-factor� mixed� design�� Here� we� expand�on�the�example�presented�earlier�in�this�chapter�by�adding�a�second�factor�to�the� model��The�data�are�shown�in�Table�15�12,�where�there�are�eight�subjects,�each�of�whom� has� been� evaluated� by� four� raters� on� a� task� of� writing� assessment� (rater� is� the� within- subjects�factor�as�each�individual�has�been�evaluated�by�four�raters)��Ratings�on�the�writ- ing�assessment�can�range�from�1�(lowest�rating)�to�10�(highest�rating)��Each�student�was� also�randomly�assigned�to�one�of�two�instructors��Thus,�factor�A�represents�the�instruc- tors�of�English�composition,�where�we�see�that�four�subjects�are�randomly�assigned�to� level� 1� of� factor� A� (i�e�,� instructor� 1)� and� the� remaining� four� to� level� 2� of� factor� A� (i�e�,� instructor�2)��Thus,�factor�B�(i�e�,�rater)�is�repeated�(the�within-subjects�factor),�and�factor� A� (i�e�,� instructor)� is� not� repeated� (the� between-subjects� factor)�� The� ANOVA� summary� table�is�shown�in�Table�15�13� The� test� statistics� are� compared� to� the� following� usual� F� test� critical� values:� for� factor� A� (the� between-subjects� factor� that� tests� mean� differences� based� on� instruc- tor),��05F1,6�=�5�99,�which�is�not�statistically�significant;�for�factor�B�(the�within-subjects� factor� that� tests� mean� differences� based� on� repeated� ratings),� �05F3,18� =� 3�16,� which� is� significant;� and� for� AB,� �05F3,18� =� 3�16,� which� is� also� statistically� significant�� For� the� Geisser–Greenhouse� conservative� procedure,� the� test� statistics� are� compared� to� the� following� critical� values:� for� factor� A� (i�e�,� instructor),� no� conservative� procedure� is� necessary;� for� factor� B� (i�e�,� repeated� measure� rater),� �05F1,6� =� 5�99,� which� is� also� sig- nificant;� and� for� the� interaction� AB� (instructor� by� rater),� �05F1,6� =� 5�99,� which� is� also� significant�� The� usual� and� Geisser–Greenhouse� procedures� both� yield� a� statistically� significant� result� for� factor� B� (rater)� and� for� the� interaction� AB� (instructor� by� rater);� Table 15.12 Data�for�the�Writing�Assessment�Example�Two-Factor� Design:�Raw�Scores�on�the�Writing�Assessment�Task� by Instructor�and�Rater Factor A (Nonrepeated Factor) Factor B (Repeated Factor) Instructor Subject Rater 1 Rater 2 Rater 3 Rater 4 1 1 3 4 7 8 2 6 5 8 9 3 3 4 7 9 4 3 4 6 8 2 5 1 2 5 10 6 2 3 6 10 7 2 4 5 9 8 2 3 6 10 507Random- and Mixed-Effects Analysis of Variance Models thus,�we�need�not�be�concerned�with�a�violation�of�the�sphericity�assumption��A�pro- file�plot�of�the�interaction�is�shown�in�Figure�15�2� There�is�a�significant�AB�(i�e�,�instructor�by�rater)�interaction,�so�we�should�follow�this�up� with�simple�interaction�contrasts,�each�involving�only�four�cell�means��As�an�example�of� an�MCP,�consider�the�contrast ψ’ = − − − = − − −( ) ( ) ( . . ) ( . .. . . .Y Y Y Y11 21 14 24 4 3 7500 1 7500 8 5000 9 7500)) .8125 4 = Table 15.13 Two-Factor�Split-Plot�ANOVA�Summary�Table� for the�Writing�Assessment�Example Source SS df MS F Between�subjects Instructor�(A) 6�125 1 6�125 4�200b Error�(S) 8�750 6 1�458 Within�subjects Rater�(B) 198�125 3 66�042 190�200a Instructor�×�rater 12�625 3 4�208 12�120a Error�(BS) 6�250 18 �347 Total 231�875 31 a� �05F3,18�=�3�16� b� �05F1,6�=�5�99� Es tim at ed m ar gi na l m ea ns Estimated marginal means of MEASURE_1 2.00 4.00 6.00 8.00 10.00 .00 Rater 1 2 3 4 Instructor Instructor 2 Instructor 1 FIGuRe 15.2 Profile�plot�for�example�writing�data� 508 An Introduction to Statistical Concepts with�a�standard�error�computed�as�follows: se MS c n jk k K j J jk ψ’ BS=               = + +== ∑∑ 2 11 0 3472 1 16 1 16 . ( / / 11 16 1 16 4 0 1473 / / ) . + = Using�the�Scheffé�procedure,�we�formulate�the�following�as�the�test�statistic: t se = = = ψ ψ ’ ’ 0 8125 0 1473 5 5160 . . . This�is�compared�with�the�critical�value�presented�here: ( )( ) ( ) ( . ) .( )( ),( ) ( ) . ,J K F FJ K K J n− − = = =− − − −1 1 3 3 3 16 3 01 1 1 1 05 3 18α 7790 Thus,�we�may�conclude�that�the�tetrad�interaction�difference�between�the�first�and�second� levels�of�factor�A�(instructor)�and�the�first�and�fourth�levels�of�factor�B�(rater,�the�repeated� measure)�is�significant��In�other�words,�rater�1�finds�better�writing�among�the�students�of� instructor�1�than�instructor�2,�whereas�rater�4�finds�better�writing�among�the�students�of� instructor�2�than�instructor�1� Although� we� have� only� considered� the� basic� repeated� measures� designs� here,� more� complex�repeated�measures�designs�also�exist��For�further�information,�see�Myers�(1979),� Keppel�(1982),�Kirk�(1982),�Myers�and�Well�(1995),�Glass�and�Hopkins�(1996),�Cotton�(1998),� Keppel�and�Wickens�(2004),�as�well�as�alternative�ANOVA�procedures�described�by�Wilcox� (2003)�and�McCulloch�(2005)��To�analyze�repeated�measures�designs�in�SAS,�use�the�GLM� procedure� with� the� REPEATED� statement�� In� SPSS� GLM,� use� the� repeated� measures� program� 15.6 SPSS and G*Power Next�we�consider�SPSS�for�the�models�presented�in�this�chapter��Note�that�all�of�the�designs� in�this�chapter�are�discussed�in�the�SPSS�context�by�Page,�Braver,�and�MacKinnon�(2003)�� This�is�followed�by�an�illustration�of�the�use�of�G*Power�for�post�hoc�and�a�priori�power� analysis�for�the�two-factor�split-plot�ANOVA� One-Factor Random-Effects ANOVA To�conduct�a�one-factor�random-effects�ANOVA�analysis,�there�are�only�two�differences� from�the�one-factor�fixed-effects�ANOVA�(Chapter�11)��Otherwise,�the�form�of�the�data� and�the�conduct�of�the�analyses�are�exactly�the�same��In�terms�of�the�form�of�the�data,�one� column�or�variable�indicates�the�levels�or�categories�of�the�independent�variable�(i�e�,�the� 509Random- and Mixed-Effects Analysis of Variance Models random�factor),�and�the�second�is�for�the�dependent�variable��Each�row�then�represents� one�individual,�indicating�the�level�or�group�that�individual�is�a�member�of�(1,�2,�3,�or�4�in� our�example;�recall�that�for�the�one-factor�random-effects�ANOVA,�these�categories�are� randomly�selected�from�the�population�of�categories),�and�their�score�on�the�dependent� variable��Thus,�we�wind�up�with�two�long�columns�of�group�values�and�scores�as�shown� in�the�following�screenshot��We�will�use�the�data�from�Chapter�11�to�illustrate,�this�time� assuming�the�independent�variable�is�a�random�factor�rather�than�fixed� The form of the data for the one-factor random effects ANOVA follows that of the one-factor fixed effects ANOVA. The “independent variable” (which is now a random rather than fixed effect) is labeled “Group” where each value represents the category to which the student was randomly assigned. The categories of the random factor were randomly selected from the population of categories. The “dependent variable” is “Labs” and represents the number of statistics labs the student attended. Step 1:� To� conduct� a� one-factor� random-effects� ANOVA,� go� to� “Analyze”� in� the� top� pulldown� menu,� then� select� “General Linear Model,”� and� then� select� “Univariate.”� Following� the� screenshot� (step� 1)� as� follows� produces� the� “Univariate”�dialog�box� 510 An Introduction to Statistical Concepts One-factor random effects ANOVA: Step 1 C B A Step 2:�Click�the�dependent�variable�(e�g�,�number�of�statistics�labs�attended)�and�move� it� into� the�“Dependent Variable”� box� by� clicking� the� arrow� button�� Click� the� inde- pendent�variable�(e�g�,�level�of�attractiveness;�this�is�the�random-effects�factor)�and�move� it�into�the�“Random Factors”�box�by�clicking�the�arrow�button��On�this�“Univariate”� dialog�screen,�you�will�notice�that�while�the�“Post hoc” option�button�is�active,�clicking� on�“Post hoc”�will�produce�a�dialog�box�with�no�active�options�as�we�are�now�dealing� with�a�random�factor�rather�than�fixed�factor��Post�hoc�MCPs�are�only�available�from�the� “Options”�screen�as�we�will�see�in�the�following�screenshots� Univariate Clicking on “Plots” will allow you to generate profile plots. Clicking on “Save” will allow you to save various forms of residuals, among other variables. Clicking on “Options” will allow you to obtain a number of other statistics (e.g., descriptive statistics, effect size, power, homogeneity tests, and multiple comparison procedures). Select the dependent variable from the list on the left and use the arrow to move to the “Dependent Variable” box on the right. Select the random factor from the list on the left and use the arrow to move to the “Random Factor(s)” box on the right. One-factor random effects ANOVA: Step 2 511Random- and Mixed-Effects Analysis of Variance Models Step 3:� Clicking� on� “Options”� provides� the� option� to� select� such� information� as� “Descriptive Statistics,” “Estimates of effect size,” “Observed power,”� and�“Homogeneity tests”� (i�e�,� Levene’s� test� for� equal� variances)�� Click� on� “Continue” to�return�to�the�original�dialog�box��Note that if you are interested in an MCP, post hoc MCPs are only available from the�“Options”�screen��To�select�a�post�hoc�procedure,� click�on�“Compare main effects”�and�use�the�toggle�menu�to�reveal�the�Tukey LSD, Bonferroni,�and�Sidak�procedures�� However,� we�have�already�mentioned�that�MCPs� are�not�generally�of�interest�for�this�model� While post hoc MCPs are usually not of interest in random effects models, if you wish to conduct a post hoc test, that selection must be made from this screen using the “Compare main effects” option then selecting one of the three MCPs that are available from the toggle menu under “Confidence interval adjustment” (i.e., LSD, Bonferroni, or Sidak). One-factor random effects ANOVA: Step 3 Select from the list on the left those variables that you wish to display means for and use the arrow to move to the “Display means for” box on the right. Step 4:�From�the�“Univariate”�dialog�box,�click�on�“Plots”�to�obtain�a�profile�plot� of� means�� Click� the� random� factor� (e�g�,� level� of� attractiveness� labeled� as� “Group”)� and� move�it�into�the�“Horizontal Axis”�box�by�clicking�the�arrow�button�(see�screenshot� step�4a)��Then�click�on�“Add”�to�move�the�variable�into�the�“Plots”�box�at�the�bottom� of�the�dialog�box�(see�screenshot�step�4b)��Click�on�“Continue”�to�return�to�the�original� dialog�box� 512 An Introduction to Statistical Concepts One-factor random effects ANOVA: Step 4a Select the random factor from the list on the left and use the arrow to move to the “Horizontal Axis” box on the right. One-factor random effects ANOVA: Step 4b �en click “Add” to move the variable into the “Plots” box at the bottom. 513Random- and Mixed-Effects Analysis of Variance Models Step 5:�From�the�“Univariate”�dialog�box�(see�screenshot�step�2),�click�on�“Save”�to� select�those�elements�that�you�want�to�save��In�our�case,�we�want�to�save�the�unstandard- ized�residuals�which�will�be�used�later�to�examine�the�extent�to�which�normality�and�inde- pendence�are�met��Thus,�place�a�checkmark�in�the�box�next�to�“Unstandardized.”�Click� “Continue”�to�return�to�the�main�“Univariate”�dialog�box��From�the�“Univariate”� dialog�box,�click�on “Ok”�to�return�to�generate�the�output� One-factor random effects ANOVA: Step 5 Two-Factor Random-Effects ANOVA To�run�a�two-factor�random-effects�ANOVA�model,�there�are�the�same�two�differences� from� the� two-factor� fixed-effects� ANOVA� (covered� in� Chapter� 13)�� First,� on� the� GLM� screen�(shown�in�the�following�screenshot),�click�both�factor�names�into�the�“Random Factor(s)”�box�rather�than�the�“Fixed Factor(s)”�box��Second,�the�same�situation� exists� with� MCPs:� if� you� are� interested� in� an� MCP,� post� hoc� MCPs� are� only� available� from�the�“Options”�screen��However,�we�have�already�mentioned�that�MCPs�are�not� generally� of� interest� for� this� model�� For� brevity,� the� subsequent� screenshots� are� not� presented� 514 An Introduction to Statistical Concepts Two-factor random- effects ANOVA Select the dependent variable from the list on the left and use the arrow to move it to the “Dependent Variable” box on the right. Select the random factors from the list on the left and use the arrow to move them to the “Random Factor(s)” box on the right. Clicking on “Plots” will allow you to generate profile plots. Clicking on “Save” will allow you to save various forms of residuals, among other variables. Clicking on “Options” will allow you to obtain a number of other statistics (e.g., descriptive statistics, effect size, power, homogeneity tests, and multiple comparison procedures). Two-Factor Mixed-Effects ANOVA To�conduct�a�two-factor�mixed-effects�ANOVA,�there�are�three�differences�from�the�two- factor�fixed-effects�ANOVA�when�using�SPSS�to�analyze�the�model��The�first�is�that�both� a�random-�and�a�fixed-effects�factor�must�be�defined�(see�screenshot�step�2�that�follows)�� The�second�difference�is�that�post�hoc�MCPs�for�the�fixed-effects�factor�are�available�from� either�the�“Post Hoc”�or�“Options”�screens,�while�for�the�random-effects�factor,�they� are�only�available�from�the�“Options”�screen��The�third�difference�is�related�to�the�out- put�provided�by�SPSS��Unfortunately�the�F�statistic�for�any�main�effect�that�is�random�in�a� mixed-effects�model�is�computed�incorrectly�in�SPSS�because�the�wrong�error�term�is�used� when�implementing�the�SPSS�point-and-click�mode��As�described�in�Lomax�and�Surman� (2007)� and� extended� by� Li� and� Lomax� (2011),� you� need� to� (a)� compute� the� F� statistics� by� hand�from�the�MS�values�(which�are�correct),�(b)�use�SPSS�syntax�where�the�user�indicates� the�proper� error� terms,�or�(c)�use�a�different�software�package� (e�g�,�SAS,�where�the�user� also�provides�the�proper�error�terms)��These�options�are�not�presented�here��Rather,�read- ers�are�referred�to�the�appropriate�references��For�the�purpose�of�this�illustration,�we�will� use�the�statistics�lab�data��The�dependent�variable�remains�the�same—the�number�of�sta- tistics�labs�attended��The�level�of�attractiveness�will�be�a�fixed�factor,�and�the�time�of�day� will�be�a�random�factor� Step 1:�To�conduct�a�one-factor�fixed-effects�ANOVA,�go�to�“Analyze”�in�the�top�pull- down�menu,�then�select�“General Linear Model,”�and�then�select�“Univariate.”� Following� screenshot� step� 1� for� the� one-factor� random-effects� ANOVA� presented� previ- ously�produces�the�“Univariate”�dialog�box� 515Random- and Mixed-Effects Analysis of Variance Models Step 2:� Per� screenshot� step� 2� that� follows,� click� the� dependent� variable� (e�g�,� number� of� statistics� labs� attended)� and� move� it� into� the� “Dependent Variable”� box� by� clicking� the� arrow� button�� Click� the� fixed� factor� (e�g�,� level� of� attractiveness)� and� move� it� into� the� “Fixed Factors”�box�by�clicking�the�arrow�button��Click�the�random�factor�(e�g�,�time�of� day)�and�move�it�into�the�“Random Factors”�box�by�clicking�the�arrow�button��Next,�click� on�“Options.”�Please�note�that�post�hoc�MCPs�for�the�fixed-effects�factor�(in�this�case,�level� of�attractiveness)�are�available�from�either�the�“Post Hoc”�or�“Options”�screens,�while� for�the�random-effects�factor,�they�are�only�available�from�the�“Options” screen��Because� these�steps�have�been�presented�in�previous�screenshots�(e�g�,�Chapter�12�for�MCPs�and�the� one-factor�random-effects�previously�shown�in�this�chapter),�they�are�not�repeated�here� Two-factor mixed- effects ANOVA: Step 2 Select the dependent variable from the list on the left and use the arrow to move it to the “Dependent Variable” box on the right. Select the random factor (or fixed factor) from the list on the left and use the arrow to move it to the “Random Factor(s)” (or “Fixed Factor(s)”) box on the right. Clicking on “Plots” will allow you to generate profile plots. Clicking on “Save” will allow you to save various forms of residuals, among other variables. Clicking on “Options” will allow you to obtain a number of other statistics (e.g., descriptive statistics, effect size, power, homogeneity tests, and multiple comparison procedures). One-Factor Repeated Measures ANOVA In�order�to�run�a�one-factor�repeated�measures�ANOVA�model,�the�data�have�to�be�in�the�form� suggested�by�the�following�screenshot��Each�row�represents�one�person�in�our�sample��All�of� the�scores�for�each�subject�must�be�in�one�row�of�the�dataset,�and�each�level�of�the�repeated�fac- tor�is�a�separate�variable�(represented�by�the�columns)��For�example,�if�there�are�four�raters�who� assess�each�student’s�essay,�there�will�be�variables�for�each�rater�(e�g�,�rater�1�through�rater�4;� example�dataset�on�the�website)��In�this�illustration,�we�have�both�raw�scores�and�ranked�data� for�each�of�the�four�raters��When�using�ANOVA�for�repeated�measures,�we�will�apply�the�raw� scores��The�ranked�scores�will�only�be�of�value�when�computing�the�nonparametric�version�of� ANOVA�(i�e�,�the�Friedman�test)�which�will�be�covered�later�in�this�chapter� 516 An Introduction to Statistical Concepts For the repeated measures ANOVA, each row represents one person in our sample. Each column represents one level of the repeated measures factor. For this illustration, four raters assessed the writing essay of each person in the sample, thus there are four columns that represent the raw scores of each of the raters (Rater1_raw, Rater2_raw, etc.) and four scores that represent the ranked scores of each of the raters (Rater1_rank, Rater2_rank, etc.). Step 1:�To�conduct�a�one-factor�repeated�measures�ANOVA,�go�to�“Analyze”�in�the�top� pulldown� menu,� then� select�“General Linear Model,”� and� then� select�“Repeated Measures.”� Following� the� screenshot� (step� 1)� as� follows� produces� the� “Repeated Measures”�dialog�box� One-factor repeated measures ANOVA: Step 1 C B A Step 2:�The�“Repeated Measures Define Factor(s)”�dialog�box�will�appear�(see� screenshot�step�2)��In�the�box�under�“Within-Subject Factor Name,”�enter�the�name� you�wish�to�call�the�repeated�factor��For�this�illustration,�we�will�label�the�repeated�measure� “Rater.”�It�is�necessary�to�define�a�name�for�the�repeated�factor�as�there�is�no�single�vari- able�representing�this�factor�(recall�that�the�columns�in�the�dataset�represent�the�repeated� measures);�in�the�dataset,�there�is�one�variable�for�each�level�of�the�factor�(in�other�words,� one�variable�for�each�different�rater�or�measurement)��Again,�in�our�example,�there�are�four� levels�of�raters�(i�e�,�four�raters)�and�thus�four�variables��Thus,�we�name�the�within-subjects� 517Random- and Mixed-Effects Analysis of Variance Models factor�“Rater.”�The�“Number of Levels”�indicates�the�number�of�measurements�of� the�repeated�measure��In�this�example,�there�were�four�raters,�and,�thus,�the�“Number of Levels”�of�the�factor�is�4�(e�g�,�4)� One-factor repeated measures ANOVA: Step 2 Clicking on “Add” will move these choices into this area. Step 3:�After�we�have�defined�the�“Within-Subject Factor Name”�and�the�“Number of Levels,”�click�on�“Add”�to�move�this�information�into�the�middle�box��In�screenshot� step� 3,� we� see� our� newly� defined� repeated� measures� factor� (i�e�,� Rater)� with� “4”� indi- cating�that�there�are�four�levels:�Rater(4)��Finally,�click�on�“Define”�to�open�the�main� “Repeated Measures”�dialog�box� One-factor repeated measures ANOVA: Step 3 Now the choices are shown in the box. 518 An Introduction to Statistical Concepts Step 4a:�From�the�“Repeated Measures”�dialog�box�(see�screenshot�step�4a),�we�see�a�head- ing�called�“Within-Subjects Variables”�with�the�newly�defined�factor�rater�in�parenthe- ses��In�this�illustration,�the�values�of�1�through�4�represent�each�one�of�the�four�raters�that�we� just�defined�through�screenshot�step�3��Preceding�each�of�the�levels�of�the�repeated�factor�are� lines�with�question�marks��This�is�the�software’s�way�of�asking�us�to�define�which�variable� from�the�list�on�the�left�represents�the�first�measurement�(or�the�first�rater�in�our�illustration)� One-factor repeated measures ANOVA: Step 4a Clicking on “Plots” will allow you to generate profile plots. Clicking on “Save” will allow you to save various forms of residuals, among other variables. Clicking on “Options” will allow you to obtain a number of other statistics (e.g., descriptive statistics, effect size, power, homogeneity tests, and multiple comparison procedures). Step 4b:�Move�the�appropriate�variables�from�the�variable�list�on�the�left�into�the�“Within- Subjects Variables”�box�on�the�right��It�is�important�to�make�sure�that�the�first�measure- ment�is�matched�up�with�“1,”�the�second�measurement�is�matched�with�“2,”�and�so�forth�so� that�the�correct�order�of�repeated�measures�is�defined��This�is�especially�critical�when�there�is� some�temporal�order�to�the�repeated�measures�(e�g�,�pre-,�post-,�3�months�after�post-)� One-factor repeated measures ANOVA: Step 4b 519Random- and Mixed-Effects Analysis of Variance Models Step 5:�From�the�“Univariate”�dialog�box�(see�screenshot�step�4a),�clicking�on�“Options”� will� provide� the� option� to� select� such� information� as� “Descriptive Statistics,” “Estimates of effect size,” “Observed power,”�and�“Homogeneity tests.”�For� the�one-factor�repeated�measures�ANOVA,�the�“Options”�dialog�box�is�the�proper�place�to� obtain�post�hoc�MCPs�including�the�Tukey�LSD,�Bonferroni,�and�Sidak�procedures��Click�on� “Continue”�to�return�to�the�original�dialog�box� One-factor repeated measures ANOVA: Step 5 Select from the list on the left those variables that you wish to display means for and use the arrow to move them to the “Display Means for” box on the right. If you wish to conduct a post hoc test to determine where there are mean differences between the repeated measures, that selection must be made from this screen using the “Compare main effects” option, then selecting one of the three MCPs that are available from the toggle menu under “Confidence interval adjustment” (i.e., LSD, Bonferroni, or Sidak). Step 6:�From�the�“Univariate”�dialog�box�(see�screenshot�step�4a),�click�on�“Plots”�to� obtain�a�profile�plot�of�means��Click�the�repeated�measure�factor�(e�g�,�“Rater”)�and�move� it�into�the�“Horizontal Axis”�box�by�clicking�the�arrow�button�(see�screenshot�step�6a)�� Then�click�on�“Add”�to�move�the�variable�into�the�“Plots”�box�at�the�bottom�of�the�dialog� box�(see�screenshot�step�6b)��Click�on�“Continue”�to�return�to�the�original�dialog�box� One-factor repeated measures ANOVA: Step 6a Select the repeated measures factor from the list on the left and use the arrow to move it to the “Horizontal Axis” box on the right. 520 An Introduction to Statistical Concepts One-factor repeated measures ANOVA: Step 6b �en click “Add” to move the variable into the “Plots” box at the bottom. Step 7:�From�the�“Univariate”�dialog�box�(see�screenshot�step�4a),�click�on�“Save”�to� select�those�elements�that�you�want�to�save�(in�our�case,�we�want�to�save�the�unstandard- ized� residuals� which� will� be� used� later� to� examine� the� extent� to� which� normality� and� independence�are�met)��To�do�this,�place�a�checkmark�next�to�“Unstandardized.”�Click� “Continue”�to�return�to�the�main�“Univariate”�dialog�box�and�then�click�on�“Ok”�to� return�to�generate�the�output� One-factor repeated measures ANOVA: Step 7 Interpreting the output:�Annotated�results�are�presented�in�Table�15�14� 521Random- and Mixed-Effects Analysis of Variance Models Table 15.14 One-Factor�Repeated�Measures�ANOVA�SPSS�Results�for�the�Writing�Assessment�Example Descriptive Statistics Mean Std. Deviation N Rater1_raw 8 The table labeled “Descriptive Statistics” provides basic descriptive statistics (means, standard deviations, and sample sizes) for each group of the repeated measure. Rater2_raw 8 Rater3_raw 8 Rater4_raw 2.7500 3.6250 6.2500 9.1250 1.48805 .91613 1.03510 .83452 8 Multivariate Testsa Effect Value F Hypothesis df Error df Sig. Partial Eta Squared Noncent. Parameter Observed Powerb Pillai's trace 3.000 5.000 .967 145.949 1.000 Wilks’ lambda 3.000 5.000 .967 145.949 1.000 Hotelling's trace 3.000 5.000 .967 145.949 1.000 Rater Roy's largest root .967 .033 29.190 29.190 48.650c 48.650c 48.650c 48.650c 3.000 5.000 .000 .000 .000 .000 .967 145.949 1.000 c Exact statistic. b Computed using alpha = .05. a Design: intercept. Within-subjects design: rater. The table labeled “Multivariate Tests” provides results for the multivariate test of mean differences between the repeated measures. Multivariate tests are provided when there are three or more levels of the within-subjects factor. These results are generally more conser- vative than the univariate results (in other words, you may be less likely to find statistically significant multivariate results as compared to univariate results). Note that the multivariate tests do not require meeting the assumption of sphericity. Thus if the assumption of sphericity is met, reporting univariate results is recommended. If results for the multivariate tests are reported, of the four test results, Wilks’ lambda is reco- mmended. In this example, all four multivariate criteria produce the same results—specifically that there is a statistically significant multivariate mean difference (as noted by p less than α.) Mauchly's Test of SphericityaMeasure: MEASURE_1 EpsilonbWithin- Subjects Effect Mauchly's W Approx. Chi- Square df Sig. Greenhouse Geisser– Huynh– Feldt Lower Bound Rater .155 10.679 5 .062 .476 .564 .333 Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is proportional to an identity matrix. b May be used to adjust the degrees of freedom for the averaged tests of significance. Corrected tests are displayed in the tests of within-subjects effects table. a Design: intercept. Within-subjects design: rater. “Mauchly’s Test of Sphericity” can be reviewed to determine if the assumption of sphericity is met. If the p value is larger than α (as in this illustration), we have met the assumption of sphericity. “Epsilon” is a gauge of differences in the variances of the repeated measures and is used to adjust the degrees of freedom when sphericity is violated. �e closer the epsilon value is to 1.0, the more homogenous are the variances. Complete heterogeneity of variances is speci- fied by the “Lower bound” and is computed as 1/(K – 1) where K is the number of within subjects factors. For this example, with four raters, the lower bound is 1/(4 – 1) or .333. (continued) 522 An Introduction to Statistical Concepts Table 15.14 (continued) One-Factor�Repeated�Measures�ANOVA�SPSS�Results�for�the�Writing�Assessment�Example Tests of within-Subjects EffectsMeasure: MEASURE_1 Source Type III Sum of Squares df Mean Square F Sig. Partial Eta Squared Noncent. Parameter Observed Powera Sphericity assumed Greenhouse Geisser– Huynh–Feldt Rater Lower-bound 198.125 198.125 198.125 198.125 3 1.428 1.691 1.000 66.042 138.760 117.163 198.125 73.477 73.477 73.477 73.477 .000 .000 .000 .000 .913 .913 .913 .913 220.430 104.912 124.250 73.477 1.000 1.000 1.000 1.000 Sphericity assumed 18.875 21 .899 Greenhouse Geisser– 18.875 9.995 1.888 Huynh–Feldt 18.875 11.837 1.595 Error (rater) Lower-bound 18.875 7.000 2.696 a Computed using alpha = .05. Since we met the assumption of sphericity, we use the results from the row labeled “sphericity assumed.” Error sum of squares indicates how much variability is unexplained across the conditions of the repeated measures. (J – 1) (N – 1) = (4 – 1) (8 – 1) = 21 Rater df is computed as (J – 1) = 4 – 1 = 3 Had we violated the assumption of sphericity, we would have wanted to use a different set of results (e.g., Geisser–Greenhouse, Huynh–Feldt, Lower-bound). Notice that in all four sets of results, the sum of squares is the same value, however the degrees of freedom differs for each. The F ratio is computed the same for each (i.e., MSrater/MSerror). Of the three results that can be used when sphericity is violated, the Lower- bound is the most conservative, followed by Geisser–Greenhouse (use when epsilon is ≤ .75) and then Huynh–Feldt (use when .75 < epsilon < 1.0). Comparing p to α, we find a statistically significant difference in the mean ratings. This is an omnibus test. We will look at our MCP to determine which mean ratings differ. Partial eta squared is one measure of effect size: 198.125 198.125 + 18.875 η2 = η2 = = .913 SSbetw SSbetw + SSerror We can interpret this to say that approximately 91% of the variation in the rating is accounted for by the differences in the raters. Observed power tells whether our test is powerful enough to detect mean differences if they really exist. Power of 1.000 indicates maximum power, the probability of rejecting the null hypothesis if it is really false is 1.00. Error df is computed as 523Random- and Mixed-Effects Analysis of Variance Models Table 15.14 (continued) One-Factor�Repeated�Measures�ANOVA�SPSS�Results�for�the�Writing�Assessment�Example Tests of within-Subjects ContrastsMeasure: MEASURE_1 Source Rater Type III Sum of Squares df Mean Square F Sig. Partial Eta Squared Noncent. Parameter Observed Powera Linear Quadratic Rater Cubic 103.685 18.667 2.032 .000 .003 .197 .937 .727 .225 103.685 18.667 2.032 1.000 .957 .235 Linear Quadratic Error (rater) Cubic 189.225 8.000 .900 12.775 3.000 3.100 1 1 1 7 7 7 189.225 8.000 .900 1.825 .429 .443 a Computed using alpha = .05. Tests of between-Subjects Effects Measure: MEASURE_1 Transformed Variable: Average Source Type III Sum of Squares df Mean Square F Sig. Partial Eta Squared Noncent. Parameter Observed Powera Intercept 445.235 .000 .985 445.235 1.000 Error 946.125 14.875 1 7 946.125 2.125 a Computed using alpha = .05. The output from the “Tests of within-Subjects Contrasts” will not be used. Polynomial contrasts do not make sense for the rater factor. The output from the “Tests of between-Subjects Effects” will not be used as there is no between-subjects factor. Estimated Marginal Means 1. Grand Mean Measure: MEASURE_1 95% Confidence Interval Mean Std. Error Lower Bound Upper Bound 5.438 .258 4.828 6.047 2. Rater EstimatesMeasure: MEASURE_1 95% Confidence Interval Rater Mean Std. Error Lower Bound Upper Bound 1 2 3 4 2.750 3.625 6.250 9.125 .526 .324 .366 .295 1.506 2.859 5.385 8.427 3.994 4.391 7.115 9.823 The “Grand Mean” (in this case, 5.438) represents the overall mean, regardless of the rater. The 95% CI represents the CI of the grand mean. The table labeled “Rater” provides descriptive statistics for each of the four raters. In addition to means, the SE and 95% CI of the means are reported. (continued) 524 An Introduction to Statistical Concepts Table 15.14 (continued) One-Factor�Repeated�Measures�ANOVA�SPSS�Results�for�the�Writing�Assessment�Example Pairwise ComparisonsMeasure: MEASURE_1 95% Confidence Interval for Differencea (I) Rater (J) Rater Mean Difference (I–J) Std. Error Sig.a Lower Bound Upper Bound 2 –.875 –1.948 .198 3 –3.500* –4.472 –2.528 1 4 –6.375* –8.940 –3.810 1 .875 –.198 1.948 3 –2.625 * –3.581 –1.669 2 4 –5.500 * –7.561 –3.439 1 3.500* 2.528 4.472 2 2.625* 1.669 3.581 3 4 –2.875* –4.871 –.879 1 6.375* 3.810 8.940 2 5.500* 3.439 7.561 4 3 2.875* .295 .267 .706 .295 .263 .567 .267 .263 .549 .706 .567 .549 .126 .000 .000 .126 .000 .000 .000 .000 .007 .000 .000 .007 .879 4.871 Based on estimated marginal means. a Adjustment for multiple comparisons: Bonferroni. *The mean difference is significant at the .05 level. “Mean Difference” is simply the difference between the means of the two raters being compared. For example, the mean difference of rater 1 and rater 2 is calculated as 2.750 – 3.625 = –.875. “Sig.” denotes the observed p value and provides the results of the Bonferroni post hoc procedure. �ere is a statistically significant mean difference between: 1. Rater 1 and rater 3 2. Rater 1 and rater 4 3. Rater 2 and rater 3 4. Rater 2 and rater 4 5. Rater 3 and rater 4 �e only groups for which there is not a statistically significant mean difference is between raters 1 and 2. Note there are redundant results presented in the table. The comparison of rater 1 and 2 (presented in results for rater 1) is the same as the comparison of rater 2 and 1 (presented in results for rater 2) and so forth. Friedman Test: Nonparametric One-Factor Repeated Measures ANOVA Step 1:� The� nonparametric� version� of� the� repeated� measures� ANOVA� is� the� Friedman� test��To�compute�the�Friedman�test,�go�to�“Analyze”�in�the�top�pulldown�menu�and�then� select�“Nonparametric Tests,”�then�“Legacy Dialogs,”�and�then�finally�“K Related Samples.”�Following�the�screenshot�(step�1)�as�follows�produces�the�“Tests for Several Related Samples”�dialog�box� 525Random- and Mixed-Effects Analysis of Variance Models A B C D Friedman’s test: Step 1 Step 2:�Recall�that�the�Friedman�test�operates�using�ranked�data,�not�continuous�raw�scores� as�with�the�repeated�measures�ANOVA;�thus,�we�will�work�with�the�ranked�variables�in�our� dataset�for�this�test��From�the�“Tests for Several Related Samples”�dialog�box,�click� the�variables�representing�the�ranked levels�of�the�repeated�factor�into�the�“Test Variables”� box�by�using�the�arrow�key�in�the�middle�of�the�dialog�box��Under�“Test Type”�at�the�bottom� left,�check “Friedman.”�Then�click�on�“Ok”�to�return�to�generate�the�output� Select the ranked repeated measures from the list on the left and use the arrow to move them to the “Test Variables” box on the right. Friedman’s test: Step 1 Interpreting the output:�Annotated�results�are�presented�in�Table�15�15� 526 An Introduction to Statistical Concepts Table 15.15 Friedman’s�Test�SPSS�Results�for�the�Writing�Assessment�Example Ranks Mean Rank Rater1_rank Rater2_rank Rater3_rank Rater4_rank 1.13 1.88 3.00 4.00 Test Statisticsa N 8 Chi-Square 22.950 df 3 Asymp. Sig. .000 a Friedman test. The table labeled “Ranks” provides the average rank for each of the repeated measures levels. The table labeled “Test Statistics” provides the results for the hypothesis test of the difference in the mean ranks. Since p is less than α, this tells us there is a statistically significant difference in the mean ranks of the raters. Two-Factor Split-Plot ANOVA To�conduct�the�two-factor�split-plot�ANOVA,�the�dataset�must�include�variables�for�each� level�of�the�repeated�factor�(as�in�the�one-factor�repeated�measures�ANOVA)�and�another� variable�for�the�nonrepeated�factor��Here�our�repeated�measures�or�within-subjects�factor� is�reflected�in�the�raw�scores�of�the�four�raters,�and�the�nonrepeated�or�between-subjects� factor�is�the�instructor� The repeated measures or within-subjects factor is labeled “Rater” where there are four different raters, each reflected in the score they assigned to each of the eight participants. (We will use the raw scores of the raters for the two-factor split-plot ANOVA.) The nonrepeated or between- subjects factor is labeled “Instructor” where each value represents the instructor to which the students were randomly assigned. Four students were randomly assigned to instructor 1 and four were randomly assigned to instructor 2. 527Random- and Mixed-Effects Analysis of Variance Models Step 1:�To�conduct�a�two-factor�split-plot�ANOVA,�go�to “Analyze”�in�the�top�pulldown� menu,�then�select�“General Linear Model,”�and�then�select�“Repeated Measures.”� This�will�produce�the�“Repeated Measures”�dialog�box��This�step�has�been�presented� previously�(see�screenshot�step�1�for�the�one-factor�repeated�measures�design)�and�will�not� be�reiterated�here� Step 2: The�“Repeated Measures Define Factor(s)”�dialog�box�will�appear�(see� screenshot�step�2�for�the�one-factor�repeated�measures�design�presented�previously)��In� the� box� under�“Within-Subjects Factor Name,”� enter� the� name� you� wish� to� call� the�repeated�factor��For�this�example,�we�label�the�repeated�factor “Rater.”�It�is�neces- sary�to�define�a�name�for�the�repeated�factor�as�there�is�no�single�variable�representing� this� factor� (recall� that� the� columns� in� the� dataset� represent� the� repeated� measures);� in� the�dataset,�there�is�one�variable�for�each�level�of�the�factor�(in�other�words,�one�variable� for� each� different� rater� or� measurement)�� Again,� in� our� example,� there� are� four� levels� of�rater�(i�e�,�four�raters)�and�thus�four�variables��Let�us�name�the�within-subjects�factor “Rater.”� The�“Number of Levels”� indicates� the� number� of� measurements� of� the� repeated�factor��Here�there�were�four�raters,�and,�thus,�the�“Number of Levels”�of�the� factor�is�4� Step 3:�After�defining�the�“Within-Subjects Factor Name”�and�the�“Number of Levels,”�then�click�on�“Add”�to�move�this�information�into�the�middle�box��In�screen- shot�step�3�for�the�one-factor�repeated�measures�design�presented�previously,�we�see�our� newly�defined�repeated�factor�(i�e�,�Rater)�with�“4”�indicating�it�was�measured�by�four� raters:� Rater(4).� Finally,� click� on� “Define”� to� open� the� main� “Repeated� Measures”� dialog�box� Step 4a:�From�the�“Repeated Measures”�dialog�box�(see�screenshot�steps�4a�and�b�for� the� one-factor� repeated� measures� design� presented� previously),� we� see� a� heading� called� “Within-Subjects Variables”� with� the� newly� defined� factor� rater� in� parentheses�� Here�the�values�of�1�through�4�represent�each�one�of�the�four�raters��Preceding�each�of�the� levels�of�the�repeated�factor�are�lines�with�question�marks��This�is�the�software’s�way�of� asking�us�to�define�which� variable� represents�the�first�measurement�(or�the�first�rater�in� our�illustration)� Step 4b:� Move� the� appropriate� variables� from� the� variable� list� on� the� left� into� the� “Within-Subjects Variables”�box�on�the�right��It�is�important�to�make�sure�that�the� first�measurement�is�matched�up�with�“1,”�the�second�measurement�is�matched�with�“2,” and�so�forth�so�that�the�correct�order�of�repeated�measures�is�defined� Step 5:� Once� the� “Within-Subjects Variables”� are� defined,� the� next� step� is� to� define�the�between-subjects�or�nonrepeated�factor,�as�we�see�in�screenshot�step�5�that�fol- lows��Move�the�appropriate�variable�from�the�variable�list�on�the�left�into�the�“Between- Subjects Factors”�box�on�the�right��From�this�point,�the�options�and�selections�work� as�we�have�seen�when�conducting�other�ANOVA�models� 528 An Introduction to Statistical Concepts Clicking on “Plots” will allow you to generate profile plots. Clicking on “Save” will allow you to save various forms of residuals, among other variables. Clicking on “Options” will allow you to obtain a number of other statistics (e.g., descriptive statistics, effect size, power, homogeneity tests). Select the nonrepeated factor from the list on the left and use the arrow to move it to the “Between- Subjects Factors(s)” box on the right. Two-factor split-plot ANOVA: Step 5 Step 6:�From�the�“Repeated Measures”�dialog�box,�clicking�on�“Options”�will�pro- vide�the�option�to�select�such�information�as�“Descriptive Statistics,” “Estimates of effect size,” “Observed power,”�and�“Homogeneity tests”�(see�screenshot� step�6)��For�the�two-factor�split-plot�ANOVA,�the�“Options”�dialog�box�is�the�proper�place� to�obtain�post�hoc�MCPs�for�the�repeated measure��Post�hoc�procedures�include�the�Tukey�LSD,� Bonferroni,�and�Sidak�procedures��Click�on�“Continue”�to�return�to�the�original�dialog�box� If you wish to conduct a post hoc test to determine where there are mean differences between the repeated measures, that selection must be made from this screen using the “Compare main effects” option, then select one of the three MCPs that are available from the toggle menu under “Confidence interval adjustment” (i.e., LSD, Bonferroni, or Sidak). Select from the list on the left those variables that you wish to display means for and use the arrow to move them to the “Display means for” box on the right. Two-factor split-plot ANOVA: Step 6 529Random- and Mixed-Effects Analysis of Variance Models Step 7:� Click� on� the� name� of� the� nonrepeated� or� between-subjects� factor� in� the “Factor(s)” list�box�in�the�top�left�and�move�it�to�the�“Post Hoc Tests for”�box�in� the�top�right�by�clicking�on�the�arrow�key��Check�an�appropriate�MCP�for�your�situation�by� placing�a�checkmark�in�the�box�next�to�the�desired�MCP��In�this�example,�we�select�Tukey� (see�screenshot�step�7)��Click�on�“Continue”�to�return�to�the�original�dialog�box� MCPs for instances when the homogeneity of variance assumption is met. MCPs for instances when the homogeneity of variance assumption is not met. Select the fixed factor of interest from the list on the left and use the arrow to move it to the “Post Hoc Tests for” box on the right. Two-factor split-plot ANOVA: Step 7 Step 8:� From� the�“Repeated Measures”� dialog� box,� click� on “Plots”� to� obtain� a� profile�plot�of�means��Click�one�independent�variable�(e�g�,�“Rater”)�and�move�it�into�the� “Horizontal Axis”�box�by�clicking�the�arrow�button��Then�click�the�other�independent� variable� (e�g�,� instructor)� and� move� it� into� the�“Separate Lines”� box� by� clicking� the� arrow�button��Then�click�on�“Add”�to�move�this�into�the�“Plots”�box�at�the�bottom�of� the�dialog�box�(see�screenshot�steps�8a�and�b)��Click�on�“Continue”�to�return�to�the�original� dialog�box��(Tip: Placing the factor that has the most categories or levels on the horizontal axis of the profile plot will make for easier interpretation of the graph. In this case, there were four raters and two instructors; thus, we placed “rater” on the horizontal axis�) Select the factor with the most levels from the list on the left and use the arrow to move to the “Horizontal Axis” box on the right. Repeat these steps to move the other factor into the box for “Separate Lines.” Two-factor split-plot ANOVA: Step 8a 530 An Introduction to Statistical Concepts Then click “Add” to move the variables into the “Plots” box at the bottom. Two-factor split-plot ANOVA: Step 8b Step 9:�From�the�“Repeated Measures” dialog�box,�click�on�“Save”�to�select�those� elements�that�you�want�to�save�(here�we�want�to�save�the�unstandardized�residuals�which� will�be�used�later�to�examine�the�extent�to�which�normality�and�independence�are�met)��To� do�this,�place�a�checkmark�next�to�“Unstandardized.”�Click�“Continue”�to�return�to� the�main�“Repeated Measures”�dialog�box��From�the�“Repeated Measures”�dialog� box,�click�on “Ok”�to�generate�the�output� Two-factor split-plot ANOVA: Step 9 Interpreting the output:�Annotated�results�are�presented�in�Table�15�16� 531Random- and Mixed-Effects Analysis of Variance Models Table 15.16 Two-Factor�Split-Plot�ANOVA�SPSS�Results�for�the�Writing�Assessment�Example Within-Subjects Factors Measure: MEASURE_1 Rater Dependent Variable 1 Rater1_raw 2 Rater2_raw 3 Rater3_raw 4 Rater4_raw Between-Subjects Factors Value Label N 1.00 Instructor 1 4Instructor 2.00 Instructor 2 4 Descriptive Statistics Instructor Mean Std. Deviation N Instructor 1 3.7500 1.50000 4 Instructor 2 1.7500 .50000 4 Rater 1 raw score Total 2.7500 1.48805 8 Instructor 1 4.2500 .50000 4 Instructor 2 3.0000 .81650 4 Rater 2 raw score Total 3.6250 .91613 8 Instructor 1 7.0000 .81650 4 Instructor 2 5.5000 .57735 4 Rater 3 raw score Total 6.2500 1.03510 8 Instructor 1 8.5000 .57735 4 Instructor 2 9.7500 .50000 4 Rater 4 raw score Total 9.1250 .83452 8 The table labeled “Within-Subjects Factors” lists the variable names for levels of the repeated factor. The table labeled “Between- Subjects Factors” lists the names and sample sizes for the levels of the nonrepeated factor. The table labeled “Descriptive Statistics” lists the means, standard deviations, and sample sizes for each of the between-subjects factors (i.e., instructors) by each of the repeated measures (i.e., raters). (continued) 532 An Introduction to Statistical Concepts Table 15.16 (continued) Two-Factor�Split-Plot�ANOVA�SPSS�Results�for�the�Writing�Assessment�Example Multivariate Testsa Effect Value F Hypothesis df Error df Sig. Partial Eta Squared Noncent. Parameter Observed Powerb Pillai's trace .983 74.892c 3.000 4.000 .001 .983 224.677 1.000 Wilks' lambda .017 74.892c 3.000 4.000 .001 .983 224.677 1.000 Hotelling's trace 56.169 74.892c 3.000 4.000 .001 .983 224.677 1.000 Rater Roy's largest root 56.169 74.892c 3.000 4.000 .001 .983 224.677 1.000 Pillai's trace .899 11.925c 3.000 4.000 .018 .899 35.774 .860 Wilks' lambda .101 11.925c 3.000 4.000 .018 .899 35.774 .860 Hotelling's trace 8.944 11.925c 3.000 4.000 .018 .899 35.774 .860 Rater* instructor Roy's largest root 8.944 11.925c 3.000 4.000 .018 .899 35.774 .860 c Exact statistic. b Computed using alpha = .05. a Design: intercept + instructor Within-subjects design: rater. The table labeled “Multivariate Tests” provides results for the multivariate test of mean differences for the repeated measures factor (i.e., “Rater”), and for the between-by within-subjects interaction (i.e., “Rater*Instructor”). Multivariate tests are provided when there are three or more levels of the within-subjects factor. These results are generally more conservative than the univariate results (in other words, you may be less likely to find statistically significant multivariate results as compared to univariate results). Note that the multivariate tests do not require meeting the assumption of sphericity. Thus if the assumption of sphericity is met, reporting univariate results is recommended. If results for the multivariate tests are reported, of the four test criteria, Wilks’ lambda is recommended. In this example, all four multivariate criteria produce the same results—specifically that there is a statistically significant multivariate mean difference for the repeated measures factor and a statistically significant between- by within-subjects interaction (as noted by p less than α). Mauchly's Test of SphericityaMeasure: MEASURE_1 Epsilonb Within Subject Effects Mauchly's W Approx. Chi-Square df Sig. Geisser– Greenhouse Huynh–Feldt Lower bound Rater .429 4.001 5 .557 .706 1.000 .333 Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is proportional to an identity matrix. b May be used to adjust the degrees of freedom for the averaged tests of significance. Corrected tests are displayed in the Tests of within-Subjects Effects table. a Design: intercept + instructor “Mauchly’s Test of Sphericity” can be reviewed to determine if the assumption of sphericity is met. If the p value is larger than α (as in this illustration), we have met the assumption of sphericity. “Epsilon” is a gauge of differences in the variances of the repeated measures. �e closer the epsilon value is to 1.0, the more homogenous are the variances. Complete heterogeneity of variances is specified by the “Lower bound” and is computed as 1/(K–1) where K is the number of within-subjects levels. For this example, with four raters, the lower bound is 1/(4–1) or .333. Within-subjects design: rater. 533Random- and Mixed-Effects Analysis of Variance Models Table 15.16 (continued) Two-Factor�Split-Plot�ANOVA�SPSS�Results�for�the�Writing�Assessment�Example Measure: MEAS URE_1 Source Type III Sum of Squares df Mean Square F Sig. Partial Eta Squared Noncent. Parameter Observed Powera Sphericity assumed 3 66.042 190.200 .000 .969 570.600 1.000 Geisser– Greenhouse 93.515 190.200 .000 .969 402.966 1.000 Huynh–Feldt 66.042 190.200 .000 .969 570.600 1.000 Rater Lower bound 198.125 190.200 .000 .969 190.200 1.000 Sphericity assumed 3 4.208 12.120 .000 .669 36.360 .998 Geisser– Greenhouse 5.959 12.120 .001 .669 25.678 .983 Huynh–Feldt 4.208 12.120 .000 .669 36.360 .998 Rater* instructor Lower bound 12.625 12.120 .013 .669 12.120 .825 Sphericity assumed 18 .347 Geisser– Greenhouse .492 .347 Error (rater) Lower bound 198.125 198.125 198.125 198.125 12.625 12.625 12.625 12.625 6.250 6.250 6.250 6.250 1.042 Since we met the assumption of sphericity, we use the results from the row labeled “sphericity assumed.” Error sum of squares indicates how much variability is unexplained across the conditions of the repeated measures. Within*Between interaction df is computed as (K – 1)( J – 1) = (4 – 1)(2 – 1) = 3 Error df is computed as (J)(K – 1)(n – 1) = 2(4 – 1)(4 – 1) = 18 Rater df is computed as (K – 1) = 4 – 1 = 3 Had we violated the assumption of sphericity, we would have wanted to use a different set of results (e.g., Geisser–Greenhouse, Huynh– Feldt, Lower bound). Notice that in all four sets of results, the sum of squares is the same value, however the degrees of freedom differs for each. The F ratio is computed the same for each. Of the three results that can be used when sphericity is violated, the Lower bound is the most conservative, followed by Geisser-Greenhouse and then Huynh-Feldt. Comparing p to α, we find a statistically significant difference in the raters and a statistically significant rater by instructor interaction. �ese are omnibus tests. We will look at our MCPs to determine which raters differ and which differ by instructor. Partial eta squared is one measure of effect size: η2 SSbetw SSbetw + SSerror = η2 198.125 = .969 198.125 + 6.250 = We can interpret this to say that approximately 97% of the variation in the ratings is accounted for by the differences in the raters. Observed power tells whether our test is powerful enough to detect mean differences if they really exist. Power of 1.000 indicates maximum power, the probability of rejecting the null hypothesis if it is really false is 1.00. Power of .998 is only slightly below maximum power of 1.00; this is extremely strong power. The table labeled “Tests of within- Subjects Effects” provides results for the univariate test of mean differences for the within-subjects factor (i.e., “rater”) and within-between subjects interaction (i.e., “rater*instructor”). Tests of within-Subjects Effects 2.119 3.000 1.000 2.1 19 1.000 6.000 3.000 12.712 18.000 a Computed using alpha = .05. Huynh–Feldt (continued) 534 An Introduction to Statistical Concepts Table 15.16 (continued) Two-Factor�Split-Plot�ANOVA�SPSS�Results�for�the�Writing�Assessment�Example Tests of within-Subjects ContrastsMeasure: MEASURE_1 Source Rater Type III Sum of Squares df Mean Square F Sig. Partial Eta Squared Noncent. Parameter Observed Powera Linear 189.225 1 189.225 302.760 .000 .981 302.760 1.000 Quadratic 8.000 1 8.000 48.000 .000 .889 48.000 1.000 Rater Cubic .900 1 .900 3.600 .107 .375 3.600 .359 Linear 9.025 1 9.025 14.440 .009 .706 14.440 .883 Quadratic 2.000 1 2.000 12.000 .013 .667 12.000 .821 Rater* instructor Cubic 1.600 1 1.600 6.400 .045 .516 6.400 .563 Linear 3.750 6 .625 Quadratic 1.000 6 .167 Error(rater) Cubic 1.500 6 .250 a Computed using alpha = .05. Levene's Test of Equality of Error Variancesa F df 1 df 2 Sig. Rater 1 raw score 3.600 1 6 Rater 2 raw score .158 1 6 Rater 3 raw score .000 1 6 Rater 4 raw score 1.000 1 6 .107 .705 1.000 .356 Tests the null hypothesis that the error variance of the dependent variable is equal across groups. The F test (and associated p values) for Levene’s Test for Equality of Error Variances is reviewed to determine if equal variances can be assumed. In this case, we meet the assumption (as p is greater than α). Note that df 1 is degrees of freedom for the numerator (calculated as J – 1 and df 2 are the degrees of freedom for the denominator (calculated as N – J ). The output from the “Tests of within-Subjects Contrasts” will not be used as polynomial contrasts do not make sense here. a Design: intercept + instructor Within-subjects design: rater. 535Random- and Mixed-Effects Analysis of Variance Models Table 15.16 (continued) Two-Factor�Split-Plot�ANOVA�SPSS�Results�for�the�Writing�Assessment�Example Type III Sum of Squares df Mean Square F Sig. Partial Eta Squared Noncent. Parameter Observed Powera Intercept 946.125 1 946.125 648.771 .000 .991 648.771 1.000 Instructor 6.125 1 6.125 4.200 .086 .412 4.200 .407 Error 8.750 6 1.458 a Computed using alpha = .05 Estimated Marginal Means 1. Grand Mean Measure: MEASURE_1 95% Confidence Interval Mean Std. Error Lower Bound Upper Bound 5.438 .213 4.915 5.960 2. Rater Estimates Measure: MEASURE_1 95% Confidence Interval Rater Mean Std. Error Lower Bound Upper Bound 1 2.750 .395 1.783 3.717 2 3.625 .239 3.039 4.211 3 6.250 .250 5.638 6.862 4 9.125 .191 8.658 9.592 The “Grand Mean” (in this case, 5.438) represents the overall mean, regardless of the rater or instructor. The 95% CI represents the CI of the grand mean. The table labeled “Rater” provides descriptive statistics for each of the four raters. In addition to means, the SE and 95% CI of the means are reported. The table labeled “Tests of between-Subjects Effects” provides results for the univariate test of mean differences for the between-subjects factor (i.e., “instructor”). Instructor df is computed as (J – 1)= 2 – 1=1 Comparing p to α, we do not find a statistically significant difference in the mean ratings by instructor. These are omnibus tests. We look at MCPs to determine which mean ratings differ by instructor. Partial eta squared is one measure of effect size: We can interpret this to say that approximately 41% of the variation in the ratings is accounted for by the differences in the instructors. Observed power tells whether our test is powerful enough to detect mean differences if they really exist. Power of .407 indicates low power; the probability of rejecting the null hypothesis if it is really false is about .41. η2 SSbetw SSbetw + SSerror = η2 6.125 = .412 6.125 + 8.750 = Tests of between-Subjects Effects Measure: MEASURE_1 Source Transformed Variable: Average (continued) 536 An Introduction to Statistical Concepts Table 15.16 (continued) Two-Factor�Split-Plot�ANOVA�SPSS�Results�for�the�Writing�Assessment�Example Pairwise Comparisons Measure: MEASURE_1 95% Confidence Interval for Differencea (I) Rater (J) Rater Mean Difference (I – J) Std. Error Sig.a Lower Bound Upper Bound 2 –.875 .280 .122 –1.955 .205 3 –3.500* .270 .000 –4.543 –2.457 1 4 –6.375* .375 .000 –7.824 –4.926 1 .875 .280 .122 –.205 1.955 3 –2.625* .280 .000 –3.705 –1.545 2 4 –5.500* .339 .000 –6.808 –4.192 1 3.500* .270 .000 2.457 4.543 2 2.625* .280 .000 1.545 3.705 3 4 –2.875* .191 .000 –3.613 –2.137 1 6.375* .375 .000 4.926 7.824 2 5.500* .339 .000 4.192 6.808 4 3 2.875* .191 .000 2.137 3.613 Based on estimated marginal means. a Adjustment for multiple comparisons: Bonferroni. *The mean difference is significant at the .05 level. “Mean Difference” is simply the difference between the means of the two raters being compared. For example, the mean difference of rater 1 and rater 2 is calculated as 2.750 – 3.625 = –.875. “Sig.” denotes the observed p value and provides the results of the Bonferroni post hoc procedure. There is a statistically significant mean difference in ratings of writing between: 1. Rater 1 and rater 3 2. Rater 1 and rater 4 3. Rater 2 and rater 3 4. Rater 2 and rater 4 5. Rater 3 and rater 4 The only groups for which there is not a statistically significant mean difference is raters 1 and 2. Note there are redundant results presented in the table. �e comparison of rater 1 and 2 (presented in results for rater 1) is the same as the comparison of rater 2 and 1 (presented in results for rater 2) and so forth. 537Random- and Mixed-Effects Analysis of Variance Models Table 15.16 (continued) Two-Factor�Split-Plot�ANOVA�SPSS�Results�for�the�Writing�Assessment�Example 3. Instructor Estimates Measure: MEASURE_1 95% Confidence Interval Instructor Mean Std. Error Lower Bound Upper Bound 5.875 .302 5.136Instructor 1 Instructor 2 5.000 .302 4.261 6.614 5.739 Pairwise ComparisonsMeasure: MEASURE_1 95% Confidence Interval for Differencea (I) Instructor (J) Instructor Mean Difference (I – J) Std. Error Sig.a Lower Bound Upper Bound Instructor 1 Instructor 2 .427 Instructor 2 Instructor 1 .875 –.875 .427 –.170 –1.920 1.920 .170 Based on estimated marginal means. a Adjustment for multiple comparisons: Bonferroni. The table for “Instructor” provides descriptive statistics for each of the levels of our between-subjects factor. In addition to means, the SE and 95% CI of the means are reported. “Mean difference” is simply the difference between the means of the two categories of our between-subjects factor. For example, the mean difference of instructor 1 and instructor 2 is calculated as 5.875 – 5.000 = .875 .086 .086 “Sig.” denotes the observed p value and provides the results of the Bonferroni post hoc procedure. There is not a statistically significant mean difference in ratings between instructor 1 and 2. Note there are redundant results presented in the table. The comparison of instructor 1 and 2 (presented in the first row) is the same as the comparison of instructor 2 and 1 (presented in the second row). Univariate TestsMeasure: MEASURE_1 Sum of Squares df Mean Square F Sig. Partial Eta Squared Noncent. Parameter Observed Powera Contrast 1.531 1 1.531 4.200 .086 .412 4.200 .407 Error 2.188 6 .365 The F tests the effect of instructor. This test is based on the linearly independent pairwise comparisons among the estimated marginal means. a Computed using alpha = .05. The contrast output from the “Univariate Tests” will not be used here. (continued) 538 An Introduction to Statistical Concepts Table 15.16 (continued) Two-Factor�Split-Plot�ANOVA�SPSS�Results�for�the�Writing�Assessment�Example 4. Instructor *Rater Measure: MEASURE_1 95% Confidence Interval Instructor Rater Mean Std. Error Lower Bound Upper Bound 1 2 3 Instructor 1 4 1 2 3 Instructor 2 4 3.750 4.250 7.000 8.500 1.750 3.000 5.500 9.750 .559 .339 .354 .270 .559 .339 .354 .270 2.382 3.422 6.135 7.839 .382 2.172 4.635 9.089 5.118 5.078 7.865 9.161 3.118 3.828 6.365 10.411 The table for “Instructor*Rater” provides descriptive statistics for each of the combinations of instructor by rater (or cell). In addition to means, the SE and 95% CI of the means are reported. The “Profile Plot” is a graph of the means for each combination of instructor by rater (or cell). We see the ratings follow a similar pattern. Three of the four raters provided a lower mean rating for writing for instructor 2 (as compared to instructor 1). 10.00 8.00 6.00 4.00 2.00 .00 1 2 3 Rater 4 Es tim at ed m ar gi na l m ea ns Estimated marginal means of MEASURE_1 Instructor Instructor 1 Instructor 2 Examining Data for Assumptions for Two-Factor Split-Plot ANOVA Normality We�use�the�residuals�(which�we�requested�and�created�through�the�“Save”�option�when� generating� our� two-factor� split-plot� ANOVA)� to� examine� the� extent� to� which� normality� was�met� 539Random- and Mixed-Effects Analysis of Variance Models �e residuals are computed by subtracting the cell mean from each observation. For example, the mean rating on writing for students assigned to instructor 1 and rated by rater 1 was 3.75. Person 1 was rated a “3” on writing by rater 1. �us the residual for person 1 is We see four new variables have been added to the dataset labeled RES_1, RES_2, and so forth. �ese are the residual used to review the normality assumption. 3.00 – 3.75 = –.75. Generating normality evidence:�As�mentioned�in�previous�chapters,�understand- ing� the� distributional� shape,� specifically� the� extent� to� which� normality� is� a� reasonable� assumption,� is� important�� For� the� two-factor� mixed� design� ANOVA,� the� distributional� shape�for�the�residuals�should�be�a�normal�distribution��Because�we�have�multiple�residu- als�to�reflect�the�multiple�measurements,�we�need�to�examine�normality�for�each�residual�� For�brevity,�we�provide�SPSS�excerpts�only�for�“RES_1,”�which�reflects�the�residual�for�time�1;� however,�we�will�narratively�discuss�all�of�the�residuals� As�in�previous�chapters,�we�can�again�use�“Explore”�to�examine�the�extent�to�which� the�assumption�of�normality�is�met��The�steps�for�accessing�“Explore”�have�already�been� presented,�and,�thus,�we�only�provide�a�basic�overview�of�the�process��Click�the�residual� and�move�it�into�the�“Dependent List”�box�by�clicking�on�the�arrow�button��The�proce- dures�for�selecting�normality�statistics�are�as�follows:�Click�on�“Plots”�in�the�upper�right� corner��Place�a�checkmark�in�the�boxes�for�“Normality plots with tests”�and�also� for�“Histogram.”�Then�click�“Continue”�to�return�to�the�main�“Explore”�dialog�box�� Finally�click�“Ok”�to�generate�the�output� Generating normality evidence Select residuals from the list on the left and use the arrow to move to the “Dependent List” box on the right. �en click on “Plots.” 540 An Introduction to Statistical Concepts Interpreting normality evidence:� We� have� already� developed� a� good� under- standing�of�how�to�interpret�some�forms�of�evidence�of�normality�including�skewness�and� kurtosis,�histograms,�and�boxplots��Next�we�see�the�output�for�this�evidence� Mean 5% Trimmed mean Median Variance Std. deviation Minimum Maximum Range Interquartile range Skewness Kurtosis 95% Confidence interval for mean Lower bound Upper bound Residual for Rater1_raw Statistic Std. Error Descriptives .0000 –.8654 .8654 –.0833 –.2500 1.071 1.03510 –.75 2.25 3.00 1.00 1.675 3.136 1.481 .752 .36596 The�skewness�statistic�of�the�residuals�for�rater�1�is�1�675�and�kurtosis�is�3�136—skewness� being� within� the� range� of� an� absolute� value� of� 2�0,� suggesting� some� evidence� of� normality�� However,� kurtosis� suggests� some� nonnormality�� For� the� other� three� residuals,� all� skewness� and�kurtosis�statistics�(not�shown�here)�are�within�an�absolute�value�of�2�0,�suggesting�evidence� of�normality��As�suggested�by�the�skewness�statistic,�the�histogram�of�residuals�is�positively� skewed,�and�the�histogram�also�provides�a�visual�display�of�the�leptokurtic�distribution� 4 3 2 Fr eq ue nc y 1 0 –1.00 .00 1.00 2.00 Residual for rater1_raw Histogram Mean = –5.55E – 17 Std. dev. = 1.035 N = 8 541Random- and Mixed-Effects Analysis of Variance Models There�are�a�few�other�statistics�that�can�be�used�to�gauge�normality��The�formal�test�of� normality,�the�Shapiro–Wilk�(S–W)�test�(SW)�(Shapiro�&�Wilk,�1965),�provides�evidence� of� the� extent� to� which� the� sample� distribution� is� statistically� different� from� a� normal� distribution��The�output�for�the�S–W�test�is�presented�in�the�following�and�suggests�that� our�sample�distributions�for�three�of�the�four�residuals�(specifically�residuals�for�raters�2,�3,� and� 4)� are� not� statistically� significantly� different� than� what� would� be� expected� from� a� normal�distribution,�as�those�p�values�are�less�than�α��However,�the�distribution�for�the� residual�for�rater�1�is�statistically�significantly�different�than�a�normal�distribution�(SW�=��745,� df�=�8,�p�=��007)� Statistic df dfSig. Sig.Statistic Tests of Normality Kolmogorov–Smirnova 8 8 8 8 8 8 8 8 .018 .200* .150 .065 .745 .913 .965 .828 .280 .250 .152 .316 .007 .374 .857 .057 Shapiro–Wilk Residual for Rater1_raw Residual for Rater2_raw Residual for Rater3_raw Residual for Rater4_raw a Lilliefors significance correction. *This is a lower bound of the true significance. Quantile–quantile� (Q–Q)� plots� are� also� often� examined� to� determine� evidence� of� nor- mality�� These� graphs� plot� quantiles� of� the� theoretical� normal� distribution� against� quan- tiles� of� the� sample� distribution�� Points� that� fall� on� or� close� to� the� diagonal� line� suggest� evidence�of�normality��The�Q–Q�plot�of�residuals�shown�in�the�following�suggests�some� nonnormality� 3 2 1 Ex pe ct ed n or m al 0 –1 –2 –1 0 1 Observed value 2 3 Normal Q–Q plot of residual for Rater1_raw This case, which falls far from the diagonal, suggests some nonnormality. Examination�of�the�following�boxplot�also�suggests�a�nonnormal�distributional�shape�of� residuals�with�one�outlier� 542 An Introduction to Statistical Concepts 3 2 1 0 –1 Residual for Rater1_raw 2 For�three�of�the�four�residuals�(residuals�for�raters�2,�3,�and�4),�the�forms�of�evidence�we� have� examined—skewness� and� kurtosis� statistics,� the� S–W� test,� the� Q–Q� plot,� and� the� boxplot—all�suggest�normality�is�a�reasonable�assumption��We�can�be�reasonably�assured� we�have�met�the�assumption�of�normality�for�residuals�for�raters�2,�3,�and�4��However,�all� forms�of�evidence�suggest�nonnormality�for�the�residual�for�rater�1� Independence The� only� assumption� we� have� not� tested� for� yet� is� independence�� As� we� discussed� in� reference� to� the� one-way� ANOVA,� if� subjects� have� been� randomly� assigned� to� condi- tions�(in�other�words,�the�different�levels�of�the�between-subjects�factor),�the�assumption� of� independence� has� been� met�� In� this� illustration,� students� were� randomly� assigned� to� instructor,� and,� thus,� the� assumption� of� independence� was� met�� However,� we� often� use�between-subjects�factors�that�do�not�allow�random�assignment,�such�as�preexisting� characteristics� (e�g�,� gender� or� education� level)�� We� can� plot� residuals� against� levels� of� our� between-subjects� factor� using� a� scatterplot� to� get� an� idea� of� whether� or� not� there� are� patterns� in� the� data� and� thereby� provide� an� indication� of� whether� we� have� met� this�assumption��In�this�illustration,�we�only�have�one�between-subjects�factor��If�there� were�multiple�between-subjects�factors,�we�would�split�the�scatterplot�by�levels�of�one� between-subjects�factor�and�then�generate�a�bivariate�scatterplot�for�the�other�between- subjects�factor�by�residual�(as�we�did�with�factorial�ANOVA)��Remember�that�the�resid- ual� was� added� to� the� dataset� by� saving� it� when� we� generated� the� two-factor� split-plot� ANOVA�model� Please� note� that� some� researchers� do� not� believe� that� the� assumption� of� indepen- dence�can�be�tested��If�there�is�not�random�assignment�to�groups,�then�these�researchers� 543Random- and Mixed-Effects Analysis of Variance Models believe�this�assumption�has�been�violated—period��The�plot�that�we�generate�will�give� us�a�general�idea�of�patterns,�however,�in�situations�where�random�assignment�was�not� performed� Generating the scatterplot:�The�general�steps�for�generating�a�simple�scatterplot� through�“Scatter/dot”�have�been�presented�in�a�previous�chapter�(e�g�,�Chapter�10),� and�will�not�be�reiterated�here��From�the�“Simple Scatterplot”�dialog�screen,�click� the� residual� variable� and� move� it� into� the� “Y Axis”� box� by� clicking� on� the� arrow�� Click� the� between-subjects� factor� (e�g�,� “Instructor”)� and� move� it� into� the�“X Axis”� box�by�clicking�on�the�arrow��Then�click�“Ok.”�Repeat�these�steps�for�each�of�the�four� residuals� Simple Scatterplot Rater 1 raw score [... Rater 2 raw score [... Rater 3 raw score [... Rater 4 raw score [... Residual for Rater2... Residual for Rater3... Residual for Rater4... Rater 2 ranked scor... Rater 2 ranked scor... Rater 3 ranked scor... Rater 4 ranked scor... 544 An Introduction to Statistical Concepts Interpreting independence evidence:�In�examining�the�scatterplots�for�evidence� of�independence,�the�points�should�fall�relatively�randomly�above�and�below�a�horizontal� line�at�0��(You�may�recall�in�Chapter�11�that�we�added�a�reference�line�to�the�graph�using� Chart�Editor��To�add�a�reference�line,�double�click�on�the�graph�in�the�output�to�activate�the� chart�editor��Select “Options”�in�the�top�pulldown�menu,�then�“Y axis reference line.”�This�will�bring�up�the�“Properties”�dialog�box��Change�the�value�of�the�position� to�be�“0�”�Then�click�on “Apply”�and�“Close”�to�generate�the�graph�with�a�horizontal� line�at�0�) Here� our� scatterplot� for� each� residual� generally� suggests� evidence� of� independence� with�a�relatively�random�display�of�residuals�above�and�below�the�horizontal�line�at�0�for� each�category�of�time�(note�that�only�the�scatterplot�of�the�residual�for�rater�3�by�instruc- tor� is� presented)�� If� we� had� not� met� the� assumption� of� independence� through� random� assignment�of�cases�to�groups,�this�provides�evidence�that�independence�was�a�reason- able�assumption� 1.00 .50 .00 Re si du al fo r R at er 3_ ra w –.50 –1.00 1.00 1.20 1.40 1.60 Instructor 1.80 2.00 Post Hoc Power for Two-Factor Split-Plot ANOVA Using G*Power Generating�power�analyses�for�two-factor�split-plot�ANOVA�models�follows�similarly�to� that� for� ANOVA,� factorial� ANOVA,� and� ANCOVA�� In� particular,� if� there� is� more� than� one�independent�variable,�we�must�test�for�main�effects�and�interactions�separately��The� first�thing�that�must�be�done�when�using�G*Power�for�computing�post�hoc�power�is�to� select�the�correct�test�family��In�our�case,�we�conducted�a�two-factor�split-plot�ANOVA�� Because�we�have�both�between,�within,�and�interaction�terms,�the�type�of�statistical�test� selected�depends�on�which�part�of�the�model�power�is�to�be�estimated��In�this�illustra- tion,� let� us� first� determine� power� for� the� within-between� subjects� interaction�� To� find� 545Random- and Mixed-Effects Analysis of Variance Models this� design,� we� select� “Tests”� in� the� top� pulldown� menu,� then� “Means,”� and� then� “ANOVA: Repeated measures, within-between interactions.”� Once� that� selection� is� made,� the� “Test family”� automatically� changes� to� “F Tests.”� (Note� that�had�we�wanted�to�determine�power�for�the�between-subjects�main�effect,�we�would� have�selected�“ANOVA: Repeated measures, between factors.”�For�the�within- subjects�main�effect,�we�would�have�selected�“ANOVA: Repeated measures, within factors.”) A B C Step 1 The�“Type of Power Analysis”�desired�needs�to�be�selected��To�compute�post�hoc� power,�select�“Post hoc: Compute achieved power—given α, sample size, and effect size.” 546 An Introduction to Statistical Concepts The default selection for “Test Family” is “t tests.” Following the procedures presented in Step 1 will automatically change the test family to “F tests.” The default selection for “Statistical Test” is “Correlation: Point biserial model.” Following the procedures presented in Step 1 will automatically change the statistical test to “ANOVA: Repeated measures, within- between interaction.” Click on “Determine” to pop out the effect size calculator box (shown below). This will allow you to compute f given partial eta squared. Once the parameters are specified, click on “Calculate.” The “Input Parameters” for computing post hoc power must be specified (the default values are shown here) including: Step 2 1. Effect size f 2. Alpha level 3. Total sample size 4. Number of groups 5. Number of measurements 6. Correlation among repeated measures 7. Nonsphericity correction The� “Input Parameters”� must� then� be� specified�� We� will� compute� the� effect� size� f� last,� so� we� skip� that� for� the� moment�� In� our� example,� the� alpha� level� we� used� was� �05,� and�the�total�sample�size�was�8��The�number of groups,�in�the�case�of�a�two-factor�split-plot� ANOVA�with�one�nonrepeated�factor�having�two�categories,�equals�2��The�next�parameter� is�the�number�of�measurements��This�refers�to�the�number�of�levels�of�the�repeated�factor,� which�in�this�illustration�is�4��Next,�we�have�to�input�the�correlation�among�repeated�mea- sures��We�will�estimate�this�parameter�as�the�average�correlation�among�all�bivariate�cor- relations�of�the�repeated�measures��For�our�raters,�the�Pearson�correlation�coefficients�were� as�follows:�r12�=��865,�r13�=��881,�r14�=�−�431,�r23�=��716,�r24�=�−�677,�and�r34�=�−�372,�and,�thus,� the�average�correlation�was��657�(in�absolute�value�terms)��The�last�parameter�to�define�is� the�nonsphericity�correction�epsilon,�ε��Epsilon�ranges�from�0�to�1,�with�0�indicating�the� assumption�is�violated�completely�and�1�being�perfect�sphericity��Acceptable�sphericity�is� approximately��75�or�higher��One�option�is�to�input�an�acceptable�level�of�sphericity;�thus,� we�input��75�here��Alternatively,�we�could�input�the�epsilon�values�obtained�for�the�usual,� Geisser–Greenhouse,�and�Huynh–Feldt�F�tests� We�skipped�filling�in�the�first�parameter,�the�effect�size�f,�until�all�of�the�previous�values� were�input��This�is�because�SPSS�only�provides�a�partial�eta�squared�effect�size��We�use�the� pop-out�effect�size�calculator�in�G*Power�to�compute�the�effect�size�f��To�pop�out�the�effect� size�calculator,�click�on�“Determine,”�which�is�displayed�under�“Input Parameters.”� In� the� pop-out� effect� size� calculator,� click� on� the� radio� button� for� “Direct”� and� then� enter� the� partial� eta� squared� value� that� was� calculated� in� SPSS� (i�e�,� �899)�� Clicking� on� 547Random- and Mixed-Effects Analysis of Variance Models “Calculate”�in�the�pop-out�effect�size�calculator�will�calculate�the�effect�size�f��Then�click� on�“Calculate and Transfer to Main Window”�to�transfer�the�calculated�effect�size� (i�e�,�2�9834527)�to�the�“Input Parameters.”�Once�the�parameters�are�specified,�click�on� “Calculate”�to�find�the�power�statistics� Here are the post hoc power results. Step 3 The�“Output Parameters”�provide�the�relevant�statistics�given�the�input�just�specified�� In�this�example,�we�were�interested�in�determining�post�hoc�power�for�the�within-between� interaction� in� a� two-factor� split-plot� ANOVA� with� a� computed� effect� size� f� of� 2�9834527,� an� alpha�level�of��05,�total�sample�size�of�8,�two�groups,�four�measurements,�an�average�correla- tion�among�repeated�measures�of��657,�and�epsilon�sphericity�correction�of��75��Based�on�those� criteria,�the�post�hoc�power�of�our�within-between�interaction�effect�for�this�test�was�1�000— the�probability�of�rejecting�the�null�hypothesis�when�it�is�really�false�(in�this�case,�the�probabil- ity�that�the�means�of�the�dependent�variable�would�be�equal�for�each�level�of�the�independent� variable)�was�at�the�maximum�(i�e�,�100%)�(sufficient�power�is�often��80�or�above)��Note�that�this� is�the�same�value�as�that�reported�in�SPSS��Keep�in�mind�that�conducting�power�analysis�a� priori�is�recommended�so�that�you�avoid�a�situation�where,�post�hoc,�you�find�that�the�sample� size�was�not�sufficient�to�reach�the�desired�level�of�power�(given�the�observed�parameters)� A Priori Power for Two-Factor Split-Plot ANOVA Using G*Power For� a� priori� power,� we� can� determine� the� total� sample� size� needed� for� the� main� effects� and/or interactions� given� an� estimated� effect� size� f,� alpha� level,� desired� power,� number� of� 548 An Introduction to Statistical Concepts groups�(i�e�,�the�number�of�categories�of�the�independent�variable in the case of only one inde- pendent variable�OR�the�product�of�the�number�of�levels�of�the�independent�variables�in the case of multiple independent variables),�number�of�measurements,�correlation�among�repeated�mea- sures,�and�nonsphericity�correction�epsilon��We�follow�Cohen’s�(1988)�convention�for�effect�size� (i�e�,�small�f�=��10;�moderate�f�=��25;�large�f�=��40)��In�this�example,�had�we�wanted�to�determine� a�priori�power�for�a�within-between�interaction�and�had�estimated�a�moderate�effect�f�of��25,� alpha�of��05,�desired�power�of��80,�number�of�groups�was�2�(i�e�,�we�have�only�one�independent� variable,�and�there�were�two�categories),�four�measurements,�a�moderate�correlation�among� repeated�measures�of��50,�and�a�nonsphericity�correction�epsilon�of��75,�we�would�need�a�total� sample�size�of�30�(i�e�,�15�cases�per�group�given�two�levels�to�our�independent�variable)��Here� are�the�post�hoc�power�results�for�the�attractiveness�by�time�of�day�interaction� Post hoc power 15.7 Template and APA-Style Write-Up Finally,�here�is�an�example�paragraph�just�for�the�results�of�the�two-factor�split-plot�design� (feel�free�to�write�similar�paragraphs�for�the�other�models�in�this�chapter)��Recall�that�our� graduate�research�assistant,�Marie,�was�assisting�the�coordinator�of�the�English�program,� Mark��Mark�wanted�to�know�the�following:�if�there�is�a�mean�difference�in�writing�based� on�instructor,�if�there�is�a�mean�difference�in�writing�based�on�rater,�and�if�there�is�a�mean� 549Random- and Mixed-Effects Analysis of Variance Models difference� in� writing� based� on� rater� by� instructor�� The� research� questions� presented� to� Mark�from�Marie’s�work�include�the�following: •� Is there a mean difference in writing based on instructor? •� Is there a mean difference in writing based on rater? •� Is there a mean difference in writing based on rater by instructor? Marie�then�assisted�Mark�in�generating�a�two-factor�split-plot�ANOVA�as�the�test�of�infer- ence,�and�a�template�for�writing�the�research�questions�for�this�design�is�presented�as�fol- lows��As�we�noted�in�previous�chapters,�it�is�important�to�ensure�the�reader�understands� the�levels�or�groups�of�the�factor(s)��This�may�be�done�parenthetically�in�the�actual�research� question,�as�an�operational�definition,�or�specified�within�the�methods�section: •� Is there a mean difference in [dependent variable] based on [between-subjects factor]? •� Is there a mean difference in [dependent variable] based on [within-subjects factor]? •� Is there a mean difference in [dependent variable] based on [between-subjects factor] by [within-subjects factor]? It�may�be�helpful�to�preface�the�results�of�the�two-factor�split-plot�ANOVA�with�informa- tion�on�an�examination�of�the�extent�to�which�the�assumptions�were�met�(recall�there�are� several�assumptions�that�we�tested)��For�the�between-subjects�factor�(i�e�,�the�nonrepeated� factor),� assumptions� include� (a)� independence� of� observations,� (b)� homogeneity� of� vari- ance,�and�(c)�normality��For�the�within-subjects�factor�(i�e�,�the�repeated�factor),�we�examine� the�assumption�of�sphericity� A two-factor split-plot (one within-subjects factor and one between- subjects factor) ANOVA was conducted. The within-subjects factor was rater on a writing assessment task (four independent raters), and the between-subjects factor was instructor (two instructors). The null hypotheses tested include the following: (1) the mean writing scores were equal for each of the four different raters, (2) the mean writ- ing scores for each instructor were equal, and (3) the mean writing scores by rater given instructor were equal. There were no missing data and no univariate outliers. The assump- tion of sphericity was met (χ2 = 4.001, Mauchly’s W = .429, df = 5, p = .557); therefore, the results reported reflect univariate results. The sphericity assumption was further upheld in that the same results were obtained for the usual, Geisser–Greenhouse, and Huynh–Feldt F tests. The assumption of homogeneity of variance was met for the writing scores of all raters [rater 1, F(1, 6) = 3.600, p = .107; rater 2, F(1, 6) = .158, p = .705; rater 3, F(1, 6) = .000, p = 1.000; and rater 4, F(1, 6) = 1.000, p = .356]. The assumption of normality was tested via examination of the residu- als. Review of the S–W test for normality (SWrater1 = .745, df = 8, p = .007; SWrater2 = .913, df = 8, p = .374; SWrater3 = .965, df = 8, p = .857; SWrater4 = .828, df = 8, p = .057), and skewness (rater 1 = 1.675; rater 550 An Introduction to Statistical Concepts 2 = .290; rater 3 = .000; rater 4 = −.571) and kurtosis (rater 1 = 3.136; rater 2 = .272; rater 3 = −.700; rater 4 = −1.729) statistics suggest that normality was a reasonable assumption for raters 2, 3, and 4, but nonnormality was suggested for rater 1. The boxplot suggested a rela- tively normal distributional shape (with no outliers) of the residuals for raters 2 through 4. The boxplot of the residuals for rater 1 sug- gested nonnormality with one outlier. The Q–Q plots suggested normal- ity was reasonable for the residuals of raters 2, 3, and 4, but suggested nonnormality for rater 1. Thus, while there was nonnormality suggested by the residuals for rater 1, the two-factor split-plot ANOVA is robust to violations of normality with equal sample sizes of groups as is evident in this design. Random assignment of individuals to instructors helped ensure that the assumption of independence was met. Additionally, a scatterplot of residuals against the levels of the between-subjects factors was reviewed. A relatively random display of points around 0 provided further evidence that the assumption of independence was met. Here�is�an�APA-style�example�paragraph�of�results�for�the�two-factor�split-plot�ANOVA� (remember� that� this� will� be� prefaced� by� the� previous� paragraph� reporting� the� extent� to� which�the�assumptions�of�the�test�were�met)� From Table 15.16, the results for the univariate ANOVA indicate the following: 1. A statistically significant within-subjects main effect for rater (Frater = 190.200, df = 3,18, p = .001) (rater 1, M = 2.750, SE = .395; rater 2, M = 3.625, SE = .239; rater 3, M = 6.250, SE = .250; rater 4, M = 9.125, SE = .191) 2. A statistically significant within-between subjects interac- tion effect between rater and instructor (Frater × instructor = 12.120, df = 3,18, p = .001) (for brevity, we have not included the means and standard errors here; however, you may want to include those in the narrative or in tabular form) 3. A nonstatistically significant between-subjects main effect for instructor (Finstructor = 4.200, df = 1,6, p = .086) (instructor 1, M = 5.875, SE = .302; instructor 2, M = 5.000, SE = .302) Effect sizes were rather large for the significant effects (partial η2rater = .969, power = 1.000; partial η2rater × instructor = .669, power = .998) with more than sufficient observed power, but less so for the non- significant effect (partial η2instructor = .412, power = .407) which had less than desired power. The statistically significant main effect for the within-subjects factor suggests that there are mean differences in writing scores by rater. The raters were quite inconsistent in that Bonferroni MCPs revealed statistically significant differences among all pairs of raters except for rater 1 versus rater 2. The nonstatistically 551Random- and Mixed-Effects Analysis of Variance Models significant main effect for the between-subjects factor suggests that there are not differences, on average, in writing scores per instructor. In examining CIs of the interaction for the between- within factor (i.e., instructor by rater), nonoverlapping CIs sug- gest statistically significant differences. We see that the patterns evident for the within-subjects factors echo here as well. For both instructor 1 and instructor 2, there are statistically significant differences among all pairs of raters except for rater 1 versus rater 2. From the profile plot in Figure 15.2, we see that while rater 4 found the students of instructor 2 to have better essays, the other raters liked the essays written by the students of instructor 1. It is suggested that a more detailed plan for evaluating essays, including rater training, be implemented in the future. 15.8 Summary In� this� chapter,� methods� involving� the� comparison� of� means� for� random-� and� mixed- effects�models�were�considered��Five�different�models�were�examined;�these�included�the� one-factor� random-effects� model,� the� two-factor� random-� and� mixed-effects� models,� the one-factor� repeated� measures� model,� and� the� two-factor� split-plot� or� mixed� design�� Included�for�each�design�were�the�usual�topics�of�model�characteristics,�the�linear�model,� assumptions�of�the�model�and�the�effects�of�their�violation,�the�ANOVA�summary�table� and�expected�mean�squares,�and�MCPs��Also�included�for�particular�designs�was�a�discus- sion�of�the�compound�symmetry�assumption�and�alternative�ANOVA�procedures� At�this�point,�you�should�have�met�the�following�objectives:�(a)�be�able�to�understand� the�characteristics�and�concepts�underlying�random-�and�mixed-effects�ANOVA�models,� (b)�be�able�to�determine�and�interpret�the�results�of�random-�and�mixed-effects�ANOVA� models,� and� (c)� be� able� to� understand� and� evaluate� the� assumptions� of� random-� and� mixed-effects� ANOVA� models�� In� Chapter� 16,� we� continue� our� extended� tour� of� the� ANOVA�by�looking�at�hierarchical�designs�that�involve�one�factor�nested�within�another� factor� (i�e�,� nested� or� hierarchical� designs),� and� randomized� block� designs,� which� we� have�very�briefly�introduced�in�this�chapter� Problems Conceptual problems 15.1� When�an�ANOVA�design�includes�a�random�factor�that�is�crossed�with�a�fixed�factor,� the�design�illustrates�which�type�of�model? � a�� Fixed � b�� Mixed � c�� Random � d�� Crossed 552 An Introduction to Statistical Concepts 15.2� The�denominator�of�the�F�ratio�used�to�test�the�interaction�in�a�two-factor�ANOVA�is� MSwith�in�which�one�of�the�following? � a�� Fixed-effects�model � b�� Random-effects�model � c�� Mixed-effects�model � d�� All�of�the�above 15.3� A� course� consists� of� five� units,� the� order� of� presentation� of� which� is� varied� (coun- terbalanced)�� A� researcher� used� a� 5� ×� 2� ANOVA� design� with� order� (five� different� randomly� selected� orders)� and� gender� serving� as� factors�� Which� ANOVA� model� is� illustrated�by�this�design? � a�� Fixed-effects�model � b�� Random-effects�model � c�� Mixed-effects�model � d�� Nested�model 15.4� A�researcher�conducts�a�study�where�children�are�measured�on�frequency�of�sharing� at�three�different�times�over�the�course�of�the�academic�year��Which�ANOVA�model� is�most�appropriate�for�analysis�of�these�data? � a�� One-factor�random-effects�model � b�� Two-factor�random-effects�model � c�� Two-factor�mixed-effects�model � d�� One-factor�repeated�measures�design � e�� Two-factor�split-plot�design 15.5� A�health-care�researcher�wants�to�make�generalizations�about�the�number�of�patients� served�by�after�hour�clinics�in�her�region��She�randomly�samples�clinics�and�collects� data�on�the�number�of�patients�served��Which�ANOVA�model�is�most�appropriate�for� analysis�of�these�data? � a�� One-factor�random-effects�model � b�� Two-factor�random-effects�model � c�� Two-factor�mixed-effects�model � d�� One-factor�repeated�measures�design � e�� Two-factor�split-plot�design 15.6� �A� preschool� teacher� randomly� assigns� children� to� classrooms—some� with� win- dows�and�some�without�windows��She�wants�to�know�if�there�is�a�mean�difference� in� receptive� vocabulary� based� on� type� of� classroom� (with� and� without� windows)� and�whether�this�varies�by�classroom�teacher��Which�ANOVA�model�is�most�appro- priate�for�analysis�of�these�data? � a�� One-factor�random-effects�model � b�� Two-factor�random-effects�model � c�� Two-factor�mixed-effects�model � d�� One-factor�repeated�measures�design � e�� Two-factor�split-plot�design 553Random- and Mixed-Effects Analysis of Variance Models 15.7� �If�a�given�set�of�data�were�analyzed�with�both�a�one-factor�fixed-effects�model�and� a�one-factor�random-effects�model,�the�F�ratio�for�the�random-effects�model�will�be� greater�than�the�F�ratio�for�the�fixed-effects�model��True�or�false? 15.8� �A�repeated�measures�design�is�necessarily�an�example�of�the�random-effects�model�� True�or�false? 15.9� �Suppose� researchers� A� and� B� perform� a� two-factor� ANOVA� on� the� same� data,� but�that�A�assumes�a�fixed-effects�model�and�B�assumes�a�random-effects�model�� I assert�that�if�A�finds�the�interaction�significant�at�the��05�level,�B�will�also�find�the� interaction�significant�at�the��05�level��Am�I�correct? 15.10� �I�assert�that�MSwith�should�always�be�used�as�the�denominator�for�all�F�ratios�in�any� two-factor�ANOVA��Am�I�correct? 15.11� �I�assert�that�in�a�one-factor�repeated�measures�ANOVA�and�a�two-factor�split-plot� ANOVA,�the�SStotal�will�be�exactly�the�same�when�using�the�same�data��Am�I�correct? 15.12� �Football� players� are� each� exposed� to� all� three� different� counterbalanced� coaching� strategies,�one�per�month��This�is�an�example�of�which�type�of�model? � a�� One-factor�fixed-effects�ANOVA�model � b�� One-factor�repeated-measures�ANOVA�model � c�� One-factor�random-effects�ANOVA�model � d�� One-factor�fixed-effects�ANCOVA�model 15.13� A�two-factor�split-plot�design�involves�which�of�the�following? � a�� Two�repeated�factors � b�� Two�nonrepeated�factors � c�� One�repeated�factor�and�one�nonrepeated�factor � d�� Farmers�splitting�up�their�land�into�plots 15.14� �The�interaction�between�factors�L�and�M�can�be�assessed�only�if�which�one�of�the� following�occurs? � a�� Both�factors�are�crossed� � b�� Both�factors�are�random� � c�� Both�factors�are�fixed� � d�� Factor�L�is�a�repeated�factor� 15.15� A�student�factor�is�almost�always�random��True�or�false? 15.16� �In� a� two-factor� split-plot� design,� there� are� two� interaction� terms�� Hypotheses� can� actually�be�tested�for�how�many�of�those�interactions? � a�� 0 � b�� 1 � c�� 2 � d�� Cannot�be�determined 15.17� �In�a�one-factor�repeated�measures�ANOVA�design,�the�F�test�is�quite�robust�to�viola- tion�of�the�sphericity�assumption,�and,�thus,�we�never�need�to�worry�about�it��True� or�false? 554 An Introduction to Statistical Concepts Computational problems 15.1� Complete�the�following�ANOVA�summary�table�for�a�two-factor�model,�where�there� are�three�levels�of�factor�A�(fixed�method�effect)�and�two�levels�of�factor�B�(random� teacher�effect)��Each�cell�of�the�design�includes�four�students�(α�=��01)� Source SS df MS F Critical Value Decision A 3�64 — — — — — B �57 — — — — — AB 2�07 — — — — — Within — — — Total 8�18 — 15.2� A�researcher�tested�whether�aerobics�increased�the�fitness�level�of�eight�undergradu- ate� students� participating� over� a� 4-month� period�� Students� were� measured� at� the� end� of� each� month� using� a� 10-point� fitness� measure� (10� being� most� fit)�� The� data� are�shown�here��Conduct�an�ANOVA�to�determine�the�effectiveness�of�the�program,� using�α�=��05��Use�the�Bonferroni�method�to�detect�exactly�where�the�differences�are� among�the�time�points�(if�they�are�different)� Subject Time 1 Time 2 Time 3 Time 4 1 3 4 6 9 2 4 7 5 10 3 5 7 7 8 4 1 3 5 7 5 3 4 7 9 6 2 5 6 7 7 1 4 6 9 8 2 4 5 6 15.3� Using�the�same�data�as�in�Computational�Problem�2,�conduct�a�two-factor�split-plot� ANOVA,�where�the�first�four�subjects�participate�in�a�step�aerobics�program�and�the� last�four�subjects�participate�in�a�spinning�program�(α�=�05)� 15.4� To�examine�changes�in�teaching�self-efficacy,�10�teachers�were�measured�on�their� self-efficacy� toward� teaching� at� the� beginning� of� their� teaching� career� and� at� the� end�of�their�1st�and�3rd�years�of�teaching��The�teaching�self-efficacy�scale�ranged� from�0�to�100�with�higher�scores�reflecting�greater�teaching�self-efficacy��The�data� are� shown� here�� Conduct� a� one-factor� repeated� measures� ANOVA� to� determine� mean�differences�across�time,�using�α�=��05��Use�the�Bonferroni�method�to�detect�if� and/or�where�the�differences�are�among�the�time�points� 555Random- and Mixed-Effects Analysis of Variance Models Subject Beginning Year 1 End Year 1 End Year 3 1 35 50 45 2 50 75 82 3 42 51 56 4 70 72 71 5 65 50 81 6 92 42 69 7 80 82 88 8 78 76 79 9 85 60 83 10 64 71 89 15.5� Using�the�same�data�as�in�Computational�Problem�4,�conduct�a�two-factor�split-plot� ANOVA,�where�the�first�five�subjects�participate�in�a�mentoring�program�and�the�last� five�subjects�do�not�participate�in�a�mentoring�program�(α�=�05)� 15.6� As�a�statistical�consultant,�a�researcher�comes�to�you�with�the�following�partial�SPSS� output� (sphericity� assumed)�� In� a� two-factor� split-plot� ANOVA� design,� rater� is� the� repeated�(or�within-subjects)�factor,�gender�of�the�rater�is�the�nonrepeated�(or�between- subjects)�factor,�and�the�dependent�variable�is�history�exam�scores��(a)�Are�the�effects� significant�(which�you�must�determine,�as�significance�is�missing,�using�α�=��05)?�(b)� What�are�the�implications�of�these�results�in�terms�of�rating�the�history�exam? Tests�of�Within-Subjects�Effects Source Type III SS df MS F Rater 298�38 3 99�46 30�47 Rater*gender 184�38 3 61�46 18�83 Error�(rater) 58�75 18 3�26 Tests�of�Between-Subjects�Effects Source Type III SS df MS F Gender 153�13 1 153�13 20�76 Error 44�25 6 7�38 Interpretive problems 15.1� In� Chapter� 13,� you� built� on� the� interpretive� problem� from� Chapter� 11� utilizing� the� survey� 1� dataset� from� the� website�� SPSS� was� used� to� conduct� a� two-factor� fixed- effects�ANOVA,�including�effect�size,�where�political�view�was�factor�A�(as�in�Chapter� 11,�J�=�5),�gender�is�factor�B�(a�new�factor,�K�=�2),�and�the�dependent�variable�was�the� 556 An Introduction to Statistical Concepts same�one�you�used�previously�in�Chapter�11��Now,�in�addition�to�the�two-factor�fixed- effects� ANOVA,� conduct� a� random-effects� and� mixed-effects� designs�� Determine� whether�the�nature�of�the�factors�makes�any�difference�in�the�results� 15.2� In�Chapter�13,�you�built�on�the�interpretive�problem�from�Chapter�11�utilizing�the� survey� 1� dataset� from� the� website�� SPSS� was� used� to� conduct� a� two-factor� fixed- effects�ANOVA,�including�effect�size,�where�hair�color�was�factor�A�(i�e�,�one�inde- pendent�variable)�(J�=�5),�gender�was�factor�B�(a�new�factor,�K�=�2),�and�the�dependent� variable�was�a�variable�of�interest�to�you�(the�following�variables�look�interesting:� books,� TV,� exercise,� drinks,� GPA,� GRE-Q,� CDs,� hair� appointment)�� Now,� in� addi- tion�to�the�two-factor�fixed-effects�ANOVA,�conduct�a�random-effects�and�mixed- effects�designs��Determine�whether�the�nature�of�the�factors�makes�any�difference� in�the�results� 557 16 Hierarchical and Randomized Block Analysis of Variance Models Chapter Outline 16�1� Two-Factor�Hierarchical�Model 16�1�1� Characteristics�of�the�Model 16�1�2� Layout�of�Data 16�1�3� ANOVA�Model 16�1�4� ANOVA�Summary�Table�and�Expected�Mean�Squares 16�1�5� Multiple�Comparison�Procedures 16�1�6� Example 16�2� Two-Factor�Randomized�Block�Design�for�n�=�1 16�2�1� Characteristics�of�the�Model 16�2�2� Layout�of�Data 16�2�3� ANOVA�Model 16�2�4� Assumptions�and�Violation�of�Assumptions 16�2�5� ANOVA�Summary�Table�and�Expected�Mean�Squares 16�2�6� Multiple�Comparison�Procedures 16�2�7� Methods�of�Block�Formation 16�2�8� Example 16�3� Two-Factor�Randomized�Block�Design�for�n�>�1
16�4� Friedman�Test
16�5� Comparison�of�Various�ANOVA�Models
16�6� SPSS
16�7� Template�and�APA-Style�Write-Up
Key Concepts
� 1�� Crossed�designs�and�nested�designs
� 2�� Confounding
� 3�� Randomized�block�designs
� 4�� Methods�of�blocking

558 An Introduction to Statistical Concepts
In�the�last�several�chapters,�our�discussion�has�dealt�with�different�analysis�of�variance�
(ANOVA)�models��In�this�chapter,�we�complete�our�discussion�of�ANOVA�by�consider-
ing� models� in� which� there� are� multiple� factors,� but� where� at� least� one� of� the� factors� is�
either� a� hierarchical� (or� nested)� factor� or� a� blocking� factor�� As� we� define� these� models,�
we�shall�see�that�this�results�in�a�hierarchical�(or�nested)�design�and�a�blocking�design,�
respectively�� In� this� chapter,� we� are� mostly� concerned� with� the� two-factor� hierarchical�
(or� nested)� model� and� the� two-factor� randomized� block� model,� although� these� models�
can�be�generalized�to�designs�with�more�than�two�factors��Most�of�the�concepts�used�in�
this�chapter�are�the�same�as�those�covered�in�previous�chapters��In�addition,�new�con-
cepts�include�crossed�and�nested�factors,�confounding,�blocking�factors,�and�methods�of�
blocking��Our�objectives�are�that�by�the�end�of�this�chapter,�you�will�be�able�to�(a)�under-
stand� the� characteristics� and� concepts� underlying� hierarchical� and� randomized� block�
ANOVA� models,� (b)� determine� and� interpret� the� results� of� hierarchical� and� random-
ized�block�ANOVA�models,�(c)�understand�and�evaluate�the�assumptions�of�hierarchical�
and�randomized�block�ANOVA�models,�and�(d)�compare�different�ANOVA�models�and�
select�an�appropriate�model�
16.1 Two-Factor Hierarchical Model
Throughout� the� text,� we� have� followed� Marie,� a� graduate� student� enrolled� in� an� educa-
tional�research�program,�on�her�statistical�analysis�adventures��In�this�chapter,�we�see�her�
embarking�on�a�new�journey�
Seeing�the�success�that�Marie�has�had�with�more�complex�statistical�analysis,�Marie’s�
faculty�advisor�has�provided�Marie�with�another�challenging�task��This�time,�Marie�
will� be� working� with� a� reading� faculty� member� (JoAnn)� at� their� university�� JoAnn�
has� conducted� an� experiment� in� which� children� were� randomly� assigned� to� one� of�
two�reading�approaches�(basal�or�whole�language)�and�one�of�four�different�teachers��
There�were�24�children�who�participated;�thus,�there�were�six�children�in�each�read-
ing� approach-teacher� combination�� Each� student� was� assessed� on� reading� compre-
hension�at�the�conclusion�of�the�study��JoAnn�wants�to�know�the�following:�if�there�
is�a�mean�difference�in�reading�based�on�approach�to�reading�and�if�there�is�a�mean�
difference�in�reading�between�teachers��Marie�suggests�the�following�research�ques-
tions�to�JoAnn:
•� Is there a mean difference in reading based on approach to reading?
•� Is there a mean difference in reading based on teacher?
With�one�between-subjects�independent�variable�(i�e�,�approach�to�reading)�and�one�hier-
archical� or� nested� factor� (i�e�,� teacher),� Marie� determines� that� a� two-factor� hierarchical�
ANOVA�is�the�best�statistical�procedure�to�use�to�answer�JoAnn’s�question��Her�next�task�
is�to�assist�JoAnn�in�analyzing�the�data�
In�this�section,�we�describe�the�distinguishing�characteristics�of�the�two-factor�hierarchi-
cal�ANOVA�model,�the�layout�of�the�data,�the�linear�model,�the�ANOVA�summary�table�
and�expected�mean�squares,�and�multiple�comparison�procedures�(MCPs)�

559Hierarchical and Randomized Block Analysis of Variance Models
16.1.1 Characteristics of the Model
The� characteristics� of� the� two-factor� fixed-,� random-,� and� mixed-effects� models� have�
already�been�covered�in�Chapters�13�and�15��Here�we�consider�a�special�form�of�the�two-
factor�model�where�one�factor�is�nested�within�another�factor��The�best�introduction�to�this�
model�is�via�an�example��Suppose�you�are�interested�in�which�of�several�different�major�
teaching�pedagogies�(e�g�,�worksheet,�math�manipulative,�and�computer-based�approaches)�
results�in�the�highest�level�of�achievement�in�mathematics�among�second-grade�students��
Thus,�math�achievement�is�the�dependent�variable,�and�teaching�pedagogy�is�one�factor��
A�second�factor�is�teacher��That�is,�you�may�also�believe�that�some�teachers�are�more�effec-
tive�than�others,�which�results�in�different�levels�of�student�achievement��However,�each�
teacher� has� only� one� class� of� students� and� thus� only� one� major� teaching� pedagogy�� In�
other�words,�all�combinations�of�the�pedagogy�and�teacher�factors�are�not�possible��This�
design� is� known� as� a� nested design,� hierarchical design,� or� multilevel model� because�
the�teacher�factor�is�nested�within�the�pedagogy�factor��This�is�in�contrast�to�a�two-factor�
crossed design�where�all�possible�combinations�of�the�two�factors�are�included��The�two-
factor�designs�described�in�Chapters�13�and�15�were�all�crossed�designs�
Let�us�give�a�more�precise�definition�of�crossed�and�nested�designs��A�two-factor�com-
pletely�crossed�design�(or�complete factorial design)�is�one�where�every�level�of�factor�A�
occurs�in�combination�with�every�level�of�factor�B��A�two-factor�nested�design�(or�incom-
plete factorial design)�of�factor�B�being�nested�within�factor�A�is�one�where�the�levels�of�
factor� B� occur� for� only� one� level� of� factor� A�� We� denote� this� particular� nested� design� as�
B(A),�which�is�read�as�factor�B�being�nested�within�factor�A�(in�other�references,�you�may�
see�this�written�as�B:A�or�as�B|A)��To�return�to�our�example,�the�teacher�factor�(factor�B)�is�
nested�within�the�method�factor�(factor�A),�as�each�teacher�utilizes�only�one�major�teaching�
pedagogy��The�outcome�measured�is�student�performance��Thus,�a�researcher�may�select�
a�nested�design�to�examine�the�extent�to�which�student�performance�in�mathematics�dif-
fers�given�that�teachers�are�nested�within�teaching�pedagogy��The�researcher�is�likely�most�
interested�in�the�treatment�(e�g�,�teaching�pedagogy),�but�recognizes�that�the�context�(i�e�,�
the�classroom�teacher)�may�contribute�to�differences�in�the�outcome,�and�can�model�this�
statistically�through�a�hierarchical�ANOVA�
These�models�are�shown�graphically�in�Figure�16�1��In�Figure�16�1a,�a�completely�crossed�
or�complete�factorial�design�is�shown�where�there�are�two�levels�of�factor�A�and�six�levels�of�
factor�B��Thus,�there�are�12�possible�factor�combinations�that�would�all�be�included�in�a�com-
pletely�crossed�design��The�shaded�region�indicates�the�combinations�that�might�be�included�
in�a�nested�or�incomplete�factorial�design�where�factor�B�(e�g�,�teacher)�is�nested�within�fac-
tor� A� (e�g�,� teaching� pedagogy)�� Although� the� number� of� levels� of� each� factor� remains� the�
same,�factor�B�now�has�only�three�levels�within�each�level�of�factor�A��For�A1,�we�see�only�
B1,�B2,�and�B3,�whereas�for�A2,�we�see�only�B4,�B5,�and�B6��Thus,�only�6�of�the�possible�12�fac-
tor�combinations�are�included�in�the�nested�design��For�example,�level�1�of�factor�B�occurs�
only�in�combination�with�level�1�of�factor�A��In�summary,�Figure�16�1a�shows�that�the�nested�
or� incomplete� factorial� design� consists� of� only� a� portion� of� the� completely� crossed� design�
(the� shaded� regions)�� In� Figure� 16�1b,� we� see� the� nested� design� depicted� in� its� more� tradi-
tional�form��Here�you�see�that�the�six�factor�combinations�not�included�are�not�even�shown�
(e�g�,�A1�with�B4)��Other�examples�of�the�two-factor�nested�design�are�as�follows:�(a)�school�is�
nested�within�school�district,�(b)�faculty�member�is�nested�within�department,�(c)�individual�
is�nested�within�neighborhood,�and�(d)�county�is�nested�within�state�
Thus,�with�this�design,�one�factor�is�nested�within�another�factor,�rather�than�the�two�fac-
tors�being�crossed��As�is�shown�in�more�detail�later�in�this�chapter,�the�nesting�characteristic�

560 An Introduction to Statistical Concepts
has�some�interesting�and�distinct�outcomes��For�now,�some�brief�mention�should�be�made�of�
these�outcomes��Nesting�is�a�particular�type�of�confounding�among�the�factors�being�investi-
gated,�where�the�AB�interaction�is�part�of�the�B�effect�(or�is�confounded�with�B)�and�therefore�
cannot�be�investigated��(Going�back�to�the�previous�example,�this�means�that�the�teacher�by�
teaching�pedagogy�interaction�effect�is�confounded�with�the�teacher�main�effect,�and�thus�
teasing�apart�those�effects�is�not�possible�)�In�the�ANOVA�model�and�the�ANOVA�summary�
table,�there�will�not�be�an�interaction�term�or�source�of�variation��This�is�due�to�the�fact�that�
each�level�of�factor�B�(the�nested�factor,�such�as�the�teacher)�occurs�in�combination�with�only�
one�level�of�factor�A�(the�nonnested�factor,�such�as�the�teaching�pedagogy)��We�cannot�com-
pare�for�a�particular�level�of�B�(e�g�,�the�classroom�teacher)�all�levels�of�factor�A�(e�g�,�teaching�
pedagogy),�as�a�certain�level�of�B�only�occurs�with�one�level�of�A�
Confounding�may�occur�for�two�reasons��First,�the�confounding�may�be�intentional�due�
to�practical�reasons,�such�as�a�reduction�in�the�number�of�individuals�to�be�observed��Fewer�
individuals�would�be�necessary�in�a�nested�design,�as�compared�to�a�crossed�design,�due�
to�the�fact�that�there�are�fewer�cells�in�the�model��Second,�the�confounding�may�be�abso-
lutely�necessary�because�crossing�may�not�be�possible��For�example,�school�is�nested�within�
school�district�because�a�particular�school�can�only�be�a�member�of�one�school�district��The�
nested�factor�(here�factor�B)�may�be�a�nuisance�variable�that�the�researcher�wants�to�take�
into�account�in�terms�of�explaining�or�predicting�the�dependent�variable�Y��An�error�com-
monly� made� is� to� ignore� the� nuisance� variable� B� and� go� ahead� with� a� one-factor� design�
using�only�factor�A��This�design�may�result�in�a�biased�test�of�factor�A�such�that�the�F�ratio�
is�inflated��Thus,�H0�would�be�rejected�more�often�than�it�should�be,�serving�to�increase�the�
actual�α�level�over�that�specified�by�the�researcher�and�thereby�increase�the�likelihood�of�a�
Type�I�error��The�F�test�is�then�too�liberal�
Let�us�make�two�further�points�about�this�first�characteristic��First,�in�the�one-factor�design�
discussed� in� Chapter� 11,� we� have� already� seen� nesting� going� on� in� a� different� way�� Here�
subjects� were� nested� within� factor� A� because� each� subject� only� responded� to� one� level� of�
factor�A��It�was�only�when�we�got�to�repeated�measures�designs�in�Chapter�15�that�individu-
als�were�allowed�to�respond�to�more�than�one�level�of�a�factor��For�the�repeated�measures�
design,�we�actually�had�a�completely�crossed�design�of�subjects�by�factor�A��Second,�Glass�
and�Hopkins�(1996)�give�a�nice�conceptual�example�of�a�nested�design�with�teachers�being�
nested�within�schools,�where�each�school�is�like�a�nest�having�multiple�eggs�or�teachers�
B1 B2 B3 B4 B5 B6
A1
A2
(a)
A1 A2
B1 B2 B3 B4 B5 B6
(b)
FIGuRe 16.1
Two-factor�completely�crossed�versus�nested�designs��(a)�The�completely crossed design:�The�shaded�region�indi-
cates�the�cells�that�would�be�included�in�a�nested�design�where�factor�B�is�nested�within�factor�A��In�the�nested�
design,�factor�A�has�two�levels,�and�factor�B�has�three�levels�within�each�level�of�factor�A��You�see�that�only�6�of�
the�12�possible�cells�are�filled�in�the�nested�design��(b)�The�same�nested�design�in�traditional form:�The�shaded�
region�indicates�the�cells�included�in�the�nested�design�(i�e�,�the�same�six�as�shown�in�the�first�part)�

561Hierarchical and Randomized Block Analysis of Variance Models
The� remaining� characteristics� should� be� familiar�� These� include� the� following:� (a)� two�
factors�(or�independent�variables)� that�are�nominal�or�ordinal�in�scale,�each�with�two�or�
more�levels;�(b)�the�levels�of�each�of�the�factors�may�be�either�randomly�sampled�from�the�
population�of�levels�or�fixed�by�the�researcher�(i�e�,�the�model�may�be�fixed,�mixed,�or�ran-
dom);�(c)�subjects�are�randomly�assigned�to�only�one�combination�of�the�levels�of�the�two�
factors;�and�(d)�the�dependent�variable�is�measured�at�least�at�the�interval�level��If�individu-
als� respond� to�more� than� one� combination� of� the� levels� of� the� two� factors,�then�this� is� a�
repeated�measures�design�(see�Chapter�15)�
For�simplicity,�we�again�assume�the�design�is�balanced��For�the�two-factor�nested�design,�a�
design�is�balanced�if�(a)�the�number�of�observations�within�each�factor�combination�(or�cell)�
is�the�same�(in�other�words,�the�sample�size�for�each�cell�of�the�design�is�the�same),�and�(b)�
the�number�of�levels�of�the�nested�factor�within�each�level�of�the�other�factor�is�the�same��The�
first�portion�of�this�statement�should�be�quite�familiar�from�factorial�designs,�so�no�further�
explanation�is�necessary��The�second�portion�of�this�statement�is�unique�to�this�design�and�
requires�a�brief�explanation��As�an�example,�say�factor�B�is�nested�within�factor�A�and�factor�
A�has�two�levels��On�the�one�hand,�factor�B�may�have�the�same�number�of�levels�for�each�level�
of�factor�A��This�occurs�if�there�are�three�levels�of�factor�B�under�level�1�of�factor�A�(i�e�,�A1)�and�
also�three�levels�of�factor�B�under�level�2�of�factor�A�(i�e�,�A2)��On�the�other�hand,�factor�B�may�
not�have�the�same�number�of�levels�for�each�level�of�factor�A��This�occurs�if�there�are�three�
levels�of�factor�B�under�A1�and�only�two�levels�of�factor�B�under�A2��If�the�design�is�unbal-
anced,�see�the�discussion�in�Kirk�(1982)�and�Dunn�and�Clark�(1987),�although�most�statistical�
software�can�seamlessly�deal�with�this�type�of�unbalanced�design�
16.1.2 layout of data
The� layout� of� the� data� for� the� two-factor� nested� design� is� shown� in� Table� 16�1�� To� sim-
plify�matters,�we�have�limited�the�number�of�levels�of�the�factors�to�two�levels�of�factor�A�
(e�g�,�teaching�pedagogy)�and�three�levels�of�factor�B�(e�g�,�teacher)��This�only�serves�as�an�
example�layout�because�many�other�possibilities�obviously�exist��Here�we�see�the�major�set�
of�columns�designated�as�the�levels�of�factor�A,�the�nonnested�factor�(e�g�,�teaching�peda-
gogy),�and�for�each�level�of�A,�the�minor�set�of�columns�are�the�levels�of�factor�B,�the�nested�
factor�(e�g�,�teacher)��Within�each�factor�level�combination�or�cell�are�the�subjects��Means�are�
shown�for�each�cell,�for�the�levels�of�factor�A,�and�overall��Note�that�the�means�for�the�levels�
of�factor�B�need�not�be�shown,�as�they�are�the�same�as�the�cell�means��For�instance,�Y
–
�11�is�
the�same�as�Y
–
��1�(not�shown)�as�B1�only�occurs�once��This�is�another�result�of�the�nesting�
Table 16.1
Layout�for�the�Two-Factor�Nested�Design
A1 A2
B1 B2 B3 B4 B5 B6
Y111 Y112 Y113 Y124 Y125 Y126
� � � � � �
� � � � � �
� � � � � �
Yn11 Yn12 Yn13 Yn24 Yn25 Yn26
Cell�means Y
–
�11 Y
–
�12 Y
–
�13 Y
–
�24 Y
–
�25 Y
–
�26
A�means Y
–
�1� Y
–
�2�
Overall�mean Y
–
…

562 An Introduction to Statistical Concepts
16.1.3 aNOVa Model
The� nested� factor� is� almost� always� random� (Glass� &� Hopkins,� 1996;� Keppel� &� Wickens,�
2004;�Mickey,�Dunn,�&�Clark,�2004;�Page,�Braver,�&�MacKinnon,�2003)��In�other�words,�the�
levels�of�the�nested�factor�are�a�random�sample�of�the�population�of�levels��For�example,�
in�the�case�of�teachers�nested�within�teaching�pedagogy,�it�is�often�the�case�that�a�random�
sample� of� the� teachers� is� selected� rather� than� specific� teachers� (which� would� be� a� fixed-
effects�factor)��Thus,�the�nested�factor�(i�e�,�the�teacher�factor)�is�a�random�factor��As�a�result,�
the�two-factor�nested�ANOVA�is�often�a�mixed-effects�model�where�the�nonnested�factor�is�
fixed�(i�e�,�all�the�levels�of�interest�for�the�nonnested�factor�are�included�in�the�model)�and�
the�nested�factor�is�random��The�two-factor�mixed-effects�nested�ANOVA�model�is�written�
in�terms�of�population�parameters�as
Y bijk j k j ijk= + + +µ α ε( )
where
Yijk�is�the�observed�score�on�the�dependent�variable�for�individual�i�in�level�j�of�factor�A�
and�level�k�of�factor�B�(or�in�the�jk�cell)
μ�is�the�overall�or�grand�population�mean�(i�e�,�regardless�of�cell�designation)
αj�is�the�fixed�effect�for�level�j�of�factor�A
bk(j)�is�the�random�effect�for�level�k�of�factor�B
εijk�is�the�random�residual�error�for�individual�i�in�cell�jk
Notice�that�there�is�no�interaction�term�in�the�model�and�also�that�the�effect�for�factor�B�is�
denoted�by�bk(j)��This tells us that factor B is nested within factor A��The�residual�error�can�be�
due�to�individual�differences,�measurement�error,�and/or�other�factors�not�under�investi-
gation��We�consider�the�fixed-,�mixed-,�and�random-effects�cases�later�in�this�chapter�
For� the� two-factor� mixed-effects� nested� ANOVA� model,� there� are� only� two� sets� of�
hypotheses,�one�for�each�of�the�main�effects,�because�there�is�no�interaction�effect��The�null�
and� alternative� hypotheses,� respectively,� for� testing� the� effect� of� factor� A� are� as� follows��
The�null�hypothesis�is�similar�to�what�we�have�seen�in�previous�chapters�for�fixed-effects�
factors�and�written�as�the�means�of�the�levels�of�factor�A�are�the�same:
H J01 1 2: . . . . . .µ µ µ= = =�
H j11 not all the are equal: . .µ
The�hypotheses�for�testing�the�effect�of�factor�B,�because�this�is�a�random-effects�factor,�are�
written�as�the�variation�among�the�means,�and�are�presented�as�follows:
H b0 02
2: σ =
H b12
2: σ > 0
These� hypotheses� reflect� the� inferences� made� in� the� fixed-,� mixed-,� and� random-effects�
models�(as�fully�described�in�Chapter�15)��For�fixed�main�effects,�the�null�hypotheses�are�
about� means,� whereas� for� random� main� effects,� the� null� hypotheses� are� about� variation�

563Hierarchical and Randomized Block Analysis of Variance Models
among�the�means��As�we�already�know,�the�difference�in�the�models�is�also�reflected�in�
the�MCPs��As�before,�we�do�need�to�pay�particular�attention�to�whether�the�model�is�fixed,�
mixed,� or� random�� The� assumptions� about� the� two-factor� nested� model� are� exactly� the�
same�as�with�the�two-factor�crossed�model�(discussed�in�Chapters�13�and�15),�and,�thus,�we�
need�not�provide�any�additional�discussion�other�than�to�remind�you�of�the�assumptions�
regarding�normality,�homogeneity�of�variance,�and�independence�(of�observations�within�
cells)��In�addition,�procedures�for�determining�power,�confidence�intervals�(CIs),�and�effect�
size�are�the�same�as�with�the�two-factor�crossed�model�
16.1.4 aNOVa Summary Table and expected Mean Squares
The� computations� of� the� two-factor� mixed-effects� nested� model� are� somewhat� similar�
to�those�of�the�two-factor�mixed-effects�crossed�model��The�main�difference�lies�in�the�
fact�that�there�is�no�interaction�term��The�ANOVA�summary�table�is�shown�in�Table�16�2,�
where�we�see�the�following�sources�of�variation:�A,�B(A),�within�cells,�and�total��There�
we�see�that�only�two�F�ratios�can�be�formed,�one�for�each�of�the�two�main�effects,�because�
no�interaction�term�is�estimated�(recall�that�this�is�because�not�all�possible�combinations�
of�A�and�B�occur)�
If�we�take�the�total�sum�of�squares�and�decompose�it,�we�have�the�following:
SS SS SS SStotal A B A with= + +( )
We�leave�the�computations�involving�these�terms�to�the�statistical�software��The�degrees�
of�freedom,�mean�squares,�and�F�ratios�are�determined�as�shown�in�Table�16�2,�assuming�a�
mixed-effects�model��The�critical�value�for�the�test�of�factor�A�is�αFJ−1,�J�(K(j)−1)�and�for�the�test�
of�factor�B�is�αFJ(K(j)−1),�JK(j)�(n−1)��Let�us�explain�something�about�the�degrees�of�freedom��The�
degrees�of�freedom�for�B(A)�are�equal�to�J(K(j)�−�1)��This�means�that�for�a�design�with�two�
levels�of�factor�A�(e�g�,�teaching�pedagogy)�and�three�levels�of�factor�B�(e�g�,�teacher)�within�
each�level�of�A�(for�a�total�of�six�levels�of�B),�the�degrees�of�freedom�are�equal�to�2(3�−�1)�=�4��
This is�not�the�same�as�the�degrees�of�freedom�for�a�completely�crossed�design�where�dfB�
would�be�5�(i�e�,�6�−�1�=�5)��The�degrees�of�freedom�for�within�are�equal�to�JK(j)(n�−�1)��For�
this�same�design�with�n�=�10,�then�the�degrees�of�freedom�within�are�equal�to�(2)(3)(10�−�1)�=�54�
(i�e�,�six�cells�with�nine�degrees�of�freedom�per�cell)�
The�appropriate�error�terms�for�each�of�the�fixed-,�random-,�and�mixed-effects�models�are�
described�in�the�following�two�paragraphs��For�the�fixed-effects�model,�both�F�ratios�use�
the�within�source�as�the�error�term��For�the�random-effects�model,�the�appropriate�error�
term�for�the�test�of�A�is�MSB(A)�and�for�the�test�of�B�is�MSwith��For�the�mixed-effects�model�
where� A� is� fixed� and� B� is� random,� the� appropriate� error� term� for� the� test� of� A� is� MSB(A),�
Table 16.2
Two-Factor�Nested�Design�ANOVA�Summary�
Table:�Mixed-Effects�Model
Source SS df MS F
A SSA J�−�1 MSA MSA/MSB(A)
B(A) SSB(A) J(K(j)�−�1) MSB(A) MSB(A)/MSwith
Within SSwith JK(j)(n�−�1) MSwith
Total SStotal N�−�1

564 An Introduction to Statistical Concepts
and� for� the� test� of� B,� is� MSwith�� As� already� mentioned,� this� is� the� predominant� model� in�
education� and� the� behavioral� sciences�� Finally,� for� the� mixed-effects� model� where� A� is�
random�and�B�is�fixed,�both�F�ratios�use�the�within�source�as�the�error�term��These�are�now�
described�by�the�expected�mean�squares�
The�formation�of�the�proper�F�ratios�is�again�related�to�the�expected�mean�squares��If�H0�
is�actually�true,�then�the�expected mean squares�are�as�follows:
E( )AMS = σε
2
E( )B AMS ( ) = σε
2
E( )withMS = σε
2
If�H0�is�actually�false,�then�the�expected�mean�squares�for�the�fixed-effects case�are�as�follows:
E( ) /AMS nK Jj j
j
J
= + −








=
∑σ αε2 2
1
1( ) ( )
E( ) /B AMS n J Kk j
k
K
j
J
j( ) ( ) ( )( )= + −








==
∑∑σ βε2 2
11
1
E( )withMS = σε
2
Thus,�the�appropriate�F�ratios�both�involve�using�the�within�source�as�the�error�term�
If�H0�is�actually�false,�then�the�expected�mean�squares�for�the�random-effects case�are�as�
follows:
E( )AMS n nKb a j a= + +σ σ σε
2 2 2
( ) ( )
E( )B AMS n b a( ) ( )= +σ σε
2 2
E( )withMS = σε
2
Thus,�the�appropriate�error�term�for�the�test�of�A�is�MSB(A),�and�the�appropriate�error�term�
for�the�test�of�B�is�MSwith�
If�H0�is�actually�false,�then�the�expected�mean�squares�for�the�mixed-effects case where A is
fixed and B is random�are�as�follows:
E( )AMS n nK Jb a j j
j
J
= + + −








=
∑σ σ αε2 2 2
1
1( ) ( ) /( )

565Hierarchical and Randomized Block Analysis of Variance Models
E( )B AMS n b a( ) ( )= +σ σε
2 2
E( )withMS = σε
2
Thus,�the�appropriate�error�term�for�the�test�of�A�is�MSB(A),�and�the�appropriate�error�term�
for�the�test�of�B�is�MSwith�
Finally,� if� H0� is� actually� false,� then� the� expected� mean� squares� for� the� mixed-effects case
where A is random and B is fixed�are�as�follows:
E( )AMS nK j a= +σ σε
2 2
( )
E( )B AMS n J Kk j
k
K
j
J
j( ) ( ) ( )/ ( )= + −








==
∑∑σ βε2 2
11
1
E( )withMS = σε
2
Thus,�the�appropriate�F�ratios�both�involve�using�the�within�source�as�the�error�term�
16.1.5 Multiple Comparison procedures
This�section�considers�MCPs�for�the�two-factor�nested�design��First�of�all,�the�researcher�
is�usually�not�interested�in�making�inferences�about�random�effects��Second,�for�MCPs�
based� on� the� levels� of� factor� A� (the� nonnested� factor),� there� is� nothing� new� to� report��
Third,� for� MCPs� based� on� the� levels� of� factor� B� (the� nested� factor),� this� is� a� different�
situation��The�researcher�is�not�usually�as�interested�in�MCPs�about�the�nested�factor�as�
compared�to�the�nonnested�factor�because�inferences�about�the�levels�of�factor�B�are�not�
even�generalizable�across�the�levels�of�factor�A,�due�to�the�nesting��If�you�are�nonethe-
less�interested�in�MCPs�for�factor�B,�by�necessity�you�have�to�look�within�a�level�of�A�to�
formulate�a�contrast��Otherwise�MCPs�are�conducted�as�before��For�more�complex�nested�
designs,�see�Myers�(1979),�Kirk�(1982),�Dunn�and�Clark�(1987),�Myers�and�Well�(1995),�or�
Keppel�and�Wickens�(2004)�
16.1.6 example
Let�us�consider�an�example�to�illustrate�the�procedures�in�this�section��The�data�are�shown�
in� Table� 16�3�� Factor� A� is� approach� to� the� teaching� of� reading� (basal� vs�� whole� language�
approaches),�and�factor�B�is�teacher��Thus,�there�are�two�teachers�using�the�basal�approach�
and� two� different� teachers� using� the� whole� language� approach�� The� researcher� is� inter-
ested� in� the� effects� these� factors� have� on� student’s� reading� comprehension� in� the� first�
grade��Thus,�the�dependent�variable�is�a�measure�of�reading�comprehension��Six�students�
are�randomly�assigned�to�each�approach-teacher�combination�for�small-group�instruction��
This� particular� example� is� a� mixed� model,� where� factor� A� (teaching� method)� is� a� fixed�
effect�and�factor�B�(teacher)�is�a�random�effect��The�results�are�shown�in�the�ANOVA�sum-
mary�table�of�Table�16�4�

566 An Introduction to Statistical Concepts
From�Table�A�4,�the�critical�value�for�the�test�of�factor�A�is�αFJ−1,�J(K(j)−1)�=��05F1,2�=�18�51,�and�
the�critical�value�for�the�test�of�factor�B�is� αFJ�(K(j)−1),�JK(j)�(n−1)�=� �05F2,20�=�3�49��Thus,�there�is�a�
statistically� significant� difference� between� the� two� approaches� to� reading� instruction� at�
the� �05� level� of� significance,� and� there� is� no� significant� difference� between� the� teachers��
When�we�look�at�the�means�for�the�levels�of�factor�A,�we�see�that�the�mean�comprehension�
score�for�the�whole�language�approach�(Y
–
�2��=�10�8333)�is�greater�than�the�mean�for�the�basal�
approach�(Y
–
�1��=�3�3333)��Because�there�were�only�two�levels�of�the�reading�approach�tested�
(whole�language�and�basal),�no�post�hoc�multiple�comparisons�are�really�necessary��Rather�
the� mean� reading� comprehension� scores� for� each� approach� can� be� merely� examined� to�
determine�which�mean�was�statistically�significantly�larger�
16.2 Two-Factor Randomized Block Design for n = 1
In� this� section,� we� describe� the� distinguishing� characteristics� of� the� two-factor� random-
ized� block� ANOVA� model� for� one� observation� per� cell,� the� layout� of� the� data,� the� linear�
model,�assumptions�and�their�violation,�the�ANOVA�summary�table�and�expected�mean�
squares,�MCPs,�and�methods�of�block�formation�
Table 16.4
Two-Factor�Nested�Design�ANOVA�
Summary�Table:�Teaching�Reading�Example
Source SS df MS F
A 337�5000 1 337�5000 59�5585a
B(A) 11�3333 2 5�6667 0�9524b
Within 119�0000 20 5�9500
Total 467�8333 23
a�
�05F1,2�=�18�51�
b�
�05F2,20�=�3�49�
Table 16.3
Data�for�the�Teaching�Reading�Example:�Two-Factor�
Nested Design
Reading Approaches
A1 (Basal) A2 (Whole Language)
Teacher B1 Teacher B2 Teacher B3 Teacher B4
1 1 7 8
1 3 8 9
2 3 8 11
4 4 10 13
4 6 12 14
5 6 15 15
Cell�means 2�8333 3�8333 10�0000 11�6667
A�means 3�3333 10�8333
Overall�mean 7�0833

567Hierarchical and Randomized Block Analysis of Variance Models
16.2.1 Characteristics of the Model
The� characteristics� of� the� two-factor� randomized� block� ANOVA� model� are� quite� similar�
to�those�of�the�regular�two-factor�ANOVA�model,�as�well�as�sharing�a�few�characteristics�
with� the� one-factor� repeated� measures� ANOVA� design�� There� is� one� obvious� exception,�
which�has�to�do�with�the�nature�of�the�factors�being�used��Here�there�will�be�two�factors,�
each�with�at�least�two�levels��One�factor�is�known�as�the�treatment factor�and�is�referred�
to� here� as� factor� A� (a� treatment� factor� is� technically� what� we� have� been� considering� in�
Chapters�11�through�15)��The�second�factor�is�known�as�the�blocking factor�and�is�referred�
to�here�as�factor�B��A�blocking�factor�is�a�new�concept�and�requires�some�discussion�
Take� an� ordinary� one-factor� ANOVA� design,� where� the� single� factor� is� a� treatment� fac-
tor�(e�g�,�method�of�exercising)�and�the�researcher�is�interested�in�its�effect�on�some�depen-
dent�variable�(e�g�,�percentage�of�body�fat)��Despite�individuals�being�randomly�assigned�to�
a� treatment� group,� the� groups� may� be� different� due� to� a� nuisance� variable� operating� in� a�
nonrandom�way��For�instance,�group�1�may�consist�of�mostly�older�adults�and�group�2�may�
consist�of�mostly�younger�adults��Thus,�it�is�likely�that�group�2�will�be�favored�over�group�1�
because�age,�the�nuisance�variable,�has�not�been�properly�balanced�out�across�the�groups�by�
the�randomization�process�
One� way� to� deal� with� this� problem� is� to� control� the� effect� of� the� nuisance� variable� by�
incorporating�it�into�the�design�of�the�study��Including�the�blocking�or�nuisance�variable�
as� a� factor� in� the� design� should� result� in� a� reduction� in� residual� variation� (due� to� some�
additional� portion� of� individual� differences� being� explained)� and� an� increase� in� power�
(Glass� &� Hopkins,� 1996;� Keppel� &� Wickens,� 2004)�� The� blocking� factor� is� selected� based�
on� the� strength� of� its� relationship� to� the� dependent� variable,� where� an� unrelated� block-
ing�variable�would�not�reduce�residual�variation��It�would�be�reasonable�to�expect,�then,�
that�variability�among�individuals�within�a�block�(e�g�,�within�younger�adults)�should�be�
less�than�variability�among�individuals�between�blocks�(e�g�,�between�younger�and�older�
adults)��Thus,�each�block�represents�the�formation�of�a�matched�set�of�individuals,�that�is,�
matched�on�the�blocking�variable,�but�not�necessarily�matched�on�any�other�nuisance�vari-
able�� Using� our� example,� we� expect� that� in� general,� adults� within� a� particular� age� block�
(i�e�,�the�older�or�younger�blocks)�will�be�more�similar�in�terms�of�variables�related�to�body�
fat�than�adults�across�blocks�
Let�us�consider�several�examples�of�blocking�factors��Some�blocking�factors�are�naturally�
occurring�blocks�such�as�siblings,�friends,�neighbors,�plots�of�land,�and�time��Other�block-
ing�factors�are�not�naturally�occurring�but�can�be�formulated�by�the�researcher��Examples�
of� this� type� include� grade� point� average� (GPA),� age,� weight,� aptitude� test� scores,� intelli-
gence�test�scores,�socioeconomic�status,�and�school�or�district�size��Note�that�the�examples�
of�blocking�factors�here�represent�a�variety�of�measurement�scales�(categorical�as�well�as�
continuous)��Later�we�will�discuss�how�to�deal�with�the�blocking�factor�based�on�its�mea-
surement�scale�
Let� us� make� some� summary� statements� about� characteristics� of� blocking� designs��
First,�designs�that�include�one�or�more�blocking�factors�are�known�as�randomized block
designs,�also�known�as�matching�designs�or�treatment�by�block�designs��The�researcher’s�
main� interest� is� in� the� treatment� factor�� The� purpose� of� the� blocking� factor� is� to� reduce�
residual�variation��Thus,�the�researcher�is�not�as�much�interested�in�the�test�of�the�blocking�
factor�(possibly�not�at�all)�as�compared�to�the�treatment�factor��Thus,�there�is�at�least�one�
blocking�factor�and�one�treatment�factor,�each�with�two�or�more�levels��Second,�each�sub-
ject�falls�into�only�one�block�in�the�design�and�is�subsequently�randomly�assigned�to�one�
level�of�the�treatment�factor�within�that�block��Thus,�subjects�within�a�block�serve�as�their�

568 An Introduction to Statistical Concepts
own�controls�such�that�some�portion�of�their�individual�differences�is�taken�into�account��
As� a� result,� the� scores� of� subjects� are� not� independent� within� a� particular� block�� Third,�
for�purposes�of�this�section,�we�assume�there�is�only�one�subject�for�each�treatment-block�
level�combination��As�a�result,�the�model�does�not�include�an�interaction�term��Later�in�this�
chapter,�we�consider�the�multiple�observations�case,�where�there�is�an�interaction�term�in�
the�model��Finally,�the�dependent�variable�is�measured�at�least�at�the�interval�level�
16.2.2 layout of data
The�layout�of�the�data�for�the�two-factor�randomized�block�model�is�shown�in�Table�16�5��
Here�we�see�the�columns�designated�as�the�levels�of�the�blocking�factor�B�and�the�rows�as�
the�levels�of�the�treatment�factor�A��Row,�block,�and�overall�means�are�also�shown��Here�
you�see�that�the�layout�of�the�data�looks�the�same�as�the�two-factor�model,�but�with�a�single�
observation�per�cell�
16.2.3 aNOVa Model
The�two-factor�fixed-effects�randomized�block�ANOVA�model�is�written�in�terms�of�popu-
lation�parameters�as
Yjk j k jk= + + +µ α β ε
where
Yjk�is�the�observed�score�on�the�dependent�variable�for�the�individual�responding�to�level�
j�of�factor�A�and�level�k�of�block�B
μ�is�the�overall�or�grand�population�mean
αj�is�the�fixed�effect�for�level�j�of�factor�A
βk�is�the�fixed�effect�for�level�k�of�the�block�B
εjk�is�the�random�residual�error�for�the�individual�in�cell�jk
The�residual�error�can�be�due�to�measurement�error,�individual�differences,�and/or�other�
factors�not�under�investigation��You�can�see�this�is�similar�to�the�two-factor�fully�crossed�
model�with�one�observation�per�cell�(i�e�,�i�=�1,�making�the�i�subscript�unnecessary)�and�
with�no�interaction�term�included��Also,�the�effects�are�denoted�by�α�and�β�given�we�have�
a� fixed-effects� model�� Note� that� the� row� and� column� effects� both� sum� to� 0� in� the� fixed-
effects�model�
Table 16.5
Layout�for�the�Two-Factor�Randomized�Block�Design
Level of Factor A
Level of Factor B
Row Mean1 2 … K
1 Y11 Y12 … Y1K Y
–
1�
2 Y21 Y22 … Y2K Y
–
2�
� � � … � �
� � � … � �
� � � … � �
J YJ1 YJ2 YJK Y
–
J�
Block�mean Y
–
�1 Y
–
�2 … Y
–
�K Y
–
��(overall mean)

569Hierarchical and Randomized Block Analysis of Variance Models
The�hypotheses�for�testing�the�effect�of�factor�A�are�as�follows,�where�the�null�indicates�
that�the�means�of�the�levels�of�factor�A�are�equal:
H J01 1 2: . . .µ µ µ= = =�
H j11 not all the are equal: .µ
For�testing�the�effect�of�factor�B�(the�blocking�factor),�the�hypotheses�are�presented�here,�
where�the�null�hypothesis�is�that�the�means�of�the�levels�of�the�blocking�factor�are�equal:
H K02 1 2: . . .µ µ µ= = =�
H k12 not all the are equal: .µ
The�factors�are�both�fixed,�so�the�hypotheses�are�written�in�terms�of�means�
16.2.4 assumptions and Violation of assumptions
In� Chapter� 15,� we� described� the� assumptions� for� the� one-factor� repeated� measures�
ANOVA�model��The�assumptions�are�nearly�the�same�for�the�two-factor�randomized�block�
model,�and�we�need�not�devote�much�attention�to�them�here��As�before,�the�assumptions�
are�mainly�concerned�with�independence,�normality,�and�homogeneity�of�variance�of�the�
population�scores�on�the�dependent�variable�
Another� assumption� is� compound symmetry� and� is� necessary� because� the� observa-
tions� within� a� block� are� not� independent�� The� assumption� states� that� the� population�
covariances� for� all� pairs� of� the� levels� of� the� treatment� factor� A� (i�e�,� j� and� j′)� are� equal��
ANOVA�is�not�particularly�robust�to�a�violation�of�this�assumption��If�the�assumption�is�
violated,�three�alternative�procedures�are�available��The�first�is�to�limit�the�levels�of�factor�
A,�either�to�those�that�meet�the�assumption�or�to�two�levels�(in�which�case,�there�is�only�
one� covariance)�� The� second,� and� more� plausible,� alternative� is� to� use� adjusted� F� tests��
These�are�reported�shortly��The�third�is�to�use�multivariate�ANOVA,�which�has�no�com-
pound� symmetry� assumption� but� is� slightly� less� powerful�� This� method� is� beyond� the�
scope�of�this�text�
Huynh�and�Feldt�(1970)�showed�that�the�compound�symmetry�assumption�is�a�sufficient�
but�unnecessary�condition�for�the�test�of�treatment�factor�A�to�be�F�distributed��Thus,�the�F�test�
may�also�be�valid�under�less�stringent�conditions��The�necessary�and�sufficient�condition�
for�the�validity�of�the�F�test�of�A�is�known�as�sphericity��This�assumes�that�the�variance�of�
the�difference�scores�for�each�pair�of�factor�levels�is�the�same��Further�discussion�of�sphe-
ricity�is�beyond�the�scope�of�this�text�(see�Keppel,�1982;�or�Kirk,�1982),�although�we�have�
previously�discussed�sphericity�for�repeated�measures�designs�in�Chapter�15�
A� final� assumption� purports� that� there� is� no� interaction� between� the� treatment� and�
blocking� factors�� This� is� obviously� an� assumption� of� the� model� because� no� interaction�
term�is�included��Such�a�model�is�often�referred�to�as�an�additive model��As�was�men-
tioned� previously,� in� this� model,� the� interaction� is� confounded� with� the� error� term��
Violation� of� the� additivity� assumption� results� in� the� test� of� factor� A� to� be� negatively�
biased;�thus,�there�is�an�increased�probability�of�committing�a�Type�II�error��As�a�result,�
if�H0�is�rejected,�then�we�are�confident�that�H0�is�really�false��If�H0�is�not�rejected,�then�

570 An Introduction to Statistical Concepts
our�interpretation�is�ambiguous�as�H0�may�or�may�not�be�really�true�(due�to�an�increased�
probability�of�a�Type�II�error)��Here�you�would�not�know�whether�H0�was�true�or�not,�as�
there�might�really�be�a�difference,�but�the�test�may�not�be�powerful�enough�to�detect�it��
Also,�the�power�of�the�test�of�factor�A�is�reduced�by�a�violation�of�the�additivity�assump-
tion��The�assumption�may�be�tested�by�Tukey’s�(1949)�test�of�additivity�(see�Hays,�1988;�
Kirk,�1982;�Timm,�2002),�which�generates�an�F�test�statistic�that�is�compared�to�the�critical�
value�of�αF1,�[(J−1)�(K−1)−1]��If�the�test�is�not�statistically�significant,�then�the�model�is�addi-
tive� and� the� assumption� has� been� met�� If� the� test� is� significant,� then� the� model� is� not�
additive�and�the�assumption�has�not�been�met��A�summary�of�the�assumptions�and�the�
effects�of�their�violation�for�this�model�is�presented�in�Table�16�6�
16.2.5 aNOVa Summary Table and expected Mean Squares
The�sources�of�variation�for�this�model�are�similar�to�those�of�the�regular�two-factor�model,�
except�that�there�is�no�interaction�term��The�ANOVA�summary�table�is�shown�in�Table�16�7,�
where� we� see� the� following� sources� of� variation:� A� (treatments),� B� (blocks),� residual,� and�
total��The�test�of�block�differences�is�usually�of�no�real�interest��In�general,�we�expect�there�
to�be�differences�between�the�blocks��From�the�table,�we�see�that�two�F�ratios�can�be�formed�
If�we�take�the�total�sum�of�squares�and�decompose�it,�we�have
SS SS SS SStotal A B res= + +
The� remaining� computations� are� determined� by� the� statistical� software�� The� degrees� of�
freedom,�mean�squares,�and�F�ratios�are�also�shown�in�Table�16�7�
Table 16.6
Assumptions�and�Effects�of�Violations:�Two-Factor�Randomized�Block�ANOVA
Assumption Effect of Assumption Violation
Independence •�Increased�likelihood�of�a�Type�I�and/or�Type�II�error�in�F
•��Affects�standard�errors�of�means�and�inferences�about�
those�means
Homogeneity�of�variance •�Small�effect�with�equal�or�nearly�equal�n’s
•�Otherwise�effect�decreases�as�n�increases
Normality •�Minimal�effect�with�equal�or�nearly�equal�n’s
Sphericity •�Fairly�serious�effect
No�interaction�between�treatment�and�blocks •��Increased�likelihood�of�a�Type�II�error�for�the�test�of�factor�A�
and�thus�reduced�power
Table 16.7
Two-Factor�Randomized�Block�Design�ANOVA�
Summary�Table
Source SS df MS F
A SSA J�−�1 MSA MSA/MSres
B SSB K�−�1 MSB MSB/MSres
Residual SSres (J�−�1)�(K�−�1) MSres
Total SStotal N�−�1

571Hierarchical and Randomized Block Analysis of Variance Models
Earlier�in�our�discussion�of�the�two-factor�randomized�block�design,�we�mentioned�that�
the�F�test�is�not�very�robust�to�violation�of�the�sphericity�assumption��We�again�recommend�
the�following�sequential�procedure�be�used�in�the�test�of�factor�A��First,�perform�the�usual�
F�test,�which�is�quite�liberal�in�terms�of�rejecting�H0�too�often,�where�the�degrees�of�free-
dom�are�J�−�1�and�(J�−�1)(K�−�1)��If�H0�is�not�rejected,�then�stop��If�H0�is�rejected,�then�continue�
with�step�2,�which�is�to�use�the�Geisser�and�Greenhouse�(1958)�conservative�F�test��For�the�
model�we�are�considering�here,�the�degrees�of�freedom�for�the�F�critical�value�are�adjusted�
to�be�1�and�K�−�1��If�H0�is�rejected,�then�stop��This�would�indicate�that�both�the�liberal�and�
conservative� tests� reached� the� same�conclusion,� that�is,�to�reject�H0��If�H0�is� not�rejected,�
then�the�two�tests�did�not�reach�the�same�conclusion,�and�a�further�test�should�be�under-
taken��Thus,�in�step�3,�an�adjusted�F�test�is�conducted��The�adjustment�is�known�as�Box’s�
(1954b)� correction� [the� Huynh� and� Feldt� (1970)� procedure]�� Here� the� degrees� of� freedom�
are�equal�to�(J�−�1)ε�and�(J�−�1)(K�−�1)ε,�where�ε�is�the�correction�factor�(see�Kirk,�1982)��It�is�
now�fairly�standard�for�the�major�statistical�software�to�conduct�the�Geisser-Greenhouse�
and�Huynh-Feldt�tests�
Based�on�the�expected�mean�squares�(not�shown�here�for�simplicity),�the�residual�is�
the�proper� error� term� for� the� fixed-,� random-,� and� mixed-effects� models�� Thus,� MSres�
is� the� proper� error� term� for� every� version� of� this� model�� One� may� also� be� interested� in�
an�assessment�of�the�effect�size�for�the�treatment�factor�A;�note�that�the�effect�size�of�the�
blocking� factor� B� is� usually� not� of� interest�� As� in� previously� presented� ANOVA� models,�
effect� size� measures� such� as� ω2� and� η2� should� be� considered�� Finally,� the� procedures� for�
determining�CIs�and�power�are�the�same�as�in�previous�models�
16.2.6 Multiple Comparison procedures
If�the�null�hypothesis�for�either�the�A�(treatment)�or�B�(blocking)�factor�is�rejected�and�there�
are�more�than�two�levels�of�the�factor�for�which�statistical�significance�was�found,�then�the�
researcher�may�be�interested�in�which�means�or�combinations�of�means�are�different��This�
could�be�assessed,�as�put�forth�in�previous�chapters,�by�the�use�of�some�MCP��In�general,�
the�use�of�MCPs�outlined�in�Chapter�12�is�unchanged�as�long�as�the�sphericity�assumption�
is�met��If�the�assumption�is�not�met,�then�MSres�is�not�the�appropriate�error�term,�and�the�
alternatives�recommended�in�Chapter�15�should�be�considered�(see�Boik,�1981;�Kirk,�1982;�
or�Maxwell,�1980)�
16.2.7 Methods of block Formation
There�are�different�methods�available�for�the�formation�of�blocks�depending�on�the�nature�
of�the�blocking�variable��As�we�see,�the�methods�have�to�do�with�whether�the�blocking�fac-
tor�is�an�ordinal�or�an�interval/ratio�variable�and�whether�the�blocking�factor�is�a�fixed�or�
random�effect��This�discussion�borrows�heavily�from�the�work�of�Pingel�(1969)�in�defining�
five�such�methods��The�first�method�is�the�predefined value blocking method,�where�the�
blocking�factor�is�an�ordinal�variable��Here�the�researcher�specifies�K�different�population�
values�of�the�blocking�variable��For�each�of�these�values�(i�e�,�a�fixed�effect),�individuals�are�
randomly�assigned�to�the�levels�of�the�treatment�factor��Thus,�individuals�within�a�block�
have� the� same� value� on� the� blocking� variable�� For� example,� if� class� rank� is� the� blocking�
variable,�the�levels�might�be�the�top�third,�middle�third,�and�bottom�third�of�the�class�
The�second�method�is�the�predefined range blocking method,�where�the�blocking�factor�
is�an�interval�or�ratio�variable��Here�the�researcher�specifies�K�mutually�exclusive�ranges�
in�the�population�distribution�of�the�blocking�variable,�where�the�probability�of�obtaining�

572 An Introduction to Statistical Concepts
a�value�of�the�blocking�variable�in�each�range�may�be�specified�as�1/K��For�each�of�these�
ranges�(i�e�,�a�fixed�effect),�individuals�are�randomly�assigned�to�the�levels�of�the�treatment�
factor��Thus,�individuals�within�a�block�are�in�the�same�range�on�the�blocking�variable��For�
example,�if�the�Graduate�Record�Exam-Verbal�(GRE-V)�score�is�the�blocking�variable,�the�
levels�might�be�200–400,�401–600,�and�601–800�
The�third�method�is�the�sampled value blocking method,�where�the�blocking�variable�is�an�
ordinal�variable��Here�the�researcher�randomly�samples�K�population�values�of�the�blocking�
variable�(i�e�,�a�random�effect)��For�each�of�these�values,�individuals�are�randomly�assigned�to�
the�levels�of�the�treatment�factor��Thus,�individuals�within�a�block�have�the�same�value�on�the�
blocking�variable��For�example,�if�class�rank�is�again�the�blocking�variable,�only�this�time�mea-
sured�in�10ths,�the�researcher�might�randomly�select�3�levels�from�the�population�of�10�levels�
The�fourth�method�is�the�sampled range blocking method,�where�the�blocking�variable�is�
an�interval�or�ratio�variable��Here�the�researcher�randomly�samples�N�individuals�from�the�
population,�such�that�N = JK,�where�K�is�the�number�of�blocks�desired�(i�e�,�a�fixed�effect)�and�
J�is�the�number�of�treatment�groups��These�individuals�are�ranked�according�to�their�values�
on� the� blocking� variable� from� 1� to� N�� The� first� block� consists� of� those� individuals� ranked�
from�1�to�J,�the�second�block�of�those�ranked�from�J�+�1�to�2J,�and�so�on��Finally�individuals�
within�a�block�are�randomly�assigned�to�the�J�treatment�groups��For�example,�consider�the�
GRE-V�score�again�as�the�blocking�variable,�where�there�are�J�=�4�treatment�groups,�K�=�10�
blocks,� and� thus� N = JK� =� 40� individuals�� The� top� four� ranked� individuals� on� the� GRE-V�
exam� would� constitute� the� first� block,� and� they� would� be� randomly� assigned� to� the� four�
groups��The�next�four�ranked�individuals�would�constitute�the�second�block,�and�so�on�
The� fifth� method� is� the� post hoc blocking method�� Here� the� researcher� has� already�
designed�the�study�and�collected�the�data,�without�the�benefit�of�a�blocking�variable��After�
the�fact,�a�blocking�variable�is�identified�and�incorporated�into�the�analysis��It�is�possible�to�
implement�any�of�the�four�preceding�procedures�on�a�post�hoc�basis�
Based�on�the�research�of�Pingel�(1969),�some�statements�can�be�made�about�the�precision�
of�these�blocking�methods�in�terms�of�a�reduction�in�residual�variability�as�well�as�better�
estimation� of� the� treatment� effect�� In� general,� for� an� ordinal� blocking� variable,� the� pre-
defined�value�blocking�method�is�more�precise�than�the�sampled�value�blocking�method��
Likewise,�for�an�interval�or�ratio�blocking�variable,�the�predefined�range�blocking�method�
is�more�precise�than�the�sampled�range�blocking�method��Finally,�the�post�hoc�blocking�
method�is�the�least�precise�of�the�methods�discussed��For�discussion�of�selecting�an�opti-
mal�number�of�blocks,�we�suggest�you�consider�Feldt�(1958;�highly�recommended),�as�well�
as�Myers�(1979),�Myers�and�Well�(1995),�and�Keppel�and�Wickens�(2004)��These�researchers�
make�the�following�recommendations�about�the�optimal�number�of�blocks�(where�rxy�is�the�
correlation�between�the�blocking�factor�X,�in�a�randomized�block�design,�and�the�depen-
dent�variable�Y):�if�rxy�=��2,�then�use�five�blocks;�if�rxy�=��4,�then�use�four�blocks;�if�rxy�=��6,�
then�use�three�blocks;�and�if�rxy�=��8,�then�use�two�blocks�
16.2.8 example
Let�us�consider�an�example�to�illustrate�the�procedures�in�this�section��The�data�are�shown�in�
Table�16�8��The�blocking�factor�is�age�(i�e�,�20,�30,�40,�and�50�years�of�age),�the�treatment�factor�
is�number�of�workouts�per�week�(i�e�,�1,�2,�3,�and�4),�and�the�dependent�variable�is�amount�of�
weight�lost�during�the�1st�month��Presume�we�have�a�fixed-effects�model��Table�16�9�contains�
the�resultant�ANOVA�summary�table�
The�test�statistics�are�both�compared�to�the�usual�F�test�critical�value�of��05F3,9�=�3�86�(from�
Table�A�4),�so�that�both�main�effects�tests�are�statistically�significant��The�Geisser-Greenhouse�

573Hierarchical and Randomized Block Analysis of Variance Models
conservative�procedure�is�necessary�for�the�test�of�factor�A;�here�the�test�statistic�is�com-
pared� to� the� critical� value� of� �05F1,3� =� 10�13,� which� is� also� significant�� The� two� procedures�
both�yield�a�statistically�significant�result,�so�we�need�not�be�concerned�with�a�violation�of�
the�sphericity�assumption�for�the�test�of�A��In�summary,�the�effects�of�amount�of�exercise�
undertaken� and� age� on� amount� of� weight� lost� are� both� statistically� significant� at� the� �05�
level�of�significance�
Next�we�need�to�test�the�additivity�assumption�using�Tukey’s�(1949)�test�of�additivity��The�F�
test�statistic�is�equal�to�0�1010,�which�is�compared�to�the�critical�value�of��05F1,8�=�5�32�from�Table�
A�4��The�test�is�nonsignificant,�so�the�model�is�additive�and�the�assumption�has�been�met�
As�an�example�of�a�MCP,�the�Tukey�HSD�procedure�is�used�to�test�for�the�equivalence�of�
exercising�once�a�week�(j�=�1)�and�four�times�a�week�(j�=�4),�where�the�contrast�is�written�as�
Y
–
4��−�Y
–
1��The�mean�amounts�of�weight�lost�for�these�groups�are�1�5000�for�the�once�a�week�pro-
gram�and�7�7500�for�the�four�times�a�week�program��The�standard�error�is�computed�as�follows:
s
MS
J
ψ ’
res= = =
0 3958
4
.
0.3146
and�the�studentized�range�statistic�is�as�follows:
q
Y Y
s
=
−
=
−
=4 1
7 75 1 50
0 3146
19 8665. .
. .
.
.
ψ ’
The�critical�value�is�αq9,4�=�4�415�(from�Table�A�9)��The�test�statistic�exceeds�the�critical�value;�
thus,� we� conclude� that� the� mean� amounts� of� weight� lost� for� groups� 1� (exercise� once� per�
week)�and�4�(exercise�four�times�per�week)�are�statistically�significantly�different�at�the��05�
level�(i�e�,�more�frequent�exercise�helps�one�to�lose�more�weight)�
Table 16.9
Two-Factor�Randomized�Block�Design�
ANOVA�Summary�Table:�Exercise�Example
Source SS df MS F
A 21�6875 3 7�2292 18�2648a
B 110�1875 3 36�7292 92�7974a
Residual 3�5625 9 0�3958
Total 135�4375 15
a�
�05F3,9�=�3�86�
Table 16.8
Data�for�the�Exercise�Example:�Two-Factor�Randomized�Block�Design
Age
Exercise Program 20 30 40 50 Row Means
1/week 3 2 1 0 1�5000
2/week 6 5 4 2 4�2500
3/week 10 8 7 6 7�7500
4/week 9 7 8 7 7�7500
Block�means 7�0000 5�5000 5�0000 3�7500 5�3125�(overall mean)

574 An Introduction to Statistical Concepts
16.3 Two-Factor Randomized Block Design for n > 1
For�two-factor�randomized�block�designs�with�more�than�one�observation�per�cell,�there�
is�little�that�we�have�not�already�covered��First,�the�characteristics�are�exactly�the�same�
as� with� the� n� =� 1� model,� with� the� obvious� exception� that� when� n� >� 1,� an� interaction�
term�exists��Second,�the�layout�of�the�data,�the�model,�the�ANOVA�summary�table,�and�
the�MCPs�are�the�same�as�in�the�regular�two-factor�model��Third,�the�assumptions�are�
the�same�as�with�the�n�=�1�model,�except�the�assumption�of�additivity�is�not�necessary�
because�an�interaction�term�exists��The�sphericity�assumption�is�required�for�those�tests�
using� MSAB� as� the� error� term�� We� do� not� mean� to� minimize� the� importance� of� this�
popular� model;� however,� there� really� is� no� additional� information� to� provide� beyond�
what�we�have�already�presented��For�a�discussion�of�other�randomized�block�designs,�
see�Kirk�(1982)�
16.4 Friedman Test
There�is�a�nonparametric�equivalent�to�the�two-factor�randomized�block�ANOVA�model��The�
test�was�developed�by�Friedman�(1937)�and�is�based�on�mean�ranks��For�the�case�of�n�=�1,�the�
procedure� is� precisely� the� same� as� the� Friedman� test� for� the� one-factor� repeated� measures�
model�(see�Chapter�15)��For�the�case�of�n�>�1,�the�procedure�is�slightly�different��First,�all�of�the�
scores�within�each�block�are�ranked�for�that�block��For�instance,�if�there�are�J�=�4�levels�of�factor�
A�and�n�=�10�individuals�per�cell,�then�each�block’s�scores�would�be�ranked�from�1�to�40��From�
this,�a�mean�ranking�can�be�determined�for�each�level�of�factor�A��The�null�hypothesis�tests�
whether�the�mean�rankings�for�each�of�the�levels�of�A�are�equal��The�test�statistic�is�a�χ2,�which�
is�compared�to�the�critical�value�of�α χ2J−1�(see�Table�A�3),�where�the�null�hypothesis�is�rejected�if�
the�test�statistic�exceeds�the�critical�value�
In� the� case� of� tied� ranks,� either� the� available� ranks� can� be� averaged,� or� a� correction�
factor� can� be� used� (see� Chapter� 15)�� You� may� also� recall� the� problem� with� small� n’s� in�
terms�of�the�test�statistic�not�being�precisely�distributed�as�a�χ2��For�situations�where�J�<� 6�and�n�<�6,�consult�the�table�of�critical�values�in�Marascuilo�and�McSweeney�(1977,�Table� A-22,�p��521)��The�Friedman�test�assumes�that�the�population�distributions�have�the�same� shape�(although�not�necessarily�normal)�and�the�same�variability�and�that�the�dependent� measure� is� continuous�� For� alternative� nonparametric� procedures,� see� the� discussion� in� Chapter�15� Various�MCPs�can�be�used�for�the�nonparametric�two-factor�randomized�block�model�� For�the�most�part,�these�MCPs�are�analogs�to�their�parametric�equivalents��In�the�case� of�planned�pairwise�comparisons,�one�may�use�multiple�matched-pair�Wilcoxon�tests� in�a�Bonferroni�form�(i�e�,�taking�the�number�of�contrasts�into�account�by�splitting�up� the�α�level)��Due�to�the�nature�of�planned�comparisons,�these�are�more�powerful�than� the� Friedman� test�� For� post� hoc� comparisons,� two� example� MCPs� are� the� Tukey� HSD� analog� for� pairwise� contrasts� and� the� Scheffé� analog� for� complex� contrasts�� For� addi- tional�discussion�about�the�use�of�MCPs�for�this�model,�see�Marascuilo�and�McSweeney� (1977)�� For� an� example� of� the� Friedman� test,� return� to� Chapter� 15�� Finally,� note� that� MCPs�are�not�usually�conducted�on�the�blocking�factor�as�they�are�rarely�of�interest�to� the�applied�researcher� 575Hierarchical and Randomized Block Analysis of Variance Models 16.5 Comparison of Various ANOVA Models How�do�some�of�the�ANOVA�models�we�have�considered�compare�in�terms�of�power�and� precision?�Recall�again�that�power�is�defined�as�the�probability�of�rejecting�H0�when�H0�is� false,�and�precision�is�defined�as�a�measure�of�our�ability�to�obtain�good�estimates�of�the� treatment�effects��The�classic�literature�on�this�topic�revolves�around�the�correlation�between� the�dependent�variable�Y�and�the�concomitant�variable�X�(i�e�,�rxy),�where�the�concomitant� variable� can� be� either� a� covariate� or� a� blocking� factor�� First� let� us� compare� the� one-factor� ANOVA�and�one-factor�ANCOVA�models��If�rxy,�the�correlation�between�the�covariate�X�and� the�dependent�variable�Y,�is�not�statistically�significantly�different�from�0,�then�the�amount� of�unexplained�variation�will�be�the�same�in�the�two�models��Thus,�no�statistical�adjustment� will�be�made�on�the�group�means��In�this�situation,�the�ANOVA�model�is�more�powerful,� as�we�lose�one�degree�of�freedom�for�each�covariate�used�in�the�ANCOVA�model��If�rxy�is� significantly�different�from�0,�then�the�amount�of�unexplained�variation�will�be�smaller�in� the�ANCOVA�model�as�compared�to�the�ANOVA�model��Here�the�ANCOVA�model�is�more� powerful�and�is�more�precise�as�compared�to�the�ANOVA�model��Second,�compare�the�one- factor�ANOVA�and�two-factor�randomized�block�designs��If�rxy,�the�correlation�between�the� blocking� factor� X� and� the� dependent� variable� Y,� is� not� statistically� significantly� different� from�0,�then�the�blocking�factor�will�not�account�for�much�variability�in�the�dependent�vari- able��One�rule�of�thumb�states�that�if�rxy�<��2,�then�ignore�the�concomitant�variable�(whether� it�is�a�covariate�or�a�blocking�factor),�and�use�the�one-factor�ANOVA��Otherwise,�take�the� concomitant�variable�into�account�somehow,�either�as�a�covariate�or�blocking�factor� How�should�we�take�the�concomitant�variable�into�account�if�it�correlates�with�the�depen- dent� variable� at� greater� than� �20� (i�e�,� rxy� >� �2)?� The� two� best� possibilities�are� the� analysis�
of� covariance� design� (ANCOVA,� Chapter� 14)� and� the� randomized� block� ANOVA� design�
(discussed�in�this�chapter)��That�is,�the�concomitant�variable�can�be�used�either�as�a�covari-
ate�through�a�statistical�form�of�control�(i�e�,�ANCOVA)�or�as�a�blocking�factor�through�an�
experimental�design�form�of�control�(i�e�,�randomized�block�ANOVA)��As�suggested�by�the�
classic�work�of�Feldt�(1958),�if��2�<�rxy�<��4,�then�use�the�concomitant�variable�as�a�blocking� factor�in�a�randomized�block�design�as�it�is�the�most�powerful�and�precise�design��If�rxy�>��6,�
then�use�the�concomitant�variable�as�a�covariate�in�an�ANCOVA�design�as�it�is�the�most�
powerful� and� precise� design�� If� �4� <� rxy� <� �6,� then� the� randomized� block� and� ANCOVA� designs�are�about�equal�in�terms�of�power�and�precision� However,� Maxwell,� Delaney,� and� Dill� (1984)� showed� that� the� correlation� between� the� covariate�and�dependent�variable�should�not�be�the�ultimate�criterion�in�deciding�whether� to� use� an� ANCOVA� or� a� randomized� block� design�� These� designs� differ� in� the� following� two�ways:�(a)�whether�the�concomitant�variable�is�treated�as�continuous�(ANCOVA)�or�cat- egorical� (randomized� block)� and� (b)� whether� individuals� are� assigned� to� groups� based� on� the�concomitant�variable�(randomized�blocks)�or�without�regard�to�the�concomitant�variable� (ANCOVA)��Thus,�the�Feldt�(1958)�comparison�of�these�particular�models�is�not�a�fair�one�in� that�the�models�differ�in�these�two�ways��The�ANCOVA�model�makes�full�use�of�the�informa- tion�contained�in�the�concomitant�variable,�whereas�in�the�randomized�block�model,�some� information�is�lost�due�to�the�categorization��In�examining�nine�different�models,�Maxwell� and�colleagues�suggest�that�rxy�should�not�be�the�sole�factor�in�the�choice�of�a�design�(given� that�rxy�is�at�least��3),�but�that�two�other�factors�be�considered��The�first�factor�is�whether�scores� on�the�concomitant�variable�are�available�prior�to�the�assignment�of�individuals�to�groups�� If�so,�power�will�be�increased�by�assigning�individuals�to�groups�based�on�the�concomitant� variable�(i�e�,�blocking)��The�second�factor�is�whether�X�(the�concomitant�variable)�and�Y�(the� 576 An Introduction to Statistical Concepts dependent�variable)�are�linearly�related��If�so,�the�use�of�ANCOVA�with�a�continuous�con- comitant�variable�is�more�powerful�because�linearity�is�an�assumption�of�the�model�(Keppel� &�Wickens,�2004;�Myers�&�Well,�1995)��If�not,�either�the�concomitant�variable�should�be�used� as�a�blocking�variable,�or�some�sort�of�nonlinear�ANCOVA�model�should�be�used� There�are�a�few�other�decision�criteria�you�may�want�to�consider�in�choosing�between� the�randomized�block�and�ANCOVA�designs��First,�in�some�situations,�blocking�may�be� difficult�to�carry�out��For�instance,�we�may�not�be�able�to�find�enough�homogeneous�indi- viduals�to�constitute�a�block��If�the�blocks�formed�are�not�very�homogeneous,�this�defeats� the�whole�purpose�of�blocking��Second,�the�interaction�of�the�independent�variable�and�the� concomitant�variable�may�be�an�important�effect�to�study��In�this�case,�use�the�randomized� block�design�with�multiple�individuals�per�cell��If�the�interaction�is�significant,�this�violates� the�assumption�of�homogeneity�of�regression�slopes�in�the�analysis�of�covariance�design,� but�does�not�violate�any�assumption�in�the�randomized�block�design�with�n�>�1��Third,�it�
should�be�obvious�by�now�that�the�assumptions�of�the�ANCOVA�design�are�much�more�
restrictive�than�in�the�randomized�block�design��Thus,�when�important�assumptions�are�
likely�to�be�seriously�violated,�the�randomized�block�design�is�preferable�
There�are�other�alternative�designs�for�incorporating�the�concomitant�variable�as�a�pre-
test,�such�as�an�ANOVA�on�gain�(the�difference�between�posttest�and�pretest),�or�a�mixed�
(split-plot)� design� where� the� pretest� and� posttest� measures� are� treated� as� the� levels� of� a�
repeated�factor��Based�on�the�research�of�Huck�and�McLean�(1975)�and�Jennings�(1988),�the�
ANCOVA�model�is�generally�preferred�over�these�other�two�models��For�further�discus-
sion,�see�Reichardt�(1979),�Huitema�(1980),�or�Kirk�(1982)�
16.6 SPSS
In�this�section,�we�examine�SPSS�for�the�models�presented�in�this�chapter��We�begin�with�
the�two-factor�hierarchical�ANOVA�and�then�follow�with�the�two-factor�randomized�block�
ANOVA�
Two-Factor Hierarchical ANOVA
To�conduct�a�two-factor�hierarchical�(or�nested)�ANOVA,�there�are�a�few�differences�from�
other�ANOVA�models�we�have�considered�in�this�text��We�will�illustrate�computation�of�
the�model�that�follows�the�point-and-click�method�as�we�have�done�in�previous�chapters��
It� is� important� to� note,� however,� that� while� SPSS� offers� limited� capability� for� estimating�
hierarchical�ANOVA�models,�the�most�recent�versions�of�SPSS�offer�increasing�ability�to�
generate�multilevel�regression�models,�and�readers�interested�in�more�complex�regression�
models�are�referred�to�Heck,�Thomas,�and�Tabata�(2010)�
In�terms�of�the�form�of�the�data,�one�column�or�variable�indicates�the�levels�or�catego-
ries�of�the�independent�variable�(i�e�,�the�fixed�factor),�one�column�indicates�the�levels�of�
the�nested�factor,�and�the�one�variable�represents�the�outcome�or�the�dependent�variable��
Each�row�represents�one�individual,�indicating�the�level�or�group�of�the�nonnested�factor�
(basal�or�whole�language,�in�our�example),�the�level�or�group�of�the�nested�factor�(teach-
ers�1,�2,�3,�or�4),�and�their�score�on�the�dependent�variable��Thus,�we�have�three�columns�
which�represent�the�nonnested�factor,�the�nested�factor,�and�the�scores,�as�shown�in�the�
following�screenshot�

577Hierarchical and Randomized Block Analysis of Variance Models
�e nested factor is labeled “Teacher”
where each value represents the child’s
classroom teacher.
�e dependent variable is “Score” and
repersents the reading score.
The form of the data for the two-factor
hierarchial ANOVA follows similarly to
previous ANOVA models. The non-nested
factor is labeled “Approach” where
each value represents the reading approach
to which they were assigned.
Step 1:�To�conduct�a�two-factor�hierarchical�ANOVA,�go�to�“Analyze”�in�the�top�pulldown�
menu,�then�select�“General Linear Model,”�and�then�select�“Univariate.”�Following�
the�screenshot�(step�1)�as�follows�produces�the�“Univariate”�dialog�box�
Two-factor
hierarchical ANOVA:
Step 1
A
B
C

578 An Introduction to Statistical Concepts
Step 2:� Click� the� dependent� variable� (e�g�,� reading� score)� and� move� it� into� the�
“Dependent Variable”�box�by�clicking�the�arrow�button��Click�the�nonnested�fac-
tor� (e�g�,� reading� approach;� this� is� a� fixed-effects� factor)� and� move� it� into� the�“Fixed
Factors”�box�by�clicking�the�arrow�button��Click�the�nested�variable�(e�g�,�teacher;�this�
is� a� random-effects� factor)� and� move� it� into� the�“Random Factors”� box� by� clicking�
the�arrow�button�
Two-factor
hierarchical ANOVA:
Step 2
Clicking on “Model”
will allow you to
define the nested
factor.
Clicking on “Plots”
will allow you to
generate profile
plots.
Clicking on “Save”
will allow you to
save various forms
of residuals,
among other
variables.
Clicking on “Options” will
allow you to obtain a
number of other statistics
(e.g., descriptive statistics,
effect size, power,
homogeneity tests, and
multiple comparison
procedures).
Select the nested factor from
the list on the left and use
the arrow to move it to the
“Random Factor(s)” box
on the right.
Select the non-nested factor
from the list on the left and
use the arrow to move it to the
“Fixed Factor(s)” box on
the right.
Select the dependent
variable from the list on the
left and use the arrow to
move it to the “Dependent
Variable” box on the right.
Univariate
Step 3a:� From� the� main� “Univariate”� dialog� box� (see� screenshot� step� 2),� click� on�
“Model”� to� enact� the� “Univariate Model”� dialog� box�� From� the� “Univariate
Model”�dialog�box,�click�the�“Custom”�radio�button�located�in�the�top�left�(see�screen-
shot�step�3a)��We�will�now�define�a�main effect�for�reading�approach�(see�screenshot�step�
3a)��To�do�this,�click�the�“Build Terms”�toggle�menu�in�the�center�of�the�page�and�select�
“Main�Effect.”� Click� the� nonnested� factor� (in� this� illustration,� “Approach”)� from� the�
“Factors & Covariates”�list�on�the�left�and�move�to�the�“Model”�box�on�the�right�by�
clicking�the�arrow�

579Hierarchical and Randomized Block Analysis of Variance Models
Two-factor
hierarchical ANOVA:
Step 3a
Click the toggle
menu for “Build
Terms” to select
“Main Effects.”
Select the non-
nested variable from
the list on the left
and use the arrow
to move it to the
“Model” box on the
right.
Step 3b:� We� will� now� define� an� interaction effect� for� reading� approach� by� teacher� (see�
screenshot�step�3b)��To�do�this,�click�the�“Build Terms”�toggle�menu�in�the�center�of�the�
page� and� select�“Interaction.”� Click� both� the� nonnested� factor� (e�g�,� “Approach”)� and�
nested�factor�(e�g�,�“Teacher”)�from�the�“Factors & Covariates”�list�on�the�left�and�move�
them�to�the�“Model”�box�on�the�right�by�clicking�the�arrow��The�interaction�term�is�neces-
sary�to�trick�SPSS�into�computing�the�main�effect�of�B(A)�for�the�nested�factor�(which�SPSS�
calls� “method*teacher,”� but� is� actually� “teacher”)� and� thus� generate� the� proper� ANOVA�
summary�table��Thus,�the�model�should�not�include�a�main�effect�term�for�“Teacher�”
Two-factor
hierarchical ANOVA:
Step 3b
Click the toggle menu
for “Build Terms”
to select
“Interaction.”
Select both the
non-nested and nested
factors from the list
on the left and use
the arrow to move
them to the “Model”
box on the right.

580 An Introduction to Statistical Concepts
Step 4:�From�the�“Univariate”�dialog�box�(see�screenshot�step�2),�clicking�on�“Post hoc”
will�provide�the�option�to�select�post�hoc�MCPs�for�the�nonnested�factor��From�the�“Post Hoc
Multiple Comparisons for Observed Means”�dialog�box,�click�on�the�name�of�the�non-
nested�factor�in�the�“Factor(s)”�list�box�in�the�top�left�and�move�it�to�the�“Post Hoc Tests
for”�box�in�the�top�right�by�clicking�on�the�arrow�key��Check�an�appropriate�MCP�for�your�
situation�by�placing�a�checkmark�in�the�box�next�to�the�desired�MCP��In�this�example,�we�select�
“Tukey�”�Click�on�“Continue”�to�return�to�the�original�dialog�box�
Two-factor
hierarchical ANOVA:
Step 4
Select the non-nested factor of
interest from the list on the left and
use the arrow to move it to the “Post
Hoc Tests for” box on the right. MCPs for instances when
homogeneity of variance
assumption is met.
MCPs for instances
when homogeneity of
variance assumption
is not met.
Step 5:�Clicking�on�“Options”�from�the�main�“Univariate”�dialog�box�(see�screenshot�
step�2)�will�provide�the�option�to�select�such�information�as�“Descriptive Statistics,”
“Estimates of effect size,” “Observed power,”�and�“Homogeneity tests”�
(i�e�,�Levene’s�test)��Click�on�“Continue”�to�return�to�the�original�dialog�box��Note that if
you are interested in an MCP for the nested factor�(although generally not of interest for this model),�
post hoc MCPs are only available from the “Options” screen�� To� select� a� post� hoc� procedure,�
click�on�“Compare main effects”�and�use�the�toggle�menu�to�reveal�the�Tukey LSD,�
Bonferroni,�and�Sidak�procedures��However,� we�have�already� mentioned�that�MCPs�
are�not�generally�of�interest�for�the�nested�factor�
It� is� important� to� note� that� Li� and� Lomax� (2011)� found� that� the� standard� errors� of� the�
MCPs�for�the�nonnested�factor�in�SPSS�point-and-click�(PAC)�mode�are�not�correct��More�
specifically,�SPSS�PAC�uses�MSwithin�as�the�error�term�in�computing�the�MCP�standard�error�
rather�than�MSB(A)�as�the�error�term��There�is�no�way�to�generate�the�correct�results�solely�
with� SPSS� PAC,� unless� hand� computations� using� the� correct� error� term� are� utilized� or�
other�software�programs�(e�g�,�SPSS�syntax)�are�also�involved�

581Hierarchical and Randomized Block Analysis of Variance Models
Two-factor
hierarchical ANOVA:
Step 5
Select from the list on the left those
variables that you wish to display
means for and use the arrow to
move them to the “Display Means
for” box on the right.
While post hoc MCPs
are usually not of
interest in random
effects models, if you
wish to conduct a post
hoc test, that selection
must be made from
this screen using the
“Compare main
effects”
option, then select one
of the three MCPs that
are available from the
toggle menu under
“Confidence interval
adjustment” (i.e., LSD
Bonferroni, or Sidak).
Step 6:�From�the�“Univariate”�dialog�box�(screenshot�step�2),�click�on�“Save”�to�select�
those�elements�you�want�to�save��Here�we�want�to�save�the�unstandardized�residuals�to�
be�used�to�examine�the�extent�to�which�normality�and�independence�are�met��Thus,�place�
a�checkmark�in�the�box�next�to�“Unstandardized.”�Click�“Continue”�to�return�to�the�
main�“Univariate”� dialog� box�� From� the�“Univariate”� dialog� box,� click� on�“OK”� to�
generate�the�output�
Two-factor
hierarchical ANOVA:
Step 6
Interpreting the output:�Annotated�results�are�presented�in�Table�16�10�

582 An Introduction to Statistical Concepts
Table 16.10
Two-Factor�Hierarchical�ANOVA�SPSS�Results�for�the�Approaches�to�Reading�Example
Between-Subjects Factors
Value Label N
1.00 Basal 12Approach to reading
2.00 Whole
language
12
1.00 Teacher B1 6
2.00 Teacher B2 6
3.00 Teacher B3 6
Teacher
4.00 Teacher B4 6
Descriptive Statistics
Dependent Variable: Reading Score
Approach to Reading Teacher Mean Std. Deviation N
Teacher B1 2.8333 1.72240 6
Teacher B2 3.8333 1.94079 6
Basal
Total 3.3333 1.82574 12
Teacher B3 10.0000 3.03315 6
Teacher B4 11.6667 2.80476 6
Whole language
Total 10.8333 2.91807 12
Teacher B1 2.8333 1.72240 6
Teacher B2 3.8333 1.94079 6
Teacher B3 10.0000 3.03315 6
Teacher B4 11.6667 2.80476 6
Total
Total 7.0833 4.51005 24
Levene’s Test of Equality of Error Variancesa
Dependent Variable: Reading Score
F df1 df 2 Sig.
1.042 3 20 .396
�e table labeled
“Descriptive
Statistics”
provides basic
descriptive statistics
(means, standard
deviations, and sample
sizes) for each
non-nested factor and
nested factor
combination (or cell).
�e table labeled “Between-
Subjects Factors” lists the
variable names and sample sizes for
the non-nested factor (i.e. “Approach
to reading”) and the nested factor
(i.e., “Teacher”).
�e F test (and associated p value)
for Levene’s Test for Equality of Error
Variances is reviewed to determine if
equal variances can be assumed. In
this case, we meet the assumption
(as p is greater than α).
a Tests the null hypothesis that the error variance
of the dependent variable is equal across groups.

583Hierarchical and Randomized Block Analysis of Variance Models
Table 16.10 (continued)
Two-Factor�Hierarchical�ANOVA�SPSS�Results�for�the�Approaches�to�Reading�Example
Tests of Between-Subjects Effects
Dependent Variable: Reading Score
Source
Type III
Sum of
Squares df
Mean
Square F Sig.
Partial
Eta
Squared
Noncent.
Parameter
Observed
Powera
Hypothesis 1 .005 .991 1.000Intercept
Error 2
Hypothesis 1 .016 .968 .948Approach
Error 2
Hypothesis 2
212.500
59.559
.952 .403 .087
212.500
59.559
1.905 .192Approach*
Teacher Error
1204.167
11.333
337.500
11.333
11.333
119.000 20
1204.167
5.667b
337.500
5.667b
5.667
5.950c
a Computed using alpha = .05.
b MS(Approach * Teacher).
c MS(Error).
Estimated Marginal Means
1. Grand Mean
Dependent Variable: Reading Score
95% Confidence Interval
Mean Std. Error Lower Bound Upper Bound
7.083a .498 6.045 8.122
a Based on modified population marginal mean.
Observed power tells whether our test is
powerful enough to detect mean differences if
they really exist. power of .948 is strong. The
probability of rejecting the null hypothesis, if it
is really false, is about 95%.
Comparing p to α, we
find a statistically
significant difference in
approach to reading.
�is is an omnibus test.
We will look at our MCPs
to determine which mean
ratings differ.
Partial eta squared is one measure
of effect size:
We can interpret this to say that
approximately 97% of the variation in
reading score is accounted for by the
differences in reading approach.
337.500+11.333
337.500 =.968=
=
�e “Grand Mean” (in this case, 7.083)
represents the overall reading score
mean, regardless of the reading approach
or teacher. �e 95% CI represents the
CI of the grand mean.
SSapproach
SSapproach+ SSapproach_error
η2p
η2p
(continued)

584 An Introduction to Statistical Concepts
Table 16.10 (continued)
Two-Factor�Hierarchical�ANOVA�SPSS�Results�for�the�Approaches�to�Reading�Example
2. Approach to Reading
Estimates
Dependent Variable: Reading Score
95% Confidence Interval
Approach to Reading Mean Std. Error Lower Bound Upper Bound
Basal .704 1.864 4.802
Whole language
3.333a
10.833a .704 9.364 12.302
a Based on modified population marginal mean.
Pairwise Comparisons
Dependent Variable: Reading Score
95% Confidence
Interval for
Differencec
(I) Approach
to Reading
(J) Approach
to Reading
Mean
Difference
(I – J) Std. Error Sig.c
Lower
Bound
Upper
Bound
Basal Whole
language
–7.500*,a,b .996 .000 –9.577 –5.423
Whole
language
Basal 7.500*,a,b .996 .000 5.423 9.577
Based on estimated marginal means.
*The mean difference is significant at the .05 level.
a An estimate of the modified population marginal mean (I ).
b An estimate of the modified population marginal mean (J ).
c Adjustment for multiple comparisons: Bonferroni.
Univariate Tests
Dependent Variable: Reading Score
Sum of
Squares df
Mean
Square F Sig.
Partial eta
Squared
Noncent.
Parameter
Observed
Powera
Contrast 1 56.723 .000 .739 56.723 1.000
Error
337.500
119.000 20
337.500
5.950
a Computed using alpha = .05.
The table for
“Approach to
Reading” provides
descriptive statistics
for each of the
reading approaches.
In addition to means,
the SE and 95% CI of
the means are
reported.
“Mean Difference’’ is
simply the difference between
the means of the two categories
of our reading approach
factor. For example, the
mean difference of basal
reading and whole language
is calculated as 3.333–
10.833 = –7.500.
“Sig.’’ is the observed p value
for the results of the
Bonferroni post hoc MCP.
There is a statistically
significant mean difference
in reading scores between
basal reading and whole
language (p < .001). Note the redundant results in the table. The comparison of basal and whole language (row 1) is the same as the comparison of whole language and basal (row 2). The error term represents the within cells source of variation. The F tests the effect of approach to reading. This test is based on the linearly independent pairwise comparisons among the estimated marginal means. 585Hierarchical and Randomized Block Analysis of Variance Models Table 16.10 (continued) Two-Factor�Hierarchical�ANOVA�SPSS�Results�for�the�Approaches�to�Reading�Example 3. Approach to Reading * Teacher Dependent Variable: Reading Score 95% Confidence Interval Approach to Reading Teacher Mean Std. Error Lower Bound Upper Bound Teacher B1 2.833 .996 Teacher B2 3.833 .996 Teacher B3 .a . Basal Teacher B4 .a . Teacher B1 .a . Teacher B2 .a . Teacher B3 10.000 .996 Whole language Teacher B4 11.667 .996 .756 1.756 . . . . 7.923 9.589 4.911 5.911 . . . . 12.077 13.744 a a This level combination of factors is not observed, thus the corresponding population marginal mean is not estimable. The table for “Approach to Reading * Teacher” provides descriptive statistics for each of the approach-teacher combinations. In addition to means, the SE and 95% CI of the means are reported. Note the footnote in reference to the missing mean values. This is because this is not a completely crossed design (i.e., the teachers taught only one reading approach). Examining Assumptions for Two-Factor Hierarchical ANOVA Normality We�will�use�the�residuals�(which�were�requested�and�created�through�the�“Save”�option� mentioned�earlier)�to�examine�the�extent�to�which�normality�was�met� The residuals are computed by subtracting the cell mean from each observation. For example, the mean reading score for students assigned to teacher 1 who received the basal approach to reading was 2.833. The first student scored 1 on reading comprehension. Thus the residual for the first person is 1.00 – 2.83 = –1.83. As we look at the raw data, we see one new variable has been added to our dataset labeled RES_1. These are the residuals and will be used to review the assumption of normality. 586 An Introduction to Statistical Concepts Generating normality evidence:�As�described�in�earlier�ANOVA�chapters,�under- standing�the�distributional�shape,�specifically�whether�normality�is�a�reasonable�assump- tion,� is� important�� For� the� two-factor� hierarchical� ANOVA,� the� residuals� should� be� normally�distributed� As�in�previous�chapters,�we�use�“Explore”�to�examine�whether�the�assumption�of�nor- mality�is�met��The�general�steps�for�accessing “Explore”�have�been�presented�in�previous� chapters�and�will�not�be�repeated�here��Click�the�residual�and�move�it�into�the�“Dependent List”�box�by�clicking�on�the�arrow�button��The�procedures�for�selecting�normality�statis- tics�were�presented�in�Chapter�6�and�remain�the�same�here:�Click�on�“Plots”�in�the�upper� right�corner��Place�a�checkmark�in�the�boxes�for�“Normality plots with tests”�and� also�for�“Histogram.”�Then�click�“Continue”�to�return�to�the�main�“Explore”�dialog� box�and�click�“OK”�to�generate�the�output� Generating normality evidence Select residuals from the list on the left and use the arrow to move it to the “Dependent List” box on the right. �en click on “Plots.” Interpreting normality evidence:�By�this�point,�we�have�had�a�substantial�amount� of�practice�in�interpreting�quite�a�range�of�normality�statistics�and�interpret�them�again�in� reference�to�the�hierarchical�ANOVA�model�assumption�of�normality� 587Hierarchical and Randomized Block Analysis of Variance Models Descriptives Mean 95% Confidence interval for mean Lower bound Upper bound 5% Trimmed mean Median Variance Std. deviation Minimum Maximum Range Interquartile range Skewness Kurtosis .0000 –.9605 .9605 –.0648 –.3333 5.174 2.27462 –3.67 5.00 8.67 4.08 .284 –.693 Statistic Std. Error .46431 .472 .918 Residual for score The�skewness�statistic�of�the�residuals�is��284�and�kurtosis�is�−�693—both�being�within� the�range�of�an�absolute�value�of�2�0,�suggesting�some�evidence�of�normality� As� suggested� by� the� skewness� statistic,� the� histogram� of� residuals� is� slightly� positively� skewed,�and�the�histogram�also�provides�a�visual�display�of�the�slightly�platykurtic�distribution� Histogram 5 4 3 2 1 0 –4.00 –2.00 .00 Residual for score Fr eq ue nc y 2.00 4.00 6.00 Mean = 8.33E – 17 Std. dev. = 2.275 N = 24 There�are�a�few�other�statistics�that�can�be�used�to�gauge�normality��The�formal�test�of� normality,�the�Shapiro–Wilk�(S–W)�test�(SW)�(Shapiro�&�Wilk,�1965),�provides�evidence�of� the�extent�to�which�our�sample�distribution�is�statistically�different�from�a�normal�distri- bution��The�output�for�the�S–W�test�is�presented�as�follows�and�suggests�that�our�sample� 588 An Introduction to Statistical Concepts distribution�for�the�residual�is�not�statistically�significantly�different�than�what�would�be� expected�from�a�normal�distribution�as�the�p�value�is�greater�than�α� Tests of Normality Kolmogorov–Smirnova Shapiro–Wilk Residual for score a Lilliefors significance correction. *This is alower bound of the true significance. .123 24 .200* .960 24 .442 Statistic Statisticdf dfSig. Sig. Quantile–quantile� (Q–Q)� plots� are� also� often� examined� to� determine� evidence� of� nor- mality,�where�quantiles�of�the�theoretical�normal�distribution�are�plotted�against�quantiles� of�the�sample�distribution��Points�that�fall�on�or�close�to�the�diagonal�line�suggest�evidence� of�normality��The�Q–Q�plot�of�residuals�shown�in�the�following�suggests�relative�normality� –5.0 –2 –1 0 1 2 3 –2.5 0.0 Observed value Ex pe ct ed n or m al Normal Q–Q plot of residual for score 2.5 5.0 Examination� of� the� following� boxplot� also� suggests� a� relatively� normal� distributional� shape�of�residuals�with�no�outliers� Residual for score –4.00 –2.00 .00 2.00 4.00 6.00 589Hierarchical and Randomized Block Analysis of Variance Models Considering�the�forms�of�evidence�we�have�examined,�skewness�and�kurtosis�statistics,� the�S–W�test,�histogram,�the�Q–Q�plot,�and�the�boxplot,�all�suggest�normality�is�a�reason- able�assumption��We�can�be�reasonably�assured�we�have�met�the�assumption�of�normality� Independence The� last� assumption� to� test� for� is� independence�� As� we� have� seen� this� tested� in� other� designs,�we�do�not�consider�it�further�here� Two-Factor Fixed-Effects Randomized Block ANOVA for n = 1 To� run� a� two-factor� fixed-effects� randomized� block� ANOVA� for� n� =� 1,� there� a� few� dif- ferences� from� the� regular� two-factor� fixed-effects� ANOVA� that� we� see� later� as� we� build� the� model� in� SPSS�� Additionally,� the� test� of� additivity� is� not� available� in� SPSS,� nor� are� the�adjusted�F�tests�(i�e�,�the�Geisser–Greenhouse�and�Huynh–Feldt�procedures)��All�other� ANOVA�procedures�that�you�are�familiar�with�will�operate�as�before� In�terms�of�the�form�of�the�data,�it�looks�just�as�we�saw�with�the�two-factor�fixed-effects� ANOVA�with�the�exception�that�now�we�have�one�treatment�factor�and�one�blocking�vari- able��The�dataset�must�therefore�consist�of�three�variables�or�columns,�one�for�the�level�of� the� treatment� factor,� second� for� the� level� of� the� blocking� factor,� and� the� third� for� the� dependent�variable��Each�row�still�represents�one�individual,�indicating�the�levels�of�the� treatment� and� blocking� factors� to� which� the� individual� is� a� member,� and� their� score� on� the�dependent�variable��As�seen�in�the�following�screenshot,�for�a�two-factor�fixed-effects� randomized�block�ANOVA,�the�SPSS�data�are�in�the�form�of�two�columns�that�represent�the� group� values� (i�e�,� the� treatment� and� blocking� factors)� and� one� column� that� represents� the�scores�on�the�dependent�variable� �e treatment factor is labeled “Program” where each value represents the exercise program in which the individual participated (e.g., 1 represents “1/week”). Thus there were four people assigned to exercise once per week. �e blocking factor is labeled “Age” where 1 represents 20 years of age, 2 represents 30, 3 represents 40, and 4 represents 50. The dependent variable is “Weightloss” and represents the amount of weight lost. We see that one person from each of the four age groups was assigned to each exercise program. �e other exercise programs (2, 3, and 4) follow this pattern as well. Step 1:�To�conduct�a�two-factor�randomized�block�ANOVA�for�n�=�1,�go�to�“Analyze”� in� the� top� pulldown� menu,� then� select� “General Linear Model,”� and� then� select� “Univariate.”�Following�the�screenshot�(step�1)�as�follows�produces�the�“Univariate”� dialog�box� 590 An Introduction to Statistical Concepts Two-factor randomized block ANOVA: Step 1 A B C Step 2:�Click�the�dependent�variable�(e�g�,�weight�loss)�and�move�it�into�the�“Dependent Variable”�box�by�clicking�the�arrow�button��Click�the�treatment�factor�and�the�blocking� factor�and�move�them�into�the�“Fixed Factors”�box�by�clicking�the�arrow�button� Two-factor randomized block ANOVA: Step 2 Clicking on “Model” will allow you to define the blocking factor. Clicking on “Plots” will allow you to generate profile plots. Clicking on “Save” will allow you to save various forms of residuals, among other variables. Select the dependent variable from the list on the left and use the arrow to move it to the “Dependent Variable” box on the right. Select the treatment and blocking factors from the list on the left and use the arrow to them move to the “Fixed Factor(s)” box on the right. Clicking on “Options” will allow you to obtain a number of other statistics (e.g., descriptive statistics, effect size, power, and multiple comparsion procedures). Step 3:� From� the� main� “Univariate”� dialog� box� (see� screenshot� step� 2),� click� on� “Model”�to�enact�the�“Univariate Model”�dialog�box��From�the�“Univariate Model”� dialog�box,�click�the�“Custom”�radio�button�(see�screenshot�step�3)��We�will�now�define� the�effects�necessary�for�this�model,�a�main�effect�for�both�exercise�program�and�for�age�� We�will�not�define�an�interaction��To�do�this,�click�the�“Build Terms”�toggle�menu�in�the� center�of�the�page�and�select�“Main effect.”�Click�the�treatment�factor�(i�e�,�“Program”)� and�the�blocking�factor�(i�e�,�“Age”)�from�the�“Factors & Covariates”�list�on�the�left� and�move�them�to�the�“Model”�box�on�the�right�by�clicking�the�arrow��Thus,�the�model� should�not�include�an�interaction�effect�for�“Program�*�Age�” 591Hierarchical and Randomized Block Analysis of Variance Models Two-factor randomized block ANOVA: Step 3 Click the toggle menu for “Build Terms” to select “Main effects.” Select the treatment and blocking factors from the list on the left and use the arrow to move them to the “Model” box on the right. Step 4:�From�the�“Univariate”�dialog�box�(see�screenshot�step�2),�clicking�on�“Post Hoc”� will� provide� the� option� to� select� post� hoc� MCPs� for� both� factors�� From� the�“Post Hoc Multiple Comparisons for Observed Means”�dialog�box,�click�on�the�name� of�the�factors�(i�e�,�“Program”�and�“Age”)�in�the�“Factor(s)”�list�box�in�the�top�left�and� move�to�the�“Post Hoc Tests for”�box�in�the�top�right�by�clicking�on�the�arrow�key�� Check�an�appropriate�MCP�for�your�situation�by�placing�a�checkmark�in�the�box�next�to� the�desired�MCP��In�this�example,�we�select�“Tukey�”�Click�on�“Continue”�to�return�to�the� original�dialog�box� Two-factor randomized block ANOVA: Step 4 Select the treatment and blocking factors from the list on the left and use the arrow to move them to the “Post Hoc Tests for” box on the right. MCPs for instances when the homogeneity of variance assumption is not met. MCPs for instances when the homogeneity of variance assumption is met. 592 An Introduction to Statistical Concepts Step 5:�Clicking�on “Options”�from�the�main�“Univariate”�dialog�box�(see�screen- shot� step� 2)� will� provide� the� option� to� select� such� information� as “Descriptive Statistics,” “Estimates of effect size,”�and�“Observed power.”�Click�on� “Continue”�to�return�to�the�original�dialog�box� Two-factor randomized block ANOVA: Step 5 Select from the list on the left those variables that you wish to display means for and use the arrow to move them to the “Display Means for” box on the right. Step 6:� From� the� “Univariate”� dialog� box,� click� on� “Plots”� to� obtain� a� profile� plot� of� means�� Click� the� treatment� factor� (e�g�,� “Program”)� and� move� it� into� the�“Horizontal Axis” box�by�clicking�the�arrow�button��Click�the�blocking�factor�(e�g�,�“Age”)�and�move�it� into�the�“Separate Lines”�box�by�clicking�the�arrow�button�(see�screenshot�step�6a)��Then� click�on�“Add”�to�move�this�arrangement�into�the�“Plots”�box�at�the�bottom�of�the�dialog� box�(see�screenshot�step�6b)��Click�on�“Continue”�to�return�to�the�original�dialog�box� Two-factor randomized block ANOVA: Step 6a Select the treatment factor from the list on the left and use the arrow to move it to the “Horizontal Axis” box on the right. select the blocking factor and move it to the “Separate Lines” box on the right. 593Hierarchical and Randomized Block Analysis of Variance Models Two-factor randomized block ANOVA: Step 6b Then click “Add” to move the arrangement into the “Plots” box at the Bottom. Step 7:� From� the�“Univariate”� dialog� box� (see� screenshot� step� 2),� click� on�“Save”� to� select�those�elements�you�want�to�save��Here�we�save�the�unstandardized�residuals�to�use� later� to� examine� the� extent� to� which� normality� and� independence� are� met�� Thus,� place� a� checkmark� in� the� box� next� to� “Unstandardized.”� Click� “Continue”� to� return� to� the� main�“Univariate”�dialog�box��From�the�“Univariate”�dialog�box,�click�on�“OK”�to�return� and�generate�the�output� Two-factor randomized block ANOVA: Step 7 Interpreting the output:�Annotated�results�are�presented�in�Table�16�11� 594 An Introduction to Statistical Concepts Table 16.11 Two-Factor�Randomized�Block�ANOVA�SPSS�Results�for�the�Exercise�Program�Example Between-Subjects Factors Value Label N 1.00 1/week 2.00 2/week 3.00 3/week Exercise program 4.00 4/week 1.00 20 years old 2.00 30 years old 3.00 40 years old Age 4.00 50 years old 4 4 4 4 4 4 4 4 Descriptive Statistics Dependent Variable: Weight Loss Exercise Program Age Mean Std. Deviation N 20 years old 30 years old 40 years old 50 years old 1/week Total 20 years old 30 years old 40 years old 50 years old 2/week Total 20 years old 30 years old 40 years old 50 years old 3/week Total 20 years old 30 years old 40 years old 50 years old 4/week Total 1 1 1 1 1 4 1 1 1 1 4 1 1 1 4 1 1 1 1 4 �e table labeled “Between- Subjects Factors” lists the variable names and sample sizes for the levels of treatment factor (i.e., “Exercise program”) and the blocking factor (i.e., “Age”). �e table labeled “Descriptive Statistics” provides basic descriptive statistics (means, standard deviations, and sample sizes) for each treatment factor- blocking factor combination. Because there was only one individual per age group in each exercise program, there is no within cells variation to calculate (and thus missing values for the standard deviation). 20 years old 30 years old 40 years old 50 years old Total Total 3.0000 2.0000 1.0000 .0000 1.5000 6.0000 5.0000 4.0000 2.0000 4.2500 10.0000 8.0000 7.0000 6.0000 7.7500 9.0000 7.0000 8.0000 7.0000 7.7500 7.0000 5.5000 5.0000 3.7500 5.3125 . . . . 1.29099 . . . . 1.70783 . . . . 1.70783 . . . . .95743 3.16228 2.64575 3.16228 3.30404 3.00486 4 4 4 4 16 595Hierarchical and Randomized Block Analysis of Variance Models Table 16.11 (continued) Two-Factor�Randomized�Block�ANOVA�SPSS�Results�for�the�Exercise�Program�Example Tests of Between-Subjects Effects Dependent Variable: Weight Loss Source Type III Sum of Squares df Mean Square F Sig. Partial Eta Squared Noncent. Parameter Observed Powerb 131.875a 6 55.526 .000 .974 333.158 1.000 451.563 1 1140.789 .000 .992 1140.789 1.000 110.187 3 92.789 .000 .969 278.368 1.000 21.688 3 18.263 .000 .859 54.789 .999 3.563 9 21.979 451.563 36.729 7.229 .396 587.000 16 Corrected Model Intercept Program Age Error Total Corrected Total 135.438 15 a R Squared = .974 (Adjusted R Squared = .956). b Computed using alpha = .05. Observed power tells whether our test is powerful enough to detect mean differences if they really exist. Power of 1.00 indicates maximum power, the probability of rejecting the null hypothesis if it is really false is about 1. Comparing p to α, we find a statistically significant difference in weight loss based for both exercise program and age group.�ese are omnibus tests.We will look at post hoc tests to determine which exercise programs and age groups statistically differ on weight loss. Partial eta squared is one measure of effect size: We can interpret this to say that approximately 97% of the variation in weight loss is accounted for by the exercise program. SSprogram SSprogram + SSerror 110.187 + 3.563 110.187 = .969 η2 = η2 = Estimated Marginal Means 1. Grand Mean Dependent Variable: Weight Loss 95% Confidence Interval Mean Std. Error Lower Bound Upper Bound 5.313 .157 4.957 5.668 The “Grand Mean’’ (in this case, 5.313) represents the overall mean, regardless of the exercise program or age. The 95% CI represents the CI of the grand mean. (continued) 596 An Introduction to Statistical Concepts Table 16.11 (continued) Two-Factor�Randomized�Block�ANOVA�SPSS�Results�for�the�Exercise�Program�Example Post Hoc Tests Exercise Program Weight Loss Tukey HSD 95% Confidence Interval (I) Exercise Program (J) Exercise Program Mean Difference (I – J ) Std. Error Sig. Lower Bound Upper Bound .44488 .001 –1.3612 .44488 .000 –4.8612 .44488 .000 –4.8612 .44488 .001 4.1388 .44488 .000 –2.1112 .44488 .000 –2.1112 .44488 .000 7.6388 .44488 .000 4.8888 .44488 1.000 1.3888 .44488 .000 7.6388 .44488 .000 4.8888 1/week 2/week 3/week 4/week 2/week 3/week 4/week 1/week 3/week 4/week 1/week 2/week 4/week 1/week 2/week 3/week –2.7500* –6.2500* –6.2500* 2.7500* –3.5000* –3.5000* 6.2500* 3.5000* .0000 6.2500* 3.5000* .0000 .44488 1.000 –4.1388 –7.6388 –7.6388 1.3612 –4.8888 –4.8888 4.8612 2.1112 –1.3888 4.8612 2.1112 –1.3888 1.3888 Based on observed means. The error term is mean square(error) = .396. *The mean difference is significant at the .05 level. “Mean Difference” is simply the difference between the means of the categories of our program factor. For example, the mean difference of exercising once per week and exercising twice per week is calculated as 1.500 – 4.250 = –2.750. “Sig.” denotes the observed p value and provides the results of the Tukey post hoc procedure. There is a statistically significant mean difference in weight loss for all exercise programs except for exercising 3 vs. 4 times per week ( p = 1.000). Note there are redundant results presented in the table. The comparison of exercising 1/week vs. 2/week (row 1) is the same as the comparison of 2/week vs. 1/week (row 4). Multiple Comparisons 2. Exercise Program Dependent Variable: Weight Loss 95% Confidence Interval Exercise Program Mean Std. Error Lower Bound Upper Bound 1/week 2/week 3/week 4/week 1.500 4.250 7.750 7.750 .315 .315 .315 .315 .788 3.538 7.038 7.038 2.212 4.962 8.462 8.462 3. Age Dependent Variable: Weight Loss 95% Confidence Interval Age Mean Std. Error Lower Bound Upper Bound 20 years old 7.000 7.712 30 years old 5.500 6.212 40 years old 5.000 5.712 50 years old 3.750 .315 .315 .315 .315 6.288 4.788 4.288 3.038 4.462 The table for “Exercise Program” provides descriptive statistics for each of the programs. In addition to means, the SE and 95% CI of the means are reported. The table for “Age” provides descriptive statistics for each of the age groups. In addition to means, the SE and 95% CI of the means are reported. 597Hierarchical and Randomized Block Analysis of Variance Models Table 16.11 (continued) Two-Factor�Randomized�Block�ANOVA�SPSS�Results�for�the�Exercise�Program�Example Homogeneous Subsets Weight Loss Tukey HSDa,b Subset Exercise Program N 1 2 3 4 4 4 4 1/week 2/week 3/week 4/week Sig. 1.5000 1.000 4.2500 1.000 7.7500 7.7500 1.000 Means for groups in homogeneous subsets are displayed. Based on observed means. �e error term is mean square(error) = .396. a Uses harmonic mean sample size = 4.000. b Alpha = .05. “Homogenous Subsets” provides a visual representation of the MCP. For each subset, the means that are printed are homogeneous, or not significantly different. For example, in subset 1 the mean weight loss for exercising once per week (regardless of age group) is 1.50. �is is statistically significantly different than the mean weight loss for exercising two, three, or four times per week (as reflected by empty cells in row 1). Similar interpretations are made for contrasts involving exercising two, three, and four times per week. Age Multiple Comparisons Weight Loss Tukey HSD 95% Confidence Interval (I) Age (J) Age Mean Difference (I – J) Std. Error Sig. Lower Bound Upper Bound 30 years old .44488 .034 40 years old .44488 .007 20 years old 50 years old .44488 .000 20 years old .44488 .034 40 years old .44488 .685 30 years old 50 years old .44488 .015 20 years old .44488 .007 30 years old .44488 .685 40 years old 50 years old .44488 .080 20 years old .44488 .000 30 years old .44488 .015 50 years old 40 years old 1.5000* 2.0000* 3.2500* –1.5000* .5000 1.7500* –2.0000 –.5000 1.2500 –3.2500* –1.7500* –1.2500 .44488 .080 .1112 .6112 1.8612 –2.8888 –.8888 .3612 –3.3888 –1.8888 –.1388 –4.6388 –3.1388 –2.6388 2.8888 3.3888 4.6388 –.1112 1.8888 3.1388 –.6112 .8888 2.6388 –1.8612 –.3612 .1388 Based on observed means. �e error term is mean square(error) = .396. *�e mean difference is significant at the .05 level. “Mean difference” is simply the difference between the means of the age groups (i.e., the blocking factor). For example, the mean weight loss difference of 20 – 30 year olds is calculated as 7.000 – 5.500 = 1.5000. “Sig.” denotes the observed p value and provides the results of the Tukey post hoc procedure.�ere is a statistically significant mean difference in weight loss for: • 20 and 30 year olds (p = .034) • 20 and 40 year olds (p = .007) • 20 and 50 year olds (p < .001) • 30 and 50 year olds (p = .015) Note there are redundant results presented in the table. �e comparison of 20–3 0 year olds is the same as the comparison of 30–20 year olds, and so forth. (continued) 598 An Introduction to Statistical Concepts Table 16.11 (continued) Two-Factor�Randomized�Block�ANOVA�SPSS�Results�for�the�Exercise�Program�Example Homogeneous Subsets Weight Loss Tukey HSDa,b Subset Age N 1 2 3 4 3.7500 4 5.0000 5.0000 4 5.5000 4 7.0000 50 years old 40 years old 30 years old 20 years old Sig. .080 .685 1.000 Means for groups in homogeneous subsets are displayed. Based on observed means. �e error term is mean square(error) = .396. a Uses harmonic mean sample size = 4.000. b Alpha = .05. “Homogenous Subsets” provides a visual representation of the MCP. For each subset, the means that are printed are homogeneous, or not significantly different. For example, in subset 1 the mean weight loss for 50 year olds (regardless of exercise program) is 3.750. �is is statistically significantly different than the mean weight loss for individuals in the 30 and 20 year old age groups (as they are not printed in subset 1). Estimated marginal means of weight loss Es tim at ed m ar gi na l m ea ns Exercise program Age The “profile plot’’ is a graph of the mean weight loss by exercise program and age. We see that, across all age groups, the greatest weight loss was for individuals who exercised either 3 or 4 weeks. 10.00 8.00 6.00 4.00 2.00 .00 –2.00 1/week 2/week 3/week 4/week 20 years old 30 years old 40 years old 50 years old Examining Assumptions for Two-Factor Randomized Block ANOVA Normality We�use�the�residuals�(which�were�requested�and�created�through�the�“Save”�option�when� generating�our�model)�to�examine�the�extent�to�which�normality�was�met� 599Hierarchical and Randomized Block Analysis of Variance Models Generating normality evidence:�As�shown�in�previous�ANOVA�chapters,�under- standing�the�distributional�shape,�specifically�the�extent�to�which�normality�is�a�reason- able�assumption,�is�important��For�the�two-factor�randomized�block�ANOVA,�the�residuals� should�be�normally�distributed��Because�the�steps�for�generating�normality�evidence�were� presented� previously� in� the� chapter� for� the� two-factor� hierarchical� ANOVA� model,� they� will�not�be�reiterated�here� Interpreting normality evidence:�By�this�point,�we�have�had�a�substantial�amount� of� practice� in� interpreting� quite� a� range� of� normality� statistics�� Here� we� interpret� them� again,�only�now�in�reference�to�the�two-factor�randomized�block�ANOVA�model� Descriptives Residual for weight loss Mean 95% Con dence interval 5% Trimmed mean Median Variance Std. deviation Minimum Maximum Range Interquartile range Skewness Kurtosis Std. ErrorStatistic Lower bound Upper bound .0000 –.2597 .2597 .0069 .0625 .238 .48734 –.94 .81 1.75 –.154 –.496 .12183 .564 1.091 .87 for mean The�skewness�statistic�of�the�residuals�is�−�154�and�kurtosis�is�−�496—both�being�within� the�range�of�an�absolute�value�of�2�0,�suggesting�some�evidence�of�normality� As� suggested� by� the� skewness� statistic,� the� histogram� of� residuals� is� slightly� negatively� skewed,�and�the�histogram�also�provides�a�visual�display�of�the�slightly�platykurtic�distribution� Histogram Mean = –2.36E – 16 Std. dev. = .487 N = 16 0 –1.00 –.50 .00 .50 1.00 Residual for weight loss Fr eq ue nc y 1 2 3 4 5 600 An Introduction to Statistical Concepts There�are�a�few�other�statistics�that�can�be�used�to�gauge�normality��The�formal�test�of� normality,� the� S–W� test� (SW)� (Shapiro� &� Wilk,� 1965),� provides� evidence� of� the� extent� to� which� our� sample� distribution� is� statistically� different� from� a� normal� distribution�� The� output�for�the�S–W�test�is�presented�as�follows�and�suggests�that�our�sample�distribution� for� the� residuals� is� not� statistically� significantly� different� than� what� would� be� expected� from�a�normal�distribution�as�the�p�value�is�greater�than�α� Tests of Normality Kolmogorov–Smirnova Shapiro–Wilk Residual for weight loss a Lilliefors significance correction. Statistic Statistic df Sig.df Sig. .136 16 .200* .965 16 .757 *This is a lower bound of the true significance. Q–Q�plots�are�also�often�examined�to�determine�evidence�of�normality�where�quantiles� of�the�theoretical�normal�distribution�are�plotted�against�quantiles�of�the�sample�distribu- tion�� Points� that� fall� on� or� close� to� the� diagonal� line� suggest� evidence� of� normality�� The� Q–Q�plot�of�residuals�shown�in�the�following�suggests�relative�normality� 2 1 0 –1 –2 –3 –1.5 –1.0 –0.5 0.0 0.5 1.0 Observed value Normal Q–Q plot of residual for weight loss Ex pe ct ed n or m al Examination� of� the� following� boxplot� also� suggests� a� relatively� normal� distributional� shape�of�residuals�with�no�outliers� 601Hierarchical and Randomized Block Analysis of Variance Models Residual for weight loss 1.00 .50 .00 –.50 –1.00 Considering�the�forms�of�evidence�we�have�examined,�skewness�and�kurtosis�statis- tics,�the�S–W�test,�histogram,�the�Q–Q�plot,�and�the�boxplot,�all�suggest�normality�is�a� reasonable�assumption�� We�can�be�reasonably�assured�we�have�met�the�assumption�of� normality� Independence The�only�assumption�we�have�not�tested�for�yet�is�independence��As�we�discussed�in�ref- erence�to�the�one-way�ANOVA,�if�subjects�have�been�randomly�assigned�to�conditions�(in� other�words,�the�different�levels�of�the�treatment�factor�in�a�two-factor�randomized�block� ANOVA),�the�assumption�of�independence�has�likely�been�met��In�our�example,�individu- als�were�randomly�assigned�to�exercise�program,�and,�thus,�the�assumption�of�indepen- dence�was�met��However,�we�often�use�independent�variables�that�do�not�allow�random� assignment��We�can�plot�residuals�against�levels�of�our�treatment�factor�using�a�scatterplot� to�see�whether�or�not�there�are�patterns�in�the�data�and�thereby�provide�an�indication�of� whether�we�have�met�this�assumption� Please� note� that� some� researchers� do� not� believe� that� the� assumption� of� indepen- dence�can�be�tested��If�there�is�not�random�assignment�to�groups,�then�these�researchers� believe�this�assumption�has�been�violated—period��The�plot�that�we�generate�will�give� us�a�general�idea�of�patterns,�however,�in�situations�where�random�assignment�was�not� performed� Generating the scatterplot:�The�general�steps�for�generating�a�simple�scatterplot� through� “Scatter/Dot”� have� been� presented� in� previous� chapters� (e�g�,� Chapter� 10),� and�they�will�not�be�reiterated�here��From�the�“Simple Scatterplot”�dialog�screen,� click�the�residual�variable�and�move�it�into�the�“Y Axis”�box�by�clicking�on�the�arrow�� Click� the� independent� variable� that� we� wish� to� display� (e�g�,� “Exercise� Program”)� and� move�it�into�the�“X Axis”�box�by�clicking�on�the�arrow��Then�click�“OK.” 602 An Introduction to Statistical Concepts Interpreting independence evidence:�In�examining�the�scatterplot�for�evidence� of�independence,�the�points�should�fall�relatively�randomly�above�and�below�a�horizon- tal�line�at�0��(You�may�recall�in�Chapter�11�that�we�added�a�reference�line�to�the�graph� using� Chart� Editor�� To� add� a� reference� line,� double� click� on� the� graph� in� the� output� to� activate� the� chart� editor�� Select�“Options”� in� the� top� pulldown� menu,� then�“Y axis reference line.” This� will� bring� up� the�“Properties”� dialog� box�� Change� the� value� of� the� position� to� be� “0�”� Then� click� on� “Apply”� and� “Close”� to� generate� the� graph�with�a�horizontal�line�at�0�) In�this�example,�our�scatterplot�for�exercise�program�by�residual�generally�suggests�evi- dence�of�independence�with�a�relatively�random�display�of�residuals�above�and�below�the� horizontal�line�at�0��Thus,�had�we�not�met�the�assumption�of�independence�through�ran- dom�assignment�of�cases�to�groups,�this�would�have�provided�evidence�that�independence� was�a�reasonable�assumption� 603Hierarchical and Randomized Block Analysis of Variance Models 1.00 .50 .00 Re si du al fo r w ei gh t l os s –.50 –1.00 1.00 1.50 2.00 2.50 3.00 Exercise program 3.50 4.00 Two-Factor Fixed-Effects Randomized Block ANOVA n > 1
To� run� a� two-factor� randomized� block� ANOVA� for� n� >� 1,� the� procedures� are� exactly�
the� same� as� with� the� regular� two-factor� ANOVA�� However,� the� adjusted� F� tests� are� not�
available�
Friedman Test
Lastly,�the�Friedman�test�can�be�run�as�previously�described�in�Chapter�15�
Post Hoc Power for Two-Factor Randomized
Block ANOVA Using G*Power
G*Power�provides�power�calculations�for�the�two-factor�randomized�block�ANOVA�model��
In�G*Power,�just�treat�this�design�as�if�it�were�a�regular�two-factor�ANOVA�model�
16.7 Template and APA-Style Write-Up
Finally,� here� is� an� example� paragraph� just� for� the� results� of� the� two-factor� hierarchical�
ANOVA�design�(feel�free�to�write�a�similar�paragraph�for�the�two-factor�randomized�block�
ANOVA�example)��Recall�that�our�graduate�research�assistant,�Marie,�was�assisting�a�read-
ing�faculty�member,�JoAnn��JoAnn�wanted�to�know�the�following:�if�there�is�a�mean�dif-
ference�in�reading�based�on�the�approach�to�reading�and�if�there�is�a�mean�difference�in�

604 An Introduction to Statistical Concepts
reading�based�on�teacher��The�research�questions�presented�to�JoAnn�from�Marie�include�
the�following:
•� Is there a mean difference in reading based on approach to reading?
•� Is there a mean difference in reading based on teacher?
Marie�then�assisted�JoAnn�in�generating�a�two-factor�hierarchical�ANOVA�as�the�test�of�
inference,�and�a�template�for�writing�the�research�questions�for�this�design�is�presented�
as�follows��As�we�noted�in�previous�chapters,�it�is�important�to�ensure�the�reader�under-
stands�the�levels�of�the�factor(s)��This�may�be�done�parenthetically�in�the�actual�research�
question,�as�an�operational�definition,�or�specified�within�the�methods�section:
•� Is there a mean difference in [dependent variable] based on
[nonnested factor]?
•� Is there a mean difference in [dependent variable] based on
[nested factor]?
It�may�be�helpful�to�preface�the�results�of�the�two-factor�hierarchical�ANOVA�with�infor-
mation�on�an�examination�of�the�extent�to�which�the�assumptions�were�met��The�assump-
tions�include�(a)�homogeneity�of�variance�and�(b)�normality�
A two-factor hierarchical ANOVA was conducted. The nonrepeated factor
was approach to reading (basal or whole language) and the nested factor
was teacher (four teachers). The null hypotheses tested included the
following: (1) the mean reading score was equal for each of the reading
approaches, and (2) the mean reading score for each teacher was equal.
The data were screened for missingness and violation of assump-
tions prior to analysis. There were no missing data. The assumption
of homogeneity of variance was met (F(3, 20) = 1.042, p = .396). The
assumption of normality was tested via examination of the residuals.
Review of the S–W test (SW = .960, df = 24, p = .442) and skewness
(.284) and kurtosis (−.693) statistics suggested that normality was a
reasonable assumption. The boxplot displayed a relatively normal dis-
tributional shape (with no outliers) of the residuals. The Q–Q plot
and histogram suggested normality was tenable.
Here�is�an�APA-style�example�paragraph�of�results�for�the�two-factor�hierarchical�ANOVA�
(remember� that� this� will� be� prefaced� by� the� previous� paragraph� reporting� the� extent� to�
which�the�assumptions�of�the�test�were�met)�
From Table 15.10, the results for the two-factor hierarchical ANOVA
indicate the following:
1. A statistically significant main effect for approach to reading
(Fapproach = 59.559, df = 1, 2, p = .016)
2. A nonstatistically significant main effect for teacher (Fteacher =
.952, df = 2, 20, p = .403)

605Hierarchical and Randomized Block Analysis of Variance Models
Effect size was rather large for the effect of approach to read-
ing (partial η2approach = .968), with high observed power (.948), but
expectedly less so for the nonsignificant teacher effect (par-
tial η2teacher = .087, power = .192). The results of this study pro-
vide evidence to suggest that reading comprehension scores are
significantly higher for students taught by the whole language
method (M = 10.833, SE = .704) as compared to the basal method
(M = 3.333, SE = .704). The results also suggest that mean scores
for reading are comparable for children regardless of the teacher
who instructed them.
16.8 Summary
In�this�chapter,�models�involving�nested�and�blocking�factors�for�the�two-factor�case�were�
considered��Three�different�models�were�examined;�these�included�the�two-factor�hierar-
chical�design,�the�two-factor�randomized�block�design�with�one�observation�per�cell,�and�
the�two-factor�randomized�block�design�with�multiple�observations�per�cell��Included�for�
each�design�were�the�usual�topics�of�model�characteristics,�the�layout�of�the�data,�the�linear�
model,�assumptions�of�the�model�and�dealing�with�their�violation,�the�ANOVA�summary�
table� and� expected� mean� squares,� and� MCPs�� Also� included� for� particular� designs� was�
a� discussion� of� the� compound� symmetry/sphericity� assumption� and� the� Friedman� test�
based� on� ranks�� We� concluded� with� a� comparison� of� various� ANOVA� models� on� preci-
sion� and� power�� At� this� point,� you� should� have� met� the� following� objectives:� (a)� be� able�
to� understand� the� characteristics� and� concepts� underlying� hierarchical� and� randomized�
block�ANOVA�models,�(b)�be�able�to�determine�and�interpret�the�results�of�hierarchical�and�
randomized� block� ANOVA� models,� (c)� be� able� to� understand� and� evaluate� the� assump-
tions�of�hierarchical�and�randomized�block�ANOVA�models,�and�(d)�be�able�to�compare�
different� ANOVA� models� and� select� an� appropriate� model�� This� chapter� concludes� our�
extended� discussion� of� ANOVA� models�� In� the� remaining� three� chapters� of� the� text,� we�
discuss� regression� models� where� the� dependent� variable� is� predicted� by� one� or� more�
independent�variables�or�predictors�
Problems
Conceptual problems
16.1� A�researcher�wants�to�know�if�the�number�of�professional�development�courses�that�
a� teacher� completes� differs� based� on� the� format� that� the� professional� development�
is�offered�(online,�mixed�mode,�face-to-face)��The�researcher�randomly�samples�100�
teachers� employed� in� the� district�� Believing� that� years� of� teaching� experience� may�
be�a�concomitant�variable,�the�researcher�ranks�the�teachers�on�years�of�experience�
and� places� them� in� categories� that� represent� 5-year� intervals�� The� researcher� then�
randomly�selects�4�years�of�experience�blocks��The�teachers�within�those�blocks�are�
then�randomly�assigned�to�professional�development�format��Which�of�the�following�
methods�of�blocking�is�employed�here?

606 An Introduction to Statistical Concepts
� a�� Predefined�value�blocking
� b�� Predefined�range�blocking
� c�� Sampled�value�blocking
� d�� Sampled�range�blocking
16.2� To�study�the�effectiveness�of�three�spelling�methods,�45�subjects�are�randomly�selected�
from�the�fourth�graders�in�a�particular�elementary�school��Based�on�the�order�of�their�
IQ�scores,�subjects�are�grouped�into�IQ�groups�(low�=�75–99,�average�=�100–115,�high�=�
116–130),�15�in�each�group��Subjects�in�each�group�are�randomly�assigned�to�one�of�the�
three� methods� of� spelling,� five� each�� Which� of� the� following� methods� of� blocking� is�
employed�here?
� a�� Predefined�value�blocking
� b�� Predefined�range�blocking
� c�� Sampled�value�blocking
� d�� Sampled�range�blocking
16.3� A�researcher�is�examining�preschoolers’�knowledge�of�number�identification��Fifty�
preschoolers� are� grouped� based� on� socioeconomic� status� (low,� moderate,� high)��
Within� each� SES� group,� students� are� randomly� assigned� to� one� of� two� treatment�
groups:� one� which� incorporates� numbers� through� individual,� small� group,� and�
whole�group�work�with�manipulatives,�music,�and�art;�and�a�second�which�incorpo-
rates�numbers�through�whole�group�study�only��Which�of�the�following�methods�of�
blocking�is�employed�here?
� a�� Predefined�value�blocking
� b�� Predefined�range�blocking
� c�� Sampled�value�blocking
� d�� Sampled�range�blocking
16.4� If�three�teachers�employ�method�A�and�three�other�teachers�employ�method�B,�then�
which�one�of�the�following�is�suggested?
� a�� Teachers�are�nested�within�method�
� b�� Teachers�are�crossed�with�methods�
� c�� Methods�are�nested�within�teacher�
� d�� Cannot�be�determined�
16.5� The�interaction�of�factors�A�and�B�can�be�assessed�only�if�which�one�of�the�following�
occurs?
� a�� Both�factors�are�fixed�
� b�� Both�factors�are�random�
� c�� Factor�A�is�nested�within�factor�B�
� d�� Factors�A�and�B�are�crossed�
16.6� In�a�two-factor�design,�factor�A�is�nested�within�factor�B�for�which�one�of�the�following?
� a�� At�each�level�of�A,�each�level�of�B�appears�
� b�� At�each�level�of�A,�unique�levels�of�B�appear�
� c�� At�each�level�of�B,�unique�levels�of�A�appear�
� d�� Cannot�be�determined�

607Hierarchical and Randomized Block Analysis of Variance Models
16.7� Five� teachers� use� an� experimental� method� of� teaching� statistics,� and� five� other�
teachers�use�the�traditional�method��If�factor�M�is�method�of�teaching,�and�factor�T�
is�teacher,�this�design�can�be�denoted�by�which�one�of�the�following?
� a�� T(M)
� b�� T�×�M
� c�� M�×�T
� d�� M(T)
16.8� If�factor�C�is�nested�within�factors�A�and�B,�this�is�denoted�as�AB(C)��True�or�false?
16.9� A�design�in�which�all�levels�of�each�factor�are�found�in�combination�with�each�level�
of�every�other�factor�is�necessarily�a�nested�design��True�or�false?
16.10� To� determine� if� counseling� method� E� is� uniformly� superior� to� method� C� for� the�
population� of� counselors,� from� which� random� samples� are� taken� to� conduct� a�
study,�one�needs�a�nested�design�with�a�mixed�model��True�or�false?
16.11� I�assert�that�the�predefined�value�method�of�block�formation�is�more�effective�than�
the�sampled�value�method�in�reducing�unexplained�variability��Am�I�correct?
16.12� For�the�interaction�to�be�tested�in�a�two-factor�randomized�block�design,�it�is�required�
that�which�one�of�the�following�occurs?
� a�� Both�factors�be�fixed
� b�� Both�factors�be�random
� c�� n�=�1
� d�� n�>�1
16.13� Five� medical� professors� use� a� computer-based� method� of� teaching� and� five� other�
medical� professors� use� a� lecture-based� method� of� teaching�� A� researcher� is� inter-
ested�in�student�outcomes�for�those�enrolled�in�classes�taught�by�these�instructional�
methods��This�is�an�example�of�which�type�of�design?
� a�� Completely�crossed�design
� b�� Repeated�measures�design
� c�� Hierarchical�design
� d�� Randomized�block�design
16.14� In� a� randomized� block� study,� the� correlation� between� the� blocking� factor� and� the�
dependent�variable�is��35��I�assert�that�the�residual�variation�will�be�smaller�when�
using�the�blocking�variable�than�without��Am�I�correct?
16.15� A�researcher�is�interested�in�examining�the�number�of�suspensions�of�high�school�
students� based� on� random� assignment� participation� in� a� series� of� self-awareness�
workshops��The�researcher�believes�that�age�may�be�a�concomitant�variable��Applying�
a� two-factor� randomized� block� ANOVA� design� to� the� data,� age� is� an� appropriate�
blocking�factor?
16.16� In�a�two-factor�hierarchical�design�with�two�levels�of�factor�A�and�three�levels�of�
factor�B�nested�within�each�level�of�A,�how�many�F�ratios�can�be�tested?
� a�� 1
� b�� 2
� c�� 3
� d�� Cannot�be�determined

608 An Introduction to Statistical Concepts
16.17� If�the�correlation�between�the�concomitant�variable�and�dependent�variable�is�−�80,�
which�of�the�following�designs�is�recommended?
� a�� ANCOVA
� b�� One-factor�ANOVA
� c�� Randomized�block�ANOVA
� d�� All�of�the�above
16.18� IQ�must�be�used�as�a�treatment�factor��True�or�false?
16.19� Which�of�the�following�blocking�methods�best�estimates�the�treatment�effects?
� a�� Predefined�value�blocking
� b�� Post�hoc�predefined�value�blocking
� c�� Sampled�value�blocking
� d�� Sampled�range�blocking
Computational problems
16.1� An�experiment�was�conducted�to�compare�three�types�of�behavior�modification�(1,�2,�
and�3)�using�age�as�a�blocking�variable�(4-,�6-,�and�8-year-old�children)��The�mean�scores�
on�the�dependent�variable,�number�of�instances�of�disruptive�behavior,�are�listed�here�
for�each�cell��The�intention�of�the�treatments�is�to�minimize�the�number�of�disruptions�
Type of
Behavior
Modification
Age
4 Years 6 Years 8 Years
1 20 40 40
2 50 30 20
3 50 40 30
Use�these�cell�means�to�graph�the�interaction�between�type�of�behavior�modification�
and�age�
� a�� Is�there�an�interaction�between�type�of�behavior�modification�and�age?
� b�� What�kind�of�recommendation�would�you�make�to�teachers?
16.2� An�experiment�was�conducted�to�compare�four�different�preschool�curricula�that�
were�adopted�in�four�different�classrooms��Reading�readiness�proficiency�was�used�
as�a�blocking�variable�(below�proficient,�at�proficient,�above�proficient)��The�mean�
scores� on� the� dependent� variable,� letter� recognition,� are� listed� here� for� each� cell��
The�intention�of�the�treatment�(i�e�,�the�curriculum)�is�to�increase�letter�recognition�
Curriculum
Reading Readiness
Proficiency
Below At Above
1 12 20 22
2 20 24 18
3 16 16 20
4 15 18 25

609Hierarchical and Randomized Block Analysis of Variance Models
Use�these�cell�means�to�graph�the�interaction�between�curriculum�and�reading�readi-
ness�proficiency�
� a�� Is� there� an� interaction� between� type� of� curriculum� and� reading� readiness�
proficiency?
� b�� What�kind�of�recommendation�would�you�make�to�teachers?
16.3� An� experimenter� tested� three� types� of� perfume� (or� aftershave)� (tame,� sexy,� and�
musk)� when� worn� by� light-haired� and� dark-haired� women� (or� men)�� Thus,� hair�
color� is� a� blocking� variable�� The� dependent� measure� was� attractiveness� defined�
as�the�number�of�times�during�a�2-week�period�that�other�persons�complimented�
a� subject� on� their� perfume� (or� aftershave)�� There� were� five� subjects� in� each� cell��
Complete� the� ANOVA� summary� table� below,� assuming� a� fixed-effects� model,�
where�α�=��05�
Source SS df MS F
Critical
Value Decision
Perfume�(A) 200 — — — — —
Hair�color�(B) 100 — — — — —
Interaction�
(AB)
20 — — — — —
Within 240 — —
Total — —
16.4� An�experiment�was�conducted�to�determine�if�there�was�a�mean�difference�in�weight�
for�women�based�on�type�of�aerobics�exercise�program�participated�(low�impact�vs��
high� impact)��Body�mass�index�(BMI)�was�used�as�a�blocking�variable� to�represent�
below,� at,� or� above� recommended� BMI�� The� data� are� shown� as� follows�� Conduct� a�
two-factor�randomized�block�ANOVA�(α�=��05)�and�Bonferroni�MCPs�using�SPSS�to�
determine�the�results�of�the�study�
Subject
Exercise
Program BMI Weight
1 1 1 100
2 1 2 135
3 1 3 200
4 1 1 95
5 1 2 140
6 1 3 180
7 2 1 120
8 2 2 152
9 2 3 176
10 2 1 128
11 2 2 142
12 2 3 220

610 An Introduction to Statistical Concepts
16.5� A�mathematics�professor�wants�to�know�which�of�three�approaches�to�teaching�cal-
culus�resulted�in�the�best�test�performance�(Sections�16�1,�16�2,�or�16�3)��Scores�on�the�
GRE-Quantitative�(GRE-Q)�portion�were�used�as�a�blocking�variable�(block�1:�200–
400;�block�2:�401–600;�block�3:�601–800)��The�data�are�shown�as�follows��Conduct�a�
two-factor�randomized�block�ANOVA�(α�=��05)�and�Bonferroni�MCPs�using�SPSS�to�
determine�the�results�of�the�study�
Subject Section GRE-Q
Test
Score
1 1 1 90
2 1 2 93
3 1 3 100
4 2 1 88
5 2 2 90
6 2 3 97
7 3 1 79
8 3 2 85
9 3 3 92
Interpretive problems
16.1� The�following�is�the�first�one-factor�ANOVA�interpretive�problem�you�developed�in�
Chapter�11:�Using the survey 1 dataset from the website, use SPSS to conduct a one-factor
fixed-effects ANOVA, including effect size, where political view is the grouping variable�(i.e.,
independent variable)�(J = 5)�and the dependent variable is a variable of interest to you�(the
following variables look interesting: books, TV, exercise, drinks, GPA, GRE-Q, CDs, hair
appointment)��Then write an APA-style paragraph describing the results�
Take� the� one-factor� ANOVA� interpretive� problem� you� developed� in� Chapter� 11��
What� are� some� reasonable� blocking� variables� to� consider?� Which� type� of� blocking�
would�be�best�in�your�situation?�Select�this�blocking�variable�from�the�same�dataset�
and�conduct�a�two-factor�randomized�block�ANOVA��Compare�these�results�with�the�
one-factor�ANOVA�results�(without�the�blocking�factor)�to�determine�how�useful�the�
blocking�variable�was�in�terms�of�reducing�residual�variability�
16.2� The�following�is�the�second�one-factor�ANOVA�interpretive�problem�you�devel-
oped�in�Chapter�11:�Using the survey 1 dataset from the website, use SPSS to conduct
a one-factor fixed-effects ANOVA, including effect size, where hair color is the grouping
variable� (i.e., independent variable)� (J = 5)� and the dependent variable is a variable of
interest to you�(the following variables look interesting: books, TV, exercise, drinks, GPA,
GRE-Q, CDs, hair appointment)�� Then write an APA-style paragraph describing the
results�
Take� this� one-factor� ANOVA� interpretive� problem� you� developed� in� Chapter� 11��
What� are� some� reasonable� blocking� variables� to� consider?� Which� type� of� blocking�
would�be�best�in�your�situation?�Select�this�blocking�variable�from�the�same�dataset�
and�conduct�a�two-factor�randomized�ANOVA��Compare�these�results�with�the�one-
factor� ANOVA� results� (without� the� blocking� factor)� to� determine� how� useful� the�
blocking�variable�was�in�terms�of�reducing�residual�variability�

611
17
Simple Linear Regression
Chapter Outline
17�1� �Concepts�of�Simple�Linear�Regression
17�2� �Population�Simple�Linear�Regression�Model
17�3� �Sample�Simple�Linear�Regression�Model
17�3�1� �Unstandardized�Regression�Model
17�3�2� �Standardized�Regression�Model
17�3�3� �Prediction�Errors
17�3�4� �Least�Squares�Criterion
17�3�5� �Proportion�of�Predictable�Variation�(Coefficient�of�Determination)
17�3�6� �Significance�Tests�and�Confidence�Intervals
17�3�7� �Assumptions�and�Violation�of�Assumptions
17�4� �SPSS
17�5� �G*Power
17�6� �Template�and�APA-Style�Write-Up
Key Concepts
� 1�� Slope�and�intercept�of�a�straight�line
� 2�� Regression�model
� 3�� Prediction�errors/residuals
� 4�� Standardized�and�unstandardized�regression�coefficients
� 5�� Proportion�of�variation�accounted�for;�coefficient�of�determination
In� Chapter� 10,� we� considered� various� bivariate� measures� of� association�� Specifically,� the�
chapter� dealt� with� the� topics� of� scatterplot,� covariance,� types� of� correlation� coefficients,�
and�their�resulting�inferential�tests��Thus,�the�chapter�was�concerned�with�addressing�the�
question�of�the�extent�to�which�two�variables�are�associated�or�related��In�this�chapter,�
we�extend�our�discussion�of�two�variables�to�address�the�question�of�the�extent�to�which�
one�variable�can�be�used�to�predict�or�explain�another�variable�
Beginning�in�Chapter�11,�we�examined�various�analysis�of�variance�(ANOVA)�models��
It�should�be�mentioned�again�that�ANOVA�and�regression�are�both�forms�of�the�same�gen-
eral�linear�model�(GLM),�where�the�relationship�between�one�or�more�independent�variables�

612 An Introduction to Statistical Concepts
and�one�dependent�variable�is�evaluated��The�major�difference�between�the�two�procedures�
is�that�in�ANOVA,�the�independent�variables�are�discrete�variables�(i�e�,�nominal�or�ordinal),�
while�in�regression,�the�independent�variables�are�continuous�variables�(i�e�,�interval�or�ratio;�
however,�we�will�see�later�how�we�can�apply�dichotomous�variables�in�regression�models)��
Otherwise�there�is�considerable�overlap�of�these�two�procedures�in�terms�of�concepts�and�
their� implementation�� Note� that� a� continuous� variable� can� be� transformed� into� a� discrete�
variable��For�example,�the�Graduate�Record�Exam-Quantitative�(GRE_Q)�exam�is�a�continu-
ous�variable�scaled�from�200�to�800�(albeit�in�10-point�score�increments)��It�could�be�made�into�
a�discrete�variable,�such�as�low�(200–400),�average�(401–600),�and�high�(601–800)�
When� considering� the� relationship� between� two� variables� (say� X� and� Y),� the� researcher�
usually�determines�some�measure�of�relationship�between�those�variables,�such�as�a�correla-
tion�coefficient�(e�g�,�rXY,�the�Pearson�product–moment�correlation�coefficient),�as�we�did�in�
Chapter�10��Another�way�of�looking�at�how�two�variables�may�be�related�is�through�regression�
analysis,�in�terms�of�prediction�or�explanation��That�is,�we�evaluate�the�ability�of�one�variable�
to�predict�or�explain�a�second�variable��Here�we�adopt�the�usual�notation�where�X�is�defined�as�
the�independent�or�predictor variable,�and�Y�as�the�dependent�or�criterion variable�
For�example,�an�admissions�officer�might�want�to�use�GRE�scores�to�predict�graduate-
level� grade� point� averages� (GPAs)� to� make� admission� decisions� for� a� sample� of� appli-
cants� to� a� university� or� college�� The� research� question� of� interest� is� how� well� does� the�
GRE�(the�independent�or�predictor�variable)�predict�or�explain�performance�in�graduate�
school�(the�dependent�or�criterion�variable)?�This�is�an�example�of�simple�linear�regres-
sion�where�only�a�single�predictor�variable�is�included�in�the�analysis��The�utility�of�the�
GRE�in�predicting�GPA�requires�that�these�variables�have�a�correlation�different�from�0��
Otherwise� the� GRE� will� not� be� very� useful� in� predicting� GPA�� For� education� and� the�
behavioral� sciences,� the� use� of� a� single� predictor� does� not� usually� result� in� reasonable�
prediction�or�explanation��Thus,�Chapter�18�considers�the�case�of�multiple�predictor�vari-
ables�through�multiple�linear�regression�analysis�
In�this�chapter,�we�consider�the�concepts�of�slope,�intercept,�regression�model,�unstan-
dardized� and� standardized� regression� coefficients,� residuals,� proportion� of� variation�
accounted�for,�tests�of�significance,�and�statistical�assumptions��Our�objectives�are�that�by�
the�end�of�this�chapter,�you�will�be�able�to�(a)�understand�the�concepts�underlying�simple�
linear�regression,�(b)�determine�and�interpret�the�results�of�simple�linear�regression,�and�
(c)�understand�and�evaluate�the�assumptions�of�simple�linear�regression�
17.1 Concepts of Simple Linear Regression
In�this�chapter,�we�continue�to�follow�Marie�on�yet�another�statistical�analysis�adventure�
Marie� has� developed� excellent� rapport� with� the� faculty� at� her� institution� as� she� has�
assisted� them� in� statistical� analysis�� Marie� will� now� be� working� with� Randall,� an�
associate�dean�in�the�Graduate�Student�Services�office��Randall�wants�to�know�if�the�
required� entrance� exam� for� graduate� school� (specifically� the� GRE_Q)� can� be� used� to�
predict� midterm� grades�� Marie� suggests� the� following� research� question� to� Randall:�
Can midterm exam scores be predicted from the GRE_Q?�Marie�determines�that�a�simple�
linear�regression�is�the�best�statistical�procedure�to�use�to�answer�Randall’s�question��
Her�next�task�is�to�assist�Randall�in�analyzing�the�data�

613Simple Linear Regression
Let� us� consider� the� basic� concepts� involved� in� simple� linear� regression�� Many� years� ago�
when�you�had�algebra,�you�learned�about�an�equation�used�to�describe�a�straight�line,
Y bX a= +
Here�the�predictor�variable�X�is�used�to�predict�the�criterion�variable�Y��The�slope�of�the�
line�is�denoted�by�b�and�indicates�the�number�of�Y�units�the�line�changes�for�a�one-unit�
change�in�X��You�may�find�it�easier�to�think�about�the�slope�as�measuring�tilt�or�steepness��
The�Y-intercept�is�denoted�by�a�and�is�the�point�at�which�the�line�intersects�or�crosses�the�
Y�axis��To�be�more�specific,�a�is�the�value�of�Y�when�X�is�equal�to�0��Hereafter�we�use�the�term�
intercept�rather�than�Y-intercept�to�keep�it�simple�
Consider�the�plot�of�the�straight�line�Y�=�0�5X�+�1�0�as�shown�in�Figure�17�1��Here�we�see�
that�the�line�clearly�intersects�the�Y�axis�at�Y�=�1�0;�thus,�the�intercept�is�equal�to�1��The�slope�
of�a�line�is�defined,�more�specifically,�as�the�change�in�Y�(numerator)�divided�by�the�change�
in�X�(denominator)�
b
Y
X
Y Y
X X
= =
−
−
∆
∆
2 1
2 1
For�instance,�take�two�points�shown�in�Figure�17�1,�(X1,�Y1)�and�(X2,�Y2),�that�fall�on�the�
straight� line� with� coordinates� (0,� 1)� and� (4,� 3),� respectively�� We� compute� the� slope� for�
those�two�points�to�be�(3�−�1)/(4�−�0)�=�0�5��If�we�were�to�select�any�other�two�points�that�
fall�on�the�straight�line,�then�the�slope�for�those�two�points�would�also�be�equal�to�0�5��
That�is,�regardless�of�the�two�points�on�the�line�that�we�select,�the�slope�will�always�be�
the�same,�constant�value�of�0�5��This�is�true�because�we�only�need�two�points�to�define�
a�particular�straight�line��That�is,�with�the�points�(0,�1)�and�(4,�3),�we�can�draw�only�one�
straight�line�that�passes�through�both�of�those�points,�and�that�line�has�a�slope�of�0�5�and�
an�intercept�of�1�0�
Let�us�take�the�concepts�of�slope,�intercept,�and�straight�line�and�apply�them�in�the�con-
text�of�correlation�so�that�we�can�study�the�relationship�between�the�variables�X�and�Y��
3.00
2.50
2.00
1.50
1.00
Y
0.00 1.00 2.00
X
3.00 4.00 FIGuRe 17.1
Plot�of�line:�Y�=�0�5X�+�1�0�

614 An Introduction to Statistical Concepts
If�the�slope�of�the�line�is�a�positive�value�(e�g�,�Figure�17�1),�as�X�increases�Y�also�increases,�
then�the�correlation�will�be�positive��If�the�slope�of�the�line�is�0,�such�that�the�line�is�paral-
lel�or�horizontal�to�the�X�axis,�as�X�increases�Y�remains�constant,�then�the�correlation�will�
be� 0�� If� the� slope� of� the� line� is� a� negative� value,� as� X� increases� Y� decreases� (i�e�,� the� line�
decreases� from� left� to� right),� then� the� correlation� will� be� negative�� Thus,� the� sign� of� the�
slope�corresponds�to�the�sign�of�the�correlation�
17.2 Population Simple Linear Regression Model
Let�us�take�these�concepts�and�apply�them�to�simple�linear�regression��Consider�the�situ-
ation�where�we�have�the�entire�population�of�individual’s�scores�on�both�variables�X�(the�
independent�variable,�such�as�GRE)�and�Y�(the�dependent�variable,�such�as�GPA)��We�define�
the�linear�regression�model�as�the�equation�for�a�straight�line��This�yields�an�equation�for�
the�regression�of�Y�the�criterion,�given�X�the�predictor,�often�stated�as�the�regression�of�Y�
on�X,�although�more�easily�understood�as�Y�being�predicted�by�X�
The�population regression model�for�Y�being�predicted�by�X�is
Y Xi YX i YX i= + +β α ε
where
Y�is�the�criterion�variable
X�is�the�predictor�variable
βYX�is�the�population�slope�for�Y�predicted�by�X
αYX�is�the�population�intercept�for�Y�predicted�by�X
εi�are�the�population�residuals�or�errors�of�prediction�(the�part�of�Yi�not�predicted�from�Xi)
i�represents�an�index�for�a�particular�case�(an�individual�or�object;�in�other�words,�the�
unit�of�analysis�that�has�been�measured)
The�index�i�can�take�on�values�from�1�to�N,�where�N�is�the�size�of�the�population,�written�
as�i�=�1,…,�N�
The�population prediction model�is
Y Xi YX i YX′ = +β α
where� Y′i� is� the� predicted� value� of� Y� for� a� specific� value� of� X�� That� is,� Yi� is� the� actual or
observed score�obtained�by�individual�i,�while�Y′i�is�the�predicted score�based�on�their�X�score�
for�that�same�individual�(in�other�words,�you�are�using�the�value�of�X�to�predict�what�Y�
will�be)��Thus,�we�see�that�the�population�prediction�error�is�defined�as�follows:
εi i iY Y= − ′
There� is� only� one� difference� between� the� regression� and� prediction� models�� The� regres-
sion�model�explicitly�includes�prediction�error�as�εi,�whereas�the�prediction�model�includes�
prediction�error�implicitly�as�part�of�the�predicted�score�Y′i�(i�e�,�there�is�some�error�in�the�
predicted�values)�

615Simple Linear Regression
Consider�for�a�moment�a�practical�application�of�the�difference�between�the�regression�
and� prediction� models�� Frequently� a� researcher� will� develop� a� regression� model� for� a�
population� where� X� and� Y� are� both� known,� and� then� use� the� prediction� model� to� actu-
ally� predict� Y� when� only� X� is� known� (i�e�,� Y� will� not� be� known� until� later)�� Using� the�
GRE�example,�the�admissions�officer�first�develops�a�regression�model�for�a�population�of�
students�currently�attending�the�university�so�as�to�have�a�current�measure�of�GPA��This�
yields� the� slope� and� intercept�� Then� the� prediction� model� is� used� to� predict� future� GPA�
and� to� help� make� admission� decisions� for� next� year’s� population� of� applicants� based� on�
their�GRE�scores�
A�simple�method�for�determining�the�population�slope�(βYX)�and�intercept�(αYX)�is�com-
puted�as
β ρ
σ
σ
YX XY
Y
X
=
and
α µ β µYX Y YX X= −
where
σY�and�σX�are�the�population�standard�deviations�for�Y�and�X�respectively
ρXY�is�the�population�correlation�between�X�and�Y�(simply�the�Pearson�correlation�coef-
ficient,�rho)
μY�and�μX�are�the�population�means�for�Y�and�X�respectively
Note�that�the�previously�used�mathematical�method�for�determining�the�slope�and�inter-
cept�of�a�straight�line�is�not�appropriate�in�regression�analysis�with�real�data�
17.3 Sample Simple Linear Regression Model
17.3.1 unstandardized Regression Model
Let�us�return�to�the�real�world�of�sample�statistics�and�consider�the�sample�simple�linear�
regression�model��As�usual,�Greek�letters�refer�to�population�parameters,�and�English�letters�
refer�to�sample�statistics��The�sample regression model�for�predicting�Y�from�X�is�computed�
as�follows:
Y b X a ei YX i YX i= + +
where
Y�and�X�are�as�before�(i�e�,�the�dependent�and�independent�variables,�respectively)
bYX�is�the�sample�slope�for�Y�predicted�by�X
aYX�is�the�sample�intercept�for�Y�predicted�by�X
ei�are�sample�residuals�or�errors�of�prediction�(the�part�of�Yi�not�predictable�from�Xi)
i�represents�an�index�for�a�case�(an�individual�or�object)

616 An Introduction to Statistical Concepts
The�index�i�can�take�on�values�from�1�to�n,�where�n�is�the�size�of�the�sample,�and�is�written�
as�i�=�1,…,�n�
The�sample prediction model�is�computed�as�follows:
Y b X ai YX i YX′ = +
where�Y′i�is�the�predicted�value�of�Y�for�a�specific�value�of�X��We�define�the�sample�predic-
tion�error�as�the�difference�between�the�actual score�obtained�by�individual�i�(i�e�,�Yi)�and�the�
predicted score�based�on�the�X�score�for�that�individual�(i�e�,�Y′i)��In�other�words,�the�residual�
is�that�part�of�Y�that�is�not�predicted�by�X��The�goal�of�the�prediction�model�is�to�include�
an�independent�variable�X�that�minimizes�the�residual;�this�means�that�the�independent�
variable�does�a�nice�job�of�predicting�the�outcome��Computationally,�the�residual�(or�error)�
is�computed�as�follows:
e Y Yi i i= − ′
The� difference� between� the� regression� and� prediction� models� is� the� same� as� previously�
discussed,�except�now�we�are�dealing�with�a�sample�rather�than�a�population�
The�sample�slope�(bYX)�and�intercept�(aYX)�can�be�determined�by
b r
s
s
YX XY
Y
X
=
and
a Y b XYX YX= −
where
sY�and�sX�are�the�sample�standard�deviations�for�Y�and�X�respectively
rXY�is�the�sample�correlation�between�X�and�Y�(again�the�Pearson�correlation�coefficient,�rho)
Y
–
�and�X
–
�are�the�sample�means�for�Y�and�X,�respectively
The�sample�slope�(bYX)�is�referred�to�alternately�as�(a)�the�expected�or�predicted�change�in�
Y�for�a�one-unit�change�in�X�and�(b)�the�unstandardized�or�raw�regression�coefficient��The�
sample�intercept�(aYX)�is�referred�to�alternately�as�(a)�the�point�at�which�the�regression�line�
intersects�(or�crosses)�the�Y�axis�and�(b)�the�value�of�Y�when�X�is�0�
Consider�now�the�analysis�of�a�realistic�example�to�be�followed�throughout�this�chap-
ter��Let�us�use�the�GRE_Q�subtest�to�predict�midterm�scores�of�an�introductory�statistics�
course��The�GRE_Q�has�a�possible�range�of�20–80�points�(if�we�remove�the�unnecessary�
last�digit�of�zero),�and�the�statistics�midterm�has�a�possible�range�of�0–50�points��Given�the�
sample� of� 10� statistics� students� shown� in� Table� 17�1,� let� us� work� through� a� simple� linear�
regression�analysis��The�observation�numbers�(i�=�1,…,�10),�and�values�for�the�GRE_Q�(the�
independent�variable,�X)�and�midterm�(the�dependent�variable,�Y)�variables�are�given�in�
the�first�three�columns�of�the�table,�respectively��The�other�columns�are�discussed�as�we�
go�along�

617Simple Linear Regression
The�sample�statistics�for�the�GRE_Q�(the�independent�variable)�are�X
–
�=�55�5�and�sX�=�13�1339,�
for�the�statistics�midterm�(the�dependent�variable)�are�Y
–
�=�38�and�sY�=�7�5130,�and�the�correlation�
rXY�is�0�9177��The�sample�slope�(bYX)�and�intercept�(aYX)�are�computed�as�follows:
b r
s
s
YX XY
Y
X
= = =0 9177
7 5130
13 1339
0 5250.
.
.
.
and
a Y b XYX YX= − = − =38 0 5250 55 5 8 8625. ( . ) .
Let� us� interpret� the� slope� and� intercept� values�� A� slope� of� 0�5250� means� that� if� your�
score� on� the� GRE_Q� is� increased� by� one� point,� then� your� predicted� score� on� the� sta-
tistics� midterm� (i�e�,� the� dependent� variable)� will� be� increased� by� 0�5250� points� or�
about�half�a�point��An�intercept�of�8�8625�means�that�if�your�score�on�the�GRE_Q�is�0�
(although�not�possible�as�you�receive�200�points�just�for�showing�up),�then�your�score�
on� the� statistics� midterm� is� 8�8625�� The� sample� simple� linear� regression� model,� given�
these�values,�becomes
Y b X a e X ei YX i YX i i i= + + = + +. .525 8 86250
If�your�score�on�the�GRE_Q�is�63,�then�your�predicted�score�on�the�statistics�midterm�is�the�
following:
Y i′ = + =. . .525 (63) 8 8625 41 93750
Thus,�based�on�the�prediction�model�developed,�your�predicted�score�on�the�midterm�is�
approximately�42;�however,�as�becomes�evident,�predictions�are�generally�not�perfect�
Table 17.1
Statistics�Midterm�Example�Regression�Data
Student GRE_Q (X) Midterm (Y) Residual (e)
Predicted
Midterm (Y′)
1 37 32 3�7125 28�2875
2 45 36 3�5125 32�4875
3 43 27 −4�4375 31�4375
4 50 34 −1�1125 35�1125
5 65 45 2�0125 42�9875
6 72 49 2�3375 46�6625
7 61 42 1�1125 40�8875
8 57 38 −0�7875 38�7875
9 48 30 −4�0625 34�0625
10 77 47 −2�2875 49�2875

618 An Introduction to Statistical Concepts
17.3.2 Standardized Regression Model
Up�until�now,�the�computations�in�simple�linear�regression�have�involved�the�use�of�raw�
scores��For�this�reason,�we�call�this�the�unstandardized regression model��The�slope�estimate�
is�an�unstandardized�or�raw�regression�slope�because�it�is�the�predicted�change�in�Y�raw�
score�units�for�a�one�raw�score�unit�change�in�X��We�can�also�express�regression�in�stan-
dard�z�score�units�for�both�X�and�Y�as
z X
X X
s
i
i
X
( ) =
−
and
z Y
Y Y
s
i
i
Y
( ) =
−
In�both�cases,�the�numerator�is�the�difference�between�the�observed�score�and�the�mean,�
and�the�denominator�is�the�standard�deviation�(and�dividing�by�the�standard�deviation,�
standardizes�the�value)��The�means�and�variances�of�both�standardized�variables�(i�e�,�zX�
and�zY)�are�0�and�1,�respectively�
The�sample�standardized�linear�prediction�model�becomes�the�following,�where�z Yi( )′ �is�
the�standardized�predicted�value�of�Y:
z Y b z X r z Xi YX i XY i( ) * ( ) ( )′ = =
Thus,� the� standardized� regression� slope,�bYX* ,� sometimes� referred� to� as� a� beta weight,� is�
equal�to�rXY��No�intercept�term�is�necessary�in�the�prediction�model�as�the�mean�of�the�z�
scores�for�both�X�and�Y�is�0�(i�e�,�a z b zYX Y YX X* *= − = 0)��In�summary,�the standardized slope is
equal to the correlation coefficient,�and�the standardized intercept is equal to 0�
For�our�statistics�midterm�example,�the�sample�standardized�linear�prediction�model�is
z Y z Xi i( ) . ( )′ = 9177
The�slope�of��9177�would�be�interpreted�as�the�expected�increase�in�the�statistics�midterm�
in� z� score� (i�e�,� standardized� score)� units� for� a� one� z� score� (i�e�,� standardized� score)� unit�
increase� in� the� GRE_Q�� A� one� z� score� unit� increase� is� also� the� same� as� a� one� standard�
deviation�increase�because�the�standard�deviation�of�z�is�equal�to�1�(recall�from�Chapter�4�
that�the�mean�of�a�standardized�z�score�is�0�with�a�standard�deviation�of�1)�
When� should� you� consider� use� of� the� standardized� versus� unstandardized� regres-
sion� analyses?� According� to� Pedhazur� (1997),� the� standardized� regression� slope� b*� is�
not� very� stable� from� sample� to� sample�� For� example,� at� Ivy-Covered� University,� the�
standardized� regression� slope� b*� would� vary� across� different� graduating� classes� (or�
samples),�whereas�the�unstandardized�regression�slope�b�would�be�much�more�consis-
tent�across�classes��Thus,�in�simple�regression,�most�researchers�prefer�the�use�of�b��We�
see�later�that�the�standardized�regression�slope�b*�has�some�utility�in�multiple�regres-
sion�analysis�

619Simple Linear Regression
17.3.3 prediction errors
Previously�we�mentioned�that�perfect�prediction�of�Y�from�X�is�extremely�unlikely,�only�
occurring�with�a�perfect�correlation�between�X�and�Y�(i�e�,�rXY�=�±1�0)��When�developing�the�
regression�model,�the�values�of�the�outcome,�Y,�are�known��Once�the�slope�and�intercept�
have� been� estimated,� we� can� then� use� the� prediction� model� to� predict� the� outcome� (Y)�
from�the�independent�variable�(X)�when�the�values�of�Y�are�unknown��We�have�already�
defined�the�predicted�values�of�Y�as�Y′��In�other�words,�a�predicted�value�Y′�can�be�com-
puted�by�plugging�the�obtained�value�for�X�into�the�prediction�model��It�can�be�shown�that�
Y′i�=�Yi�for�all�i�only�when�there�is�perfect�prediction��However,�this�is�extremely�unlikely�in�
reality,�particularly�in�simple�linear�regression�using�a�single�predictor�
We�can�determine�a�value�of�Y′�for�each�of�the�i�cases�(individuals�or�objects)�from�the�
prediction�model��In�comparing�the�actual�Y�values�to�the�predicted�Y�values,�we�obtain�
the� residuals� as� the� difference� between� the� observed� (Yi)� and� predicted� values� (Y′i),� com-
puted�as�follows:
e Y Yi i i= − ′
for�all�i�=�1,…,�n�individuals�or�objects�in�the�sample��The�residuals,�ei,�are�also�known�
as�errors of estimate,�or�prediction errors,�and�are�that�portion�of�Yi�that�is�not�predict-
able�from�Xi��The�residual�terms�are�random�values�that�are�unique�to�each�individual�
or�object�
The�residuals�and�predicted�values�for�the�statistics�midterm�example�are�shown�in�the�
last�two�columns�of�Table�17�1,�respectively��Consider�observation�2,�where�the�observed�
GRE_Q�score�is�45�and�the�observed�midterm�score�is�36��The�predicted�midterm�score�
is�32�4875�and�the�residual�is�+3�5125��This�indicates�that�person�2�had�a�higher�observed�
midterm�score�than�was�predicted�using�the�GRE_Q�as�a�predictor��We�see�that�a�posi-
tive�residual�indicates�the�observed�criterion�score�is�larger�than�the�predicted�criterion�
score,�whereas�a�negative�residual�(such�as�in�observation�3)�indicates�the�observed�crite-
rion�score�is�smaller�than�the�predicted�criterion�score��For�observation�3,�the�observed�
GRE_Q� score� is� 43,� the� observed� midterm� score� is� 27,� the� predicted� midterm� score� is�
31�4375,�and,�thus,�the�residual�is�−4�4375��Person�2�scored�higher�on�the�midterm�than�we�
predicted,�and�person�3�scored�lower�on�the�midterm�than�we�predicted�
The�regression�example�is�shown�graphically�in�the�scatterplot�of�Figure�17�2,�where�
the� straight� diagonal� line� represents� the� regression� line�� Individuals� falling� above� the�
regression� line� have� positive� residuals� (e�g�,� observation� 1)� (in� other� words,� the� differ-
ence�between�the�observed�score,�represented�as�open�circle�1�on�the�graph,�is�greater�in�
value�than�the�predicted�value,�which�is�represented�by�the�regression�line),�and�indi-
viduals�falling�below�the�regression�line�have�negative�residuals�(e�g�,�observation�3)�(in�
other�words,�the�difference�between�the�observed�score,�represented�as�open�circle�3�on�
the�graph,�is�less�in�value�than�the�predicted�value,�which�is�represented�by�the�regres-
sion�line)��The�residual�is,�very�simply,�the�vertical�distance�between�the�observed�score�
[represented�by�the�open�circles�or�“dots”�in�the�scatterplot�(Figure�17�2)]�and�the�regres-
sion�line��In�the�residual�column�of�Table�17�1,�we�see�that�half�of�the�residuals�are�posi-
tive�and�half�negative,�and�in�Figure�17�2,�that�half�of�the�points�fall�above�the�regression�
line�and�half�below�the�regression�line��It�can�be�shown�that�the�mean�of�the�residuals�is�
always�0�(i�e�,�e–�=�0),�as�the�sum�of�the�residuals�is�always�0��This�results�from�the�fact�that�
the�mean�of�the�observed�criterion�scores�is�equal�to�the�mean�of�the�predicted�criterion�
scores�(i�e�,�Y
–
�=�Y
–′�38�for�the�example�data)�

620 An Introduction to Statistical Concepts
17.3.4 least Squares Criterion
How� was� one� particular� method� selected� for� determining� the� slope� and� intercept?�
Obviously,�some�standard�procedure�has�to�be�used��Thus,�there�are�statistical�criteria�that�
help�us�decide�which�method�to�use�in�determining�the�slope�and�intercept��The�criterion�
usually� used� in� linear� regression� analysis� (and� in� all� GLMs,� for� that� matter)� is� the� least
squares criterion��According�to�the�least�squares�criterion,�the�sum�of�the�squared�predic-
tion�errors�or�residuals�is�smallest��That�is,�we�want�to�find�that�regression�line,�defined�by�
a�particular�slope�and�intercept,�which�results�in�the�smallest�sum�of�the�squared�residuals�
(recall�that�the�residual�is�the�difference�between�the�observed�and�predicted�values�for�the�
outcome)��Since�the�residual�is�the�vertical�difference�between�the�observed�and�predicted�
value,� the� regression� line� is� simply� the� line� that� minimizes� that� vertical� distance�� Given�
the�value�that�we�place�on�the�accuracy�of�prediction,�this�is�the�most�logical�choice�of�a�
method�for�estimating�the�slope�and�intercept�
In�summary�then,�the�least�squares�criterion�gives�us�a�particular�slope�and�intercept,�
and�thus�a�particular�regression�line,�such�that�the�sum�of�the�squared�residuals�is�small-
est�� We� often� refer� to� this� particular� method� for� determining� the� slope� and� intercept� as�
least squares estimation� because� b� and� a� represent� sample� estimates� of� the� population�
parameters�β�and�α�obtained�using�the�least�squares�criterion�
17.3.5 proportion of predictable Variation (Coefficient of determination)
How�well�is�the�criterion�variable�Y�predicted�by�the�predictor�variable�X?�For�our�example,�
we�want�to�know�how�well�the�statistics�midterm�scores�are�predicted�by�the�GRE_Q��Let�us�
consider�two�possible�situations�with�respect�to�this�example��First,�if�the�GRE_Q�is�found�to�
be�a�really�good�predictor�of�statistics�midterm�scores,�then�instructors�could�use�the�GRE_Q�
information�to�individualize�their�instruction�to�the�skill�level�of�each�student�or�class��They�
could,� for� example,� provide� special� instruction� to� those� students� with� low� GRE_Q� scores,�
or� in� general,� adjust� the� level� of� instruction� to� fit� the� quantitative� skills� of� their� students��
50.00
45.00
40.00
35.00
M
id
te
rm
e
xa
m
sc
or
e
30.00
1
3
25.00
30.00 40.00 50.00 60.00
GRE_Q
70.00 80.00
Imagine a point on
the regression line
directly below (or
above) each open
dot in the
scatterplot. The
vertical distance
from the observed
score (i.e., the
open dot) and the
regression line is
the residual.
This closed dot represents the
predicted value for the dependent
variable. Although not shown, each
observed value (i.e., each open dot)
has a predicted value just like this
closed dot on the regression line.
FIGuRe 17.2
Scatterplot�for�midterm�example�

621Simple Linear Regression
Second,�if�the�GRE_Q�is�not�found�to�be�a�very�good�predictor�of�statistics�midterm�scores,�
then�instructors�would�not�find�very�much�use�for�the�GRE_Q�in�terms�of�their�preparation�
for� the� statistics� course�� They� could� search� for� some� other� more� useful� predictor,� such� as�
prior�grades�in�quantitatively�oriented�courses�or�the�number�of�years�since�the�student�had�
taken�algebra��In�other�words,�if�a�predictor�is�not�found�to�be�particularly�useful�in�predict-
ing�the�criterion�variable,�then�other�relevant�predictors�should�be�considered�
How�do�we�determine�the�utility�of�a�predictor�variable?�The�simplest�method�involves�par-
titioning�the�total�sum�of�squares�in�Y,�which�we�denote�as�SStotal�(sometimes�written�as�SSY)��
This�process�is�much�like�partitioning�the�sum�of�squares�in�ANOVA�
In�simple�linear�regression,�we�can�partition�SStotal�into
SS SS SStotal reg res= +
( ) ( ) ( )Y Y Y Y Y Y
i
n
i
n
i
n
− = ′ − + − ′
= = =
∑ ∑ ∑2
1
2
1
2
1
where
SStotal�is�the�total�sum�of�squares�in�Y
SSreg�is�the�sum�of�squares�of�the�regression�of�Y�predicted�by�X�(sometimes�written�as�
SSY′)�(and�represented�in�the�equation�as� ( )
2
1
′ −
=
∑ Y Y
i
n
)
SSres� is� the� sum� of� squares� of� the� residuals� (and� represented� in� the� equation� as�
( )Y Y
i
n
− ′
=
∑ 2
1
),�and�the�sums�are�taken�over�all�observations�from�i�=�1,…,�n
Thus,�SStotal�represents�the�total�variation�in�the�observed�Y�scores,�SSreg�the�variation�in�Y�
predicted�by�X,�and�SSres�the�variation�in�Y�not�predicted�by�X�
The�equation�for�SSreg�uses�information�about�the�difference�between�the�predicted�value�
of�Y�and�the�mean�of�Y:� ( ) .′ −
=
∑ Y Y
i
n
2
1
�Thus,�the�SSreg�is�essentially�examining�how�much�bet-
ter�the�line�of�best�fit�(i�e�,�the�predicted�value�of�Y)�is�as�compared�to�the�mean�of�Y�(recall�
that�a�slope�of�0�is�a�horizontal�line,�which�is�the�mean�of�Y)��The�equation�for�SSres�uses�
information�about�the�difference�between�the�observed�value�of�Y�and�the�predicted�value�
of�Y:� ( ) .Y Y
i
n
− ′
=
∑ 2
1
�Thus,�the�SSres�is�providing�an�indication�of�how�“off”�or�inaccurate�the�
model�is��The�closer�SSres�is�to�0,�the�better�the�model�fit�(as�more�variability�of�the�depen-
dent�variable�is�being�explained�by�the�model;�in�other�words,�the�independent�variables�
are�doing�a�good�job�of�prediction�when�the�SSres�is�smaller)��Since�r SS SSXY reg total
2 = ,�we�can�
write�SStotal,�SSreg,�and�SSres�as�follows:
SS
n Y Y
n
total
i
n
i
n
=
−






= =
∑ ∑2
1 1
2
SS r SSreg XY total=
2
SS r SSres XY total= −( )1
2

622 An Introduction to Statistical Concepts
where�rXY
2 �is�the�squared�sample�correlation�between�X�and�Y,�commonly�referred�to�as�the�
coefficient of determination��The�coefficient�of�determination�in�simple�linear�regression�
is� not� only� the� squared� simple� bivariate� Pearson� correlation� between� X� and� Y� but� also�
r
SS
SS
XY
reg
total
2 = ,�which�tells�us�that�it�is�the�proportion�of�the�total�variation�of�the�dependent�
variable�(i�e�,�the�denominator)�that�has�been�explained�by�the�regression�model�(i�e�,�the�
numerator)�
There�is�no�objective�gold�standard�as�to�how�large�the�coefficient�of�determination�needs�
to�be�in�order�to�say�a�meaningful�proportion�of�variation�has�been�predicted��The�coefficient�
is�determined,�not�just�by�the�quality�of�the�one�predictor�variable�included�in�the�model,�
but�also�by�the�quality�of�relevant�predictor�variables�not�included�in�the�model�and�by�the�
amount�of�total�variation�in�Y��However,�the�coefficient�of�determination�can�be�used�both�as�
a�measure�of�effect�size�and�as�a�test�of�significance�(described�in�the�next�section)��According�
to�the�subjective�standards�of�Cohen�(1988),�a�small�effect�size�is�defined�as�r�=��10�or r2�=��01,�
a�medium�effect�size�as�r�=��30�or r2�=��09,�and�a�large�effect�size�as�r�=��50�or r2�=��25��For�addi-
tional�information�on�effect�size�measures�in�regression,�we�suggest�you�consider�Steiger�and�
Fouladi�(1992),�Mendoza�and�Stafford�(2001),�and�Smithson�(2001;�which�also�includes�some�
discussion�of�power)�
With� the� sample� data� of� predicting� midterm� statistics� scores� from� the� GRE_Q,� let� us�
determine�the�sums�of�squares��We�can�write�SStotal�as�follows:
SS
n Y Y
n
total
i
n
i
n
=
−






=
−
==
=
∑ ∑2
1 1
2
210 14 948 380
10
508 00
( , ) ( )
. 000
We�already�know�that�rXY�=��9177,�so�squaring�it,�we�obtain�rXY
2 .= 8422��Next�we�can�deter-
mine�SSreg�and�SSres�as�follows:
SS r SSreg XY total= = =
2 8422 508 0000 427 8376. ( . ) .
SS r SSres XY total= − = − =( ) ( . )( . ) .1 1 8422 508 0000 80 1624
2
Given�the�squared�correlation�between�X�and�Y�(rXY
2 = .8422),�the�GRE_Q�predicts�approxi-
mately�84%�of�the�variation�in�the�midterm�statistics�exam,�which�is�clearly�a�large�effect�
size��Significance�tests�are�discussed�in�the�next�section�
17.3.6 Significance Tests and Confidence Intervals
This� section� describes� four� procedures� used� in� the� simple� linear� regression� context��
The�first�two�are�tests�of�statistical�significance�that�generally�involve�testing�whether�
or�not�X�is�a�significant�predictor�of�Y��Then�we�consider�two�confidence�interval�(CI)�
techniques�

623Simple Linear Regression
17.3.6.1 Test of Significance of rXY
2
The�first�test�is�the�test�of�the�significance�of�rXY
2 �(alternatively�known�as�the�test�of�the�propor-
tion�of�variation�in�Y�predicted�or�explained�by�X)��It�is�important�that�rXY
2 �be�different�from�0�
in�order�to�have�reasonable�prediction��The�null�and�alternative�hypotheses,�respectively,�are�
as�follows,�where�the�null�indicates�that�the�correlation�between�X�and�Y�will�be�0:
H XY0
2: 0ρ =
H XY1: ρ
2 0>
This�test�is�based�on�the�following�test�statistic:
F
r m
r n m
=
− − −
2
21 1
/
( ) /( )
where
F�indicates�that�this�is�an�F�statistic
r2�is�the�coefficient�of�determination
1�−�r2�is�the�proportion�of�variation�in�Y�that�is�not�predicted�by�X
m�is�the�number�of�predictors�(which�in�the�case�of�simple�linear�regression�is�always�1)
n�is�the�sample�size
The�F�test�statistic�is�compared�to�the�F�critical�value,�always�a�one-tailed�test�(given�that�
a� squared� value� cannot� be� negative),� and� at� the� designated� level� of� significance� α,� with�
degrees�of�freedom�equal�to�m�(i�e�,�the�number�of�independent�variables)�and�(n − m�−�1),�
as�taken�from�the�F�table�in�Table�A�4��That�is,�the�tabled�critical�value�is�αFm,�(n − m�−�1)�
For�the�statistics�midterm�example,�we�determine�the�test�statistic�to�be�the�following:
F
r m
r n m
=
− − −
=
− − −
=
2
21 1
8422 1
1 8422 10 1 1
42 6971
/
( )/( )
. /
( . )/( )
.
From�Table�A�4,�the�critical�value,�at�the��05�level�of�significance,�with�degrees�of�freedom�
of�1�(i�e�,�one�predictor)�and�8�(i�e�,�n − m�−�1�=�10�−�1�−�1�=�8),�is��05F1,8�=�5�32��The�test�statistic�
exceeds�the�critical�value;�thus,�we�reject�H0�and�conclude�that�ρXY
2 �is�not�equal�to�0�at�the�
�05�level�of�significance�(i�e�,�GRE_Q�does�predict�a�significant�proportion�of�the�variation�
on�the�midterm�exam)�
17.3.6.2 Test of Significance of bYX
The�second�test�is�the�test�of�the�significance�of�the�slope�or�regression�coefficient,�bYX��In�
other�words,�is�the�unstandardized�regression�coefficient�statistically�significantly�differ-
ent� from� 0?� This� is� actually� the� same� as� the� test� of� b*,� the� standardized� regression� coef-
ficient,�so�we�need�not�develop�a�separate�test�for�b*��The�null�and�alternative�hypotheses,�
respectively,�are�as�follows:
H YX0 0: β =
H YX1 0: β ≠

624 An Introduction to Statistical Concepts
To� test� whether� the� regression� coefficient� is� equal� to�0,� we� need�a� standard� error� for� the�
slope�b��However,�first�we�need�to�develop�some�new�concepts��The�first�new�concept�is�the�
variance error of estimate��Although�this�is�the�correct�term,�it�is�easier�to�consider�this�as�
the�variance of the residuals��The�variance�error�of�estimate,�or�variance�of�the�residuals,�
is�defined�as
s e df SS df MSres i res res res res
2 2/ /= = =Σ
where�the�summation�is�taken�from�i�=�1,…,�n�and�dfres�=�(n − m�−�1)�(or�n�−�2�if�there�is�
only�a�single�predictor)��Two�degrees�of�freedom�are�lost�because�we�have�to�estimate�the�
population�slope�and�intercept,�β�and�α,�from�the�sample�data��The�variance�error�of�esti-
mate�indicates�the�amount�of�variation�among�the�residuals��If�there�are�some�extremely�
large�residuals,�this�will�result�in�a�relatively�large�value�of�s2res,�indicating�poor�prediction�
overall��If�the�residuals�are�generally�small,�this�will�result�in�a�comparatively�small�value�
of�s2res,�indicating�good�prediction�overall�
The�next�new�concept�is�the�standard error of estimate�(sometimes�known�as�the�root�
mean�square�error)��The�standard�error�of�estimate�is�simply�the�positive�square�root�of�the�
variance�error�of�estimate�and�thus�is�the�standard�deviation�of�the�residuals�or�errors�of�
estimate��We�denote�the�standard�error�of�estimate�as�sres�
The�final�new�concept�is�the�standard error of�b��We�denote�the�standard�error�of�b�as�sb�
and�define�it�as
s
s
n X X n
s
SS
b
res res
X
=
− ( )





=
∑ ∑2
2
where� the� summation� is� taken� over� i� =� 1,…,� n�� We� want� sb� to� be� small� to� reject� H0,� so�
we�need�sres�to�be�small�and�SSX�to�be�large��In�other�words,�we�want�there�to�be�a�large�
spread�of�scores�in�X��If�the�variability�in�X�is�small,�it�is�difficult�for�X�to�be�a�significant�
predictor�of�Y�
Now�we�can�put�these�concepts�together�into�a�test�statistic�to�test�the�significance�of�the�
slope�b��As�in�many�significance�tests,�the�test�statistic�is�formed�by�the�ratio�of�a�parameter�
estimate�divided�by�its�respective�standard�error��A�ratio�of�the�parameter�estimate�of�the�
slope�b�to�its�standard�error�sb�is�formed�as�follows:
t
b
sb
=
The�test�statistic�t�is�compared�to�the�critical�values�of�t�(in�Table�A�2),�a�two-tailed�test�for�a�
nondirectional�H1,�at�the�designated�level�of�significance�α,�and�with�degrees�of�freedom�of�
(n − m�−�1)��That�is,�the�tabled�critical�values�are�±(α/2)�t(n − m�−�1)�for�a�two-tailed�test�
In�addition,�all�other�things�being�equal�(i�e�,�same�data,�same�degrees�of�freedom,�same�
level� of� significance),� both� of� these� significance� tests� (i�e�,� the� test� of� significance� of� the�
squared� bivariate� correlation� between� X� and� Y� and� the� test� of� significance� of� the� slope)�
will�yield�the�exact�same�result��That�is,�if�X�is�a�significant�predictor�of�Y,�then�H0�will�be�

625Simple Linear Regression
rejected�in�both�tests��If�X is not�a�significant�predictor�of�Y,�then�H0�will�not�be�rejected�for�
either�test��In�simple�linear�regression,�each�of�these�tests�is�a�method�for�testing�the�same�
general�hypothesis�and�logically�should�lead�the�researcher�to�the�exact�same�conclusion��
Thus,�there�is�no�need�to�implement�both�tests�
We�can�also�form�a�CI�around�the�slope�b��As�in�most�CI�procedures,�it�follows�the�form�
of�the�sample�estimate�plus�or�minus�the�tabled�critical�value�multiplied�by�the�standard�
error��The�CI�around�b�is�formed�as�follows:
CI ( ) 2 1b b t sn m b= ± − −( / ) ( )α
Recall�that�the�null�hypothesis�was�written�as�H0:�β�=�0��Therefore,�if�the�CI�contains�0,�then�
β�is�not�significantly�different�from�0�at�the�specified�α�level��This�is�interpreted�to�mean�
that�in�(1�−�α)%�of�the�sample�CIs�that�would�be�formed�from�multiple�samples,�β�will�be�
included��This�procedure�assumes�homogeneity�of�variance�(discussed�later�in�this�chap-
ter);�for�alternative�procedures,�see�Wilcox�(1996,�2003)�
Now� we� can� determine� the� second� test� statistic� for� the� midterm� statistics� example��
We�specify�H0:�β�=�0�(i�e�,�the�null�hypothesis�is�that�the�slope�is�equal�to�0;�visually�a�
slope�of�0�is�a�horizontal�line)�and�conduct�a�two-tailed�test��First�the�variance�error�of�
estimate�is
s e df SS df MSres i res res res res
2 2 / / 8 1578/8 1 197= = = = =Σ 0 0 0. .
The� standard� error� of� estimate,� sres,� is� 10 0197 3 1654. .= �� Next� the� standard� error� of� b� is�
computed�as�follows:
s
s
n X X n
s
SS
b
res res
X
=
− ( )





= = =
∑ ∑2
2
3 1654
1552 5000
0803
.
.
.
Finally,�we�determine�the�test�statistic�to�be�as�follows:
t
b
sb
= = =
.
.
.
5250
0803
6 5380
To� evaluate� the� null� hypothesis,� we� compare� this� test� statistic� to� its� critical� values�
±�025t8� =� ±2�306�� The� test� statistic� exceeds� the� critical� value,� so� H0� is� rejected� in� favor� of�
H1��We�conclude�that�the�slope�is�indeed�significantly�different�from�0,�at�the��05�level�of�
significance�
Finally�let�us�determine�the�CI�for�the�slope�b�as�follows:
CI ( )
525 2 3 6( 8
2 1 25 8b b t s b t sn m b b= ± = ±
= ±
− −( / ) ( ) . ( )
. . .
α 0
0 0 0 0 0 033) 3398 71 2= ( . , . )0 0 0
The�interval�does�not�contain�0,�the�value�specified�in�H0;�thus,�we�conclude�that�the�slope�
β�is�significantly�different�from�0,�at�the��05�level�of�significance�

626 An Introduction to Statistical Concepts
17.3.6.3 Confidence Interval for the Predicted Mean Value of Y
The�third�procedure�is�to�develop�a�CI�for�the�predicted�mean�value�of�Y,�denoted�by�Y0′,�for�
a�specific�value�of�X0��Alternatively,�Y0′�is�referred�to�as�the�conditional�mean�of�Y�given�X0�
(more�about�conditional�distributions�in�the�next�section)��In�other�words,�for�a�particular�
predictor�score�X0,�how�confident�can�we�be�in�the�predicted�mean�for�Y?
The�standard�error�of�Y0′�is
s Y s n X X SSres X( ) ( / ) [( ) ]0 0
21′ = + − /
In�looking�at�this�equation,�the�further�X0�is�from�X
–
,�the�larger�the�standard�error��Thus,�the�
standard�error�depends�on�the�particular�value�of�X0�selected��In�other�words,�we�expect�
to�make�our�best�predictions�at�the�center�of�the�distribution�of�X�scores�and�to�make�our�
poorest�predictions�for�extreme�values�of�X��Thus,�the�closer�the�value�of�the�predictor�is�to�
the�center�of�the�distribution�of�the�X�scores,�the�better�the�prediction�will�be�
A�CI�around�Y0′�is�formed�as�follows:
CI 2 2( ) ( )( / ) ( )Y Y t s Yn0 0 0′ ′ ′= ± −α
Our�interpretation�is�that�in�(1�−�α)%�of�the�sample�CIs�that�would�be�formed�from�multiple�
samples,�the�population�mean�value�of�Y�for�a�given�value�of�X�will�be�included�
Let�us�consider�an�example�of�this�CI�procedure�with�the�midterm�statistics�data��If�we�
take�a�GRE_Q�score�of�50,�the�predicted�score�on�the�statistics�midterm�is�35�1125��A�CI�for�
the�predicted�mean�value�of�35�1125�is�as�follows:
s Y s n X X SSres X( ) ( ) [( ) ] . ( ) [( )0 0
2 21 3 1654 1 10 50 55 1552′ = + − = + −/ / / / .. ] .5000 1 0786=
CI
35 1125 2
2 2 25 8( ) ( ) ( )
. (
( / ) ( ) .Y Y t s Y Y t s Yn0 0 0 0 0 0′ ′ ′ ′ ′= ± = ±
= ±
−α
.. ) . ( . , . )3 6 (1 786) 32 6252 37 59980 0 =
In�Figure�17�3,�the�CI�around�Y0′�given�X0�is�plotted�as�the�pair�of�curved�lines�closest�to�the�
regression�line��Here�we�see�graphically�that�the�width�of�the�CI�increases�the�further�we�
move�from�X
–
�(where�X
–
�=�55�5000)�
17.3.6.4 Prediction Interval for Individual Values of Y
The� fourth� and� final� procedure� is� to� develop� a� prediction� interval� (PI)� for� an� individual�
predicted�value�of�Y′0�at�a�specific�individual�value�of�X0��That�is,�the�predictor�score�for�a�
particular�individual�is�known,�but�the�criterion�score�for�that�individual�has�not�yet�been�
observed��This�is�in�contrast�to�the�CI�just�discussed�where�the�individual�Y�scores�have�
already�been�observed��Thus,�the�CI�deals�with�the�mean�of�the�predicted�values,�while�the�
PI�deals�with�an�individual�predicted�value�not�yet�observed�
The�standard�error�of�Y′0�is
s Y s n X X SSres X( ) ( / ) [( ) ]0 0
21 1′ = + + − /

627Simple Linear Regression
The�standard�error�of�Y′0�is�similar�to�the�standard�error�of�Y0′�with�the�addition�of�1�to�the�
equation��Thus,�the�standard�error�of�Y ′0�will�always�be�greater�than�the�standard�error�of�Y0′�
as�there�is�more�uncertainty�about�individual�values�than�about�the�mean��The�further�X0�
is�from�X
–
,�the�larger�the�standard�error��Thus,�the�standard�error�again�depends�on�the�par-
ticular�value�of�X,�where�we�have�more�confidence�in�predictions�for�values�of�X�close�to�X
–
�
The�PI�around�Y′0�is�formed�as�follows:
PI( ) 2 2Y Y t s Yn′ = ′ ± ′−0 0 0( / ) ( ) ( )α
Our�interpretation�is�that�in�(1�−�α)%�of�the�sample�PIs�that�would�be�formed�from�multiple�
samples,�the�new�observation�Y0�for�a�given�value�of�X�will�be�included�
Consider� an� example� of� this� PI� procedure� with� the� midterm� statistics� data�� If� we� take�
a�GRE_Q�score�of�50,�the�predicted�score�on�the�statistics�midterm�is�35�1125��A�PI�for�the�
predicted�individual�value�of�35�1125�is�as�follows:
s Y s n X X SSres X( ) ( / ) [( ) / ] . ( ) [( )0 0
2 21 1 3 1654 1 1 10 50 55′ = + + − = + + −/ /11552 5000 3 3441. ] . .=
PI ( )
35 1125
2 2 25 8Y Y t s Y Y t s Yn′ = ′ ± ′ = ′ ± ′
= ±
−0 0 0 0 0 0( / ) ( ) .( ) ( )
. (
α
22 3 6 3 3441 27 4 1 42 824. )( . ) ( . , . )0 0 0 0=
In� Figure� 17�3,� the� PI� around� Y′0� given� X0� is� plotted� as� the� pair� of� curved� lines� furthest�
from�the�regression�line��Here�we�see�graphically�that�the�PI�is�always�wider�than�its�cor-
responding�CI�
17.3.7 assumptions and Violation of assumptions
In�this�section,�we�consider�the�following�assumptions�involved�in�simple�linear�regres-
sion:�(a)�independence,�(b)�homogeneity,�(c)�normality,�(d)�linearity,�and�(e)�fixed�X��Some�
discussion�is�also�devoted�to�the�effects�of�assumption�violations�and�how�to�detect�them�
50.00
45.00
40.00
M
id
te
rm
e
xa
m
sc
or
e
35.00
30.00
25.00
30.00 40.00 50.00 60.00
GRE_Q
70.00 80.00
FIGuRe 17.3
CIs�for�midterm�example:�the�curved�lines�
closest� to� the� regression� line� are� for� the�
95%�CI;�the�curved�lines�furthest�from�the�
regression�line�are�for�the�95%�PI�

628 An Introduction to Statistical Concepts
17.3.7.1 Independence
The�first�assumption�is�concerned�with�independence�of�the�observations��We�should�be�
familiar�with�this�assumption�from�previous�chapters�(e�g�,�ANOVA)��In�regression�analy-
sis,� another� way� to� think� about� this� assumption� is� that� the� errors� in� prediction� or� the�
residuals�(i�e�,�ei)�are�assumed�to�be�random�and�independent��That�is,�there�is�no�system-
atic�pattern�about�the�errors,�and�each�error�is�independent�of�the�other�errors��An�example�
of� a� systematic� pattern� would� be� where� for� small� values� of� X� the� residuals� tended� to� be�
small,�whereas�for�large�values�of�X,�the�residuals�tended�to�be�large��Thus,�there�would�be�
a�relationship�between�the�independent�variable� X�and�the�residual�e��Dependent�errors�
occur�when�the�error�for�one�individual�depends�on�or�is�related�to�the�error�for�another�
individual�as�a�result�of�some�predictor�not�being�included�in�the�model��For�our�midterm�
statistics�example,�students�similar�in�age�might�have�similar�residuals�because�age�was�
not�included�as�a�predictor�in�the�model�
Note�that�there�are�several�different�types�of�residuals��The�ei�is�known�as�raw residuals�
for�the�same�reason�that�Xi�and�Yi�are�called�raw�scores,�all�being�in�their�original�scale��
The�raw�residuals�are�on�the�same�raw�score�scale�as�Y�but�with�a�mean�of�0�and�a�vari-
ance�of�s2res��Some�researchers�dislike�raw�residuals�as�their�scale�depends�on�the�scale�of�Y,�
and,� therefore,� they� must� temper� their� interpretation� of� the� residual� values�� Several� dif-
ferent�types�of�standardized residuals�have�been�developed,�including�the�original�form�
of�standardized�residual�ei /sres��These�values�are�measured�along�the�z�score�scale�with�a�
mean�of�0�and�a�variance�of�1,�and�approximately�95%�of�the�values�are�within�±2�units�of�0��
Later�in�our�illustration�of�SPSS,�we�will�use�studentized residuals�for�diagnostic�checks��
Studentized�residuals�are�a�type�of�standardized�residual�that�are�more�sensitive�to�detect-
ing�outliers��Some�researchers�prefer�these�or�other�variants�of�standardized�residuals�over�
raw�residuals�because�they�find�it�easier�to�detect�large�residuals��However,�if�you�really�
think�about�it,�one�can�easily�look�at�the�middle�95%�of�the�raw�residuals�by�just�consider-
ing� the� range� of� ±2� standard� errors� (i�e�,� ±2sres)� around� 0�� Readers� interested� in� learning�
more� about� other� types� of� standardized� residuals� are� referred� to� a� number� of� excellent�
resources� (see� Atkinson,� 1985;� Cook� &� Weisberg,� 1982;� Dunn� &� Clark,� 1987;� Kleinbaum,�
Kupper,�Muller,�&�Nizam,�1998;�Weisberg,�1985)�
The�simplest�procedure�for�assessing�this�assumption�is�to�examine�a�scatterplot�(Y�vs��X)�
or�a�residual�plot�(e�g�,�e�vs��X)��If�the�independence�assumption�is�satisfied,�there�should�be�
a�random�display�of�points��If�the�assumption�is�violated,�the�plot�will�display�some�type�
of�pattern;�for�example,�the�negative�residuals�tend�to�cluster�together,�and�positive�residu-
als� tend� to� cluster� together�� As� we� know� from� ANOVA,� violation� of� the� independence�
assumption�generally�occurs�in�the�following�three�situations:�(a)�when�the�observations�
are�collected�over�time�(the�independent�variable�is�a�measure�of�time;�consider�using�the�
Durbin�and�Watson�test�[1950,�1951,�1971]);�(b)�when�observations�are�made�within�blocks,�
such�that�the�observations�within�a�particular�block�are�more�similar�than�observations�in�
different�blocks;�or�(c)�when�observation�involves�replication��Lack�of�independence�affects�
the�estimated�standard�errors,�being�under-�or�overestimated��For�serious�violations,�one�
could�consider�using�generalized�or�weighted�least�squares�as�the�method�of�estimation�
17.3.7.2 Homogeneity
The� second� assumption� is� homogeneity of variance,� which� should� also� be� a� familiar�
assumption�(e�g�,�ANOVA)��This�assumption�must�be�reframed�a�bit�in�the�regression�context�
by�examining�the�concept�of�a�conditional distribution��In�regression�analysis,�a�conditional�

629Simple Linear Regression
distribution�is�defined�as�the�distribution�of�Y�for�a�particular�value�of�X��For�instance,�in�
the�midterm�statistics�example,�we�could�consider�the�conditional�distribution�of�midterm�
scores�when�GRE_Q�=�50;�in�other�words,�what�the�distribution�of�Y�looks�like�for�X�=�50��We�
call�this�a�conditional�distribution�because�it�represents�the�distribution�of�Y�conditional�on�a�
particular�value�of�X�(sometimes�denoted�as�Y|X,�read�as�Y�given�X)��Alternatively�we�could�
examine�the�conditional�distribution�of�the�prediction�errors,�that�is,�the�distribution�of�the�
prediction�errors�conditional�on�a�particular�value�of�X�(i�e�,�e|X,�read�as�e�given�X)��Thus,�the�
homogeneity�assumption�is�that�the�conditional�distributions�have�a�constant�variance�for�
all�values�of�X�
In�a�plot�of�the�Y�scores�or�the�residuals�versus�X,�the�consistency�of�the�variance�of�the�con-
ditional�distributions�can�be�examined��A�common�violation�of�this�assumption�occurs�when�
the�conditional�residual�variance�increases�as�X�increases��Here�the�residual�plot�is�cone-�or�
fan-shaped,�where�the�cone�opens�toward�the�right��An�example�of�this�violation�would�be�
where�weight�is�predicted�by�age,�as�weight�is�more�easily�predicted�for�young�children�than�
it�is�for�adults��Thus,�residuals�would�tend�to�be�larger�for�adults�than�for�children�
If�the�homogeneity�assumption�is�violated,�estimates�of�the�standard�errors�are�larger,�
and�although�the�regression�coefficients�remain�unbiased,�the�validity�of�the�significance�
tests�is�affected��In�fact�with�larger�standard�errors,�it�is�more�difficult�to�reject�H0,�therefore�
resulting� in� a� larger� number� of� Type� II� errors�� Minor� violations� of� this� assumption� will�
have�a�small�net�effect;�more�serious�violations�occur�when�the�variances�are�greatly�dif-
ferent��In�addition,�nonconstant�variances�may�also�result�in�the�conditional�distributions�
being�nonnormal�in�shape�
If�the�homogeneity�assumption�is�seriously�violated,�the�simplest�solution�is�to�use�some�
sort� of� transformation,� known� as� variance stabilizing transformations� (e�g�,� Weisberg,�
1985)�� Commonly� used� transformations� are� the� log� or� square� root� of� Y� (e�g�,� Kleinbaum�
et� al�,� 1998)�� These� transformations� can� also� often� improve� on� the� nonnormality� of� the�
conditional�distributions��However,�this�complicates�things�in�terms�of�dealing�with�trans-
formed�variables�rather�than�the�original�variables��A�better�solution�is�to�use�generalized�
or�weighted�least�squares�(e�g�,�Weisberg,�1985)��A�third�solution�is�to�use�a�form�of�robust�
estimation�(e�g�,�Carroll�&�Ruppert,�1982;�Kleinbaum�et�al�,�1998;�Wilcox,�1996,�2003)�
17.3.7.3 Normality
The�third�assumption�of�normality�should�also�be�a�familiar�one��In�regression,�the�nor-
mality�assumption�is�that�the�conditional�distributions�of�either�Y�or�the�prediction�errors�
(i�e�,�residuals)�are�normal�in�shape��That�is,�for�all�values�of�X,�the�scores�on�Y�or�the�pre-
diction� errors� are� normally� distributed�� Oftentimes� nonnormal� distributions� are� largely�
a�function�of�one�or�a�few�extreme�observations,�known�as�outliers��Extreme�values�may�
cause�nonnormality�and�seriously�affect�the�regression�results��The�regression�estimates�
are� quite� sensitive� to� outlying� observations� such� that� the� precision� of� the� estimates� is�
affected,� particularly� the� slope�� Also� the� coefficient� of� determination� can� be� affected�� In�
general,� the� regression� line� will� be� pulled� toward� the� outlier,� because� the� least� squares�
principle�always�attempts�to�find�the�line�that�best�fits�all�of�the�points�
Various�rules�of�thumb�are�used�to�crudely�detect�outliers�from�a�residual�plot�or�scatter-
plot��A�commonly�used�rule�is�to�define�an�outlier�as�an�observation�more�than�two�or�three�
standard�errors�from�the�mean�(i�e�,�a�large�distance�from�the�mean)��The�outlier�observation�
may�be�a�result�of�(a)�a�simple�recording�or�data�entry�error,�(b)�an�error�in�observation,�(c) an�
improperly�functioning�instrument,�(d)�inappropriate�use�of�administration�instructions,�or�
(e)�a�true�outlier��If�the�outlier�is�the�result�of�an�error,�correct�the�error�if�possible�and�redo�the�

630 An Introduction to Statistical Concepts
regression�analysis��If�the�error�cannot�be�corrected,�then�the�observation�could�be�deleted��If�
the�outlier�represents�an�accurate�observation,�then�this�observation�may�contain�important�
theoretical� information,� and� one� would� be� more� hesitant� to� delete� it� (or� perhaps� seek� out�
similar�observations)�
A� simple� procedure� to� use� for� single� case� outliers� (i�e�,� just� one� outlier)� is� to� perform�
two�regression�analyses,�both�with�and�without�the�outlier�being�included��A�comparison�
of� the� regression� results� will� provide� some� indication� of� the� effects� of� the� outlier�� Other�
methods�for�detecting�and�dealing�with�outliers�are�available,�but�are�not�described�here�
(e�g�,� Andrews� &� Pregibon,� 1978;� Barnett� &� Lewis,� 1978;� Beckman� &� Cook,� 1983;� Cook,�
1977;�Hawkins,�1980;�Kleinbaum�et�al�,�1998;�Mickey,�Dunn,�&�Clark,�2004;�Pedhazur,�1997;�
Rousseeuw�&�Leroy,�1987;�Wilcox,�1996,�2003)�
How�does�one�go�about�detecting�violation�of�the�normality�assumption?�There�are�two�
commonly�used�procedures��The�simplest�procedure�involves�checking�for�symmetry�in�a�
histogram,�frequency�distribution,�boxplot,�or�skewness�and�kurtosis�statistics��Although�
nonzero kurtosis� (i�e�,� a� distribution� that� is� either� flat,� platykurtic,� or� has� a� sharp� peak,�
leptokurtic)�will�have�minimal�effect�on�the�regression�estimates,�nonzero skewness�(i�e�,�
a� distribution� that� is� not� symmetrical� with� either� a� positive� or� negative� skew)� will� have�
much�more�impact�on�these�estimates��Thus,�finding�asymmetrical�distributions�is�a�must��
One� rule� of� thumb� is� to� be� concerned� if� the� skewness� value� is� larger� than� 1�5� or� 2�0� in�
magnitude��For�the�midterm�statistics�example,�the�skewness�value�for�the�raw�residuals�is�
−0�2692��Thus,�there�is�evidence�of�normality�in�this�illustration�
Another� useful� graphical� technique� is� the� normal� probability� plot� [or� quantile–quan-
tile� (Q–Q)� plot]�� With� normally� distributed� data� or� residuals,� the� points� on� the� normal�
probability�plot�will�fall�along�a�straight�diagonal�line,�whereas�nonnormal�data�will�not��
There�is�a�difficulty�with�this�plot�because�there�is�no�criterion�with�which�to�judge�devia-
tion�from�linearity��A�normal�probability�plot�of�the�raw�residuals�for�the�midterm�statis-
tics�example�is�shown�in�Figure�17�4��Together�the�skewness�and�normal�probability�plot�
results�indicate�that�the�normality�assumption�is�satisfied��It�is�recommended�that�skew-
ness�and/or�the�normal�probability�plot�be�considered�at�a�minimum�
There� are� also� several� statistical� procedures� available� for� the� detection� of� nonnormal-
ity�(e�g�,�Andrews,�1971;�Belsley,�Kuh,�&�Welsch,�1980;�Ruppert�&�Carroll,�1980;�Wu,�1985)��
In�addition,�various�transformations�are�available�to�transform�a�nonnormal�distribution�
FIGuRe 17.4
Normal� probability� plot� for� midterm�
example�
1.0
0.8
0.6
0.4
Ex
pe
ct
ed
c
um
ul
at
iv
e
pr
ob
ab
ili
ty
0.2
0.0
0.0 0.2 0.4 0.6
Observed cumulative probability
0.8 1.0
Dependent variable: midterm exam score

631Simple Linear Regression
into�a�normal�distribution��The�most�commonly�used�transformations�to�correct�for�non-
normality� in� regression� analysis� are� to� transform� the� dependent� variable� using� the� log�
(to�correct�for�positive�skew)�or�the�square�root�(to�correct�for�positive�or�negative�skew)��
However,� again� there� is� the� problem� of� dealing� with� transformed� variables� measured�
along�some�other�scale�than�that�of�the�original�variables�
17.3.7.4 Linearity
The�fourth�assumption�is�linearity��This�assumption�simply�indicates�that�there�is�a�lin-
ear�relationship�between�X�and�Y,�which�is�also�assumed�for�most�types�of�correlations��
Consider�the�scatterplot�and�regression�line�in�Figure�17�5�where�X�and�Y�are�not�linearly�
related��Here�X�and�Y�form�a�perfect�curvilinear�relationship�as�all�of�the�points�fall�pre-
cisely�on�a�curve��However,�fitting�a�straight�line�to�these�points�will�result�in�a�slope�of 0�
not� useful� at� all� for� predicting� Y� from� X� (as� the� predicted� score� for� all� cases� will� be� the�
mean�of�Y)��For�example,�age�and�performance�are�not�linearly�related�
If�the�relationship�between�X�and�Y�is�linear,�then�the�sample�slope�and�intercept�will�
be�unbiased�estimators�of�the�population�slope� and�intercept,� respectively��The�linearity�
assumption� is� important� because,� regardless� of� the� value� of� Xi,� we� always� expect� Yi� to�
increase� by� bYX� units� for� a� one-unit� increase� in� Xi�� If� a� nonlinear� relationship� exists,� this�
means�that�the�expected�increase�in�Yi�depends�on�the�value�of�Xi��Strictly�speaking,�lin-
earity�in�a�model�refers�to�there�being�linearity�in�the�parameters�of�the�model�(i�e�,�slope�
β�and�intercept�α)�
Detecting�violation�of�the�linearity�assumption�can�often�be�done�by�looking�at�the�scat-
terplot�of�Y�versus�X��If�the�linearity�assumption�is�met,�we�expect�to�see�no�systematic�pat-
tern�of�points��While�this�plot�is�often�satisfactory�in�simple�linear�regression,�less�obvious�
violations� are� more� easily� detected� in� a� residual� plot�� If� the� linearity� assumption� is� met,�
we� expect� to� see� a� horizontal� band� of� residuals� mainly� contained� within� ±2� or� ±3sres� (or�
standard�errors)�across�the�values�of�X��If�the�assumption�is�violated,�we�expect�to�see�a�
systematic�pattern�between�e�and�X��Therefore,�we�recommend�you�examine�both�the�scat-
terplot�and�the�residual�plot��A�residual�plot�for�the�midterm�statistics�example�is�shown�in�
Figure�17�6��Even�with�a�very�small�sample,�we�see�a�fairly�random�display�of�residuals�and�
therefore�feel�fairly�confident�that�the�linearity�assumption�has�been�satisfied�
5.00
4.00
3.00Y
2.00
1.00
1.00 2.00 3.00
X
4.00 5.00 FIGuRe 17.5
Nonlinear�regression�example�

632 An Introduction to Statistical Concepts
�If�a�serious�violation�of�the�linearity�assumption�has�been�detected,�how�should�we�deal�
with�it?�There�are�two�alternative�procedures�that�the�researcher�can�utilize,�transforma-
tions�or�nonlinear models��The�first�option�is�to�transform�either�one�or�both�of�the�vari-
ables�to�achieve�linearity��That�is,�the�researcher�selects�a�transformation�that�subsequently�
results� in� a� linear� relationship� between� the� transformed� variables�� Then� the� method� of�
least�squares�can�be�used�to�perform�a�linear�regression�analysis�on�the�transformed�vari-
ables�� However,� when� dealing� with� transformed� variables� measured� along� a� different�
scale,� results� need� to� be� described� in� terms� of� the� transformed� rather� than� the� original�
variables��A�better�option�is�to�use�a�nonlinear�model�to�examine�the�relationship�between�
the�variables�in�their�original�scale�(see�Wilcox,�1996,�2003;�also�discussed�in�Chapter�18)�
17.3.7.5 Fixed X
The�fifth�and�final�assumption�is�that�the�values�of�X�are�fixed��That�is,�X�is�a�fixed�variable�
rather�than�a�random�variable��This�results�in�the�regression�model�being�valid�only�for�
those�particular�values�of�X�that�were�actually�observed�and�used�in�the�analysis��Thus,�
the�same�values�of�X�would�be�used�in�replications�or�repeated�samples��You�may�recall�a�
similar�concept�in�the�fixed-effects�ANOVA�models�previously�considered�
Strictly�speaking,�the�regression�model�and�its�parameter�estimates�are�only�valid�for�
those�values�of�X�actually�sampled��The�use�of�a�prediction�model,�based�on�one�sample�
of� individuals,� to� predict� Y� for� another� sample� of� individuals� may� also� be� suspect��
Depending�on�the�circumstances,�the�new�sample�of�individuals�may�actually�call�for�a�
different�set�of�parameter�estimates��Two�obvious�situations�that�come�to�mind�are�the�
extrapolation�and�interpolation�of�values�of�X��In�general,�we�may�not�want�to�make�
predictions�about�individuals�having�X�scores�(i�e�,�scores�on�the�independent�variable)�
that�are�outside�of�the�range�of�values�used�in�developing�the�prediction�model;�this�is�
defined�as�extrapolating�beyond�the�sample�predictor�data��We�cannot�assume�that�the�
function� defined� by� the� prediction� model� is� the� same� outside� of� the� values� of� X� that�
were�initially�sampled��The�prediction�errors�for�the�new�nonsampled�X�values�would�
be�expected�to�be�larger�than�those�for�the�sampled�X�values�because�there�are�no�sup-
portive�prediction�data�for�the�former�
FIGuRe 17.6
Residual�plot�for�midterm�example�
30.00 40.00 50.00 60.00
GRE_Q
70.00 80.00
–6.00000
–4.00000
–2.00000
0.00000
U
ns
ta
nd
ar
di
ze
d
re
si
du
al
2.00000
4.00000

633Simple Linear Regression
On�the�other�hand,�we�are�not�quite�as�concerned�in�making�predictions�about�individu-
als�having�X�scores�within�the�range�of�values�used�in�developing�the�prediction�model;�
this� is� defined� as� interpolating� within� the� range� of� the� sample� predictor� data�� We� would�
feel�somewhat�more�comfortable�in�assuming�that�the�function�defined�by�the�prediction�
model�is�the�same�for�other�new�values�of�X�within�the�range�of�those�initially�sampled��
For�the�most�part,�the�fixed�X�assumption�is�satisfied�if�the�new�observations�behave�like�
those� in� the� prediction� sample�� In� the� interpolation� situation,� we� expect� the� prediction�
errors� to� be� somewhat� smaller� as� compared� to� the� extrapolation� situation� because� there�
are�at�least�some�similar�supportive�prediction�data�for�the�former��It�has�been�shown�that�
when� other� assumptions� are� met,� regression� analysis� performs� just� as� well� when� X� is� a�
random�variable�(e�g�,�Glass�&�Hopkins,�1996;�Myers�&�Well,�1995;�Pedhazur,�1997)��There�
is�no�corresponding�assumption�about�the�nature�of�Y�
In� our� midterm� statistics� example,� we� have� more� confidence� in� our� prediction� for� a�
GRE_Q�value�of�52�(which�did�not�occur�in�the�sample,�but�falls�within�the�range�of�sam-
pled�values)�than�in�a�value�of�20�(which�also�did�not�occur,�but�is�much�smaller�than�the�
smallest�value�sampled,�37)��In�fact,�this�is�precisely�the�rationale�underlying�the�PI�previ-
ously�developed,�where�the�width�of�the�interval�increased�as�an�individual’s�score�on�the�
predictor�(Xi)�moved�away�from�the�predictor�mean�(X
–
)�
A�summary�of�the�assumptions�and�the�effects�of�their�violation�for�simple�linear�regres-
sion�is�presented�in�Table�17�2�
17.3.7.6 Summary
The�simplest�procedure�for�assessing�assumptions�is�to�plot�the�residuals�and�see�what�the�
plot�tells�you��Take�the�midterm�statistics�problem�as�an�example��Although�sample�size�is�
quite�small�in�terms�of�looking�at�conditional�distributions,�it�would�appear�that�all�of�our�
assumptions�have�been�satisfied��All�of�the�residuals�are�within�two�standard�errors�of�0,�
and�there�does�not�seem�to�be�any�systematic�pattern�in�the�residuals��The�distribution�of�
the�residuals�is�nearly�symmetrical,�and�the�normal�probability�plot�looks�good��The�scat-
terplot�also�strongly�suggests�a�linear�relationship�
Table 17.2
Assumptions�and�Violation�of�Assumptions:�Simple�Linear�Regression
Assumption Effect of Assumption Violation
Independence •�Influences�standard�errors�of�the�model
Homogeneity •�Bias�in�s2res
•�May�inflate�standard�errors�and�thus�increase�likelihood�of�a�Type�II�error
•�May�result�in�nonnormal�conditional�distributions
Normality •�Less�precise�slope,�intercept,�and�R2
Linearity •�Bias�in�slope�and�intercept
•�Expected�change�in�Y�is�not�a�constant�and�depends�on�value�of�X
•�Reduced�magnitude�of�coefficient�of�determination
Values�of�X�fixed •��Extrapolating�beyond�the�range�of�X:�prediction�errors�larger,�may�also�bias�slope�
and�intercept
•��Interpolating�within�the�range�of�X:�smaller�effects�than�when�extrapolating;�if�
other�assumptions�met,�negligible�effect

634 An Introduction to Statistical Concepts
17.4 SPSS
Next�we�consider�SPSS�for�the�simple�linear�regression�model��Before�we�conduct�the�anal-
ysis,�let�us�review�the�data��With�one�independent�variable�and�one�dependent�variable,�the�
dataset�must�consist�of�two�variables�or�columns,�one�for�the�independent�variable�and�one�
for�the�dependent�variable��Each�row�still�represents�one�individual,�with�the�value�of�the�
independent�variable�for�that�particular�case�and�their�score�on�the�dependent�variable��In�
the�following�screenshot,�we�see�the�SPSS�dataset�is�in�the�form�of�two�columns�represent-
ing�one�independent�variable�(GRE_Q)�and�one�dependent�variable�(midterm�exam�score)�
�e independent
variable is labeled
“GRE_Q” where each
value represents the
student’s score on the
GRE_Q.
�e dependent
variable is “Midterm”
and represents the
score on the midterm
exam.
Step 1:� To� conduct� a� simple� linear� regression,� go� to�“Analyze”� in� the� top� pulldown�
menu,�then�select�“Regression,”�and�then�select�“Linear.”�Following�the�screenshot�
(step�1)�as�follows�produces�the�“Linear Regression”�dialog�box�
A
B C
Simple linear regression:
Step 1

635Simple Linear Regression
Step 2:� Click� the� dependent� variable� (e�g�,� “Midterm”)� and� move� it� into� the�
“Dependent” box�by�clicking�the�arrow�button��Click�the�independent�variable�and�
move�it�into�the�“Independent(s)”�box�by�clicking�the�arrow�button�(see�screenshot�
step�2)�
Clicking on
“Statistics” will
allow you to select
various regression
coefficients and
residuals.
Clicking on “Plots”
will allow you to
select various
residual plots.
Clicking on “Save”
will allow you to
save various
predicted values,
residuals, and
other statistics
useful for
diagnostics.
Select the
dependent variable
from the list on the
left and use the
arrow to move it to
the “Dependent”
box on the right.
Select the
independent
variable from the
list on the left and
use the arrow to
move it to the
“Independent(s)”
box on the right.
Simple linear regression:
Step 2
Step 3:�From�the�“Linear Regression”�dialog�box�(see�screenshot�step�2),�clicking�on�
“Statistics”�will�provide�the�option�to�select�various�regression�coefficients�and�residu-
als��From�the�“Statistics”�dialog�box�(see�screenshot�step�3),�place�a�checkmark�in�the�
box�next�to�the�following:�(1) estimates, (2)�confidence intervals,�(3)�model fit,�
(4)�descriptives,� (5)�Durbin–Watson, and� (6)�casewise diagnostics.� Click� on�
“Continue”�to�return�to�the�original�dialog�box�

636 An Introduction to Statistical Concepts
Simple linear regression:
Step 3
Step 4:�From�the�“Linear Regression”�dialog�box�(see�screenshot�step�2),�click-
ing� on� “Plots”� will� provide� the� option� to� select� various� residual� plots�� From� the�
“Plots”�dialog�box,�place�a�checkmark�in�the�box�next�to�the�following:�(1) histo-
gram� and� (2)�normal probability plot�� Click� on�“Continue”� to� return� to� the�
original�dialog�box�
Simple linear regression:
Step 4

637Simple Linear Regression
Step 5:� From� the�“Linear Regression”� dialog� box� (see� screenshot� step� 2),� clicking�
on�“Save”� will� provide� the� option� to� save� various� predicted� values,� residuals,� and� sta-
tistics� that� can� be� used� for� diagnostic� examination�� From� the�“Save”� dialog� box� under�
the�heading�of�Predicted Values,�place�a�checkmark�in�the�box�next�to�the�following:�
unstandardized.�Under�the�heading�of�Residuals,�place�a�checkmark�in�the�box�next�
to� the� following:� (1)�unstandardized and� (2)�studentized. Under� the� heading� of�
Distances, place�a�checkmark�in�the�box�next�to�the�following:�(1)�Mahalanobis�and�
(2)�Cook’s. Under�the�heading�of�Influence Statistics,�place�a�checkmark�in�the�
box� next� to� the� following:� (1)�DFBETA(s)� and� (2)�Standardized DFBETA(s).� Click� on�
“Continue”�to�return�to�the�original�dialog�box��From�the�“Linear Regression”�dialog�
box,�click�on�“OK”�to�return�to�generate�the�output�
Simple linear regression:
Step 5
Interpreting the output: Annotated� results� are� presented� in� Table� 17�3�� In�
Chapters� 18� and� 19,� we� see� other� regression� modules� in� SPSS� which� allow� you� to�
consider,� for� example,� generalized� or� weighted� least� squares� regression,� nonlinear�
regression,�and�logistic�regression��Additional�information�on�regression�analysis�in�
SPSS�is�provided�in�texts�such�as�Morgan�and�Griego�(1998)�and�Meyers,�Gamst,�and�
Guarino�(2006)�

638 An Introduction to Statistical Concepts
Table 17.3
Selected�SPSS�Results�for�the�Midterm�Example
Descriptive Statistics
Mean Std. Deviation N
Midterm exam score 38.0000 7.51295 10
GRE_Q 55.5000 13.13393 10
Correlations
Midterm
Exam Score GRE_Q
Pearson correlation
Sig. (One-tailed)
N
Midterm exam score
GRE_Q
Midterm exam score
GRE_Q
Midterm exam score
GRE_Q
Variables Entered/Removeda
Model Variables Entered
Variables
Removed Method
1 GRE_Qb Enter
a Dependent variable: midterm exam score.
b All requested variables entered.
�e table labeled “Descriptive
Statistics” provides basic
descriptive statistics (means, standard
deviations, and sample sizes) for the
independent and dependent variables.
�e table labeled “Correlations”
provides the correlation coefficient
value (r = .918), p value (<.001), and sample size (N = 10) for the simple bivariate Pearson correlation between the independent and dependent variables. �ere is a statistically significant bivariate correlation between GRE_Q and midterm exam score. “Variables Entered/ Removed” lists the independent variables included in the model and the method by which they were entered (i.e., “Enter”). 10 10 1.000 .918 .000 . .918 1.000 .000 10 10 . Model Summarya Model R R Square Adjusted R Square Std. Error of the Estimate Durbin– Watson 1 .918b .842 .822 3.16540 1.287 “Adjusted R Square” is an estimate of how well the model would fit other data from the same population and is calculated as: If an additional independent variable were entered in the model, an increase in adjusted R2 indicates the new variable is adding value to the model. Negative adjusted R2 values can occur and indicate the model fits the data VERY poorly. R in simple linear regression is the simple bivariate Pearson correlation between X and Y. R2 in simple linear regression is the squared simple bivariate Pearson correlation between X and Y. It represents the proportion of variance in the dependent variable that is explained by the independent variable. Durbin–Watson is a test of independence of the residuals. Ranging from 0 to 4, values of 2 indicate uncorrelated errors. Values less than 1 or greater than 3 indicate a likely assumption violation. R2adj = 1 – (1 – R2) n – 1 n – m – 1 b Predictors: (constant), GRE_Q. a Dependent variable: midterm exam score. 639Simple Linear Regression Table 17.3 (continued) Selected�SPSS�Results�for�the�Midterm�Example ANOVAa Model Sum of Squares df Mean Square F Sig. Regression 1 42.700 .000bResidual 8 427.842 10.020 1 Total 427.842 80.158 508.000 9 b Predictors: (constant), GRE-Q. Total sum of squares is partitioned into SS regression and SS residual. When the regression SS equals 0, this indicates that the independent variable has provided no information in terms of explaining the dependent variable. �e F statistic is computed as The p value (.000) indicates we reject the null hypothesis. The prediction equation provides a better fit to the data than estimating the predicted value of Y to be equal to the mean of Y. a Dependent variable: midterm exam score. F = MSreg MSres = 427.842 10.020 Coefficientsa t b SEb = = = .080 6.535.525 Unstandardized Coefficients Standardized Coefficients 95.0% Confidence Interval for B Model B Std. Error Beta t Sig. Lower Bound Upper Bound (Constant)1 GRE_Q 8.865 .525 4.570 .080 .918 1.940 6.535 .088 .000 –1.673 .340 19.402 .710 a Dependent variable: midterm exam score. a Dependent variable: midterm exam score. Residuals Statisticsa Minimum Maximum Mean Std. Deviation N Predicted value 10 Std. predicted value 10 Standard error of predicted value 10 Adjusted predicted value 10 Residual 10 Std. residual 10 Stud. residual 10 Deleted residual 10 Stud. deleted residual 10 Mahal. distance 10 Cook's distance 10 Centered leverage value 28.2882 –1.409 1.008 26.5379 –4.43800 –1.402 –1.568 –5.55197 –1.763 .013 .004 .001 49.2866 1.637 1.996 50.7968 3.71176 1.173 1.422 5.46209 1.539 2.680 .477 .298 38.0000 .000 1.380 37.9612 .00000 .000 .006 .03876 –.009 .900 .159 .100 6.89478 1.000 .333 7.24166 2.98436 .943 1.071 3.87616 1.135 .893 .157 .099 10 The “constant” is the intercept and tells us that if GRE_Q (the independent variable) was zero, the midterm exam score (the dependent variable) would be 8.865. �e “GRE_Q” is the slope and tells us that for a one point increase in GRE_Q, the midterm exam score will increase by about one half of one point. �e test statistic, t, is calculated as the unstandardized coefficient divided by its standard error. �us for the slope, the test statistic is: �e p value for the intercept (the “constant”) ( p = .088) indicates that the intercept is not statistically significantly different from 0 (this finding is usually of less interest than the slope). �e p value for GRE_Q (the independent variable) ( p = .000) indicates that the slope is statistically significantly different from 0. “Residuals statistics” and related graphs (histogram and Q–Q plot, not shown here) will be examined in our discussionof assumptions. 640 An Introduction to Statistical Concepts Examining Data for Assumptions in Simple Linear Regression As� you� may� recall,� there� were� a� number� of� assumptions� associated� with� simple� linear� regression��These�included�the�following:�(a)�independence,�(b)�homogeneity�of�variance,� (c)� linearity,� and� (d)� normality�� Although� fixed� values� of� X� are� assumed,� this� is� not� an� assumption�that�can�be�tested�but�is�instead�related�to�the�use�of�the�results�(i�e�,�extrapola- tion�and�interpolation)� Before�we�begin�to�examine�assumptions,�let�us�review�the�values�that�we�requested�to� be�saved�to�our�data�file�(see�dataset�screenshot�that�follows)� � 1��PRE _ 1�are�the�unstandardized�predicted�values�(i�e�,�Y′i )� � 2��RES _ 1� are� the� unstandardized� residuals,� simply� the� difference� between� the� observed� and� predicted� values�� For� student� 1,� for� example,� the� observed� value for�the�midterm�(i�e�,�the�dependent�variable)�was�32,�and�the�predicted� value�was�28�28824��Thus,�the�unstandardized�residual�is�simply�32�−�28�28824,� or�3�71176� � 3��SRE _ 1� are� the� studentized� residuals,� a� type� of� standardized� residual� that� is� more� sensitive� to� outliers� as� compared� to� standardized� residuals�� Studentized� residuals�are�computed�as�the�unstandardized�residual�divided�by�an�estimate� of� the� standard� deviation� with� that� case� removed�� As� a� rule� of� thumb,� studen- tized� residuals� with� an� absolute� value� greater� than� 3� are� considered� outliers� (Stevens,�1984)� � 4��MAH _ 1� are� Mahalanobis� distance� values� that� can� be� helpful� in� detecting� out- liers�� These� values� can� be� reviewed� to� determine� cases� that� are� exerting� lever- age�� Barnett� and� Lewis� (1994)� produced� a� table� of� critical� values� for� evaluating� Mahalanobis�distance��Squared�Mahalanobis�distances�divided�by�the�number�of� variables� (D2/df )� which� are� greater� than� 2�5� (for� small� samples)� or� 3–4� (for� large� samples)�are�suggestive�of�outliers�(Hair,�Black,�Babin,�Anderson,�&�Tatham,�2006)�� Later,� we� will� follow� another� convention� for� examining� these� values� using� the� chi-square�distribution� � 5��COO _ 1�are�Cook’s�distance�values�and�provide�an�indication�of�influence�of�indi- vidual�cases��As�a�rule�of�thumb,�Cook’s�values�greater�than�1�0�suggest�that�case� is�potentially�problematic� � 6��DFB0 _ 1�and�DFB1 _ 1�are�unstandardized�DFBETA�values�for�the�intercept�and� slope,�respectively��These�values�provide�estimates�of�the�intercept�and�slope�when� the�case�is�removed� � 7��SDB0 _ 1� and�SDB1 _ 1� are� standardized� DFBETA� values� for� the� intercept� and� slope,� respectively,� and� are� easier� to� interpret� as� compared� to� their� unstandard- ized�counterparts��Standardized�DFBETA�values�greater�than�an�absolute�value�of� 2�suggest�that�the�case�may�be�exerting�undue�influence�on�the�parameters�of�the� model�(i�e�,�the�slope�and�intercept)� 641Simple Linear Regression As we look at our raw data, we see nine new variables have been added to our dataset. These are our predicted values, residuals, and other diagnostic statistics. The residuals will be used as diagnostics to review the extent to which our data meet the assumptions of simple linear regression. 1 2 3 4 5 6 7 Independence We� now� plot� the� studentized� residuals� (which� were� requested� and� created� through� the� “Save”�option�mentioned�earlier)�against�the�values�of�X�to�examine�the�extent�to�which�inde- pendence�was�met��The�general�steps�for�generating�a�simple�scatterplot�through�“Scatter/ dot”�have�been�presented�in�a�previous�chapter�(e�g�,�Chapter�10),�and�they�will�not�be�reiter- ated�here��From�the�“Simple Scatterplot”�dialog�screen,�click�the�studentized�residual� variable�and�move�it�into�the�“Y Axis”�box�by�clicking�on�the�arrow��Click�the�independent� variable�X�and�move�it�into�the�“X Axis”�box�by�clicking�on�the�arrow��Then�click�“OK.” Interpreting independence evidence:�If�the�assumption�of�independence�is�met,� the�points�should�fall�randomly�within�a�band�of�−2�0�to�+2�0��Here�we�have�evidence�of� independence,�especially�given�the�small�sample�size,�as�all�points�are�within�an�absolute� value�of�2�0�and�fall�relatively�randomly� 642 An Introduction to Statistical Concepts 2.00000 1.00000 .00000 St ud en tiz ed re si du al –1.00000 –2.00000 30.00 40.00 50.00 60.00 GRE_Q 70.00 80.00 Homogeneity of Variance We� can� use� the� same� plot� of� studentized� residuals� against� X� values� (used� earlier� for� inde- pendence)�to�examine�the�extent�to�which�homogeneity�was�met��Recall�that�homogeneity�is� when�the�dependent�variable�has�the�same�variance�for�all�values�of�the�independent�variable�� Evidence�of�meeting�the�assumption�of�homogeneity�is�a�plot�where�the�spread�of�residuals� appears�fairly�constant�over�the�range�of�X�values�(i�e�,�a�random�display�of�points)��If�the�spread� of�the�residuals�increases�or�decreases�across�the�plot�from�left�to�right,�this�may�indicate�that� the�assumption�of�homogeneity�has�been�violated��Here�we�have�evidence�of�homogeneity� Linearity Since�we�have�only�one�independent�variable,�a�simple�bivariate�scatterplot�of�the�depen- dent�variable�(on�the�Y�axis)�and�the�independent�variable�(on�the�X�axis)�will�provide�a� visual�indication�of�the�extent�to�which�linearity�is�reasonable��As�those�steps�have�been� presented�previously�in�the�discussion�of�independence,�they�will�not�be�repeated�here�� For�this�scatterplot,�there�is�a�general�positive�linear�relationship�between�the�variables� 50.00 45.00 40.00 35.00 M id te rm e xa m sc or e 30.00 25.00 30.00 40.00 50.00 60.00 GRE_Q 70.00 80.00 643Simple Linear Regression Additionally,� the� plot� of� studentized� residuals� against� X� values� (used� earlier� for� inde- pendence)�can�be�used�to�examine�the�extent�to�which�linearity�was�met��We�highly�rec- ommend� examining� this� residual� plot� as� it� is� more� sensitive� to� detecting� independence� violations�� Here� a� random� display� of� points� within� an� absolute� value� of� 2� or� 3� suggests� further�evident�of�linearity� Normality Generating normality evidence:�Understanding�the�distributional�shape,�specifi- cally� the� extent� to� which� normality� is� a� reasonable� assumption,� is� important� in� simple� linear� regression� just� as� it� was� in� ANOVA� models�� We� again� examine� residuals� for� nor- mality,�following�the�same�steps�as�with�the�previous�ANOVA�designs��We�also�use�vari- ous�diagnostics�to�examine�our�data�for�influential�cases��Let�us�begin�by�examining�the� unstandardized� residuals� for� normality�� For� simple� linear� regression,� the� distributional� shape�of�the�unstandardized�residuals�should�be�a�normal�distribution��Because�the�steps� for�generating�normality�evidence�were�presented�previously�in�the�chapters�for�ANOVA� models,�they�will�not�be�provided�here� Interpreting normality evidence:�By�now,�we�have�had�a�substantial�amount�of� practice� in� interpreting� quite� a� range� of� normality� statistics�� We� interpret� them� again� in� reference�to�the�assumption�of�normality�for�the�unstandardized�residuals�in�simple�linear� regression� Mean 95% Confidence interval 5% Trimmed mean Median Variance Std. deviation Minimum Maximum Range Interquartile range Skewness Kurtosis Lower bound Upper bound Statistic Std. Error .0000000 –2.1348848 2.1348848 .0403471 .1626409 8.906 2.98436314 –4.43800 3.71176 8.14976 5.36232 –.269 –1.369 1.334 .687 .94373849 Descriptives Unstandardized residual for mean The�skewness�statistic�of�the�residuals�is�−�269�and�kurtosis�is�−1�369—both�being�within� the�range�of�an�absolute�value�of�2�0,�suggesting�some�evidence�of�normality� While�we�have�a�very�small�sample�size,�the�histogram�reflects�the�skewness�and�kur- tosis�statistics� 644 An Introduction to Statistical Concepts 4 3 2 Fr eq ue nc y 1 0 –6.00000 –4.00000 –2.00000 Unstandardized residual .00000 2.00000 4.00000 Histogram Mean = 1.11E – 15 Std. dev. = 2.98436 N = 10 There�are�a�few�other�statistics�that�can�be�used�to�gauge�normality��The�formal�test� of� normality,� the� Shapiro–Wilk� (S–W)� test� (SW)� (Shapiro� &� Wilk,� 1965),� provides� evi- dence� of� the� extent� to� which� our� sample� distribution� is� statistically� different� from� a� normal�distribution��The�output�for�the�S–W�test�is�presented�as�follows�and�suggests� that� our� sample� distribution� for� the� residual� is� not� statistically� significantly� different� than�what�would�be�expected�from�a�normal�distribution�as�the�p�value�is�greater�than� α�(p�=��416)� Tests of Normality Statistic Statisticdf dfSig. Sig. Shapiro–Wilk .41610.927.200*10.150Unstandardized residual a Lilliefors significance correction. *This is a lower bound of the true significance. Kolmogorov–Smirnova Q–Q�plots�are�also�often�examined�to�determine�evidence�of�normality��Q–Q�plots�graph� quantiles�of�the�theoretical�normal�distribution�against�quantiles�of�the�sample�distribu- tion�� Points� that� fall� on� or� close� to� the� diagonal� line� suggest� evidence� of� normality�� The� Q–Q�plot�of�residuals�shown�as�follows�suggests�relative�normality� 645Simple Linear Regression 2 1 0 Ex pe ct ed n or m al –1 –2 –5.0 –2.5 0.0 Observed value 2.5 5.0 Normal Q–Q plot of unstandardized residual Examination� of� the� following� boxplot� also� suggests� a� relatively� normal� distributional� shape�of�residuals�with�no�outliers� 4.00000 2.00000 .00000 –2.00000 –4.00000 –6.00000 Unstandardized residual Considering�the�forms�of�evidence�we�have�examined,�skewness�and�kurtosis�statistics,� the�S–W�test,�histogram,�the�Q–Q�plot,�and�the�boxplot,�all�suggest�normality�is�a�reason- able�assumption��We�can�be�reasonably�assured�we�have�met�the�assumption�of�normality� of�the�residuals� Screening Data for Influential Points Casewise diagnostics:� Recall� that� we� requested� a� number� of� statistics� to� help� us� in�diagnostics�and�screening�our�data��One�that�we�requested�was�for�“Casewise�diag- nostics�”�If�there�were�any�cases�with�large�values�for�the�standardized�residual�(more� 646 An Introduction to Statistical Concepts than� three� standard� deviations),� there� would� have� been� information� in� our� output� to� indicate�the�case�number�and�values�of�the�standardized�residual,�predicted�value,�and� unstandardized�residual��This�information�is�useful�for�more�closely�examining�case(s)� with�extreme�standardized�residuals� Cook’s distance: Cook’s� distance� provides� an� overall� measure� for� the� influence� of� individual� cases�� Values� greater� than� one� suggest� that� the� case� may� be� problematic� in� terms�of�undue�influence�on�the�model��In�examining�the�residual�statistics�provided�in�the� following�output,�we�see�that�the�maximum�value�for�Cook’s�distance�is��477,�well�under� the�point�at�which�we�should�be�concerned� Predicted value Std. predicted value Standard error of predicted value Adjusted predicted value Residual Std. residual Stud. residual Deleted residual Stud. deleted residual Mahal. distance Cook’s distance Centered leverage value a Dependent variable: midterm exam score. Residuals Statisticsa Minimum Maximum Mean Std. Deviation N 28.2882 49.2866 1.637 1.996 50.7968 3.71176 1.173 1.422 5.46209 1.539 2.680 .477 .298 .100 .159 .900 –.009 .03876 .006 .000 .00000 37.9612 1.380 .000 38.0000 6.89478 1.000 .333 7.24166 2.98436 .943 1.071 3.87616 1.135 .893 .157 .099 10 10 10 10 10 10 10 10 10 10 10 10 –1.409 1.008 26.5379 –4.43800 –1.402 –1.568 –5.55197 –1.763 .013 .004 .001 Mahalanobis distances:� Mahalanobis� distances� are� measures� of� the� distance� from� each�case�to�the�mean�of�the�independent�variable�for�the�remaining�cases��We�can�use�the� value� of� Mahalanobis� distance� as� a� test� statistic� value� using� the� chi-square� distribution�� With� only� one� independent� variable� and� one� dependent� variable,� we� have� two� degrees� of� freedom�� Given� an� alpha� level� of� �05,� the� chi-square� critical� value� is� 5�99�� Thus,� any� Mahalanobis�distance�greater�than�5�99�suggests�that�case�is�an�outlier��With�a�maximum� distance�of�2�680�(see�previous�table),�there�is�no�evidence�to�suggest�there�are�outliers�in� our�data� DFBETA:�We�also�asked�to�save�DFBETA�values��These�values�provide�another�indication� of�the�influence�of�cases��The�DFBETA�provides�information�on�the�change�in�the�predicted� value�when�the�case�is�deleted�from�the�model��For�standardized�DFBETA�values,�values� greater� than� an� absolute� value� of� 2�0� should� be� examined� more� closely�� Looking� at� the� minimum�(−�87682)�and�maximum�(�62542)�DFBETA�values�for�the�slope�(i�e�,�GRE_Q),�we� do�not�have�any�cases�that�suggest�undue�influence� 647Simple Linear Regression N Minimum Descriptive Statistics Maximum Mean Std. Deviation DFBETA GRE_Q Standardized DFBETA GRE_Q Valid N (listwise) 10 10 10 –.06509 .04470 .62542 –.0275752 .47302980 .03608593–.0021866 –.87682 17.5 G*Power A� priori� and� post� hoc� power� could� again� be� determined� using� the� specialized� software� described� previously� in� this� text� (e�g�,� G*Power);� alternatively,� you� can� consult� a� priori� power� tables� (e�g�,� Cohen,� 1988)�� As� an� illustration,� we� use� G*Power� to� compute� the� post� hoc�power�of�our�test� Post Hoc Power for Simple Linear Regression Using G*Power The�first�thing�that�must�be�done�when�using�G*Power�to�compute�post�hoc�power�is�to� select�the�correct�test�family��Here�we�conducted�simple�linear�regression��To�find�regres- sion,� select� “Tests”� in� the� top� pulldown� menu,� then� “Correlation and regres- sion,”�and�then�“Linear bivariate regression: One group, size of slope.”� Once�that�selection�is�made,�the�“Test family”�automatically�changes�to�“t tests.” A B C Step 1 The�“Type of Power Analysis”�desired�then�needs�to�be�selected��To�compute�post� hoc�power,�select�“Post hoc: Compute achieved power—given α, sample size, and effect size.” 648 An Introduction to Statistical Concepts Click on “Determine” to pop out the effect size calculator box (shown below). This will allow you to compute the effect size, “Slope H1.” Once the parameters are specified, click on “Calculate.” The default selection for “Test Family” is “t tests” and this is the appropriate test family for linear regression. Change the statistical test to “Linear bivariate regression: One group, size of slope.” The “Input Parameters” for computing post hoc power must be specified including: 1. number of tails (i.e., directionality of the test) 2. effect size, slope H1 3. α level 4. total sample size 5. Slope H0 (i.e., null) 6. standard deviation of X (estimated from sample) 7. standard deviation of Y (estimated from sample) Step 2 The�“Input Parameters”�must�then�be�specified��In�our�example,�we�conducted�a�two- tailed�test��We�will�compute�the�effect�size,�Slope H1,�last,�so�we�skip�that�for�the�moment�� The�alpha�level�we�used�was��05,�and�the�total�sample�size�was�10��The�Slope H0�is�the�slope� specified�in�the�null�hypothesis—thus�a�value�of�0��The�last�two�parameters�to�be�specified� are�for�the�standard�deviation�of�X,�the�independent�variable,�and�the�standard�deviation� of�Y,�the�dependent�variable� We�skipped�filling�in�the�second�parameter,�the�effect�size,�Slope H1,�for�a�reason��We�will�use� the�pop-out�effect�size�calculator�in�G*Power�to�compute�the�effect�size�Slope H1��To�pop�out� the�effect�size�calculator,�click�on�“Determine”�displayed�under�“Input Parameters.” In�the�pop-out�effect�size�calculator,�click�the�toggle�menu�to�select�ρ,�σ_x,�σ_y�=>�slope��Input�
the�values�for�the�correlation�coefficient�of�X�and�Y,�the�standard�deviation�of�X,�and�the�stan-
dard�deviation�of�Y��Click�on�“Calculate”�in�the�pop-out�effect�size�calculator�to�compute�
the�effect�size�Slope H1��Then�click�on�“Calculate and Transfer to Main Window”�
to�transfer�the�calculated�effect�size�(i�e�,�1�604822)�to�the�“Input Parameters.”�Once�the�
parameters�are�specified,�click�on�“Calculate”�to�find�the�power�statistics�

649Simple Linear Regression
Post hoc power
Here are the post-hoc
power results.
The�“Output Parameters”�provide�the�relevant�statistics�given�the�input�just�specified��
Here�we�were�interested�in�determining�post�hoc�power�for�simple�linear�regression�with�a�
two-tailed�test,�a�computed�effect�size�Slope H1�of�1�6048220,�an�alpha�level�of��05,�total�sample�
size�of�10,�a�hypothesized�null�slope�of�0,�a�standard�deviation�of�X�of�7�51295,�and�a�standard�
deviation�of�Y�of�13�13393��Based�on�those�criteria,�the�post�hoc�power�for�the�simple�linear�
regression�was��9999926��In�other�words,�for�these�conditions�the�post�hoc�power�of�our�sim-
ple�linear�regression�was�nearly�1�00—the�probability�of�rejecting�the�null�hypothesis�when�it�
is�really�false�(in�this�case,�the�probability�that�the�slope�is�0)�was�around�the�maximum�(i�e�,�
1�00)�(sufficient�power�is�often��80�or�above)��Keep�in�mind�that�conducting�power�analysis�a�
priori�is�recommended�so�that�you�avoid�a�situation�where,�post�hoc,�you�find�that�the�sample�
size�was�not�sufficient�to�reach�the�desired�level�of�power�(given�the�observed�parameters)�
A Priori Power for Simple Linear Regression Using G*Power
For�a�priori�power,�we�can�determine�the�total�sample�size�needed�for�simple�linear�regres-
sion�given�the�directionality�of�the�test,�an�estimated�effect�size�Slope H1,�α�level,�desired�
power,�slope�for�the�null�hypothesis�(i�e�,�0),�and�the�standard�deviations�of�X�and�Y��We�
follow�Cohen’s�(1988)�conventions�for�effect�size�(i�e�,�small�r�=��10;�moderate�r�=��30;�large�
r�=��50)��In�this�example,�had�we�wanted�to�determine�a�priori�power�and�had�estimated�a�
moderate�effect�r�of��30,�α�of��05,�desired�power�of��80,�null�slope�of�0,�and�standard�devia-
tion�of�5�for�both�the�X�and�Y,�we�would�need�a�total�sample�size�of�82�

650 An Introduction to Statistical Concepts
A Priori power
Here are the post-hoc
power results.
17.6 Template and APA-Style Write-Up
Finally,�here�is�an�example�paragraph�for�the�results�of�the�simple�linear�regression�analy-
sis�� Recall� that� our� graduate� research� assistant,� Marie,� was� assisting� the� associate� dean�
in�Graduate�Student�Services,�Randall��Randall�wanted�to�know�if�midterm�exam�scores�
could�be�predicted�by�the�quantitative�subtest�of�the�required�graduate�entrance�exam,�the�
GRE_Q��The�research�question�presented�to�Randall�from�Marie�included�the�following:�
Can midterm exam scores be predicted from the GRE_Q?
Marie� then� assisted� Randall� in� generating� a� simple� linear� regression� model� as� the� test� of�
inference��A�template�for�writing�the�research�question�for�this�design�is�presented�as�follows:
• Can [dependent variable] be predicted from [independent variable]?
It� may� be� helpful� to� preface� the� results� of� the� simple� linear� regression� with� information�
on� an� examination� of� the� extent� to� which� the� assumptions� were� met�� The� assumptions�
include� (a)� independence,� (b)� homogeneity� of� variance,� (c)� normality,� (d)� linearity,� and�
(e) fixed�values�of�X�
A simple linear regression analysis was conducted to determine if
midterm exam scores (dependent variable) could be predicted from
GRE _ Q scores (independent variable). The null hypothesis tested was

651Simple Linear Regression
that the regression coefficient (i.e., the slope) was equal to 0. The
data were screened for missingness and violation of assumptions prior
to analysis. There were no missing data.
Linearity: The scatterplot of the independent variable (GRE _ Q) and the
dependent variable (midterm exam scores) indicates that the assump-
tion of linearity is reasonable—as GRE _ Q increases, midterm exam
scores generally increase as well. With a random display of points
falling within an absolute value of 2, a scatterplot of unstandardized
residuals against values of the independent variable provided further
evidence of linearity.
Normality: The assumption of normality was tested via examination
of the unstandardized residuals. Review of the S–W test for nor-
mality (SW = .927, df = 10, p = .416) and skewness (−.269) and kur-
tosis (−1.369) statistics suggested that normality was a reasonable
assumption. The boxplot suggested a relatively normal distribu-
tional shape (with no outliers) of the residuals. The Q–Q plot and
histogram suggested normality was reasonable.
Independence: A relatively random display of points in the scatterplot
of studentized residuals against values of the independent variable
provided evidence of independence. The Durbin–Watson statistic was
computed to evaluate independence of errors and was 1.287, which is
considered acceptable. This suggests that the assumption of indepen-
dent errors has been met.
Homogeneity of variance: A relatively random display of points, where
the spread of residuals appears fairly constant over the range of
values of the independent variable (in the scatterplot of studentized
residuals against values of the independent variable) provided evidence
of homogeneity of variance.
Here�is�an�APA-style�example�paragraph�of�results�for�the�simple�linear�regression�analysis�
(remember� that� this� will� be� prefaced� by� the� previous� paragraph� reporting� the� extent� to�
which�the�assumptions�of�the�test�were�met)�
The results of the simple linear regression suggest that a signifi-
cant proportion of the total variation in midterm scores was pre-
dicted by GRE _ Q. In other words, a student’s score on the GRE _ Q
is a good predictor of their midterm exam grade, F(1, 8) = 42.700, p
< .001. Additionally, we find the following: (a) the unstandardized slope (.525) and standardized slope (.918) are statistically signifi- cantly different from 0 (t = 6.535, df = 8, p < .001); with every one point increase in the GRE _ Q, midterm exam scores will increase by approximately one half of one point; (b) the CI around the unstan- dardized slope does not include 0 (.340, .710), further confirm- ing that GRE _ Q is a statistically significant predictor of midterm scores; and (c) the intercept (or average midterm exam score when 652 An Introduction to Statistical Concepts GRE _ Q is 0) was 8.865. Multiple R squared indicates that approxi- mately 84% of the variation in midterm scores was predicted by GRE _ Q scores. According to Cohen (1988), this suggests a large effect. 17.7 Summary In�this�chapter,�the�method�of�simple�linear�regression�was�described��First�we�discussed� the�basic�concepts�of�regression�such�as�the�slope�and�intercept��Next,�a�formal�introduc- tion� to� the� population� simple� linear� regression� model� was� given�� These� concepts� were� then�extended�to�the�sample�situation�where�a�more�detailed�discussion�was�given��In�the� sample�context,�we�considered�unstandardized�and�standardized�regression�coefficients,� errors�in�prediction,�the�least�squares�criterion,�the�coefficient�of�determination,�tests�of�sig- nificance,�and�a�discussion�of�statistical�assumptions��At�this�point,�you�should�have�met� the�following�objectives:�(a)�be�able�to�understand�the�concepts�underlying�simple�linear� regression,�(b)�be�able�to�determine�and�interpret�the�results�of�simple�linear�regression,� and� (c)� be� able� to� understand� and� evaluate� the� assumptions� of� simple� linear� regression�� Chapter�18�follows�up�with�a�description�of�multiple�regression�analysis,�where�regression� models�are�developed�based�on�two�or�more�predictors� Problems Conceptual problems 17.1� A�regression�intercept�represents�which�one�of�the�following? � a�� The�slope�of�the�line � b�� The�amount�of�change�in�Y�given�a�one-unit�change�in�X � c�� The�value�of�Y�when�X�is�equal�to�0 � d�� The�strength�of�the�relationship�between�X�and�Y 17.2� The�regression�line�for�predicting�final�exam�grades�in�history�from�midterm�scores� in�the�same�course�is�found�to�be�Y′�=��61�X�+�3�12��If�the�value�of�X�increases�from�74� to�75,�the�value�of�Y�will�do�which�one�of�the�following? � a�� Increase�by��61�points � b�� Increase�by�1�00�points � c�� Increase�by�3�12�points � d�� Decrease�by��61�points 17.3� The�regression�line�for�predicting�salary�of�principals�from�cumulative�GPA�in�gradu- ate�school�is�found�to�be�Y′�=�35,000�X�+�37,000��What�does�the�value�of�37,000�represent? � a�� Average�cumulative�GPA � b�� The�criterion�value � c�� The�mean�salary�of�principals�when�cumulative�GPA�is�0 � d�� The�standardized�regression�coefficient�given�an�intercept�of�0 653Simple Linear Regression 17.4� The�regression�line�for�predicting�salary�of�principals�from�cumulative�GPA�in�gradu- ate�school�is�found�to�be�Y′�=�35,000X�+�37,000��What�does�the�value�of�35,000�represent? � a�� The�amount�of�change�in�Y�given�a�one-unit�change�in�X � b�� The�correlation�between�X�and�Y � c�� The�intercept�value � d�� The�value�of�Y�when�X�is�equal�to�0 17.5� You� are� given� that� μX� =� 14,� σ2X� =� 36,� μY� =� 14,� σ2Y� =� 49,� and� Y� =� 14� is� the� prediction� equation� for� predicting� Y� from� X�� Which� of� the� following� is� the� variance� of� the� predicted�values�of�Y′? � a�� 0 � b�� 14 � c�� 36 � d�� 49 17.6� In�regression�analysis,�the�prediction�of�Y�is�most�accurate�for�which�of�the�following� correlations�between�X�and�Y? � a�� −�90 � b�� −�30 � c�� +�20 � d�� +�80 17.7� If�the�relationship�between�two�variables�is�linear,�then�which�one�of�the�following�is� correct? � a�� All�of�the�points�must�fall�on�a�curved�line� � b�� The�relationship�is�best�represented�by�a�curved�line� � c�� All�of�the�points�must�fall�on�a�straight�line� � d�� The�relationship�is�best�represented�by�a�straight�line� 17.8� If�both�X�and�Y�are�measured�on�a�z�score�scale,�the�regression�line�will�have�a�slope� of�which�one�of�the�following? � a�� 0�00 � b�� +1�or�−1 � c�� rXY � d�� sY/sX 17.9� �If�the�simple�linear�regression�equation�for�predicting�Y�from�X�is�Y′�=�25,�then�the� correlation�between�X�and�Y�is�which�one�of�the�following? � a�� 0�00 � b�� 0�25 � c�� 0�50 � d�� 1�00 17.10� Which�one�of�the�following�is�correct�for�the�unstandardized�regression�slope? � a�� It�may�never�be�negative� � b�� It�may�never�be�greater�than�+1�00� � c�� It�may�never�be�greater�than�the�correlation�coefficient�rXY� � d�� None�of�the�above� 654 An Introduction to Statistical Concepts 17.11� �If�two�individuals�have�the�same�score�on�the�predictor,�their�residual�scores�will�be� which�one�of�the�following? � a�� Be�necessarily�equal � b�� Depend�only�on�their�observed�scores�on�Y � c�� Depend�only�on�their�predicted�scores�on�Y � d�� Depend� only� on� the� number� of� individuals� that� have� the� same� predicted� score 17.12� �If�rXY�=��6,�the�proportion�of�variation�in�Y�that�is�not�predictable�from�X�is�which�one� of�the�following? � a�� 36 � b�� 40 � c�� 60 � d�� 64 17.13� Homogeneity�assumes�which�one�of�the�following? � a�� The�range�of�Y�is�the�same�as�the�range�of�X� � b�� The�X�and�Y�distributions�have�the�same�mean�values� � c�� The�variability�of�the�X�and�the�Y�distributions�is�the�same� � d�� The�conditional�variability�of�Y�is�the�same�for�all�values�of�X� 17.14� �Which�one�of�the�following�is�suggested�to�examine�the�extent�to�which�homogeneity� of�variance�has�been�met? � a�� Scatterplot�of�Mahalanobis�distances�against�standardized�residuals � b�� Scatterplot�of�studentized�residuals�against�unstandardized�predicted�values � c�� Simple�bivariate�correlation�between�X�and�Y � d�� S–W�test�results�for�the�unstandardized�residuals 17.15� �Which�one�of�the�following�is�suggested�to�examine�the�extent�to�which�normality� has�been�met? � a�� Scatterplot�of�Mahalanobis�distances�against�standardized�residuals � b�� Scatterplot�of�studentized�residuals�against�unstandardized�predicted�values � c�� Simple�bivariate�correlation�between�X�and�Y � d�� S–W�test�results�for�the�unstandardized�residuals 17.16� The�linear�regression�slope�bYX�represents�which�one�of�the�following? � a�� Amount�of�change�in�X�expected�from�a�one-unit�change�in�Y � b�� Amount�of�change�in�Y�expected�from�a�one-unit�change�in�X � c�� Correlation�between�X�and�Y � d�� Error�of�estimate�of�Y�from�X 17.17� �If� the� correlation� between� X� and� Y� is� 0,� then� the� best� prediction� of� Y� that� can� be� made�is�the�mean�of�Y��True�or�false? 17.18� �If�X�and�Y�are�highly�nonlinear,�linear�regression�is�more�useful�than�the�situation� where�X�and�Y�are�highly�linear��True�or�false? 655Simple Linear Regression 17.19� �If� the� pretest� (X)� and� the� posttest� (Y)� are� positively� correlated,� and� your� friend� receives�a�pretest�score�below�the�mean,�then�the�regression�equation�would�predict� that�your�friend�would�have�a�posttest�score�that�is�above�the�mean��True�or�false? 17.20� �Two� variables� are� linearly� related� so� that� given� X,� Y� can� be� predicted� without� error��I�assert�that�rXY�must�be�equal�to�either�+1�0�or�−1�0��Am�I�correct? 17.21� �I� assert� that� the� simple� regression� model� is� structured� so� that� at� least� two� of� the� actual�data�points�will�necessarily�fall�on�the�regression�line��Am�I�correct? Computational problems 17.1� You�are�given�the�following�pairs�of�scores�on�X�(number�of�hours�studied)�and� Y�(quiz�score)� X Y 4 5 4 6 3 4 7 8 2 4 � a�� Find�the�linear�regression�model�for�predicting�Y�from�X� � b�� Use� the� prediction� model� obtained� to� predict� the� value� of� Y� for� a� new� person� who�has�a�value�of�6�for�X� 17.2� You�are�given�the�following�pairs�of�scores�on�X�(preschool�social�skills)�and�Y�(receptive� vocabulary�at�the�end�of�kindergarten)� X Y 25 60 30 45 42 56 45 58 36 42 50 38 38 35 47 45 32 47 28 57 31 56 � a�� Find�the�linear�regression�model�for�predicting�Y�from�X� � b�� Use�the�prediction�model�obtained�to�predict�the�value�of�Y�for�a�new�child�who� has�a�value�of�48�for�X� 17.3� The�prediction�equation�for�predicting�Y�(pain�indicator)�from�X�(drug�dosage)�is� Y�=�2�5�X�+�18��What�is�the�observed�mean�for�Y�if�μX�=�40�and�σ2X�=�81? 656 An Introduction to Statistical Concepts 17.4� You� are� given� the� following� pairs� of� scores� on� X� (number� of� years� working)� and� Y (number�of�raises)� X Y 2 2 2 1 1 1 1 1 3 5 4 4 5 7 5 6 7 7 6 8 4 3 3 3 6 6 6 6 8 10 9 9 10 6 9 6 4 9 4 10 Perform�the�following�computations�using�α�=��05� � a�� The�regression�equation�of�Y�predicted�by�X� � b�� Test�of�the�significance�of�X�as�a�predictor� � c�� Plot�Y�versus�X� � d�� Compute�the�residuals� � e�� Plot�residuals�versus�X� Interpretive problems 17.1� �With�the�class�survey�1�dataset�on�the�website,�your�task�is�to�use�SPSS�to�find�a�suit- able�single�predictor�of�current�GPA��In�other�words,�select�several�potential�predic- tors� that� seem� reasonable,� and� conduct� a� simple� linear� regression� analysis� for� each� of�those�predictors�individually��Which�of�those�is�the�best�predictor�of�current�GPA?� What�is�the�interpretation�of�the�effect�size?�Write�up�the�results�following�APA� 17.2� With� the� class� survey� 1� dataset� on� the� website,� your� task� is� to� use� SPSS� to� find� a� suitable�single�predictor�of�the�number�of�hours�exercised�per�week��In�other�words,� select�several�potential�predictors�that�seem�reasonable,�and�conduct�a�simple�linear� regression� analysis� for� each� of� those� predictors� individually�� Which� of� those� is� the� best�predictor�of�the�number�of�hours�of�exercise?�What�is�the�interpretation�of�the� effect�size?�Write�up�the�results�following�APA� 657 18 Multiple Regression Chapter Outline 18�1� Partial�and�Semipartial�Correlations 18�1�1� Partial�Correlation 18�1�2� Semipartial�(Part)�Correlation 18�2� Multiple�Linear�Regression 18�2�1� Unstandardized�Regression�Model 18�2�2� Standardized�Regression�Model 18�2�3� Coefficient�of�Multiple�Determination�and�Multiple�Correlation 18�2�4� Significance�Tests 18�2�5� Assumptions 18�3� Methods�of�Entering�Predictors 18�3�1� Backward�Elimination 18�3�2� Forward�Selection 18�3�3� Stepwise�Selection 18�3�4� All�Possible�Subsets�Regression 18�3�5� Hierarchical�Regression 18�3�6� Commentary�on�Sequential�Regression�Procedures 18�4� Nonlinear�Relationships 18�5� Interactions 18�6� Categorical�Predictors 18�7� SPSS 18�8� G*Power 18�9� Template�and�APA-Style�Write-Up Key Concepts � 1�� Partial�and�semipartial�(part)�correlations � 2�� Standardized�and�unstandardized�regression�coefficients � 3�� Coefficient�of�multiple�determination�and�multiple�correlation 658 An Introduction to Statistical Concepts In� Chapter� 17,� our� concern� was� with� the� prediction� or� explanation� of� a� dependent� or� cri- terion� variable� (Y)� by� a� single� independent� or� predictor� variable� (X)�� However,� given� the� types� of� phenomena� we� typically� deal� with� in� education� and� the� behavioral� sciences,� the� use�of�a�single�predictor�variable�is�quite�restrictive��In�other�words,�given�the�complexity�of� most�human,�organizational,�and�animal�behaviors,�one�predictor�is�usually�not�sufficient� in� terms� of� understanding� the� criterion�� In� order� to� account� for� a� sufficient� proportion� of� variability�in�the�criterion,�more�than�one�predictor�is�necessary��This�leads�us�to�analyze�the� data�via�multiple�regression�analysis�where�two�or�more�predictors�are�used�to�predict�or� explain�the�criterion�variable��Here�we�adopt�the�usual�notation�where�the�X’s�are�defined�as� the�independent�or�predictor�variables,�and�Y�as�the�dependent�or�criterion�variable� For�example,�our�admissions�officer�might�want�to�use�more�than�just�Graduate�Record� Exam�(GRE)�scores�to�predict�graduate-level�grade�point�averages�(GPAs)�to�make�admis- sions�decisions�for�a�sample�of�applicants�to�your�favorite�local�university�or�college��Other� potentially�useful�predictors�might�be�undergraduate�grade�point�averages�(UGPAs),�rec- ommendation� letters,� writing� samples,� and/or� an� evaluation� from� a� personal� interview�� The�research�question�of�interest�would�now�be,�how�well�do�the�GRE,�UGPAs,�recommen- dations,�writing�samples,�and/or�interview�scores�(the�independent�or�predictor�variables)� predict� performance� in� graduate� school� (the� dependent� or� criterion� variable)?� This� is� an� example� of� a� situation� where� multiple� regression� analysis� using� multiple� predictor� vari- ables�might�be�the�method�of�choice� Most� of� the� concepts� used� in� simple� linear� regression� from� Chapter� 17� carry� over� to� multiple� regression� analysis�� This� chapter� considers� the� concepts� of� partial,� semipartial,� and�multiple�correlations,�standardized�and�unstandardized�regression�coefficients,�and� the�coefficient�of�multiple�determination,�as�well�as�introduces�a�number�of�other�types�of� regression�models��Our�objectives�are�that�by�the�end�of�this�chapter,�you�will�be�able�to�(a)� determine�and�interpret�the�results�of�partial�and�semipartial�correlations,�(b)�understand� the�concepts�underlying�multiple�linear�regression,�(c)�determine�and�interpret�the�results� of� multiple� linear� regression,� (d)� understand� and� evaluate� the� assumptions� of� multiple� linear�regression,�and�(e)�have�a�basic�understanding�of�other�types�of�regression�models� 18.1 Partial and Semipartial Correlations Marie�has�developed�into�quite�a�statistics�guru��We�see�in�this�chapter�that�her�statistical� prowess�has�garnered�her�repeat�business� As�you�may�recall�from�the�previous�chapter,�Randall,�an�associate�dean�in�the�Graduate� Student�Services�office,�was�assisted�by�Marie�in�determining�if�the�GRE-Quantitative� (GRE-Q)� can� be� used� to� predict� midterm� grades�� Having� had� such� a� good� expe- rience� in� working� with� Marie,� Randall� has� requested� that� Jennifer,� the� assistant� dean� in� the� Graduate� Student� Services� office,� seek� advice� from� Marie� on� a� special� project�� Jennifer�is�interested�in�estimating�the�extent�to�which�GGPA�can�be�predicted�by�scores� on�the�overall�GRE�total�and�UGPA��Marie�suggests�the�following�research�question�to� Jennifer:�Can GGPA be predicted by scores on the overall GRE total and UGPA?�Marie�deter- mines�that�a�multiple�linear�regression�is�the�appropriate�statistical�procedure�to�use�to� answer�Jennifer’s�question��Marie�then�proceeds�to�assist�Jennifer�in�analyzing�the�data� 659Multiple Regression Prior�to�a�discussion�of�regression�analysis,�we�need�to�consider�two�related�concepts�in� correlational� analysis,� partial� and� semipartial� correlations�� Multiple� regression� analysis� involves�the�use�of�two�or�more�predictor�variables�and�one�criterion�variable;�thus,�there� are�at�a�minimum�three�variables�involved�in�the�analysis��If�we�think�about�these�vari- ables�in�the�context�of�the�Pearson�correlation,�we�have�a�problem�because�this�correlation� can�only�be�used�to�relate�two�variables�at�a�time��How�do�we�incorporate�additional�vari- ables�into�a�correlational�analysis?�The�answer�is�through�partial�and�semipartial�correla- tions,�and�later�in�this�chapter,�multiple�correlations� 18.1.1 partial Correlation First� we� discuss� the� concept� of� partial correlation�� The� simplest� situation� consists� of� three�variables,�which�we�label�X1,�X2,�and�X3��Here�an�example�of�a�partial�correlation� would�be�the�correlation�between�X1�and�X2�where�X3�is�held�constant�(i�e�,�controlled�or� partialled�out)��That�is,�the�influence�of�X3�is�removed�from�both�X1�and�X2�(both�have� been� adjusted� for� X3)�� Thus,� the� partial� correlation� here� represents� the� linear� relation- ship�between�X1�and�X2�independent�of�the�linear�influence�of�X3��This�particular�partial� correlation� is� denoted� by� r12�3,� where� the� X’s� are� not� shown� for� simplicity� and� the� dot� indicates�that�the�variables�preceding�it�are�to�be�correlated�and�the�variable(s)�following� it�are�to�be�partialled�out��We�compute�r12�3�as�follows: r r r r r r 12 3 12 13 23 13 2 23 21 1 . ( )( ) = − − − Let�us�take�an�example�of�a�situation�where�a�partial�correlation�might�be�computed��Say� a� researcher� is� interested� in� the� relationship� between� height� (X1)� and� weight� (X2)�� The� sample�consists�of�individuals�ranging�in�age�(X3)�from�6�months�to�65�years��The�sample� correlations�are�for�height�(X1)�and�weight�(X2),�r12�=��7;�height�(X1)�and�age�(X3),�r13�=��1;�and� weight�(X2)�and�age�(X3),�r23�=��6��We�compute�the�correlation�between�height�and�weight,� controlling�for�age,�r12�3,�as�follows: r r r r r r 12 3 12 13 23 13 2 23 21 1 . ( )( ) .7 (.1)(.6) (1 .01)(1 .36) = − − − = − − − == .8040 We�see�here�that�the�bivariate�correlation�between�height�and�weight,�ignoring�age�(r12�=��7),� is� smaller� than� the� partial� correlation� between� height� and� weight� controlling� for� age� (r12�3�=��8040)��That�is,�the�relationship�between�height�and�weight�is�stronger�when�age� is�held�constant�(i�e�,�for�a�particular�age)�than�it�is�across�all�ages��Although�we�often� talk�about�holding�a�particular�variable�constant,�in�reality�variables�such�as�age�cannot� be�held�constant�artificially� Some�rather�interesting�partial�correlation�results�can�occur�in�particular�situations��At� one�extreme,�if�both�the�correlation�between�height�(X1)�and�age�(X3),�r13,�and�weight�(X2)� and� age� (X3),� r23,� equal� 0,� then� the� correlation� between� height� (X1)� and� weight� (X2)� will� equal�the�partial�correlation�between�height�and�weight�controlling�for�age,�r12�=�r12�3��That� is,�if�the�variable�being�partialled�out�is�uncorrelated�with�each�of�the�other�two�variables,� then� the� partialling� process� will� logically� not� have� any� effect�� At� the� other� extreme,� if� either� r13� or� r23� equals� 1,� then� r12�3� cannot� be� calculated� as� the� denominator� is� equal� to� 0� 660 An Introduction to Statistical Concepts (in other� words,� at�least� one� of� the�terms� in� the�denominator�is� equal� to�0�which� results� in� the� product� of� the� two� terms� in� the� denominator� equaling� 0� and� thus� a� denominator� of 0—and�you�cannot�divide�by�0)��Thus,�in�this�situation�(where�either�r13�or�r23�is�perfectly� correlated� at� 1�0),� the� partial� correlation� (i�e�,� r12�3,� partial� correlation� between� height� and� weight�controlling�for�age)�is�not�defined��Later�in�this�chapter,�we�refer�to�this�as�perfect� collinearity,�which�is�a�serious�problem��In�between�these�extremes,�it�is�possible�for�the� partial� correlation�to�be�greater�than�or�less�than�its�corresponding�bivariate�correlation� (including�a�change�in�sign)�and�even�for�the�partial�correlation�to�be�equal�to�0�when�its� bivariate�correlation�is�not��For�significance�tests�of�partial�and�semipartial�correlations,�we� refer�you�to�your�favorite�statistical�software� 18.1.2 Semipartial (part) Correlation Next�the�concept�of�semipartial correlation�(also�called�a�part correlation)�is�discussed�� The� simplest� situation� consists� again� of� three� variables,� which� we� label� X1,� X2,� and� X3�� Here�an�example�of�a�semipartial�correlation�would�be�the�correlation�between�X1�and�X2� where�X3�is�removed�from�X2�only��That�is,�the�influence�of�X3�is�removed�from�X2�only�� Thus,� the� semipartial� correlation� here� represents� the� linear� relationship� between� X1� and�X2�after�that�portion�of�X2�that�can�be�linearly�predicted�from�X3�has�been�removed� from�X2��This�particular�semipartial�correlation�is�denoted�by�r1(2�3),�where�the�X’s�are�not� shown� for� simplicity� and� within� the� parentheses,� the� dot� indicates� that� the� variable(s)� following�it�are�to�be�removed�from�the�variable�preceding�it��Another�use�of�the�semi- partial�correlation�is�when�we�want�to�examine�the�predictive�power�in�the�prediction� of�Y�from�X1�after�removing�X2�from�the�prediction��A�method�for�computing�r1(2�3)�is� as�follows: r r r r r 1 2 3 12 13 23 23 21 ( . ) ( ) = − − Let�us�take�an�example�of�a�situation�where�a�semipartial�correlation�might�be�computed�� Say�a�researcher�is�interested�in�the�relationship�between�GPA�(X1)�and�GRE�scores�(X2)��The� researcher�would�like�to�remove�the�influence�of�intelligence�(IQ:�X3)�from�GRE�scores�but� not�from�GPA��The�simple�bivariate�correlation�between�GPA�and�GRE�is�r12�=��5;�between� GPA� and� IQ� is� r13� =� �3;� and� between� GRE� and� IQ� is� r23� =� �7�� We� compute� the� semipartial� correlation�that�removes�the�influence�of�intelligence�(IQ:�X3)�from�GRE�scores�(X2)�but�not� from�GPA�(X1)�(i�e�,�r1(2�3))�as�follows: r r r r r 1 2 3 12 13 23 23 21 5 3 7 1 49 4061( . ) ( ) . (. )(. ) . .= − − = − − = Thus,� the� bivariate� correlation� between� GPA� (X1)� and� GRE� scores� (X2)� ignoring� IQ� (X3)� (r12� =� �50)� is� larger� than� the� semipartial� correlation� between� GPA� and� GRE� controlling� for�IQ� in� GRE�(r1(2�3)� =��4061)�� As�was� the�case� with� partial�correlations,� various� values� of� a�semipartial�correlation�can�be�obtained�depending�on�the�combination�of�the�bivariate� correlations��For�more�information�on�partial�and�semipartial�correlations,�see�Hays�(1988),� Glass�and�Hopkins�(1996),�or�Pedhazur�(1997)� 661Multiple Regression Now�that�we�have�considered�the�correlational�relationships�among�two�or�more�vari- ables� (i�e�,� partial� and� semipartial� correlations),� let� us� move� on� to� an� examination� of� the� multiple�regression�model�where�there�are�two�or�more�predictor�variables� 18.2 Multiple Linear Regression Let�us�take�the�concepts�we�have�learned�in�this�and�the�previous�chapter�and�place�them� into�the�context�of�multiple�linear�regression��For�purposes�of�brevity,�we�do�not�consider� the�population�situation�because�the�sample�situation�is�invoked�99�44%�of�the�time��In�this� section,�we�discuss�the�unstandardized�and�standardized�multiple�regression�models,�the� coefficient�of�multiple�determination,�multiple�correlation,�tests�of�significance,�and�statis- tical�assumptions� 18.2.1 unstandardized Regression Model The�sample�multiple�linear�regression�model�for�predicting�Y�from�m�predictors�X1,2,…,�m�is Y b X b X b X a ei i i m mi i= + + + + +1 1 2 2 � where Y�is�the�criterion�variable�(also�known�as�the�dependent�variable) Xk’s�are�the�predictor�(or�independent)�variables�where�k�=�1,…,�m bk�is�the�sample�partial�slope�of�the�regression�line�for�Y�as�predicted�by�Xk a�is�the�sample�intercept�of�the�regression�line�for�Y�as�predicted�by�the�set�of�Xk’s ei� represents� the� residuals� or� errors� of� prediction� (the� part� of� Y� not� predictable� from� the�Xk’s) i represents�an�index�for�an�individual�or�object��The�index�i�can�take�on�values�from�1�to�n� where�n�is�the�size�of�the�sample�(i�e�,�i�=�1,…,�n) The�term�partial slope�is�used�because�it�represents�the�slope�of�Y�for�a�particular�Xk�in� which�we�have�partialled�out�the�influence�of�the�other�Xk’s,�much�as�we�did�with�the�par- tial�correlation� The�sample�prediction�model�is Y b X b X b X ai i i m mi′ = + + + +1 1 2 2 � where�Y′i�is�the�predicted�value�of�Y�for�specific�values�of�the�Xk’s,�and�the�other�terms�are� as� before�� The� difference� between� the� regression� and� prediction� models� is� the� same� as� in�Chapter�17��We�can�compute�residuals,�the�ei,�for�each�of�the�i�individuals�or�objects�by� comparing�the�actual�Y�values�with�the�predicted�Y�values�as e Y Yi i i= − ’ for�all�i�=�1,…,�n�individuals�or�objects�in�the�sample� 662 An Introduction to Statistical Concepts Determining�the�sample�partial�slopes�and�the�intercept�in�the�multiple�predictor�case� is� rather� complicated�� To� keep� it� simple,� we� use� a� two-predictor� model� for� illustrative� purposes��Generally�we�rely�on�statistical�software�for�implementing�multiple�regression� analysis��For�the�two-predictor�case,�the�sample�partial�slopes�(b1�and�b2)�and�the�intercept� (a)�can�be�determined�as�follows: b r r r s r s Y Y Y 1 1 2 12 12 2 11 = − − ( ) ( ) b r r r s r s a Y b X b X Y Y Y 2 2 1 12 12 2 2 1 1 2 2 1 = − − = − − ( ) ( ) The�sample�partial�slope�b1�is�referred�to�alternately�as�(a)�the�expected�or�predicted�change� in�Y�for�a�one-unit�change�in�X1�with�X2�held�constant�(or�for�individuals�with�the�same� score�on�X2)�and�(b)�the�unstandardized�or�raw�regression�coefficient�for�X1��Similar�state- ments�may�be�made�for�b2��Note�the�similarity�of�the�partial�slope�equation�to�the�semipar- tial�correlation��The�sample�intercept�is�referred�to�as�the�value�of�the�dependent�variable�Y� when�the�values�of�the�independent�variables�X1�and�X2�are�both�0� An�alternative�method�for�computing�the�sample�partial�slopes�that�involves�the�use�of� a�partial�correlation�is�as�follows: b r s r s r Y Y Y 1 1 2 2 2 1 12 2 1 1 = − − . b r s r s r Y Y Y 2 2 1 1 2 2 12 2 1 1 = − − . What�statistical�criterion�is�used�to�arrive�at�the�particular�values�for�the�partial�slopes�and� intercept?�The�criterion�usually�used�in�multiple�linear�regression�analysis�[and�in�all�general� linear�models�(GLM)�for�that�matter]�is�the�least�squares�criterion��The�least�squares�criterion� arrives�at�those�values�for�the�partial�slopes�and�intercept�such�that�the�sum�of�the�squared� prediction� errors� or� residuals� is� smallest�� That� is,� we� want� to� find� that� regression� model,� defined�by�a�particular�set�of�partial�slopes�and�an�intercept,�which�has�the�smallest�sum�of� the�squared�residuals��We�often�refer�to�this�particular�method�for�calculating�the�slope�and� intercept�as�least�squares�estimation�because�a�and�the�bk’s�represent�sample�estimates�of�the� population�parameters�α�and�the�βk’s,�which�are�obtained�using�the�least�squares�criterion�� Recall�from�simple�linear�regression�that�the�residual�is�simply�the�vertical�distance�from�the� observed�value�of�Y�to�the�predicted�value�of�Y,�and�the�line�of�best�fit�minimizes�this�dis- tance��This�concept�still�applies�to�multiple�linear�regression�with�the�exception�that�we�are� now�in�a�three-dimensional�(or�more)�plane�given�there�are�multiple�independent�variables� Consider�now�the�analysis�of�a�realistic�example�we�will�follow�in�this�chapter��We�use� the� GRE� Quantitative� +� Verbal� Total� (GRETOT)� and� undergraduate� grade� point� average� (UGPA)�to�predict�graduate�grade�point�average�(GGPA)��GRETOT�has�a�possible�range�of� 40–160�points�(if�we�remove�the�unnecessary�last�digit�of�0),�and�GPA�is�defined�as�having� a�possible�range�of�0�00–4�00�points��Given�the�sample�of�11�statistics�students�as�shown�in� Table�18�1,�let�us�work�through�a�multiple�linear�regression�analysis� 663Multiple Regression As� sample� statistics,� we� compute� for� GRETOT� (X1� or� subscript� 1)� that� the� mean� is� X – 1�=�112�7273�and�the�variance�is�s12�=�266�8182,�for�UGPA�(X2�or�subscript�2)�that�the�mean�is� X – 2�=�3�1091�and�the�variance�is�s22�=�0�1609,�and�for�GGPA�(Y),�a�mean�of�Y – �=�3�5000�and�vari- ance�of�sY 2�=�0�1100��In�addition,�we�compute�the�bivariate�correlation�between�the�depen- dent�variable�(GGPA)�and�GRE�total,�rY1�=��7845;�between�the�dependent�variable�(GGPA)� and�UGPA,�rY2�=��7516;�and�between�GRE�total�and�UGPA,�r12�=��3011��The�sample�partial� slopes�(b1�and�b2)�and�intercept�(a)�are�determined�as�follows: b r r r s r s Y Y Y 1 1 2 12 12 2 11 7845 7516 3011 3317 1 = − − = − − ( ) ( ) [. (. )(. )]. ( .. ) . . 3011 16 3346 01252 = b r r r s r s Y Y Y 2 2 1 12 12 2 21 7516 7845 3011 3317 1 = − − = − − ( ) ( ) [. (. )(. )]. ( .. ). . 3011 4011 46872 = a Y b X b X= − − = − − =1 1 2 2 3 5000 0125 112 7273 4687 3 1091 63. (. )( . ) (. )( . ) . 337 Let� us� interpret� the� partial� slope� and� intercept� values�� A� partial� slope� of� �0125� for� GRETOT�would� mean�that�if�your�score�on�the�GRETOT�was�increased�by�one�point,� then�your�GGPA�would�be�increased�by��0125�points,�controlling�for�UGPA��Likewise,� a�partial�slope�of��4687�for�UGPA�would�mean�that�if�your�UGPA�was�increased�by�one� point,� then� your� GGPA� would� be� increased� by� �4687� points,� controlling� for� GRETOT�� An�intercept�of��6337�would�mean�that�if�your�scores�on�the�GRETOT�and�UGPA�were� both�0,�then�your�GGPA�would�be��6337��However,�it�is�impossible�to�obtain�a�GRETOT� score�of�0�because�you�receive�40�points�for�putting�your�name�on�the�answer�sheet��In� a�similar�way,�an�undergraduate�student�could�not�obtain�a�UGPA�of�0�and�be�admit- ted�to�graduate�school��This�is�not�to�say�that�the�regression�equation�is�incorrect�but� just� to� point� out� how� the� interpretation� of� “GRETOT� and� UGPA� were� both� 0”� is� a� bit� meaningless�in�context� To�put�all�of�this�together�then,�the�sample�multiple�linear�regression�model�is Y b X b X a e X X ei i i i i i i= + + + = + + +1 1 2 2 1 2125 4687 6337. . .0 Table 18.1 GRE–GPA�Example�Data Student GRE Total (X1) UGPA (X2) GGPA (Y) 1 145 3�2 4�0 2 120 3�7 3�9 3 125 3�6 3�8 4 130 2�9 3�7 5 110 3�5 3�6 6 100 3�3 3�5 7 95 3�0 3�4 8 115 2�7 3�3 9 105 3�1 3�2 10 90 2�8 3�1 11 105 2�4 3�0 664 An Introduction to Statistical Concepts If�your�score�on�the�GRETOT�was�130�and�your�UGPA�was�3�5,�then�your�predicted�score� on�the�GGPA�would�be�computed�as�follows: Yi′ = + + =. . . . .0 0 000125 (13 ) 4687(3 5 ) 6337 3 8992 Based� on� the� prediction� equation,� we� predict� your� GGPA� to� be� around� 3�9;� however,� as� we�saw�in�Chapter�17,�predictions�are�usually�somewhat�less�than�perfect,�even�with�two� predictors� 18.2.2 Standardized Regression Model Up� until� this� point� in� the� chapter,� everything� in� multiple� linear� regression� analysis� has� involved�the�use�of�raw�scores��For�this�reason,�we�referred�to�the�model�as�the�unstandard- ized�regression�model��Often�we�may�want�to�express�the�regression�in�terms�of�standard� z�score�units�rather�than�in�raw�score�units�(as�in�Chapter�17)��The�means�and�variances�of� the�standardized�variables�(e�g�,�z1,�z2,�zY)�are�0�and�1,�respectively��The�sample�standard- ized�linear�prediction�model�becomes�the�following: z Y b z b z b zi i i m mi( ) * * ... *′ = + + +1 1 2 2 where�bk*�represents�a�sample�standardized�partial�slope�(sometimes�called�beta�weights)� and�the�other�terms�are�as�before��As�was�the�case�in�simple�linear�regression,�no�intercept� term�is�necessary�in�the�standardized�prediction�model�as�the�mean�of�the�z�scores�for�all� variables� is� 0�� (Recall� that� the� intercept� is� the� value� of� the� dependent� variable� when� the� scores�on�the�independent�variables�are�all�0��Thus,�in�a�standardized�prediction�model,� the�dependent�variable�will�equal�0�when�the�values�of�the�independent�variables�are�equal� to�their�means—i�e�,�0�)�The�sample�standardized�partial�slopes�are,�in�general,�computed� by�the�following�equation: b b s s k k k Y * = For�the�two-predictor�case,�the�standardized�partial�slopes�can�be�calculated�by b b s sY 1 1 1* = or b r r r r Y Y 1 1 2 12 12 21 * ( ) = − − and b b s sY 2 2 2* = 665Multiple Regression or b r r r r Y Y 2 2 1 12 12 21 * ( ) = − − If� the� two� predictors� are� uncorrelated� (i�e�,� r12� =� 0),� then� the� standardized� partial� slopes� are� equal�to�the�simple�bivariate�correlations�between�the�dependent�variable�and�the�independent� variables�(i�e�,�b rY1* = 1�and�b rY2* = 2)�because�the�rest�of�the�equation�goes�away��For�example, b r r r r r r rY Y Y Y Y1 1 2 12 12 2 1 2 1 1 0 1 0 * ( ) ( ) ( ) = − − = − − = For�our�GGPA�example,�the�standardized�partial�slopes�are�equal�to b b s sY 1 1 1 0125 16 3346 3317 6156* . ( . . ) .= = =/ b b s sY 2 2 2 4687 4011 3317 5668* . (. . ) .= = =/ The�prediction�model�is�then z Y z zi i i( ) . .′ = +6156 56681 2 The�standardized�partial�slope�of��6156�for�GRETOT�would�be�interpreted�as�the�expected� increase�in�GGPA�in�z�score�units�for�a�one�z�score�unit�increase�in�the�GRETOT,�controlling� for�UGPA��A�similar�statement�may�be�made�for�the�standardized�partial�slope�of�UGPA�� The�bk*�can�also�be�interpreted�as�the�expected�standard�deviation�change�in�the�dependent� variable�Y�associated�with�a�one�standard�deviation�change�in�the�independent�variable�Xk� when�the�other�Xk’s�are�held�constant� When� would� you� want� to� use� the� standardized� versus� unstandardized� regression� analy- ses?�According�to�Pedhazur�(1997),�bk*�is�sample�specific�and�is�not�very�stable�across�different� samples� due� to� the� variance� of� Xk� changing� (as� the� variance� of� Xk� increases,� the� value� of�bk*� also� increases,� all� else� being� equal)�� For� example,� at� Ivy-Covered� University,�bk*� would� vary� across�different�graduating�classes�(or�samples)�while�bk�would�be�much�more�consistent�across� classes��Thus,�most�researchers�prefer�the�use�of�bk�to�compare�the�influence�of�a�particular�pre- dictor�variable�across�different�samples�and/or�populations��Pedhazur�also�states�that�the�bk*� is�of�“limited�value”�(p��321),�but�could�be�reported�along�with�the�bk��As�Pedhazur�and�others� have�reported,�the�bk*�can�be�deceptive�in�determining�the�relative�importance�of�the�predic- tors�as�they�are�affected�by�the�variances�and�covariances�of�both�the�included�predictors�and� the�predictors�not�included�in�the�model��Thus,�we�recommend�the�bk�for�general�purpose�use� 18.2.3 Coefficient of Multiple determination and Multiple Correlation An�obvious�question�now�is,�how�well�is�the�criterion�variable�predicted�or�explained�by� the�set�of�predictor�variables?�For�our�example,�we�are�interested�in�how�well�the�GGPAs� (the� dependent� variable)� are� predicted� by� the� GRE� total� scores� and� the� UGPAs�� In� other� words,�what�is�the�utility�of�the�set�of�predictor�variables? 666 An Introduction to Statistical Concepts The�simplest�method�involves�the�partitioning�of�the�familiar�total�sum�of�squares� in Y,�which�we�denote�as�SStotal��In�multiple�linear�regression�analysis,�we�can�write�SStotal� as�follows: SS n Y Y ntotal i i= − / 2 2[ ( ) ]Σ Σ or SS n stotal Y= −( )1 2 where�we�sum�over�Y�from�i�=�1,…,�n��Next�we�can�conceptually�partition�SStotal�as SS SS SStotal reg res= + Σ Σ Σ( ) ( ) ( )Y Y Y Y Y Yi i i i− = − + − 2 2 2′ ′ where SSreg�is�the�regression�sum�of�squares�due�to�the�prediction�of�Y�from�the�Xk’s�(often�writ- ten�as�SSY′) SSres�is�the�sum�of�squares�due�to�the�residuals Before�we�consider�computation�of�SSreg�and�SSres,�let�us�look�at�the�coefficient�of�multiple� determination��Recall�from�Chapter�17�the�coefficient�of�determination,�rXY 2 ��Now�consider� the�multiple�predictor�version�of�rXY 2 ,�here�denoted�as�RY m. ,...,1 2 ��The�subscript�tells�us�that� Y�is�the�criterion�(or�dependent)�variable�and�that�X1,…,�m�are�the�predictor�(or�independent)� variables��The�simplest�procedure�for�computing�R2�is�as�follows: R b r b r b rY m Y Y m Ym. ,..., * * * 1 2 1 2= + + +1 2 � The�coefficient�of�multiple�determination�tells�us�the�proportion�of�total�variation�in�the� dependent� variable� Y� that� is� predicted� from� the� set� of� predictor� variables� (i�e�,� X1,…,m’s)�� Often�we�see�the�coefficient�in�terms�of�SS�as R SS SSY m reg total. ,...,1 2 = / Thus,� one� method� for� computing� the� sums� of� squares� regression� and� residual,� SSreg� and� SSres,�is�from�the�coefficient�of�multiple�determination,�R2,�as�follows: SS R SSreg total= 2 SS R SS SS SSres total total reg= − = −( )1 2 As�discussed�in�Chapter�17,�there�is�no�objective�gold�standard�as�to�how�large�the�coef- ficient�of�determination�needs�to�be�in�order�to�say�a�meaningful�proportion�of�varia- tion� has� been� predicted�� The� coefficient� is� determined� not� just� by� the� quality� of� the� 667Multiple Regression predictor�variables�included�in�the�model�but�also�by�the�quality�of�relevant�predictor� variables�not�included�in�the�model,�as�well�as�by�the�amount�of�total�variation�in�the� dependent�variable�Y��However,�the�coefficient�of�determination�can�be�used�as�a�mea- sure�of�effect�size��According�to�the�subjective�standard�of�Cohen�(1988),�a�small�effect� size�is�defined�as�R2�=��10,�a�medium�effect�size�as�R2�=��30,�and�a�large�effect�size�as�R2� =��50��For�additional�information�on�effect�size�measures�in�regression,�we�suggest�you� consider�Steiger�and�Fouladi�(1992),�Mendoza�and�Stafford�(2001),�and�Smithson�(2001;� which�also�includes�some�discussion�of�power)��Note�also�that�RY�1,�…,�m�is�referred�to�as� the�multiple correlation coefficient�so�as�not�to�confuse�it�with�a�simple�bivariate�correla- tion�coefficient� With� the� example� of� predicting� GGPA� from� GRETOT� and� UGPA,� let� us� examine� the� partitioning�of�the�total�sum�of�squares�SStotal�as�follows: SS n stotal Y= − = =( 1) 1 11 1 1 2 ( ) . .0 00 000 Next,�we�can�determine�the�multiple�correlation�coefficient�R2�as R b r b r b rY m Y Y m Ym. ,..., * * * . . . .1 2 1 2= + + … + = +1 2 6156( 7845) 5668( 75166) 9 89= . 0 We�can�also�partition�SStotal�into�SSreg�and�SSres,�where SS R SSreg total= = = 2 9 89(1 1 ) 9998. . .0 000 0 SS R SSres total= − = − =( ) ( . ) . .1 1 9 89 1 1 1 2 2 0 000 00 Finally,�let�us�summarize�these�results�for�the�example�data��We�found�that�the�coefficient� of�multiple�determination�(R2)�was�equal�to��9089��Thus,�the�GRE�total�score�and�the�UGPA� predict�around�91%�of�the�variation�in�the�GGPA��This�would�be�quite�satisfactory�for�the� college� admissions� officer� in� that� there� is� little� variation� left� to� be� explained,� although� this� result� is� quite� unlikely� in� actual� research� in� education� and� the� behavioral� sciences�� Obviously�there�is�a�large�effect�size�here� It� should� be� noted� that� R2� is� sensitive� to� sample� size� and� to� the� number� of� predic- tor�variables��As�sample�size�and/or�the�number�of�predictor�variables�increase,�R2�will� increase� as� well�� R� is� a� biased� estimate� of� the� population� multiple� correlation� due� to� sampling�error�in�the�bivariate�correlations�and�in�the�standard�deviations�of�X�and�Y�� Because�R�systematically�overestimates�the�population�multiple�correlation,�an�adjusted� coefficient� of� multiple� determination� has� been� devised�� The� adjusted� R2(Radj 2 )� is� calcu- lated�as�follows: R R n n m adj 2 21 1 1 1 = − − − − −      ( ) Thus,� Radj 2 � adjusts� for� sample� size� and� for� the� number� of� predictors� in� the� model;� this� allows� us� to� compare� models� fitted� to� the� same� set� of� data� with� different� numbers� of� predictors�or�with�different�samples�of�data��The�difference�between�R2�and�Radj 2 �is�called� shrinkage� 668 An Introduction to Statistical Concepts When�n�is�small�relative�to�m,�the�amount�of�bias�can�be�large�as�R2�can�be�expected�to� be�large�by�chance�alone��In�this�case,�the�adjustment�will�be�quite�large,�as�it�should�be��In� addition,�with�small�samples,�the�regression�coefficients�(i�e�,�the�bk’s)�may�not�be�very�good� estimates�of�the�population�values��When�n�is�large�relative�to�m,�bias�will�be�minimized� and�generalizations�are�likely�to�be�better�about�the�population�values� With�a�large�number�of�predictors,�power�is�reduced,�and�there�is�an�increased�like- lihood� of� a� Type� I� error� across� the� total� number� of� significance� tests� (i�e�,� one� for� each� predictor�and�overall,�as�we�show�in�the�next�section)��In�multiple�regression,�power�is�a� function�of�sample�size,�the�number�of�predictors,�the�level�of�significance,�and�the�size� of� the� population� effect� (i�e�,� for� a� given� predictor,� or� overall)�� To� determine� how� large� a� sample� you� need� relative� to� the� number� of� predictors,� we� suggest� that� you� consult� power�tables�(e�g�,�Cohen,�1988)�or�power�software�(e�g�,�Murphy�&�Myors,�2004;�Power� and�Precision;�G*Power)��Simple�advice�is�to�design�your�research�such�that�the�ratio�of� n�to�m�is�large� For�the�example�data,�we�determine�the�adjusted�multiple�coefficient�of�determination� Radj 2 �to�be R R n n m adj 2 21 1 1 1 1 1 9089 11 1 11 2 1 = − − − − −       = − − − − −       =( ) ( . ) .88861 which,�in�this�case,�indicates�a�very�small�adjustment�in�comparison�to�R2� 18.2.4 Significance Tests Here�we�describe�two�procedures�used�in�multiple�linear�regression�analysis��These�involve� testing�the�significance�of�the�overall�regression�model�and�of�each�individual�partial�slope� (or�regression�coefficient)� 18.2.4.1 Test of Significance of Overall Regression Model The�first�test�is�the�test�of�significance�of�the�overall�regression�model,�or�alternatively�the� test�of�significance�of�the�coefficient�of�multiple�determination��This�is�a�test�of�all�of�the� bk’s� simultaneously,� an� examination� of� overall� model� fit� of� the� independent� variables� in� aggregate��The�null�and�alternative�hypotheses,�respectively,�are�as�follows: H k0 0: β β β1 2= = = =� H k1 not all the: β = 0 If�H0�is�rejected,�then�one�or�more�of�the�individual�regression�coefficients�(i�e�,�the�bk)�is�sta- tistically�significantly�different�from�0�(if�the�assumptions�are�satisfied,�as�discussed�later)�� If�H0�is�not�rejected,�then�none�of�the�individual�regression�coefficients�will�be�significantly� different�from�0� The�test�is�based�on�the�following�test�statistic: F R m R n m = − − − 2 21 1 / ( )/( ) 669Multiple Regression where F�indicates�that�this�is�an�F�statistic m�is�the�number�of�predictors�or�independent�variables n�is�the�sample�size The�F�test�statistic�is�compared�to�the�F�critical�value,�always�a�one-tailed�test�(by�default,� this�value�can�never�be�negative�given�the�terms�in�the�equation,�so�this�will�always�be�a� nondirectional� test)� and� at� the� designated� level� of� significance,� with� degrees� of� freedom� being�m�and�(n − m�−�1),�as�taken�from�the�F�table�in�Table�A�4��That�is,�the�tabled�critical� value�is�αFm,(n−m−1)��The�test�statistic�can�also�be�written�in�equivalent�form�as F SS df SS df MS MS reg reg res res reg res = = / / where� the� degrees� of� freedom� regression� equals� the� number� of� independent� variables,� dfreg�=�m,�and�degrees�of�freedom�residual�equals�the�difference�between�the�sample�size,� number�of�independent�variables,�and�1,�dfres�=�(n − m�−�1)� For�the�GGPA�example,�we�compute�the�overall�F�test�statistic�as�the�following: F R m R n m = − − − = − − − = 2 21 1 9089 2 1 9089 11 2 1 39 9078 / ( )/( ) . / ( . )/( ) . or�as F SS df SS df reg reg res res = = = / / / / 0 9998 2 1002 8 39 9122 . . . The�critical�value,�at�the��05�level�of�significance,�is��05F2,8�=�4�46��The�test�statistic�exceeds� the�critical�value,�so�we�reject�H0�and�conclude�that�all�of�the�partial�slopes�are�not�equal� to�0�at�the��05�level�of�significance�(the�two�F�test�statistics�differ�slightly�due�to�rounding� error)� 18.2.4.2 Test of Significance of bk The�second�test�is�the�test�of�the�statistical�significance�of�each�individual�partial�slope�or� regression�coefficient,�bk��That�is,�are�the�individual�unstandardized�regression�coefficients� statistically�significantly�different�from�0?�This�is�actually�the�same�as�the�test�of�bk*,�so�we� need�not�develop�a�separate�test�for�bk*��The�null�and�alternative�hypotheses,�respectively,� are�as�follows: H k0 0: β = H k1 0: β ≠ where�βk�is�the�population�partial�slope�for�Xk� 670 An Introduction to Statistical Concepts In�multiple�regression,�it�is�necessary�to�compute�a�standard�error�for�each�regression�coef- ficient�bk��Recall�from�Chapter�17�the�variance�error�of�estimate�concept��The�variance�error�of� estimate�is�similarly�defined�for�multiple�linear�regression�and�computed�as�follows: s SS df MSres res res res 2 = = where�dfres�=�(n − m�−�1)��Degrees�of�freedom�are�lost�as�we�have�to�estimate�the�population� partial�slopes� and�intercept,� the�βk’s� and�α,� respectively,� from� the� sample� data��The�vari- ance�error�of�estimate�indicates�the�amount�of�variation�among�the�residuals��The�standard� error�of�estimate�is�simply�the�positive�square�root�of�the�variance�error�of�estimate�and�is� the�standard�deviation�of�the�residuals�or�errors�of�estimate��We�call�it�the�standard error of estimate,�denoted�as�sres� Finally,�we�need�to�compute�a�standard�error�for�each�bk��Denote�the�standard�error�of�bk� as�s(bk)�and�define�it�as s b s n s R k res k k ( ) ( ) ( ) = − −1 12 2 where sk 2�is�the�sample�variance�for�predictor�Xk Rk 2�is�the�squared�multiple�correlation�between�Xk�and�the�remaining�Xk’s Rk 2�represents�the�overlap�between�that�predictor�(Xk)�and�the�remaining�predictors In�the�case�of�two�predictors,�the�squared�multiple�correlation,�Rk 2,�is�equal�to�the�simple� bivariate�correlation�between�the�two�independent�variables,�r12 2 � The�test�statistic�for�testing�the�significance�of�the�regression�coefficients,�bk’s,�is�as�follows: t b s b k k = ( ) The�test�statistic�t�is�compared�to�the�critical�values�of�t,�a�two-tailed�test�for�a�nondirec- tional�H1,�at�the�designated�level�of�significance,�and�with�degrees�of�freedom�(n − m�−�1),� as�taken�from�the�t�table�in�Table�A�2��Thus,�the�tabled�critical�values�are�±(α/2)�t(n−m−1)�for�a� two-tailed�test� We�can�also�form�a�confidence�interval�(CI)�around�bk�as�follows: CI( ) 2 1b b t s bk k n m k= ± − −( / ) ( ) ( )α Recall�that�the�null�hypothesis�tested�is�H0:�βk�=�0��Therefore,�if�the�CI�contains�0,�then�the� regression�coefficient�bk�is�not�statistically�significantly�different�from�0�at�the�specified�α� level��This�is�interpreted�to�mean�that�in�(1�−�α)%�of�the�sample�CIs�that�would�be�formed� from�multiple�samples,�βk�will�be�included� Let� us� compute� the� second� test� statistic� for� the� GGPA� example�� We� specify� the� null� hypothesis�to�be�βk�=�0�(i�e�,�the�slope�is�0)�and�conduct�two-tailed�tests��First�the�variance� error�of�estimate�is s SS df res res res 2 1002 8 0125= = = . . 671Multiple Regression The�standard�error�of�estimate,�sres,�is��1118��Next�the�standard�errors�of�the�bk�are�found�to�be s b s n s r res( ) ( ) ( ) . ( ) . ( . ) .1 1 2 12 2 21 1 1118 10 266 8182 1 3011 00= − − = − = 223 s b s n s r res( ) ( ) ( ) . ( ) . ( . ) .2 2 2 12 2 21 1 1118 10 0 1609 1 3011 0924= − − = − = Finally�we�find�the�t�test�statistics�to�be�computed�as�follows: t b s b1 1 1/ 125/ 23 5 4348= = =( ) . . .0 00 t b s b2 2 2/ 4687/ 924 5 725= = =( ) . . .0 0 To�evaluate�the�null�hypotheses,�we�compare�these�test�statistics�to�the�critical�values�of� ±�025�t8�=�±2�306��Both�test�statistics�exceed�the�critical�value;�consequently�H0�is�rejected�in� favor�of�H1�for�both�predictors��We�conclude�that�both�partial�slopes�are�indeed�statistically� significantly�different�from�0�at�the��05�level�of�significance� Finally,�let�us�compute�the�CIs�for�the�bk’s�as�follows: CI( ) 125 2 3 6(1 1 2 1 1 1 25 8 1b b t s b b t s bn m= ± = ± = ±− −( / ) ( ) .( ) ( ) . . .α 0 0 0 000 00 023) 72 178= (. , . ) CI( ) 4687 2 3 6(2 2 ( 2 1 2 2 25 8 2b b t s b b t s bn m= ± = ± = ±− −α/ ) ( ) .( ) ( ) . . .0 0 09924) 2556 6818= (. , . ) The�intervals�do�not�contain�0,�the�value�specified�in�H0;�thus,�we�again�conclude�that�both� bk’s�are�significantly�different�from�0�at�the��05�level�of�significance� 18.2.4.3 Other Tests One�can�also�form�CIs�for�the�predicted�mean�of�Y�and�the�prediction�intervals�for�indi- vidual�values�of�Y,�as�we�described�in�Chapter�17� 18.2.5 assumptions A�considerable�amount�of�space�in�Chapter�17�was�dedicated�to�the�assumptions�of�simple� linear�regression��For�the�most�part,�the�assumptions�of�multiple�linear�regression�analysis� are�the�same,�and,�thus,�we�need�not�devote�as�much�space�here��The�assumptions�are�con- cerned�with�(a)�independence,�(b)�homogeneity,�(c)�normality,�(d)�linearity,�(e)�fixed�X,�and� (f)�noncollinearity��This�section�also�mentions�those�techniques�appropriate�for�evaluating� each�assumption� 18.2.5.1 Independence The� first� assumption� is� concerned� with� independence� of� the� observations�� The� simplest� procedure� for� assessing� independence� is� to� examine� residual� plots� of� e� versus� the� pre- dicted�values�of�the�dependent�variable�Y′�and�of�e�versus�each�independent�variable�Xk� 672 An Introduction to Statistical Concepts (alternatively,�one�can�look�at�plots�of�observed�values�of�the�dependent�variable�Y�versus� predicted� values� of� the� dependent� variable� Y′� and� of� observed� values� of� the� dependent� variable�Y�versus�each�independent�variable�Xk)��If�the�independence�assumption�is�satis- fied,�the�residuals�should�fall�into�a�random�display�of�points��If�the�assumption�is�violated,� the�residuals�will�fall�into�some�sort�of�pattern��Lack�of�independence�affects�the�estimated� standard� errors� of� the� model�� For� serious� violations,� one� could� consider� generalized� or� weighted�least�squares�as�the�method�of�estimation�(e�g�,�Myers,�1986;�Weisberg,�1985),�or� some�type�of�transformation��The�residual�plots�shown�in�Figure�18�1�do�not�suggest�any� independence�problems�for�the�GGPA�example,�where�Figure�18�1a�represents�the�residual� e�versus�the�predicted�value�of�the�dependent�variable�Y′,�Figure�18�1b�represents�e�versus� GRETOT,�and�Figure�18�1c�represents�e�versus�UGPA� 18.2.5.2 Homogeneity The�second�assumption�is�homogeneity of variance,�where�the�conditional�distributions� have�the�same�constant�variance�for�all�values�of�X��In�the�residual�plots,�the�consistency� of� the� variance� of� the� conditional� distributions� may� be� examined�� If� the� homogeneity� assumption�is�violated,�estimates�of�the�standard�errors�are�larger,�and�the�conditional�dis- tributions�may�also�be�nonnormal��As�described�in�Chapter�17,�solutions�include�variance- stabilizing�transformations�(such�as�the�square�root�or�log�of�Y),�generalized�or�weighted� least�squares�(e�g�,�Myers,�1986;�Weisberg,�1985),�or�robust�regression�(Kleinbaum,�Kupper,� Muller,�&�Nizam,�1998;�Myers,�1986;�Wilcox,�1996,�2003;�Wu,�1985)��Due�to�the�small�sample� size,�homogeneity�cannot�really�be�assessed�for�the�example�data� 18.2.5.3 Normality The�third�assumption�is�that�the�conditional�distributions�of�the�scores�on�Y,�or�the�pre- diction�errors,�are�normal�in�shape��Violation�of�the�normality�assumption�may�be�the� result�of�outliers��The�simplest�outlier�detection�procedure�is�to�look�for�observations�that� are� more� than� two� standard� errors� from� the� mean�� Other� procedures� were� previously� described�in�Chapter�17��Several�methods�for�dealing�with�outliers�are�available,�such�as� conducting�regression�analyses�with�and�without�suspected�outliers,�robust�regression� (Kleinbaum� et� al�,� 1998;� Myers,� 1986;� Wilcox,� 1996,� 2003;� Wu,� 1985),� and� nonparametric� regression�(Miller,�1997;�Rousseeuw�&�Leroy,�1987;�Wu,�1985)��The�following�can�be�used� to� detect� normality� violations:� frequency� distributions,� normal� probability� [quantile– quantile� (Q–Q)]� plots,� and� skewness� statistics�� For� the� example� data,� the� normal� prob- ability�plot�is�shown�in�Figure�18�2,�and�even�with�a�small�sample�looks�good��Violation� can� lead� to� imprecision� in� the� partial� slopes� and� in� the� coefficient� of� determination�� There�are�also�several�statistical�procedures�available�for�the�detection�of�nonnormality� (e�g�,�Andrews,�1971;�Belsley,�Kuh,�&�Welsch,�1980;�D’Agostino,�1971;�Ruppert�&�Carroll,� 1980;�Shapiro�&�Wilk,�1965;�Wu,�1985);�transformations�can�also�be�used�to�normalize�the� data��Review�Chapter�17�for�more�details� 18.2.5.4 Linearity The�fourth�assumption�is�linearity,�that�there�is�a�linear�relationship�between�the�observed� scores� on� the� dependent� variable� Y� and� the� values� of� the� independent� variables,� Xk’s�� If� satisfied,�then�the�sample�partial�slopes�and�intercept�are�unbiased�estimators�of�the�pop- ulation� partial� slopes� and� intercept,� respectively�� The� linearity� assumption� is� important� 673Multiple Regression Unstandardized predicted value(a) St ud en tiz ed re si du al 2.00000 1.00000 .00000 –1.00000 –2.00000 3.00000 3.20000 3.40000 3.60000 3.80000 4.00000 2.00000 1.00000 .00000 –1.00000 –2.00000 90.00 100.00 110.00 120.00 130.00 140.00 150.00 GRE total score(b) St ud en tiz ed re si du al Undergraduate grade point average(c) St ud en tiz ed re si du al 2.00000 1.00000 .00000 –1.00000 –2.00000 2.50 2.75 3.00 3.25 3.50 3.75 FIGuRe 18.1 Residual�plots�for�GRE–GPA�example:�(a),�(b),�and�(c)� 674 An Introduction to Statistical Concepts because�regardless�of�the�value�of�Xk,�we�always�expect�Y�to�increase�by�bk�units�for�a�one- unit� increase� in� Xk,� controlling� for� the� other� Xk’s�� If� a� nonlinear� relationship� exists,� this� means� that� the� expected� increase� in� Y� depends� on� the� value� of� Xk;� that� is,� the� expected� increase�is�not�a�constant�value��Strictly�speaking,�linearity�in�a�model�refers�to�there�being� linearity�in�the�parameters�of�the�model�(i�e�,�α�and�the�βk’s)� Violation�of�the�linearity�assumption�can�be�detected�through�residual�plots��The�residu- als� should� be� located� within� a� band� of� ±2sres� (or� standard� errors),� indicating� no� system- atic�pattern�of�points,�as�previously�discussed�in�Chapter�17��Residual�plots�for�the�GGPA� example�are�shown�in�Figure�18�1��Even�with�a�very�small�sample,�we�see�a�fairly�random� pattern�of�residuals,�and�therefore�feel�fairly�confident�that�the�linearity�assumption�has� been� satisfied�� Note� also� that� there� are� other� types� of� residual� plots� developed� espe- cially� for� multiple� regression� analysis,� such� as� the� added� variable� and� partial� residual� plots� (Larsen� &� McCleary,� 1972;� Mansfield� &� Conerly,� 1987;� Weisberg,� 1985)�� Procedures� to�deal�with�nonlinearity�include�transformations�(of�one�or�more�of�the�Xk’s�and/or�of�Y� as�described�in�Chapter�17)�and�other�regression�models�(discussed�later�in�this�chapter)� 18.2.5.5 Fixed X The� fifth� assumption� is� that� the� values� of� Xk� are� fixed,� where� the� independent� variables,� Xk’s,�are�fixed�variables�rather�than�random�variables��This�results�in�the�regression�model� being�valid�only�for�those�particular�values�of�Xk�that�were�actually�observed�and�used�in� the�analysis��Thus,�the�same�values�of�Xk�would�be�used�in�replications�or�repeated�samples� Strictly� speaking,� the� regression� model� and� its� parameter� estimates� are� only� valid� for� those�values�of�Xk�actually�sampled��The�use�of�a�prediction�model�developed�to�predict� the�dependent�variable�Y,�based�on�one�sample�of�individuals,�may�be�suspect�for�another� sample� of� individuals�� Depending� on� the� circumstances,� the� new� sample� of� individuals� may�actually�call�for�a�different�set�of�parameter�estimates��Expanding�on�our�discussion� in�Chapter�17,�generally�we�may�not�want�to�make�predictions�about�individuals�having� combinations�of�Xk�scores�outside�of�the�range�of�values�used�in�developing�the�prediction� Observed value Ex pe ct ed n or m al Normal Q–Q plot of unstandardized residual –0.3 –3 –2 –1 0 1 2 –0.2 –0.1 0.0000 0.1 0.2 FIGuRe 18.2 Normal�probability�plot�for�GRE–GPA�example� 675Multiple Regression model;�this�is�defined�as�extrapolating�beyond�the�sample�predictor�data��On�the�other�hand,� we�may�not�be�quite�as�concerned�in�making�predictions�about�individuals�having�combi- nations�of�Xk�scores�within�the�range�of�values�used�in�developing�the�prediction�model;� this�is�defined�as�interpolating�within�the�range�of�the�sample�predictor�data� It�has�been�shown�that�when�other�assumptions�are�met,�regression�analysis�performs� just�as�well�when�X�is�a�random�variable�(e�g�,�Glass�&�Hopkins,�1996;�Myers�&�Well,�1995;� Pedhazur,�1997;�Wonnacott�&�Wonnacott,�1981)��There�is�no�such�assumption�about�Y� 18.2.5.6 Noncollinearity The�final�assumption�is�unique�to�multiple�linear�regression�analysis,�being�unnecessary� in�simple�linear�regression��A�violation�of�this�assumption�is�known�as�collinearity�where� there�is�a�very�strong�linear�relationship�between�two�or�more�of�the�predictors��The�pres- ence�of�severe�collinearity�is�problematic�in�several�respects��First,�it�will�lead�to�instability� of�the�regression�coefficients�across�samples,�where�the�estimates�will�bounce�around�quite� a�bit�in�terms�of�magnitude�and�even�occasionally�result�in�changes�in�sign�(perhaps�oppo- site�of�expectation)��This�occurs�because�the�standard�errors�of�the�regression�coefficients� become� larger,� thus� making� it� more� difficult� to� achieve� statistical� significance�� Another� result� that� may� occur� involves� an� overall� regression� that� is� significant,� but� none� of� the� individual�predictors�are�significant��Collinearity�will�also�restrict�the�utility�and�general- izability�of�the�estimated�regression�model� Recall�from�earlier�in�the�chapter�the�notion�of�partial�regression�coefficients,�where�the� other�predictors�were�held�constant��In�the�presence�of�severe�collinearity,�the�other�predic- tors�cannot�really�be�held�constant�because�they�are�so�highly�intercorrelated��Collinearity� may�be�indicated�when�there�are�large�changes�in�estimated�coefficients�due�to�(a)�a�vari- able�being�added�or�deleted�and/or�(b)�an�observation�being�added�or�deleted�(Chatterjee�&� Price,�1977)��Collinearity�is�also�likely�when�a�composite�variable�as�well�as�its�component� variables� are� used� as� predictors� in� the� same� regression� model� (e�g�,� including� GRETOT,� GRE-Quantitative,�and�GRE-Verbal�as�predictors)� How� do� we� detect� violations� of� this� assumption?� The� simplest� procedure� is� to� conduct� a�series�of�special�regression�analyses,�one�for�each�X,�where�that�predictor�is�predicted�by� all�of�the�remaining�X’s�(i�e�,�the�criterion�variable�is�not�involved)��If�any�of�the�resultant�Rk 2� values�are�close�to�1�(greater�than��9�is�a�good�rule�of�thumb),�then�there�may�be�a�collinearity� problem��However,�the�large�R2�value�may�also�be�due�to�small�sample�size;�thus,�more�data� would�be�useful��For�the�example�data,�R12 2 .091= �and�therefore�collinearity�is�not�a�concern� Also,�if�the�number�of�predictors�is�greater�than�or�equal�to�n,�then�perfect�collinearity�is� a�possibility��Another�statistical�method�for�detecting�collinearity�is�to�compute�a�variance� inflation�factor�(VIF)�for�each�predictor,�which�is�equal�to�1/(1 2− Rk)��The�VIF�is�defined�as� the�inflation�that�occurs�for�each�regression�coefficient�above�the�ideal�situation�of�uncorre- lated�predictors��Many�suggest�that�the�largest�VIF�should�be�less�than�10�in�order�to�satisfy� this�assumption�(Myers,�1990;�Stevens,�2009;�Wetherill,�1986)� There�are�several�possible�methods�for�dealing�with�a�collinearity�problem��First,�one�can� remove�one�or�more�of�the�correlated�predictors��Second,�ridge�regression�techniques�can�be� used�(e�g�,�Hoerl�&�Kennard,�1970a,�1970b;�Marquardt�&�Snee,�1975;�Myers,�1986;�Wetherill,� 1986)��Third,�principal�component�scores�resulting�from�principal�component�analysis�can� be�utilized�rather�than�raw�scores�on�each�variable�(e�g�,�Kleinbaum�et�al�,�1998;�Myers,�1986;� Weisberg,� 1985;� Wetherill,� 1986)�� Fourth,� transformations� of� the� variables� can� be� used� to� remove�or�reduce�the�extent�of�the�problem��The�final�solution,�and�probably�our�last�choice,� is�to�use�simple�linear�regression,�as�collinearity�cannot�exist�with�a�single�predictor� 676 An Introduction to Statistical Concepts 18.2.5.7 Summary For�the�GGPA�example,�although�sample�size�is�quite�small�in�terms�of�looking�at�condi- tional�distributions,�it�would�appear�that�all�of�our�assumptions�have�been�satisfied��All�of� the�residuals�are�within�two�standard�errors�of�0,�and�there�does�not�seem�to�be�any�sys- tematic�pattern�in�the�residuals��The�distribution�of�the�residuals�is�nearly�symmetric,�and� the�normal�probability�plot�looks�good��A�summary�of�the�assumptions�and�the�effects�of� their�violation�for�multiple�linear�regression�analysis�is�presented�in�Table�18�2� 18.3 Methods of Entering Predictors The�multiple�predictor�model�which�we�have�considered�thus�far�can�be�viewed�as�simulta- neous regression��That�is,�all�of�the�predictors�to�be�used�are�entered�(or�selected)�simultane- ously,�such�that�all�of�the�regression�parameters�are�estimated�simultaneously;�here�the�set� of�predictors�has�been�selected�a�priori��In�computing�these�regression�models,�we�have�used� the�default�setting�in�SPSS�of�the�method�of�entry�as�“Enter,”�which�enters�the�set�of�indepen- dent�variables�in�aggregate��There�are�other�methods�of�entering�the�independent�variables� where�the�predictor�variables�are�entered�(or�selected)�systematically;�here�the�set�of�predic- tors�has�not�been�selected�a�priori��This�class�of�models�is�referred�to�as�sequential regression� (also�known�as�variable selection procedures)��This�section�introduces�a�brief�description� of�the�following�sequential�regression�procedures:�backward�elimination,�forward�selection,� stepwise�selection,�all�possible�subsets�regression,�and�hierarchical�regression� 18.3.1 backward elimination First�consider�the�backward�elimination�procedure��Here�variables�are�eliminated�from�the� model�based�on�their�minimal�contribution�to�the�prediction�of�the�criterion�variable��In�the� Table 18.2 Assumptions�and�Violation�of�Assumptions:�Multiple�Linear�Regression�Analysis Assumption Effect of Assumption Violation Independence •�Influences�standard�errors�of�the�model Homogeneity •�Bias�in�s2res •�May�inflate�standard�errors�and�thus�increase�likelihood�of�a�Type�II�error •�May�result�in�nonnormal�conditional�distributions Normality •�Less�precise�slopes,�intercept,�and�R2 Linearity •�Bias�in�slope�and�intercept •�Expected�change�in�Y�is�not�a�constant�and�depends�on�value�of�X Fixed�X�values •��Extrapolating�beyond�the�range�of�X�combinations:�prediction�errors�larger,�may� also�bias�slopes�and�intercept •��Interpolating�within�the�range�of�X�combinations:�smaller�effects�than�earlier;� if other�assumptions�met,�negligible�effect Noncollinearity�of�X’s •��Regression�coefficients�can�be�quite�unstable�across�samples�(as�standard�errors� are�larger) •�R2�may�be�significant,�yet�none�of�the�predictors�are�significant •�Restricted�generalizability�of�the�model 677Multiple Regression first� stage� of� the� analysis,� all� potential� predictors� are� included� in� the� model�� In� the� second� stage,� that� predictor� is� deleted� from� the� model� that� makes� the� smallest� contribution� to� the� prediction�of�the�dependent�variable��This�can�be�done�by�eliminating�that�variable�having�the� smallest�t�or�F�statistic�such�that�it�is�making�the�smallest�contribution�to�Radj 2 ��In�subsequent� stages,�that�predictor�is�deleted�that�makes�the�next�smallest�contribution�to�the�prediction�of� the�outcome�Y��The�analysis�continues�until�each�of�the�remaining�predictors�in�the�model�is�a� significant�predictor�of�Y��This�could�be�determined�by�comparing�the�t�or�F�statistics�for�each� predictor�to�the�critical�value,�at�a�preselected�level�of�significance��Some�computer�programs� use�as�a�stopping�rule�the�maximum�F-to-remove�criterion,�where�the�procedure�is�stopped� when�all�of�the�selected�predictors’�F�values�are�greater�than�the�specified�F�criterion��Another� stopping� rule� is� where� the� researcher� stops� at� a� predetermined� number� of� predictors� (see� Hocking,�1976;�Thompson,�1978)��In�SPSS,�this�is�the�backward�method�of�entering�predictors� 18.3.2 Forward Selection In�the�forward�selection�procedure,�variables�are�added�or�selected�into�the�model�based�on� their� maximal� contribution� to� the� prediction� of� the� criterion� variable�� Initially,� none� of� the� potential�predictors�are�included�in�the�model��In�the�first�stage,�the�predictor�is�added�to�the� model�that�makes�the�largest�contribution�to�the�prediction�of�the�dependent�variable��This�can� be�done�by�selecting�that�variable�having�the�largest�t�or�F�statistic�such�that�it�is�making�the� largest�contribution�to�Radj 2 ��In�subsequent�stages,�the�predictor�is�selected�that�makes�the�next� largest�contribution�to�the�prediction�of�Y��The�analysis�continues�until�each�of�the�selected�pre- dictors�in�the�model�is�a�significant�predictor�of�the�outcome�Y,�whereas�none�of�the�unselected� predictors�is�a�significant�predictor��This�could�be�determined�by�comparing�the�t�or�F�statistics� for�each�predictor�to�the�critical�value,�at�a�preselected�level�of�significance��Some�computer�pro- grams�use�as�a�stopping�rule�the�minimum�F-to-enter�criterion,�where�the�procedure�is�stopped� when�all�of�the�unselected�predictors’�F�values�are�less�than�the�specified�F�criterion��For�the� same�set�of�data�and�at�the�same�level�of�significance,�the�backward�elimination�and�forward� selection�procedures�may�not�necessarily�result�in�the�exact�same�final�model�due�to�the�differ- ences�in�how�variables�are�selected��In�SPSS,�this�is�the�forward�method�of�entering�predictors� 18.3.3 Stepwise Selection The�stepwise�selection�procedure�is�a�modification�of�the�forward�selection�procedure�with� one�important�difference��Predictors�that�have�been�selected�into�the�model�can,�at�a�later�step,� be�deleted�from�the�model;�thus,�the�modification�conceptually�involves�a�backward�elimina- tion�mechanism��This�situation�can�occur�for�a�predictor�when�a�significant�contribution�at�an� earlier�step�later�becomes�a�nonsignificant�contribution�given�the�set�of�other�predictors�in�the� model��Thus,�a�predictor�loses�its�significance�due�to�new�predictors�being�added�to�the�model� The�stepwise�selection�procedure�is�as�follows��Initially,�none�of�the�potential�predictors�are� included�in�the�model��In�the�first�step,�that�predictor�is�added�to�the�model�that�makes�the� largest�contribution�to�the�explanation�of�the�dependent�variable��This�can�be�done�by�select- ing�that�variable�having�the�largest�t�or�F�statistic�such�that�it�is�making�the�largest�contribution� to�Radj 2 ��In�subsequent�stages,�the�predictor�is�selected�that�makes�the�next�largest�contribution� to�the�prediction�of�Y��Those�predictors�that�have�entered�at�earlier�stages�are�also�checked�to� see�if�their�contribution�remains�significant��If�not,�then�that�predictor�is�eliminated�from�the� model��The�analysis�continues�until�each�of�the�predictors�remaining�in�the�model�is�a�sig- nificant�predictor�of�Y,�while�none�of�the�other�predictors�is�a�significant�predictor��This�could� be�determined�by�comparing�the�t�or�F�statistics�for�each�predictor�to�the�critical�value,�at�a� specified�level�of�significance��Some�computer�programs�use�as�stopping�rules�the�minimum� 678 An Introduction to Statistical Concepts F-to-enter�and�maximum�F-to-remove�criteria,�where�the�F-to-enter�value�selected�is�usually� equal�to�or�slightly�greater�than�the�F-to-remove�value�selected�(to�prevent�a�predictor�from� continuously�being�entered�and�removed)��For�the�same�set�of�data�and�at�the�same�level�of� significance,�the�backward�elimination,�forward�selection,�and�stepwise�selection�procedures� may�not�necessarily�result�in�the�exact�same�final�model,�due�to�differences�in�how�variables� are�selected��In�SPSS,�this�is�the�stepwise�method�of�entering�predictors� 18.3.4 all possible Subsets Regression Another� sequential� regression� procedure� is� known� as� all� possible� subsets� regression�� Let� us� say,� for� example,� that� there� are� five� potential� predictors�� In� this� procedure,� all� possible� one-,�two-,�three-,�and�four-variable�models�are�analyzed�(with�five�predictors,�there�is�only� a�single�five-predictor�model)��Thus,�there�will�be�5�one-predictor�models,�10�two-predictor� models,�10�three-predictor�models,�and�5�four-predictor�models��The�best�k�predictor�model� can� be� selected� as� the� model� that� yields� the� largest� Radj 2 �� For� example,� the� best� 3-predictor� model�would�be�that�model�of�the�10�estimated�that�yields�the�largest�Radj 2 ��With�today’s�pow- erful�computers,�this�procedure�is�easier�and�more�cost�efficient�than�in�the�past��However,� the�researcher�is�not�advised�to�consider�this�procedure,�or�for�that�matter,�any�of�the�other� sequential�regression�procedures,�when�the�number�of�potential�predictors�is�large��Here�the� researcher�is�allowing�number�crunching�to�take�precedence�over�thoughtful�analysis��Also,� the�number�of�models�will�be�equal�to�2m,�so�that�for�10�predictors,�there�are�1024�possible� subsets��Obviously,�examining�that�number�of�models�is�not�a�thoughtful�analysis� 18.3.5 hierarchical Regression In�hierarchical�regression,�the�researcher�specifies�a�priori�a�sequence�for�the�individual�predic- tor�variables�(not�to�be�confused�with�hierarchical�linear�models,�which�is�a�regression�approach� for�analyzing�nested�data�collected�at�multiple�levels,�such�as�child,�classroom,�and�school)��The� analysis�proceeds�in�a�forward�selection,�backward�elimination,�or�stepwise�selection�mode� according�to�a�researcher-specified,�theoretically�based�sequence,�rather�than�an�unspecified,� statistically�based�sequence��This�variable�selection�method�is�different�from�those�previously� discussed�in�that�the�researcher�determines�the�order�of�entry�from�a�careful�consideration�of� the�available�theory�and�research,�instead�of�the�software�dictating�the�sequence� A� type� of� hierarchical� regression� is� known� as� setwise regression� (also� called� block- wise,�chunkwise,�or�forced stepwise regression)��Here�the�researcher�specifies�a�priori�a� sequence�for�sets�of�predictor�variables��This�procedure�is�similar�to�hierarchical�regression� in�that�the�researcher�determines�the�order�of�entry�of�the�predictors��The�difference�is�that� the�setwise�method�uses�sets�of�predictor�variables�at�each�stage�rather�than�one�individual� predictor�variable�at�a�time��The�sets�of�variables�are�determined�by�the�researcher�so�that� variables�within�a�set�share�some�common�theoretical�ground�(e�g�,�home�background�vari- ables�in�one�set�and�aptitude�variables�in�another�set)��Variables�within�a�set�are�selected� according�to�one�of�the�sequential�regression�procedures��The�variables�selected�for�a�par- ticular�set�are�then�entered�in�the�specified�theoretically�based�sequence��In�SPSS,�this�is� conducted�by�entering�predictors�in�blocks�and�selecting�their�desired�method�of�entering� variables�in�each�block�(e�g�,�simultaneously,�forward,�backward,�stepwise)� 18.3.6 Commentary on Sequential Regression procedures Let� us� make� some� comments� and� recommendations� about� the� sequential� regression� pro- cedures�� First,� numerous� statisticians� have� noted� problems� with� stepwise� methods� (i�e�,� 679Multiple Regression backward�elimination,�forward�selection,�and�stepwise�selection)�(e�g�,�Derksen�&�Keselman,� 1992;� Huberty,� 1989;� Mickey,� Dunn,� &� Clark,� 2004;� Miller,� 1984,� 1990;� Wilcox,� 2003)�� These� problems� include� the� following:� (a)� selecting� noise� rather� than� important� predictors;� (b)� highly�inflated�R2�and�Radj 2 �values;�(c)�CIs�for�partial�slopes�that�are�too�narrow;�(d)�p�values� that�are�not�trustworthy;�(e)�important�predictors�being�barely�edged�out�of�the�model,�mak- ing�it�possible�to�miss�the�true�model;�and�(f)�potentially�heavy�capitalization�on�chance�given� the�number�of�models�analyzed��Second,�theoretically�based�regression�models�have�become� the�norm�in�many�disciples�(and�the�stepwise�methods�of�entry�are�driven�by�mathematics� of�the�models�rather�than�theory)��Thus,�hierarchical�regression�either�has�or�will�dominate� the�landscape�of�the�sequential�regression�procedures��Thus,�we�strongly�encourage�you�to� consider�more�extended�discussions�of�hierarchical�regression�(e�g�,�Bernstein,�1988;�Cohen�&� Cohen,�1983;�Pedhazur,�1997;�Schafer,�1991;�Tabachnick�&�Fidell,�2007)� If�you�are�working�in�an�area�of�inquiry�where�research�evidence�is�scarce�or�nonexistent,� then� you� are� conducting� exploratory� research�� Thus,� you� are� probably� trying� to� simply� identify� the� key� variables�� Here� hierarchical� regression� is� not� appropriate,� as� a� theoreti- cally� driven� sequence� cannot� be� developed� and� there� is� no� theory� to� guide� its� develop- ment��Here�we�recommend�the�use�of�all�possible�subsets�regression�(e�g�,�Kleinbaum�et�al�� 1998)��For�additional�information�on�the�sequential�regression�procedures,�see�Cohen�and� Cohen�(1983),�Weisberg�(1985),�Miller�(1990),�Pedhazur�(1997),�and�Kleinbaum�et�al��(1998)� 18.4 Nonlinear Relationships Here�we�continue�our�discussion�on�how�to�deal�with�nonlinearity�from�Chapter�17��We� formally�introduce�several�multiple�regression�models�for�when�the�criterion�variable�does� not�have�a�linear�relationship�with�the�predictor�variables� First�consider�polynomial�regression�models��In�polynomial�models,�powers�of�the�pre- dictor�variables�(e�g�,�squared,�cubed)�are�used��In�general,�a�sample�polynomial�regression� model�that�includes�one�predictor�is�as�follows: Y b X b X b X a em m= + + … + + +1 2 2 where�the�independent�variable�X�is�taken�from�the�first�power�through�the�mth�power,� and� the� i� subscript� for� observations� has� been� deleted� to� simplify� matters�� If� the� model� consists�only�of�X�taken�to�the�first�power,�then�this�is�a�simple linear regression model� (or�first-degree polynomial;�this�is�a�straight�line�and�what�we�have�studied�to�this�point)�� A�second-degree polynomial�includes�X�taken�to�the�second�power�(or�quadratic model;� this�is�a�curve�with�one�bend�in�it�rather�than�a�straight�line)��A�third-degree polynomial� includes�X�taken�to�the�third�power�(or�cubic model;�this�is�a�curve�with�two�bends�in�it)� A�polynomial�model�with�multiple�predictors�can�also�be�utilized��An�example�of�a�sec- ond-degree�polynomial�model�with�two�predictors�is�illustrated�in�the�following�equation: Y b X b X b X b X a e= + + + + +1 1 2 1 3 2 4 2 22 It�is�important�to�note�that�whenever�a�higher-order�polynomial�is�included�in�a�model� (e�g�,�quadratic,�cubic,�and�more),�the�first-order�polynomial�must�also�be�included�in�the� 680 An Introduction to Statistical Concepts model�� In� other� words,� it� is� not� appropriate� to� include� a� quadratic� term� X2� without� also� including� the� first-order� polynomial� X�� For� more� information� on� polynomial� regression� models,� see� Weisberg� (1985),� Bates� and� Watts� (1988),� Seber� and� Wild� (1989),� Pedhazur� (1997),�and�Kleinbaum�et�al��(1998)��Alternatively,�one�might�transform�the�criterion�vari- able�and/or�the�predictor�variables�to�obtain�a�more�linear�form,�as�previously�discussed� 18.5 Interactions Another�type�of�model�involves�the�use�of�an�interaction�term,�as�previously�discussed�in� factorial�ANOVA�(Chapter�13)��These�can�be�implemented�in�any�type�of�regression�model�� We�can�write�a�simple�two-predictor�interaction-type�model�as Y b X b X b X X a e= + + + +1 1 2 2 3 1 2 where�X1X2�represents�the�interaction�of�predictor�variables�1�and�2��An�interaction�can�be� defined� as� occurring� when� the� relationship� between� Y� and� X1� depends� on� the� level� of� X2�� In other�words,�X2�is�a�moderator variable��For�example,�suppose�one�were�to�use�years�of� education�and�age�to�predict�political�attitude��The�relationship�between�education�and�atti- tude� might� be� moderated� by� age�� In� other� words,� the� relationship� between� education� and� attitude�may�be�different�for�older�versus�younger�individuals��If�age�were�a�moderator,�we� would�expect�there�to�be�an�interaction�between�age�and�education�in�a�regression�model�� Note�that�if�the�predictors�are�very�highly�correlated,�collinearity�is�likely��For�more�informa- tion�on�interaction�models,�see�Cohen�and�Cohen�(1983),�Berry�and�Feldman�(1985),�Kleinbaum� et�al��(1998),�Weinberg�and�Abramowitz�(2002),�and�Meyers,�Gamst,�and�Guarino�(2006)� 18.6 Categorical Predictors So� far,� we� have� only� considered� continuous� predictors—independent� variables� that� are� interval� or� ratio� in� scale�� There� may� be� times,� however,� that� you� wish� to� use� a� categorical� predictor—an�independent�variable�that�is�nominal�or�ordinal�in�scale��For�example,�gender,� grade� level� (e�g�,� freshman,� sophomore,� junior,� senior),� and� highest� education� earned� (less� than�high�school,�high�school�graduate,�etc�)�are�all�categorical�variables�that�may�be�very� interesting�and�theoretically�appropriate�to�include�in�either�a�simple�or�multiple�regression� model��Given�their�scale�(i�e�,�nominal�or�ordinal),�however,�we�must�recode�the�values�prior� to�analysis�so�that�they�are�on�a�scale�of�0�and�1��This�is�called�“dummy�coding”�as�this�type� of� recoding� makes� the� model� work�� For� example,� males� might� be� coded� as� 0� and� females� coded�as�1��When�there�are�more�than�two�categories�to�the�categorical�predictor,�multiple� dummy� coded� variables� must� be� created—specifically 1 minus the number of levels or catego- ries of the categorical variable��Thus,�in�the�case�of�grade�level�where�there�are�four�categories� (freshman,�sophomore,�junior,�senior),�three�of�the�four�categories�would�be�dummy�coded� and� included� in� the� regression� model� as� predictors�� The� category� that� is� “left� out”� is� the� reference�category,�or�that�category�to�which�all�other�levels�are�compared��The�easiest�way� to�understand�this�is�perhaps�to�examine�the�data��In�the�screenshot�that�follows,�the�first� column�represents�grade�level�where�1 =�freshman,�2�=�sophomore,�3�=�junior,�and�4�=�senior�� Dummy�coding�three�of�the�four�grade�levels,�with�“senior”�as�the�reference�category,�will� result�in�three�additional�columns�(columns�2,�3,�and�4�in�the�screenshot)� 681Multiple Regression 1 2 3 4 In�terms�of�generating�the�analysis�and�the�point�and�click�use�of�SPSS�to�compute�the�regres- sion�model,�nothing�changes��The�steps�are�the�same�regardless�of�whether�the�predictors�are� continuous�or�categorical��Now�let�us�discuss�why�dummy�coding�works�in�this�situation��You� may�recall�from�Chapter�10�our�discussion�of�point�biserial�correlations��The�point�biserial�cor- relation�is�a�variant�of�the�Pearson�product–moment�correlation,�and�we�can�use�the�Pearson� as�a�variant�of�the�point�biserial��Thus,�while�we�will�not�have�a�linear�relationship�between�a� continuous�outcome�and�a�binary�variable,�the�mathematics�that�underlie�the�model�will�hold� Consider�an�example�output�for�predicting�GPA�based�on�grade�level,�where�“senior”�is� the�reference�category��We�see�that�the�intercept�(i�e�,�“constant”)�is�statistically�significant�as� is�“freshman�”�The�interpretation�of�the�intercept�remains�the�same�regardless�of�the�scale�of� the�predictors��The�intercept�represents�GPA�(the�dependent�variable)�when�all�the�predictors� are�0��In�this�case,�this�means�that�GPA�is�3�267�for�seniors�(the�reference�category)��The�only� statistically� significant� predictor� is� “freshman�”� This� is� interpreted� to� say� that� mean� GPA� decreases�by��800�points�for�freshmen�as compared to seniors��The�nonstatistically�significant� regression�coefficients�for�“sophomore”�and�“junior”�indicate�that�mean�GPA�is�similar�for� these�grade�levels�as�compared�to�seniors��The�interpretation�for�dummy�variable�predictors� is�always�in�reference�to�the�category�that�was�“left�out�”�In�this�case,�that�was�“seniors�” Coefficientsa Unstandardized Coefficients BModel 1 (Constant) Freshman Sophomore Junior 3.267 –.800 .233 .200 .183 .258 .258 .258 17.892 –3.098 .904 .775 .000 .015 .393 .461 –.704 .205 .176 a Dependent variable: GPA. Std. Error Beta Sig. Standardized Coefficients t It�is�important�to�note�that�even�though�“sophomore”�and�“junior”�were�not�statistically� significant,� they� should� be� retained� in� the� model� as� they� represent� (along� with� “fresh- man”)�a�group��Dropping�one�or�more�dummy�coded�indicator�variables�that�represent�a� group�will�change�the�reference�category��For�example,�if�“sophomore”�and�“junior”�were� dropped�from�the�model,�the�interpretation�would�then�become�the�mean�GPA�for�fresh- men�as compared to all other grade levels��Thus,�careful�thought�needs�to�be�put�into�dropping� one�or�more�indicators�that�are�part�of�a�set� 682 An Introduction to Statistical Concepts 18.7 SPSS Next� we� consider� SPSS� for� the� multiple� linear� regression� model�� Before� we� conduct� the� analysis,�let�us�review�the�data��With�one�dependent�variable�and�two�independent�vari- ables,� the� dataset� must� consist� of� three� variables� or� columns,� one� for� each� independent� variable�and�one�for�the�dependent�variable��Each�row�still�represents�one�individual,�indi- cating�the�value�of�the�independent�variables�for�that�particular�case�and�their�score�on�the� dependent�variable��As�seen�in�the�following�screenshot,�for�a�multiple�linear�regression� analysis�therefore,�the�SPSS�data�are�in�the�form�of�three�columns�that�represent�the�two� independent�variables�(GRE�total�score�and�UGPA)�and�one�dependent�variable�(GGPA)� �e independent variables are labeled “GRE Total” and “UGPA” where each value represents the student’s total score on the GRE and their undergraduate GPA. �e dependent variable is “GGPA” and represents their graduate GPA. Step 1:� To� conduct� a� simple� linear� regression,� go� to�“Analyze”� in� the� top� pulldown� menu,�then�select�“Regression,”�and�then�select�“Linear.”�Following�the�screenshot� (step�1)�as�follows�produces�the�“Linear Regression”�dialog�box� Multiple linear regression: Step 1 A B C Step 2:�Click�the�dependent�variable�(e�g�,�“GGPA”)�and�move�it�into�the�“Dependent” box�by�clicking�the�arrow�button��Click�the�independent�variables�and�move�them�into�the� “Independent(s)”�box�by�clicking�the�arrow�button�(see�screenshot�Step�2)� 683Multiple Regression Multiple linear regression:Step 2 Clicking on “Statistics” will allow you to select various regression coefficients and residuals. Clicking on “Plots” will allow you to select various residuals plots. Clicking on “Save” will allow you to save various predicted values, residuals, and other statistics useful for diagnostics. Clicking on “Next” will allow you to define the blocks when entering variables in sets. Clicking on “Enter” will allow you to select different types of methods of entering the variables (e.g., stepwise, forward). “Enter” is the default and all predictors are entered as one set. Select the independent variables from the list on the left and use the arrow to move them to the “Independent(s)” box on the right. Select the dependent variable from the list on the left and use the arrow to move it to the “Dependent” box on the right. Step 3:�From�the�“Linear Regression”�dialog�box�(see�screenshot�Step�2),�clicking�on� “Statistics”�will�provide�the�option�to�select�various�regression�coefficients�and�residu- als��From�the�“Statistics”�dialog�box�(see�screenshot�Step�3),�place�a�checkmark�in�the�box� next�to�the�following:�(a)�estimates,�(b) CIs,�(c)�model fit,�(d)�R squared change, (e)�descriptives, (f) part and partial correlations, (g)�collinearity diag- nostics, (h)�Durbin–Watson, and (i)�casewise diagnostics��For�this�example,�we� apply�an�α�level�of��05;�thus,�we�will�leave�the�default�CI�percentage�at�95��If�we�were�using�a� different�α,�the�CI�would�be�the�complement�of�alpha�(e�g�,�α�=��01,�then�CI�=�1�−��01�=�99)��We� will�also�leave�the�default�of�“three standard deviations”�for�defining�outliers�for�the� casewise�diagnostics��Click�on�“Continue”�to�return�to�the�original�dialog�box� Multiple linear regression: Step 3 684 An Introduction to Statistical Concepts Step 4:�From�the�“Linear Regression”�dialog�box�(see�screenshot�Step�2),�clicking� on�“Plots”� will� provide� the� option� to� select� various� residual� plots�� From� the “Plots”� dialog�box,�place�a�checkmark�in�the�box�next�to�the�following:�(a)�histogram,�(b)�normal probability plot,�and�(c)�produce all partial plots.�Click�on�“Continue”�to� return�to�the�original�dialog�box� Mutiple linear regression: Step 4 Step 5:�From�the�“Linear Regression”�dialog�box�(see�screenshot�Step�2),�clicking�on� “Save”�will�provide�the�option�to�save�various�predicted�values,�residuals,�and�statistics�that� can�be�used�for�diagnostic�examination��From�the “Save”�dialog�box�under�the�heading�of� Predicted Values,�place�a�checkmark�in�the�box�next�to�the�following:�unstandard- ized.� Under� the� heading� of� Residuals,� place� a� checkmark� in� the� box� next� to� the� fol- lowing:�(a)�unstandardized�and�(b)�studentized. Under�the�heading�of�Distances,� place�a�checkmark�in�the�box�next�to�the�following:�(a)�Mahalanobis,�(b)�Cook’s,�and�(c)� leverage values��Under�the�heading�of�Influence Statistics,�place�a�checkmark�in� the�box�next�to�the�following:�standardized DfBeta(s).�Click�on�“Continue”�to�return� to�the�original�dialog�box��From�the�“Linear Regression”�dialog�box,�click�on�“OK”�to� return�and�generate�the�output� Mutiple linear regression: Step 5 Interpreting the output:�Annotated�results�are�shown�in�Table�18�3� 685Multiple Regression Table 18.3 SPSS�Results�for�the�Multiple�Regression�GRE–GPA�Example Descriptive Statistics Mean Std. Deviation N Graduate grade point average 11 GRE total score 11 Undergraduate grade point average 3.5000 112.7273 3.1091 .33166 16.33457 .40113 11 Correlations Graduate Grade Point Average GRE Total Score Undergraduate Grade Point Average Graduate grade point average GRE total score Pearson correlation Undergraduate grade point average Graduate grade point average GRE total score Sig. (1-tailed) Undergraduate grade point average Graduate grade point average 11 GRE total score N Undergraduate grade point average .784 1.000 .301 .002 . .184 11 11 .752 .301 1.000 .004 .184 . 11 11 11 The table labeled “Descriptive Statistics” provides basic descriptive statistics (means, standard deviations, and sample sizes) for the independent and dependent variables. �e table labeled “Correlations” provides the: Pearson correlation coefficient values, p values, and sample size for the simple bivariate Pearson correlation between the independent and dependent variables. 1.000 .784 .752 . .002 .004 11 11 11 �e correlation between graduate GPA and GRE-total (p = .002) and the correlation between graduate GPA and undergraduate GPA (p = .004) are statistically significant. Variables Entered/Removedb Model Variables Entered Variables Removed Method 1 Undergraduate grade point average, GRE total score Enter a All requested variables entered. b Dependent variable: Graduate grade point average. “Variables Entered/ Removed” lists the independent variables included in the model and the method they were entered (i.e., “Enter”). (continued) 686 An Introduction to Statistical Concepts Table 18.3 (continued) SPSS�Results�for�the�Multiple�Regression�GRE–GPA�Example Model Summaryb Change Statistics Model R R Square Adjusted R Square Std. Error of the Estimate R Square Change F Change df 1 df 2 Sig. F Change Durbin– Watson 1 .953a .908 .885 .11272 .908 39.291 2 8 .000 2.116 a Predictors: (Constant), undergraduate grade point average, GRE total score. b Dependent variable: Graduate grade point average. “Adjusted R square” adjusts for the number of independent variables and sample size. Shrinkage is the difference between R2 and adjusted R2. When sample size is small, given the number of independent variables, the difference between R2 and adjusted R2 will be large to compensate for a large amount of bias. If an additional independent variable were entered in the model, an increase in adjusted R2 indicates the new variable is adding value to the model. Negative adjusted R2 values can occur and indicate the model fits the data VERY poorly. R is the multiple correlation coefficient. R2 is the squared multiple correlation coefficient (aka, coefficient of determination). It represents the proportion of variance in the dependent variable that is explained by the independent variables. Durbin–Watson is a test for independence of residuals. Ranging from 0 to 4, values of 2 indicate uncorrelated errors; values less than 1 or greater than 3 indicate a likely violation of this assumption. Change statistics are used when methods other than simultaneous entry (e.g., hierarchical, forward, backward) are used to enter the predictors in the model. In those cases, more than one row will be presented here. A p value less than α would indicate the additional variables are explaining additional variation. Adjusted R2 is interpreted as the percentage of variation in the dependent variable that is explained after adjusting for sample size and the number of predictors. ANOVAb Model Sum of Squares df Mean Square F Sig. Regression .998 2 .499 39.291 .000a Residual .102 8 .013 1 Total 1.100 10 a Predictors: (Constant), undergraduate grade point average, GRE total score. b Dependent variable: Graduate grade point average. Total SS is partitioned into SS regression and SS residual. Regression sum of squares indicates variability explained by the regression model. Residual sum of squares indicates variability not explained by the regression model. The F statistic tests the overall regression model (i.e., that the population multiple correlation coefficient is zero). The p value (.000) indicates we reject the null hypothesis. The probability of finding a sample value of multiple R2 of .908 or larger when the true population multiple correlation coefficient is zero is less than 1%. 687Multiple Regression Table 18.3 (continued) SPSS�Results�for�the�Multiple�Regression�GRE–GPA�Example Coefficientsa t = = .093 .469 5.043 Unstandardized Coefficients Standardized Coefficients 95.0% Confidence Interval for B Correlations CollinearityStatistics Model B Std. Error Beta t Sig. Lower Bound Upper Bound Zero- Order Partial Part Tolerance VIF (Constant) GRE total score .585 1.100 1 UGPA .638 .012 .469 .327 .002 .093 .614 .567 1.954 5.447 5.030 .087 .001 .001 –.115 .007 .254 1.391 .018 .684 .784 .752 .887 .872 .541 .909 .909 1.100 a Dependent variable: Graduate grade point average. �e “Constant” is the intercept and the unstandardized coefficient tells us that if the predictors were zero, graduate GPA (the dependent variable) would be .638. �e “GRE-Total” and “UGPA” are the slopes. For every one point increase in GRE-total, graduate GPA will increase by about 1/10 of one point (holding constant undergraduate GPA). For every one point increase in undergraduate GPA, graduate GPA will increase by about ½ of one point (holding constant GRE-total). �e test statistic, t, is calculated as the unstandardized coefficient divided by its standard error. �us the slope for undergraduate GPA is calculated as (difference due to rounding): The p value for the intercept (the “constant”) ( p= .087) indicates that the intercept is not statistically significantly different from zero (this finding is usually of less interest than the slopes). The p values for GRE-total and undergraduate GPA (the independent variables) ( p = .001) indicate that the slopes are statistically significantly different from zero. Zero-order correlations are the simple bivariate Pearson correlations between the dependent variable and the independent variables. �e partial correlation of .887 is the correlation between GRE-total and graduate GPA (dependent variable) when the linear effect of undergraduate GPA has been removed from both GRE-total and graduate GPA. Squaring this indicates that 78.7% of the variation in graduate GPA that is not explained by undergraduate GPA is explained by GRE-total. �e part correlation of .585, when squared (i.e., .342) indicates that GRE- total explains an additional 34% of the variance in graduate GPA over and above the variance in graduate GPA which is explained by undergraduate GPA. Collinearity statistics to be reviewed under assumptions. (continued) 688 An Introduction to Statistical Concepts Table 18.3 (continued) SPSS�Results�for�the�Multiple�Regression�GRE–GPA�Example Collinearity Diagnosticsa Variance Proportions Model Dimension Eigenvalue Condition Index (Constant) GRE Total Score Undergraduate Grade Point Average 1 2.981 1.000 .00 .00 .00 2 .012 15.727 .03 .86 .40 1 3 .007 20.537 .97 .13 .60 a Dependent variable: Graduate grade point average. Residuals Statisticsa Minimum Maximum Mean Std. Deviation N 11 11 11 11 11 11 11 11 11 11 11 value Residual Predicted value Std. predicted value Standard error of predicted Adjusted predicted value Std. residual Stud. residual Deleted residual Stud. deleted residual Mahal. distance Cook's distance Centered leverage value 3.0714 –1.357 .038 3.0599 –.19943 –1.769 –1.881 –.22531 –2.355 .240 .012 .024 3.9448 1.408 .079 3.9117 .17207 1.527 1.716 .21754 2.020 4.053 .260 .405 3.5000 .000 .058 3.4954 .00000 .000 .017 .00458 .000 1.818 .092 .182 .31597 1.000 .011 .30917 .10082 .894 1.008 .12935 1.145 1.048 .081 .105 11 a Dependent variable: Graduate grade point average. “Residual statistics” and related graphs (histogram and Q–Q plot of standardized residuals, not presented here) will be examined in our discussion of assumptions. “Collinearity diagnostics” will be examined in our discussion of assumptions. 689Multiple Regression Table 18.3 (continued) SPSS�Results�for�the�Multiple�Regression�GRE–GPA�Example Histogram Dependent variable: Graduate grade point average Mean = 3.61E–16 Std. dev. = 0.894 N = 11 Regression standardized residual Fr eq ue nc y 3 2 1 0 –2 –1 0 1 2 Normal p–p plot of regression standardized residual Dependent variable: Graduate grade point average 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Ex pe ct ed c um p ro b Observed cum prob Partial regression plot Dependent variable: Graduate grade point average GRE total score –20.00 –10.00 .00 10.00 20.00 30.00 40.00 –.40 –.20 .00 .20 .40 .60 G ra du at e gr ad e po in t a ve ra ge (continued) 690 An Introduction to Statistical Concepts Table 18.3 (continued) SPSS�Results�for�the�Multiple�Regression�GRE–GPA�Example Partial regression plot Dependent variable: Graduate grade point average Undergraduate grade point average –.75 –.40 –.20 .00 .20 –.50 –.25 .00 .25 .50 G ra du at e gr ad e po in t a ve ra ge Examining Data for Assumptions for Multiple Linear Regression As�you� may� recall,�there�were� a�number�of�assumptions�associated�with�multiple�linear� regression�� These� included� (a)� independence,� (b)� homogeneity� of� variance,� (c)� linear- ity,�(d)�normality,�and�(e)�multicollinearity��Although�fixed�values�of�X�were�discussed�in� assumptions,�this�is�not�an�assumption�that�will�be�tested�but�is�instead�related�to�the�use� of�the�results�(i�e�,�extrapolation�and�interpolation)� Before�we�begin�to�examine�assumptions,�let�us�review�the�values�that�we�requested�to� be�saved�to�our�dataset�(see�dataset�screenshot�that�follows)� � 1��PRE _ 1�represents�the�unstandardized�predicted�values�(i�e�,�Y′i)� � 2��RES _ 1� represents� the� unstandardized� residuals,� simply� the� difference� between� the� observed� and� predicted� values�� For� student� 1,� for� example,� the� observed�value�for�the�GGPA�(i�e�,�the�dependent�variable)�was�4,�and�the�pre- dicted�value�was�3�94483��Thus,�the�unstandardized�residual�is�simply�4�−�3�94483,� or��05517� � 3��SRE _ 1� represents� the� studentized� residuals,� a� type� of� standardized� resid- ual� that� is� more� sensitive� to� outliers� as� compared� to� standardized� residuals�� Studentized� residuals� are� computed� as� the� unstandardized� residual� divided� by� an� estimate� of� the� standard� deviation� with� that� case� removed�� As� a� rule� of� thumb,�studentized�residuals�with�an�absolute�value�greater�than�3�are�consid- ered�outliers�(Stevens,�1984)� � 4��MAH _ 1� represents� Mahalanobis� distance� values� which� measure� how� far� that� particular� case� is� from� the� average� of� the� independent� variable� and� thus� can� be� 691Multiple Regression helpful�in�detecting�outliers��These�values�can�be�reviewed�to�determine�cases�that� are�exerting�leverage��Barnett�and�Lewis�(1978)�produced�a�table�of�critical�values� for�evaluating�Mahalanobis�distance��Squared�Mahalanobis�distances�divided�by� the�number�of�variables�(D2/df )�which�are�greater�than�2�5�(for�small�samples)�or� 3–4�(for�large�samples)�are�suggestive�of�outliers�(Hair,�Black,�Babin,�Anderson,�&� Tatham,� 2006)�� Later,� we� follow� another� convention� for� examining� these� values� using�the�chi-square�distribution� � 5� COO _ 1�represents�Cook’s�distance�values�and�provide�an�indication�of�influence� of�individual�cases��As�a�rule�of�thumb,�Cook’s�values�greater�than�1�suggest�that� case�is�potentially�problematic� � 6��LEV _ 1�represents�leverage�values,�a�measure�of�distance�from�a�respective�case� to�the�average�of�the�predictor� � 7��SDB0 _ 1, SDB1 _ 1,� and� SDB2 _ 1� are� standardized� DFBETA� values� for� the�intercept�and�slopes,�respectively,�and�are�easier�to�interpret�as�compared� to� their� unstandardized� counterparts�� Standardized� DFBETA� values� greater� than�an�absolute�value�of�2�suggest�that�the�case�may�be�exerting�undue�influ- ence� on� the� calculation� of� the� parameters� in� the� model� (i�e�,� the� slopes� and� intercept)� 1 2 As we look at the raw data, we see nine new variables have been added to our dataset. These are our predicted values, residuals, and other diagnostic statistics. The residuals will be used to for diagnostics to review the extent to which our data meet the assumptions of multiple linear regression. 3 4 5 6 7 7 7 Independence Here� we� will� plot� the� following:� (a)� studentized� residuals� (which� were� requested� and� created� through� the� “Save” option� when� generating� our� model)� against� unstandard- ized�predicted�values�and�(b)�studentized�residuals�against�each�independent�variable�to� examine� the� extent� to� which� independence� was� met�� The� general� steps� for� generating� a� simple� scatterplot� through� “Scatter/dot”� have� been� presented� in� a� previous� chapter� (e�g�,�Chapter�10),�and�they�will�not�be�reiterated�here��From�the�“Simple Scatterplot”� dialog�screen,�click�the�studentized�residual�variable�and�move�it�into�the�“Y Axis”�box� by�clicking�on�the�arrow��Click�the�unstandardized�predicted�values�and�move�them�into� the “X Axis”�box�by�clicking�on�the�arrow��Then�click�“Ok.”�Repeat�these�steps�to�plot� the�studentized�residual�to�each�independent�variable� 692 An Introduction to Statistical Concepts If�the�assumption�of�independence�is�met,�the�points�should�fall�randomly�within�a�band� of�−2�0�to�+2�0��In�this�illustration�(see�Figure�18�1),�we�have�evidence�of�independence�as� all�points�for�all�graphs�are�within�an�absolute�value�of�2�0�and�fall�relatively�randomly� Homogeneity of Variance We�can�use�the�same�plots�that�were�used�to�examine�independence��To�examine�the�extent� to�which�homogeneity�was�met,�we�plot�(a)�studentized�residuals�(which�were�requested� and�created�through�the�“Save”�option�when�generating�our�model)�against�unstandard- ized� predicted� values� and� (b)� studentized� residuals� against� each� independent� variable�� Recall� that� homogeneity� is� when� the� dependent� variable� has� the� same� variance� for� all� values�of�the�independent�variable� Evidence�of�meeting�the�assumption�of�homogeneity�is�a�plot�where�the�spread�of�residu- als�appears�fairly�constant�over�the�range�of�unstandardized�predicted�values�(i�e�,�a�ran- dom�display�of�points)�and�observed�values�of�the�independent�variables��If�the�display�of� residuals�increases�or�decreases�across�the�plot,�then�there�may�be�an�indication�that�the� assumption�of�homogeneity�has�been�violated��Here�we�see�evidence�of�homogeneity� Linearity Since�we�have�more�than�one�independent�variable,�we�have�to�take�a�different�approach�to� examining� linearity� than� what� was� done� with� simple� linear� regression�� However,� we� can� use�the�same�information�gleaned�from�our�examination�of�independence�and�homogeneity� for�reviewing�the�assumption�of�linearity��As�those�steps�have�been�presented�previously�in� the�discussion�of�independence,�they�will�not�be�repeated�here��From�the�scatterplot,�there�is� a�general�positive�linear�relationship�between�the�variables,�and,�thus,�we�have�evidence�of� linearity��We�can�also�review�the�partial�regression�plots�that�we�asked�for�when�generating� the� regression� model�� A� separate� partial� regression� plot� is� provided� for� each� independent� variable,� where� we� are� looking� for� linearity� (rather� than� some� type� of� polynomial)�� Even� with�a�small�sample�size,�the�partial�regression�plots�suggest�evidence�of�linearity� 693Multiple Regression Normality Generating normality evidence:�Understanding�the�distributional�shape,�specifi- cally�the�extent�to�which�normality�is�a�reasonable�assumption,�is�important�in�multiple� linear�regression�just�as�it�was�in�simple�linear�regression��We�will�examine�residuals�for� normality,� following� the� same� steps� as� with� the� previous� procedures�� We� will� also� use� various�diagnostics�to�examine�our�data�for�influential�cases��Let�us�begin�by�examining� the�unstandardized�residuals�for�normality��Just�as�we�saw�with�simple�linear�regression,� for� multiple� linear� regression,� the� distributional� shape� of� the� unstandardized� residuals� should�be�normal��Because�the�steps�for�generating�normality�evidence�were�presented�in� previous�chapters,�they�will�not�be�repeated�here� Interpreting normality evidence:�By�this�point,�we�are�well�versed�in�interpret- ing�quite�a�range�of�normality�statistics�and�will�do�the�same�for�multiple�linear�regression� Descriptives MeanUnstandardized residual Lower bound Statistic Std. Error .03039717.0000000 –.0677291 .0677291 .0015202 .0281190 .010 .10081601 –.19943 .17207 .37150 .14051 –.336 .484 .661 1.279 Upper boundfor mean Median 95% Confidence interval 5% Trimmed mean Variance Std. deviation Minimum Maximum Range Interquartile range Skewness Kurtosis The�skewness�statistic�of�the�residuals�is�−�336�and�kurtosis�is��484—both�being�within�the� range�of�an�absolute�value�of�2�0,�suggesting�some�evidence�of�normality��Given�the�very�small� sample�size,�the�following�histogram�reflects�as�normal�a�distribution�as�might�be�expected� Histogram 5 Mean = 3.82E–17 Std. dev. = .10082 N=11 4 3 2 1 0 –.20000 –.10000 –.00000 Unstandardized residual Fr eq ue nc y .10000 .20000 694 An Introduction to Statistical Concepts There�are�a�few�other�statistics�that�can�be�used�to�gauge�normality��The�formal�test� of� normality,� the� Shapiro–Wilk� (S–W)� test� (SW)� (Shapiro� &� Wilk,� 1965),� provides� evi- dence� of� the� extent� to� which� our� sample� distribution� is� statistically� different� from� a� normal�distribution��The�output�for�the�S–W�test�is�presented�as�follows�and�suggests� that� our� sample� distribution� for� the� residual� is� not� statistically� significantly� different� than�what�would�be�expected�from�a�normal�distribution�as�the�p�value�is�greater�than� α�(p�=��918)� Tests of Normality Kolmogorov–Smirnova Unstandardized residual a Lilliefors significance correction. *This is a lower bound of the true significance. .155 Statistic df dfSig. Sig. 11 .200* .973 11 .918 Statistic Shapiro–Wilk Q–Q� plots� are� also� often� examined� to� determine� evidence� of� normality�� Q–Q� plots� graph� quantiles� of� the� theoretical� normal� distribution� against� quantiles� of� the� sample� distribution��Points�that�fall�on�or�close�to�the�diagonal�line�suggest�evidence�of�normal- ity��The�Q–Q�plot�of�residuals�(see�Figure�18�2)�suggests�relative�normality��Examination� of�the�following�boxplot�also�suggests�a�relatively�normal�distribution�of�residuals�with� no�outliers� .20000 .10000 .00000 –.10000 –.20000 Unstandardized residual Considering�the�forms�of�evidence�we�have�examined,�skewness�and�kurtosis�statistics,� the�S–W�test,�histogram,�the�Q–Q�plot,�and�the�boxplot,�all�suggest�normality�is�a�reason- able�assumption� 695Multiple Regression Screening Data for Influential Points Casewise diagnostics:�Recall�that�we�requested�a�number�of�statistics�to�help�in�diag- nostics��One�that�we�requested�was�for�“Casewise�diagnostics�”�If�we�had�any�cases�with� large� values� for� the� standardized� residual� (outside� three� standard� deviations),� informa- tion� would� have� been� included� in� our� output� to� indicate� the� case� number,� value� of� the� standardized� residual,� predicted� value,� and� unstandardized� residual�� This� information� can�be�used�to�more�closely�examine�case(s)�with�the�extreme�values�on�the�standardized� residuals� Cook’s distance:� Cook’s� distance� provides� an� overall� measure� for� the� influence� of� individual�cases��Values�greater�than�1�suggest�that�the�case�may�be�problematic�in�terms� of�undue�influence�on�the�model��Examining�the�residual�statistics�in�our�output�(see�fol- lowing�table),�we�see�that�the�maximum�value�for�Cook’s�distance�is��260,�well�under�the� point�at�which�we�should�be�concerned� Residuals Statisticsa Minimum Maximum Mean Std. Deviation N 11 11 11 11 11 11 11 11 11 11 11 11 .31597 1.000 .011 .30917 .10082 .894 1.008 .12935 1.145 1.048 .081 .105 3.5000 .000 .058 3.4954 .00000 .000 .017 .00458 .000 1.818 .092 .182 3.9448 1.408 .079 3.9117 .17207 1.527 1.716 .21754 2.020 4.053 .260 .405 3.0714 –1.357 .038 3.0599 –.19943 –1.769 –1.881 –.22531 –2.355 .240 .012 .024 Predicted value Std. predicted value Standard error of predicted value Adjusted predicted value Residual Std. residual Stud. residual Deleted residual Stud deleted residual Mahal distance Cook’s distance Centered leverage value a Dependent variable. Graduate grade point average. Mahalanobis distances:� Mahalanobis� distances� are� measures� of� the� distance� from� each� case� to� the� mean� of� the� independent� variable� for� the� remaining� cases�� We� can� use� the� value� of� Mahalanobis� distance� as� a� test� statistic� value� with� the� chi-square� distribu- tion��With�two�independent�variables�and�one�dependent�variable,�we�have�three�degrees� of� freedom�� Given� an� alpha� level� of� �05,� the� chi-square� critical� value� is� 7�82�� Thus,� any� Mahalanobis�distance�greater�than�7�82�suggests�that�case�is�an�outlier��With�a�maximum� of�4�053�(see�previous�table),�there�is�no�evidence�to�suggest�there�are�outliers�in�our�data� Centered leverage values:�Centered�leverage�values�less�than��20�suggest�there�are� no�problems�with�cases�that�are�exerting�undue�influence��Values�greater�than��5�indicate� problems� 696 An Introduction to Statistical Concepts DFBETA:�We�also�asked�to�save�DFBETA�values��These�values�provide�another�indication� of� the� influence� of� cases�� DFBETA� provides� information� on� the� change� in� the� predicted� value�when�the�case�is�deleted�from�the�model��For�standardized�DFBETA�values,�values� greater�than�an�absolute�value�of�2�0�should�be�examined�more�closely��Looking�at�the�min- imum�and�maximum�DFBETA�values,�there�are�no�cases�suggestive�of�undue�influence� Descriptive Statistics N Minimum Maximum 11 11 11 11 –.51278 –.75577 –.32176 .63170 .59269 .55938 Standardized DFBETA Standardized DFBETA GRE total Standardized DFBETA UGPA Valid N (listwise) intercept Diagnostic plots:�There�are�a�number�of�diagnostic�plots�that�can�be�generated�from� the�values�we�saved��For�example,�a�plot�of�Cook’s�distance�against�centered�leverage�val- ues�provides�a�way�to�identify�influential�cases�(i�e�,�cases�with�leverage�of��50�or�above�and� Cook’s�distance�of�1�0�or�greater)��Here�there�are�no�cases�that�suggest�undue�influence� Centered leverage value C oo k’ s d is ta nc e .00000 .10000 .20000 .30000 .40000 .50000 .00000 .10000 .05000 .15000 .20000 .25000 .30000 Multicollinearity Generating multicollinearity evidence:� Multicollinearity,� as� you� recall,� refers� to�strong�correlations�between�the�independent�variables��Detecting�multicollinearity�can� be� done� by� reviewing� the� VIF� and� tolerance� statistics�� From� the� following� table,� we� see� tolerance�and�VIF�values��Tolerance�is�calculated�as�(1�−�R2),�and�values�close�to�0�(a�rule�of� 697Multiple Regression thumb�is��10�or�less)�suggest�potential�multicollinearity�problems��Why?�A�tolerance�of� �10�suggests�that�90%�(or�more)�of�the�variance�in�one�of�the�independent�variables�can�be� explained�by�another�independent�variable��VIF�is�the�“variance�inflation�factor”�and�is�the� reciprocal�of � tolerance�where�VIF tolerance = 1 ��VIF�values�greater�than�10�(which�correspond � to�a�tolerance�of��10)�suggest�potential�multicollinearity� Collinearity Statistics Tolerance VIF .909 1.100 .909 1.100 Collinearity� diagnostics� (see� the� following� SPSS� output)� can� also� be� reviewed�� “Dimension� 1”� refers� to� the� intercept;� however,� we� are� interested� in� reviewing� data� for� “dimensions�2�and�3�”�Multiple�eigenvalues�close�to�0�indicate�independent�variables�that� have�strong�intercorrelations��The�condition�index�is�calculated�as�the�square�root�of�the� ratio�of�the�largest�eigenvalue�to�each�preceding�eigenvalue�(e�g�,� 2.981 .012 15.76= )��Condition� indices�greater�than�15�suggest�there�is�a�possible�problem�with�multicollinearity,�and�val- ues�greater�than�30�indicate�a�substantial�multicollinearity�problem��In�this�case,�both�the� eigenvalues�and�condition�indices�suggest�possible�problems�with�multicollinearity� Model 1 1 2 3 2.981 .012 .007 1.000 15.727 20.537 .00 .03 .97 .00 .86 .13 .00 .40 .60 Dimension Eigenvalue a Dependent variable: Graduate grade point average. Condition Index (Constant) GRE Total Score Undergraduate Grade Point Average Variance Proportions Collinearity Diagnosticsa Multicollinearity� can� also� be� examined� by� computing� regression� models� where� each� independent�variable�is�considered�the�outcome�and�is�predicted�by�the�remaining�indepen- dent�variables�(the�dependent�variable�is�not�included�in�these�models)��Because�the�steps� for�conducting�regression�have�already�been�presented,�they�will�not�be�repeated�again�� Click�one�of�the�independent�variables�(e�g�,�“UGPA”)�and�move�it�into�the�“Dependent”� box�by�clicking�the�arrow�button��Click�the�remaining�independent�variable(s)�and�move� those�into�the�“Independent(s)”�box�by�clicking�the�arrow�button� Interpreting multicollinearity evidence:� If� any� of� the� resultant� Rk 2� values� are� close� to� 1� (greater� than� �9� is� a� good� rule� of� thumb),� then� there� may� be� a� collinearity� problem��For�the�example�data,�R12 2 091= . �and�therefore�collinearity�is�not�a�concern��Note� that� in� multiple� regression� situations� where� there� are� two� independent� variables� (as� in� this� example� with� GRE� total� and� UGPA),� only� one� regression� needs� to� be� conducted� to� check�for�multicollinearity�as�the�results�for�regressing�UGPA�on�GRE�total�are�the�same� as�regressing�GRE�total�on�UGPA� 698 An Introduction to Statistical Concepts R R SquareModel R Square 1 .301a .091 –.010 16.41926 Adjusted Std. Error of the Estimate 18.8 G*Power A� priori� and� post� hoc� power� could� again� be� determined� using� the� specialized� software� described�previously�in�this�text�(e�g�,�G*Power),�or�you�can�consult�a�priori�power�tables�(e�g�,� Cohen,�1988)��As�an�illustration,�we�use�G*Power�to�compute�the�post�hoc�power�of�our�test� Post Hoc Power for Multiple Linear Regression Using G*Power The�first�thing�that�must�be�done�when�using�G*Power�for�computing�post�hoc�power�is� to� select� the� correct� test� family�� In� our� case,� we� conducted� a� multiple� linear� regression�� To� find� regression,� we� select�“Tests”� in� the� top� pulldown� menu,� then�“Correlation and regression,”�and�then�“Linear multiple regression: Fixed model, R2 deviation from zero.”�This�will�allow�us�to�determine�power�for�the�hypothesis�that� the�overall�multiple�R2�is�equal�to�0�(i�e�,�power�for�the�overall�regression�model)��Once�that� selection�is�made,�the�“Test family”�automatically�changes�to�“F test.” Step 1 A B C 699Multiple Regression The�“Type of Power Analysis”�desired�needs�to�be�selected��To�compute�post�hoc� power,�select�“Post hoc: Compute achieved power—given α, sample size, and effect size.” Step 2 Click on “Determine” to pop out the effect size calculator box (shown below). �is will allow you to compute the effect size, “f2,” given the squared multiple correlation. �e default selection for “Statistical Test” is “Correlation: Point biserial model.” Following the procedures presented in Step 1 will automatically change the statistical test to “Linear multiple regression: Fixed model, R2 deviation from zero”. �e default selection for “Test Family” is “t tests” and this will change to “F tests” when the linear multiple regression is selected. Once the parameters are specified, click on “Calculate.” The “Input Parameters” for computing post hoc power must be specified including: 1. Effect size f 2 2. α Level 3. Total sample size 4. Number of predictors The�“Input Parameters”�must�then�be�specified��We�compute�the�effect�size,�f 2,�last� and�so�we�skip�that�for�the�moment��The�α�level�we�used�was��05,�the�total�sample�size�was� 11,�and�there�were�two�independent�variables��Next�we�use�the�pop-out�effect�size�calcula- tor�in�G*Power� to�compute� the�effect� size�f 2��To�do�this,� click� on�“Determine”� which� is� displayed�under�“Input Parameters.”�In�the�pop-out�effect�size�calculator,�input�the� value�for�the�squared�multiple�correlation��Click�on�“Calculate”�to�compute�the�effect� size�f 2��Then�click�on�“Calculate and Transfer to Main Window”�to�transfer�the� calculated�effect�size�(i�e�,�9�8695652)�to�the�“Input Parameters.”�Once�the�parameters� are�specified,�click�on�“Calculate”�to�find�the�power�statistics� 700 An Introduction to Statistical Concepts Post hoc power Here are the post-hoc power results. The�“Output Parameters”�provide�the�relevant�statistics�given�the�input�just�speci- fied��Here�we�were�interested�in�determining�post�hoc�power�for�a�multiple�linear�regres- sion�with�a�computed�effect�size�f 2�of�9�8695652,�an�alpha�level�of��05,�total�sample�size�of�11,� and�two�predictors��Based�on�those�criteria,�the�post�hoc�power�for�the�overall�multiple�lin- ear�regression�model�was�1�0000��In�other�words,�given�the�input�parameters,�the�probabil- ity�of�rejecting�the�null�hypothesis�when�it�is�really�false�(in�this�case,�the�probability�that� the�multiple�correlation�coefficient�is�0)�was�at�the�maximum�(i�e�,�1�00)�(sufficient�power�is� often��80�or�above)��Do�not�forget�that�conducting�power�analysis�a�priori�is�recommended� so�that�you�avoid�a�situation�where,�post�hoc,�you�find�that�the�sample�size�was�not�suf- ficient� to� reach� the� desired� level� of� power� (given� the� observed� parameters)�� Conducting� power� for� change� in� R2� and� for� the� slopes� can� be� conducted� similarly� by� selecting� the� test�family�of�“Linear multiple regression: Fixed model, R2 increase” or “Linear multiple regression: Fixed model, single regression coeffi- cient,” respectively� 701Multiple Regression A priori power Here are the post-hoc power results. A Priori Power for Multiple Linear Regression Using G*Power For� a� priori� power,� we� can� determine� the� total� sample� size� needed� for� multiple� linear� regression�given�the�estimated�effect�size�f 2,�α�level,�desired�power,�and�number�of�pre- dictors��We�follow�Cohen’s�(1988)�conventions�for�effect�size�(i�e�,�small�r2�=��02;�moderate� r2�=��15;�large�r2�=��35)��If�we�had�estimated�a�moderate�effect�r2�of��15,�alpha�of��05,�observed� power�of��80,�and�two�independent�variables,�we�would�need�a�total�sample�size�of�58� 18.9 Template and APA-Style Write-Up Finally,�here�is�an�example�paragraph�for�the�results�of�the�multiple�linear�regression�anal- ysis��Recall�that�our�graduate�research�assistant,�Marie,�was�assisting�the�assistant�dean�in� Graduate�Student�Services,�Jennifer��Jennifer�wanted�to�know�if�GGPA�could�be�predicted� by�the�total�score�on�the�required�graduate�entrance�exam�(GRE�total)�and�by�UGPA��The� research�question�presented�to�Jennifer�from�Marie�included�the�following:�Can GGPA be predicted from the GRE total and UGPA? Marie�then�assisted�Jennifer�in�generating�a�multiple�linear�regression�as�the�test�of�infer- ence,�and�a�template�for�writing�the�research�question�for�this�design�is�presented�as�follows: •� Can [dependent variable] be predicted from [list independent variables]? 702 An Introduction to Statistical Concepts It�may�be�helpful�to�preface�the�results�of�the�multiple�linear�regression�with�information� on� an� examination� of� the� extent� to� which� the� assumptions� were� met�� The� assumptions� include�(a)�independence,�(b)�homogeneity�of�variance,�(c)�normality,�(d)�linearity,�(e)�non- collinearity,�and�(f)�values�of�X�are�fixed��Because�the�last�assumption�(fixed�X)�is�based�on� interpretation,�it�will�not�be�discussed�here� A multiple linear regression model was conducted to determine if GGPA (dependent variable) could be predicted from GRE total scores and UGPA (independent variables). The null hypotheses tested were that the multiple R2 was equal to 0 and that the regression coefficients (i.e., the slopes) were equal to 0. The data were screened for miss- ingness and violation of assumptions prior to analysis. There were no missing data. Linearity: Review of the partial scatterplot of the independent vari- ables (GRE total and UGPA) and the dependent variable (GGPA scores) indicates linearity is a reasonable assumption. Additionally, with a random display of points falling within an absolute value of 2, a scatterplot of unstandardized residuals to predicted values provided further evidence of linearity. Normality: The assumption of normality was tested via examination of the unstandardized residuals. Review of the S–W test for normal- ity (SW = .973, df = 11, p = .918) and skewness (−.336) and kurtosis (.484) statistics suggested that normality was a reasonable assump- tion. The boxplot suggested a relatively normal distributional shape (with no outliers) of the residuals. The Q–Q plot and histogram sug- gested normality was reasonable. Examination of casewise diagnos- tics, including Mahalanobis distance, Cook’s distance, DfBeta values, and centered leverage values, suggested there were no cases exerting undue influence on the model. Independence: A relatively random display of points in the scat- terplots of studentized residuals against values of the indepen- dent variables and studentized residuals against predicted values provided evidence of independence. The Durbin–Watson statistic was computed to evaluate independence of errors and was 2.116, which is considered acceptable. This suggests that the assumption of indepen- dent errors has been met. Homogeneity of variance: A relatively random display of points, where the spread of residuals appears fairly constant over the range of values of the independent variables (in the scatterplots of studen- tized residuals against predicted values and studentized residuals against values of the independent variables) provided evidence of homogeneity of variance. Multicollinearity: Tolerance was greater than .10 (.909), and the variance inflation factor was less than 10 (1.100), suggesting that multicollinearity was not an issue. However, the eigenvalues for the predictors were close to 0 (.012 and .007). A review of GRE total 703Multiple Regression regressed on UGPA, however, produced a multiple R squared of .091, which suggests noncollinearity. In aggregate, therefore, the evidence suggests that multicollinearity is not an issue. Here� is� an� APA-style� example� paragraph� of� results� for� the� multiple� linear� regression� (remember� that� this� will� be� prefaced� by� the� previous� paragraph� reporting� the� extent� to� which�the�assumptions�of�the�test�were�met)� The results of the multiple linear regression suggest that a sig- nificant proportion of the total variation in GGPA was predicted by GRE total and UGPA, F(2, 8) = 39.291, p < .001. Additionally, we find the following: 1. For GRE total, the unstandardized partial slope (.012) and standardized partial slope (.614) are statistically signifi- cantly different from 0 (t = 5.447, df = 8, p < .001); with every one-point increase in the GRE total, GGPA will increase by approximately 1/100 of one point when controlling for UGPA. 2. For UGPA, the unstandardized partial slope (.469) and standard- ized partial slope (.567) are statistically significantly dif- ferent from 0 (t = 5.030, df = 8, p < .001); with every one-point increase in UGPA, GGPA will increase by approximately one-half of one point when controlling for GRE total. 3. The CI around the unstandardized partial slopes do not include 0 (GRE total, .007, .018; UGPA, .254, .684), further confirming that these variables are statistically significant predictors of GGPA. Thus, GRETOT and UGPA were shown to be statistically sig- nificant predictors of GGPA, both individually and collectively. 4. The intercept (or average GGPA when GRE total and UGPA is 0) was .638, not statistically significantly different from 0 (t = 1.954, df = 8, p = .087). 5. Multiple R2 indicates that approximately 91% of the variation in GGPA was predicted by GRE total scores and UGPA. Interpreted according to Cohen (1988), this suggests a large effect. 6. Estimated power to predict multiple R2 is at the maximum, 1.00. We�note�that�the�more�advanced�regression�models�described�in�this�chapter�can�all�be�con- ducted�using�SPSS��For�further�information�on�regression�analysis�with�SPSS,�see�Morgan� and�Griego�(1998),�Weinberg�and�Abramowitz�(2002),�and�Meyers�et�al��(2006)� 18.10 Summary In� this� chapter,� methods� involving� multiple� predictors� in� the� regression� context� were� considered�� The� chapter� began� with� a� look� at� partial� and� semipartial� correlations�� Next,� a� lengthy� discussion� of� multiple� linear� regression� analysis� was� conducted�� Here� we� 704 An Introduction to Statistical Concepts extended� many� of� the� basic� concepts� of� simple� linear� regression� to� the� multiple� predic- tor� context�� In� addition,� several� new� concepts� were� introduced,� including� the� coefficient� of�multiple�determination,�the�multiple�correlation,�and�tests�of�the�individual�regression� coefficients��Finally�we�examined�a�number�of�other�regression�models,�such�as�forward� selection,�backward�elimination,�stepwise�selection,�all�possible�subsets�regression,�hier- archical�regression,�and�nonlinear�regression��At�this�point,�you�should�have�met�the�fol- lowing�objectives:�(a)�be�able�to�determine�and�interpret�the�results�of�part�and�semipartial� correlations,�(b)�be�able�to�understand�the�concepts�underlying�multiple�linear�regression,� (c)�be�able�to�determine�and�interpret�the�results�of�multiple�linear�regression,�(d)�be�able�to� understand�and�evaluate�the�assumptions�of�multiple�linear�regression,�and�(e)�be�able�to�have� a�basic�understanding�of�other�types�of�regression�models��In�Chapter�19,�we�conclude�the� text�by�considering�logistic�regression�analysis� Problems Conceptual problems 18.1� �The�correlation�of�salary�and�cumulative�GPA�controlling�for�socioeconomic�status� is�an�example�of�which�one�of�the�following? � a�� Bivariate�correlation � b�� Partial�correlation � c�� Regression�correlation � d�� Semipartial�correlation 18.2� �Variable�1�is�to�be�predicted�from�a�combination�of�variable�2�and�one�of�variables�3,� 4,�5,�and�6��The�correlations�of�importance�are�as�follows: � r13�=��8� � r23�=��2 � r14�=��6� � r24�=��5 � r15�=��6� � r25�=��2 � r16�=��8� � r26�=��5 � Which�of�the�following�multiple�correlation�coefficients�will�have�the�largest�value? � a�� r1�23 � b�� r1�24 � c�� r1�25 � d�� r1�26 18.3� �The�most�accurate�predictions�are�made�when�the�standard�error�of�estimate�equals� which�one�of�the�following? � a�� Y – � b�� sY � c�� 0 � d�� 1 18.4� The�intercept�can�take�on�a�positive�value�only��True�or�false? 705Multiple Regression 18.5� ��Adding�an�additional�predictor�to�a�regression�equation�will�necessarily�result�in� an�increase�in�R2��True�or�false? 18.6� �The�best�prediction�in�multiple�regression�analysis�will�result�when�each�predictor� has�a�high�correlation�with�the�other�predictor�variables�and�a�high�correlation�with� the�dependent�variable��True�or�false? 18.7� Consider�the�following�two�situations: � Situation�1� rY1�=��6� rY2�=��5� r12�=��0 � Situation�2� rY1�=��6� rY2�=��5� r12�=��2 I�assert�that�the�value�of�R2�will�be�greater�in�situation�2��Am�I�correct? 18.8� �Values�of�variables�X1,�X2,�and�X3�are�available�for�a�sample�of�50�students��The�value� of� r12� =� �6�� I� assert� that� if� the� partial� correlation� r12�3� were� calculated,� it� would� be� larger�than��6��Am�I�correct? 18.9� �A�researcher�is�building�a�regression�model��There�is�theory�to�suggest�that�science� ability�can�be�predicted�by�literacy�skills�when�controlling�for�child�characteristics� (e�g�,�age�and�socioeconomic�status)��Which�one�of�the�following�variable�selection� procedures�is�suggested? � a�� Backward�elimination � b�� Forward�selection � c�� Hierarchical�regression � d�� Stepwise�selection 18.10� �I�assert�that�the�forward�selection,�backward�elimination,�and�stepwise�regression� methods� will� always� arrive� at� the� same� final� model,� given� the� same� dataset� and� level�of�significance?�Am�I�correct? 18.11� �I�assert�the�Radj 2 �will�always�be�larger�for�the�model�with�the�most�predictors��Am�I� correct? 18.12� �In�a�two-predictor�regression�model,�if�the�correlation�among�the�predictors�is��95� and�VIF�is�20,�then�we�should�be�concerned�about�collinearity��True�or�false? Computational problems 18.1� �You�are�given�the�following�data,�where�X1�(hours�of�professional�development)�and� X2�(aptitude�test�scores)�are�used�to�predict�Y�(annual�salary�in�thousands): Y X1 X2 40 100 10 50 200 20 50 300 10 70 400 30 65 500 20 65 600 20 80 700 30 Determine�the�following�values:�intercept,�b1,�b2,�SSres,�SSreg,�F,�s2res,�s(b1),�s(b2),�t1,�t2� 706 An Introduction to Statistical Concepts 18.2� �You�are�given�the�following�data,�where�X1�(final�percentage�in�science�class)�and� X2�(number�of�absences)�are�used�to�predict�Y�(standardized�science�test�score�in� third�grade): Y X1 X2 300 65 7 480 98 0 350 70 3 420 80 2 400 82 0 335 70 3 370 75 4 390 80 1 485 99 0 415 95 2 375 88 3 Determine�the�following�values:�intercept,�b1,�b2,�SSres,�SSreg,�F,�s2res,�s(b1),�s(b2),�t1,�t2� 18.3� �Complete�the�missing�information�for�this�regression�model�(df�=�23)� Y′ = 25.1 + 1.2X1 + 1.0X2 − .50X3 (2�1) (1�5) (1�3) (�06) Standard�errors (11�9) (�) (�) (�) t�ratios (�) (�) (�) Significant�at��05? 18.4� �Consider�a�sample�of�elementary�school�children��Given�that�r(strength,�weight)�=��6,� r(strength,�age)�=��7,�and�r(weight,�age)�=��8,�what�is�the�first-order�partial�correlation� coefficient�between�strength�and�weight�holding�age�constant? 18.5� �For�a�sample�of�100�adults,�you�are�given�that�r12�=��55,�r13�=��80,�and�r23�=��70��What�is� the�value�of�r1(2�3)? 18.6� �A� researcher� would� like� to� predict� salary� from� a� set� of� four� predictor� variables� for� a� sample� of� 45� subjects�� Multiple� linear� regression� analysis� was� utilized�� Complete� the� following�summary�table�(α�=��05)�for�the�test�of�significance�of�the�overall�regression� model: Source SS df MS F Critical Value and Decision Regression — — 20 — Residual 400 — — Total — — 18.7� �Calculate�the�partial�correlation�r12�3�and�the�part�correlation�r1(2�3)�from�the�following� bivariate�correlations:�r12�=��5,�r13�=��8,�r23�=��9� 18.8� �Calculate�the�partial�correlation�r13�2�and�the�part�correlation�r1(3�2)�from�the�following� bivariate�correlations:�r12�=��21,�r13�=��40,�r23�=�−�38� 707Multiple Regression 18.9� �You�are�given�the�following�data,�where�X1�(verbal�aptitude)�and�X2�(prior�reading� achievement)�are�to�be�used�to�predict�Y�(reading�achievement): Y X1 X2 2 2 5 1 2 4 1 1 5 1 1 3 5 3 6 4 4 4 7 5 6 6 5 4 7 7 3 8 6 3 3 4 3 3 3 6 6 6 9 6 6 8 10 8 9 9 9 6 6 10 4 6 9 5 9 4 8 10 4 9 Determine�the�following�values:�intercept,�b1,�b2,�SSres,�SSreg,�F,�s2res,�s(b1),�s(b2),�t1,�t2� 18.10� �You� are� given� the� following� data,� where� X1� (years� of� teaching� experience)� and� X2� (salary�in�thousands)�are�to�be�used�to�predict�Y�(morale): Y X1 X2 125 1 24 130 2 30 145 3 32 115 2 28 170 6 40 180 7 38 165 5 48 150 4 42 195 9 56 180 10 52 120 2 33 190 8 50 170 7 49 175 9 53 160 6 49 Determine�the�following�values:�intercept,�b1,�b2,�SSres,�SSreg,�F,�s2res,�s(b1),�s(b2),�t1,�t2� 708 An Introduction to Statistical Concepts Interpretive problems 18.1� Use�SPSS�to�develop�a�multiple�regression�model�with�the�example�survey�1�dataset� on�the�website��Utilize�current�GPA�as�the�dependent�variable�and�find�at�least�two� strong�predictors�from�among�the�continuous�variables�in�the�dataset��Write�up�your� results,�including�interpretation�of�effect�size�and�testing�of�assumptions� 18.2� Use�SPSS�to�develop�a�multiple�regression�model�with�the�example�survey�1�dataset� on�the�website��Utilize�how�many�hours�of�television�watched�per�week�as�the�depen- dent� variable� and� find� at� least� two� strong� predictors� from� among� the� continuous� variables�in�the�dataset��Write�up�your�results,�including�interpretation�of�effect�size� and�testing�of�assumptions� 709 19 Logistic Regression Chapter Outline 19�1� How�Logistic�Regression�Works 19�2� Logistic�Regression�Equation � 19�2�1� Probability � 19�2�2� Odds�and�Logit�(or�Log�Odds) 19�3� Estimation�and�Model�Fit 19�4� Significance�Tests � 19�4�1� Test�of�Significance�of�Overall�Regression�Model � 19�4�2� Test�of�Significance�of�Logistic�Regression�Coefficients 19�5� Assumptions�and�Conditions � 19�5�1� Assumptions � 19�5�2� Conditions 19�6� Effect�Size 19�7� Methods�of�Predictor�Entry � 19�7�1� Simultaneous�Logistic�Regression � 19�7�2� Stepwise�Logistic�Regression � 19�7�3� Hierarchical�Regression 19�8� SPSS 19�9� G*Power 19�10� Template�and�APA-Style�Write-Up 19�11� What�Is�Next? Key Concepts � 1�� Logit � 2�� Odds � 3�� Odds�ratio In� the� past� two� chapters,� we� have� examined� ordinary� least� squares� (OLS)� regression— simple�and�multiple�regression�models—that�allow�us�to�examine�the�relationship�between� one�or�more�predictors�when�the�outcome�is�continuous��In�this�chapter,�we�are�introduced� to� logistic� regression,� which� can� be� used� when� the� outcome� is� categorical�� For� the� pur- poses�of�this�chapter,�we�will�concentrate�on�binary�logistic�regression�which�is�used�when� 710 An Introduction to Statistical Concepts the� outcome� has� only� two� categories� (i�e�,� dichotomous,� binary,� or� sometimes� referred� to� as�a�Bernoulli�outcome)��The�logistic�regression�procedure�appropriate�for�more�than�two� categories�is�called�multinomial�(or�polytomous)�logistic�regression��Readers�interested�in� learning�more�about�multinomial�logistic�regression�will�be�provided�some�additional�ref- erences�later�in�this�chapter��Also�in�this�chapter,�we�discuss�methods�that�can�be�used�to� enter� predictors� in� logistic� regression� models�� Our� objectives� are� that� by� the� end� of� this� chapter,�you�will�be�able�to�(a)�understand�the�concepts�underlying�logistic�regression,�(b)� determine�and�interpret�the�results�of�logistic�regression,�(c)�understand�and�evaluate�the� assumptions�of�logistic�regression,�and�(d)�have�a�basic�understanding�of�methods�of�enter- ing�the�covariates� 19.1 How Logistic Regression Works We�conclude�the�textbook�as�Marie�embarks�on�her�most�challenging�statistical�project� to�date� With�excitement,�Marie�is�finishing�up�her�graduate�program�in�educational�research� and�has�been�assigned�by�her�faculty�advisor�to�one�additional�consultation��Malani�is� a�faculty�member�in�the�early�childhood�department�and�has�collected�data�on�20�chil- dren� who� will� be� entering� kindergarten� in� the� fall�� Interested� in� kindergarten� readi- ness�issues,�Malani�wants�to�know�if�a�teacher�observation�scale�for�social�development� and� family� structure� (single� family� vs�� two-family� home)� can� predict� whether� chil- dren�are�prepared�or�unprepared�to�enter�kindergarten��Marie�suggests�the�following� research�question�to�Malani:�Can kindergarten readiness (prepared vs. unprepared) be pre- dicted by social development and family structure (single family vs. two-family home)?�Given� that�the�outcome�is�dichotomous,�Marie�determines�that�binary�logistic�regression�is� the� appropriate� statistical� procedure� to� use� to� answer� Malani’s� question�� Marie� then� proceeds�with�assisting�Malani�in�analyzing�the�data� If�the�dependent�variable�is�binary�(i�e�,�dichotomous�or�having�only�two�categories),�then� none� of� the� regression� methods� described� so� far� in� this� text� are� appropriate�� Although� simple� and� multiple� regression� can� easily� accommodate� dichotomous� independent� vari- ables�through�dummy�coding�(i�e�,�assignment�of�1�and�0�to�the�categories),�it�is�an�entirely� different� case� when� the� outcome� is� dichotomous�� Applying� OLS� regression� to� a� binary� outcome� creates� problems�� For� example,� a� dichotomous� outcome� violates� normality� and� homogeneity�assumptions�in�OLS�regression��In�addition,�OLS�estimates�are�based�on�lin- ear�relationships�between�the�independent�and�dependent�variables,�and�forcing�a�linear� relationship�(as�seen�in�Figure�19�1)�in�the�case�of�a�binary�outcome�is�erroneous�[although� we�found�at�least�one�author�(Hellevik,�2009)�who�argues�that�OLS�regression�can�be�used� with�dichotomous�outcomes]� As�part�of�the�regression�family,�logistic�regression�still�allows�a�prediction�to�be�made;� however,�now�the�prediction�is�whether�or�not�the�unit�under�investigation�falls�into�one� of�the�two�categories�of�the�dependent�variable��Initially�used�mostly�in�the�hard�sciences,� this�method�has�become�more�broadly�popular�in�recent�years�as�there�are�many�situations� where� researchers� want� to� examine� outcomes� that� are� discrete,� rather� than� continuous,� in� nature�� Some� examples� of� dichotomous� dependent� variables� are� pass/fail,� surviving� 711Logistic Regression surgery/not,� admit/reject,� vote� for/against,� employ/not,� win/lose,� or� purchase/not�� The� idea� of� using� a� dichotomous� variable� was� introduced� in� Chapter� 18� as� the� concept� of� a� dummy variable,�where�the�first�condition�is�indicated�by�a�value�of�1�(e�g�,�prepared�for� kindergarten),�whereas�a�value�of�0�indicates�the�opposite�condition�(e�g�,�unprepared� for�kindergarten)��For�the�purposes�of�this�text,�our�discussion�will�concentrate�on�dichoto- mous� outcomes� where� logistic� regression� is� appropriate� (i�e�,� binary� logistic� regression,� referred� to� throughout� this� chapter� simply� as� logistic� regression)�� Conditions� for� which� there�are�more�than�two�possible�categories�for�the�dependent�variable�(e�g�,�three�catego- ries,�such�as�remain�in�the�teaching�profession,�remain�in�teaching�but�change�schools,�or� leave�the�teaching�profession�entirely),�multinomial�logistic�regression�may�be�appropri- ate��An�example�of�the�data�structure�for�a�logistic�regression�model�with�a�binary�outcome� (prepared�vs��unprepared�for�kindergarten),�one�continuous�predictor�(social�development),� and� one� dichotomous� dummy� coded� predictor� (family� structure:� single-parent� vs�� two-parent�home)�is�presented�in�Table�19�1� 19.2 Logistic Regression Equation As�we�learned�previously�with�OLS�regression,�knowledge�of�the�independent�variable(s)� provides�the�information�necessary�to�be�able�to�estimate�a�precise�numerical�value�of�the� dependent�variable,�a�predicted�value��The�following�formula�recaps�the�sample�multiple� regression� equation� where� Y� is� the� predicted� outcome� for� individual� i� based� on� (a)� the� Y intercept,�a,�the�value�of�Y�when�all�predictor�values�are�0;�(b)�the�product�of�the�value� of�the�independent�variables,�Xs,�and�the�regression�coefficients,�bk;�and�(c)�the�residual,�εi: Y a b X b Xi m m i= + + + +1 1 ... ε Age (months) at kindergarten entry Re ad in g pr o� ci en t 50 .00 .20 .40 .60 .80 1.00 55 60 65 70 75 80 FIGuRe 19.1 Nonlinearity�of�binary�outcome� 712 An Introduction to Statistical Concepts As�we�see,�the�logistic�regression�equation�is�similar�in�concept�to�simple�and�multiple�lin- ear�regression,�but�operates�much�differently��In�logistic�regression,�the�binary�dependent� variable� is� transformed� into� a� logit� variable� (which� is� the� natural� log� of� the� odds� of� the� dependent� variable� occurring� or� not� occurring),� and� the� parameters� are� then� estimated� using�maximum�likelihood��The�end�result�is�that�the�odds�of�an�event�occurring�are�esti- mated�through�the�logistic�regression�model�(whereas�OLS�estimates�a�precise�numerical� value�of�the�dependent�variable)� To�understand�how�the�logistic�regression�equation�operates,�there�are�three�primary�com- putational�concepts�that�must�be�understood:�probability,�odds,�and�the�logit��These�express� the�same�thing,�only�in�different�ways�(Menard,�2000)��Let�us�first�consider�probability� 19.2.1 probability The� overarching� difference� between� OLS� regression� (i�e�,� simple� and� multiple� linear� regres- sion)� and� logistic� regression� is� the� measurement� scale� of� the� outcome�� With� OLS� regres- sion,�our�outcome�is�continuous�in�scale�(i�e�,�interval�or�ratio�measurement�scale)��In�binary� logistic�regression,�our�outcome�is�dichotomous—one�of�two�categories��Let�us�use�kinder- garten� readiness� (“prepared� for� kindergarten”� coded� as� “1”� vs�� unprepared� coded� as� “0”)� as�an�example�of�our�logistic�regression�outcome��Therefore,�what�the�regression�equation� allows� us� to� predict� is� substantially� different� for� OLS� as� compared� to� logistic� regression�� In� comparison� to� OLS,� which� allows� us� to� compute� a� precise� numerical� value� (e�g�,� a� spe- cific�predicted�score�for�the�dependent�variable),�the�logistic�regression�equation�allows�us� to�compute�a�probability—more�specifically,�the�probability�that�the�dependent�variable�will� Table 19.1 Kindergarten�Readiness�Example�Data Child Social Development (X1) Family Structure (X2) Kindergarten Readiness (Y ) 1 15 Single�family�(0) Unprepared�(0) 2 12 Single�family�(0) Unprepared�(0) 3 18 Single�family�(0) Prepared�(1) 4 20 Single�family�(0) Prepared�(1) 5 11 Single�family�(0) Unprepared�(0) 6 17 Single�family�(0) Prepared�(1) 7 14 Single�family�(0) Unprepared�(0) 8 18 Single�family�(0) Prepared�(1) 9 13 Single�family�(0) Unprepared�(0) 10 10 Single�family�(0) Unprepared�(0) 11 22 Two-parent�home�(1) Unprepared�(0) 12 25 Two-parent�home�(1) Prepared�(1) 13 23 Two-parent�home�(1) Prepared�(1) 14 21 Two-parent�home�(1) Prepared�(1) 15 30 Two-parent�home�(1) Prepared�(1) 16 27 Two-parent�home�(1) Prepared�(1) 17 26 Two-parent�home�(1) Prepared�(1) 18 28 Two-parent�home�(1) Prepared�(1) 19 24 Two-parent�home�(1) Unprepared�(0) 20 30 Two-parent�home�(1) Prepared�(1) 713Logistic Regression occur��The logistic�regression�equation,�therefore,�generates�predicted�probabilities�that�fall� between� values� of� 0� and� 1�� The� probability� of� a� case� or� unit� being� classified� into� the� low- est� numerical� category� [i�e�,� P(Y� =� 0),� or� in� the� case� of� our� example,� the� probability� that� a� child�will�be�unprepared�for�kindergarten]�is�equal�to�1�minus�the�probability�that�it�falls� within�the�highest�numerical�category�[i�e�,�P(Y�=�1),�or�the�probability�that�a�child�will�be� prepared�for�kindergarten]��This�equates�to�P(Y�=�0)�=�1�−�P(Y�=�1)��Applied�to�our�example,� the�probability�that�a�child�will�be�unprepared�for�kindergarten�is�equal�to�1�minus�the�prob- ability�that�a�child�will�be�prepared�for�kindergarten��In�other�words,�the�knowledge�of�the� probability�of�one�category�occurring�(e�g�,�unprepared�for�kindergarten)�allows�us�to�easily� determine�the�probability�that�the�other�category�will�occur�(e�g�,�prepared)�as�the�total�prob- ability�must�equal�1�0��Remember,�however,�that�probabilities�have�to�fall�within�the�range� of�0�to�1��As�we�know�from�Chapter�5,�it�is�not�possible�to�have�a�negative�probability,�nor�is� it�possible�to�have�a�probability�greater�than�1�(i�e�,�greater�than�100%)��If�we�try�to�model�the� probability�as�the�dependent�variable�in�our�OLS�equation,�it�is�mathematically�possible�that� the�predicted�values�would�be�negative�or�greater�than�1—values�that�are�outside�the�range� of�what�is�feasible�when�considering�probabilities��Therefore�this�is�where�our�logistic�regres- sion�equation�takes�a�turn�from�what�we�learned�with�linear�regression� 19.2.2 Odds and logit (or log Odds) So� far,� we� have� talked� about� the� outcome� of� our� logistic� regression� equation� as� being� a� probability,�and�we�also�know�that�predicted�probabilities�must�be�between�0�and�1��As�we� think�about�how�to�estimate�probabilities,�we�will�see�that�this�takes�a�few�steps�to�achieve�� Rather�than�the�dependent�variable�being�a�probability,�if�it�were�an�odds value,�then�values� greater�than�1�would�be�possible�and�appropriate��Odds�are�simply�the�ratio�of�the�prob- ability�of�the�dependent�variable’s�two�outcomes��The�odds�that�the�outcome�of�a�binary� variable�is�1�(i�e�,�public�school�attendance)�rather�than�0�(or�private�school�attendance)�is� simply�the�ratio�of�the�odds�that�Y�is�equal�1�to�the�odds�that�Y�does�not�equal�1��In�math- ematical�terms,�this�can�be�written�as�follows: Odds Y P Y P Y ( ) ( ) ( ) = = = − = 1 1 1 1 As�we�see�in�Table�19�2,�when�the�probability�that�Y�=�1�(e�g�,�prepared�for�kindergarten)� equals��50�(column�1�in�Table�19�2),�then�1�−�P(Y = 1)�(or�unprepared�for�kindergarten)� is��50�(column�2)�and�the�odds�are�equal�to�1�00�(column�3)��When�the�probability�of�Y�=�1� Table 19.2 Illustration�of�Logged�Odds P(Y = 1) 1 − P(Y = 1) Odds Y P Y P Y ( ) ( 1) 1 ( 1) = = = − = 1 ln[ ( 1)] ln ( 1) 1 ( 1) Odds Y P Y P Y = = = − =     �001 �999 �001/�999�=��001 ln(�001)�=�−6�908 �100 �900 �100/�900�=��111 ln(�111)�=�−2�198 �300 �700 �300/�700�=��429 ln(�429)�=�−�846 �500 �500 �500/�500�=�1�000 ln(1�000)�=��000 �700 �300 �700/�300�=�2�333 ln(2�333)�=��847 �900 �100 �900/�100�=�9�000 ln(9�000)�=�2�197 �999 �001 �999/�001�=�999�000 ln(999)�=�6�907 714 An Introduction to Statistical Concepts (e�g�,�prepared)�is�very�small�(say,��100�or�less),�then�the�odds�for�being�prepared�for�kindergar- ten�are�also�very�small�and�approach�0�the�smaller�the�probability�that�Y�=�1�(i�e�,�the�smaller� the�probability�that�a�child�is�prepared�for�kindergarten)��However,�as�the�probability�of�Y�=�1� (e�g�,�being�prepared�for�kindergarten)�increases,�the�odds�(column�3)�increase�tremendously�� Thus,�the�issue�that�we�are�faced�with�when�using�odds�is�that�while�odds�can�be�infinitely� large,�we�are�still�limited�in�that�the�minimum�value�is�0�and�we�still�do�not�have�data�that�can� be�modeled�linearly� Changing�the�scale�of�the�odds�by�taking�the�natural�logarithm�of�the�odds�(also�called� logit Y�or�log odds)�provides�us�with�a�value�of�the�dependent�variable�that�can�theoretically� range�from�negative�infinity�to�positive�infinity��Thus,�taking�the�log�odds�of�Y�creates�a� linear�relationship�between�X�and�the�probability�of�Y�(Pampel,�2000)��The�natural�log�of� the�odds�is�calculated�as�follows�with�the�residual�being�the�difference�between�the�pre- dicted�probability�and�the�actual�value�of�the�dependent�variable�(0�or�1): ln ( ) ( ) ( ) P Y P Y Logit Y = − = = 1 1 1 In�column�4�of�Table�19�2,�we�see�what�happens�when�the�logit�transformation�is�made��As� the�odds�increase�from�1�to�positive�infinity,�the�logit�(or�log�odds)�of�Y�becomes�larger�and� larger�(and�remains�positive)��As�the�odds�decrease�from�1�to�0,�the�logit�(or�log�odds)�of� Y is�negative�and�grows�larger�and�larger�(in�absolute�value)� The� logit� of� Y� equation� is� interpreted� very� similarly� to� that� of� OLS�� For� each� one-unit� change�in�the�independent�variable,�the�logistic�regression�coefficients�represent�the�change� in� the� predicted� log� odds� of� being� in� a� category�� In� comparison� to� OLS� regression,� the� regression�coefficients�have�the�exact�same�interpretation��The�difference�in�interpretation� with�logistic�regression�is�that�the�outcome�now�represents�a�log odds�rather�than�a�precise� numerical�value�as�we�saw�with�OLS�regression��Linking�the�logit�back�to�probabilities,�a� one-unit�change�in�the�logit�equals�a�bigger�change�in�probabilities�that�are�near�the�center� as� compared� to� the� extreme� values�� This� happens� because� of� the� linearization� once� we� take�the�natural�log��Taking�the�natural�log�stretches�the�S-shaped�curve�into�a�linear�form;� thus,�the�values�at�the�extreme�are�stretched�less,�so�to�speak,�as�compared�to�the�values� in�the�middle�(Pampel,�2000)��By�working�with�log�odds,�our�familiar�additive�regression� equation�is�applicable: ln ( ) ( ) ( ) ... P Y P Y Logit Y X X Xm m = − = = = + + + + 1 1 1 1 1 2 2α β β β It� is� important� to� note� that� although� we� were� accustomed� to� examining� standardized� regression�coefficients�in�OLS�regression,�it�is�not�the�norm�that�standardized�coefficients� are� computed� for� logistic� regression� models� by� statistical� software�� Standardization� is� ordinarily� accomplished� by� taking� the� product� of� the� unstandardized� regression� coef- ficient� and� the� ratio� of� the� standard� deviation� of� X� to� the� standard� deviation� of� Y�� The� interpretation�of�a�standard�deviation�change�in�a�continuous�variable�thus�makes�sense;� however,�this�is�not�the�case�for�a�dichotomous�variable,�nor�is�it�the�case�for�the�log�odds� (which�is�the�predicted�outcome�and�which�does�not�have�a�standard�deviation)� While� interpretation� of� the� logistic� equation� is� relatively� straightforward� as� it� holds� many�similarities�to�OLS�regression,�log�odds�are�not�a�metric�that�we�use�often��Therefore� understanding�what�it�means�when�a�predictor,�X,�has�some�effect�on�the�log�odds,�Y,�can� be�difficult��This�is�where�odds�come�back�into�the�picture� 715Logistic Regression If�we�exponentiate�the�logit�(Y)�(i�e�,�the�outcome�of�our�logistic�regression�equation),�then� it�converts�back�to�the�odds�(see�the�following�equation)��Now�we�can�interpret�the�inde- pendent�variables�as�affecting�the�odds�(rather�than�log�odds)�of�the�outcome: Odds Y e e eY Odds Y X X Xm m( ) (log ( ) ln[ ( )] ...= = = = == + + + +1 1 1 1 2 2it α β β β ee e e eX X Xm mα β β β)( )( )...( )1 1 2 2 As� can� be� seen� here,� the� exponentiation� creates� an� equation� that� is� multiplicative� rather� than�additive,�and�this�then�changes�the�interpretation�of�the�exponentiated�coefficients��In� previous�regression�equations�we�have�studied,�when�the�product�of�the�regression�coef- ficient�and�its�predictor�is�0,�that�variable�adds�nothing�to�the�prediction�of�the�dependent� variable�� In� a� multiplicative� environment,� a� value� of� 0� corresponds� to� a� coefficient� of� 1�� In�other�words,�a�coefficient�of�1�will�not�change�the�value�of�the�odds�(i�e�,�the�outcome)�� Coefficients� greater� than� 1� increase� the� odds,� and� coefficients� less� than� 1� decrease� the� odds��In�addition,�the�odds�will�change�more�the�greater�the�distance�the�value�is�from�1� Converting�the�odds�back�to�a�probability�can�be�done�through�the�following�formula: P Y Odds Y Odds Y e e X X X X m m ( ) ( ) ( ) ... = = = + = = + + + + + +1 1 1 1 1 1 1 2 2 1 1 α β β β α β ++ + +β β2 2X Xm m... Probability�values�close�to�1�indicate�increased�likelihood�of�occurrence��In�our�example,� since�“1”�indicates�public�school�attendance,�a�probability�close�to�1�would�indicate�a�child� was� more� likely� to�attend�public� school��Children� with� probabilities� close� to�0�suggest�a� decreased�probability�of�attending�public�school�(and�increased�probability�of�attending� private�school)� 19.3 Estimation and Model Fit Now�that�we�understand�the�logistic�regression�process�and�resulting�equations�a�bit�better,� it�is�time�to�turn�our�attention�to�how�the�equation�is�estimated�and�how�we�can�determine� how� well� the� model� fits�� We� previously� learned� with� simple� and� multiple� regression� that� the�data�from�the�observed�values�of�the�independent�variables�in�the�sample�were�used�to� estimate�or�predict�the�values�of�the�dependent�variable��In�logistic�regression,�we�are�also� using�the�knowledge�of�the�values�of�our�predictor(s)�to�estimate�the�outcome�(i�e�,�log�odds)�� Now�we�are�using�a�method�called�maximum�likelihood�estimation�to�estimate�the�values� of�the�parameters�(i�e�,�the�logistic�coefficients)��As�we�just�learned,�the�dependent�variable�in� a�logistic�regression�model�is�transformed�into�a�logit�value,�which�is�the�natural�log�of�the� odds�of�the�dependent�variable�occurring�or�not�occurring��Maximum�likelihood�estimation� is�then�applied�to�the�model�and�estimates�the�odds�of�occurrence�after�transformation�into� the�logit��Very�simply,�maximum�likelihood�estimates�the�parameters�most�likely�to�occur� given�the�patterns�in�the�sample�data��Whereas�in�OLS�the�sum�of�squared�distance�of�the� observed�data�to�the�regression�line�was�minimized,�in�maximum�likelihood�the�log�likeli- hood�is�maximized� The� log� of� the� likelihood� function� (sometimes� abbreviated� as� LL)� that� results� from� ML�estimation�then�reflects�the�likelihood�of�observing�the�sample�statistics�given�the� 716 An Introduction to Statistical Concepts population� parameters�� The� log� likelihood� provides� an� index� of� how� much� has� not� been� explained� in� the� model� after� the� parameters� have� been� estimated,� and� as� such,� the�LL�can�be�used�as�an�indicator�of�model�fit��The�values�of�the�log�likelihood�func- tion� vary� from� 0� to� negative� infinity,� with� values� closer� to� 0� suggesting� better� model� fit�and�larger�values�(in�absolute�value�terms)�indicating�poorer�fit��The�log�likelihood� value�will�approach�0�the�closer�the�likelihood�value�is�to�1��When�this�happens,�this� suggests� the� observed� data� could� be� generated� from� these� population� parameters�� In� other�words,�the�smaller�the�log�likelihood,�the�better�the�model�fit��It�follows�therefore,� that� the� log� likelihood� value� will� grow� more� negative� the� closer� the� likelihood� func- tion� is� to� 0�� This� suggests� that� the� observed� data� are� less� likely� to� be� generated� from� these�population�parameters� Maximum�likelihood�estimation�performed�by�statistical�software�usually�begins�the� estimation�process�with�all�regression�coefficients�equal�to�the�most�conservative�esti- mate�(i�e�,�the�least�squares�estimates)��Better�model�fit�is�accomplished�through�the�use� of�an�algorithm�which�generates�new�sets�of�regression�coefficients�that�produce�larger� log�likelihoods��This�is�an�iterative�process�that�stops�when�the�selection�of�new�param- eters�creates�very�little�change�in�the�regression�coefficients�and�very�small�increases�in� the�log�likelihood—so�small�that�there�is�little�value�in�any�further�estimation� 19.4 Significance Tests As� with� multiple� regression,� there� are� two� tests� of� significance� in� logistic� regression�� Specifically,�these�involve�testing�the�significance�of�the�overall�logistic�regression�model� and�testing�the�significance�of�each�of�the�logistic�regression�coefficients� 19.4.1 Test of Significance of Overall Regression Model The� first� test� is� the� test� of� statistical� significance� to� determine� overall� model� fit� and� provides� evidence� of� the� extent� to� which� the� predicted� values� accurately� represent� the� observed� values� (Xie,� Pendergast,� &� Clarke,� 2008)�� We� consider� several� overall� model� tests�including�(a)�change�in�log�likelihood,�(b)�Hosmer–Lemeshow�goodness-of-fit�test,� (c)� pseudovariance� explained,� and� (d)� predicted� group� membership�� Additional� work� (e�g�,� Xie� et� al�,� 2008)� has� recently� been� conducted� on� new� methods� to� assess� model� fit,� but�these�are�not�currently�available�in�statistical�software�nor�easily�computed��Also�in� this�section,�we�briefly�address�sensitivity,�specificity,�false�positive,�false�negative,�and� cross�validation� 19.4.1.1 Change in Log Likelihood One�way�to�test�overall�model�fit�is�the�likelihood�ratio�test��This�test�is�based�on�the�change� in�the�log�likelihood�function�from�a�smaller�model�(often�the�baseline�or�intercept�only� model)�to�a�larger�model�that�includes�one�or�more�predictors�(sometimes�referred�to�as� the�fitted�model)��Although�we�indicate�that�the�smaller�model�is�often�the�intercept�only� model,�this�test�can�also�be�used�to�examine�changes�in�model�fit�from�one�fitted�model�to� another�fitted�model�and�we�will�discuss�this�in�a�bit��This�likelihood�ratio�test�is�similar� 717Logistic Regression to�the�overall�F�test�in�OLS�regression�and�tests�the�null�hypothesis�that�all�the�regression� coefficients�are�equal�to�0��Using�statistical�notation,�we�can�denote�the�null�and�alternative� hypotheses�for�the�regression�coefficients�as�follows: H H H is false m0 1 2 1 0 0: ... : β β β= = = = For� explanation� purposes,� we� assume� the� smaller� model� is� the� baseline� or� intercept� only� model�� The� baseline� log� likelihood� is� estimated� from� a� logistic� regression� model� that� includes� only� the� constant� (i�e�,� intercept)� term�� The� model� log� likelihood� is� esti- mated� from� the� logistic� regression� model� that� includes� the� constant� and� the� relevant� predictor(s)��By�multiplying�the�difference�in�these�log�likelihood�functions�by�−2,�a�chi- square�test�is�produced�with�degrees�of�freedom�equal�to�the�difference�in�the�degrees�of� freedom�of�the�models�(df�=�dfmodel�−�df baseline)�(where�“model”�refers�to�the�fitted�model�that� includes�one�or�more�predictors)��In�the�case�of�the�constant�only�model,�there�is�only�one� parameter�estimated�(i�e�,�the�intercept),�so�there�is�only�one�degree�of�freedom��In�mod- els�that�include�independent�variables,�the�degrees�of�freedom�are�equal�to�the�number� of� independent� variables� in� the� model� plus� one� for� the� constant�� The� larger� the� differ- ence�between�the�baseline�and�model�LL�values,�the�better�the�model�fit��It�is�important� to�note�that�the�log�likelihood�difference�test�assumes�nested�models��In�other�words,�all� elements�that�are�included�in�the�baseline�or�smallest�model�must�also�be�included�in�the� fitted�model��As�alluded�to�previously,�the�change�in�log�likelihood�test�can�be�used�for� more�than�just�comparing�the�intercept�only�model�to�a�fitted�model��Researchers�often� use�this�test�in�the�model�building�process�to�determine�if�adding�predictors�(or�sets�of� predictors)�aids�in�model�fit�by�comparing�one�fitted�model�to�another�fitted�model��In� general,�the�change�in�log�likelihood�is�computed�as�follows: χ2 2= − −( )LL LLbaselinemodel 19.4.1.2 Hosmer–Lemeshow Goodness-of-Fit Test The�Hosmer–Lemeshow�goodness-of-fit�test�is�another�tool�that�can�be�used�to�examine� overall� model� fit�� The� Hosmer–Lemeshow� statistic� is� computed� by� dividing� cases� into� deciles� (i�e�,� 10� groups)� based� on� their� predicted� probabilities�� Then� a� chi-square� value� is�computed�based�on�the�observed�and�expected�frequencies��This�is�a�chi-square�test� for� which� the� researcher� does� not� want� to� find� statistical� significance�� Nonstatistically� significant�results�for�the�Hosmer–Lemeshow�test�indicate�the�model�has�acceptable�fit�� In� other� words,� the� predicted� or� estimated� model� is� not� statistically� significantly� dif- ferent� from� the� observed� values�� Although� the� Hosmer–Lemeshow� test� can� easily� be� requested� in� SPSS,� it� has� been� criticized� for� being� conservative� (i�e�,� lacking� sufficient� power�to�detect�lack�of�fit�in�instances�such�as�nonlinearity�of�an�independent�variable),� too� likely� to� indicate� model� fit� when� five� or� fewer� groups� (based� on� the� decile� groups� created�in�computing�the�statistic)�are�used�to�calculate�the�statistic,�and�offers�little�diag- nostics�to�assist�the�researcher�when�the�test�indicates�poor�model�fit�(Hosmer,�Hosmer,� LeCessie, &�Lemeshow,�1997)� 718 An Introduction to Statistical Concepts 19.4.1.3 Pseudovariance Explained Another�overall�model�fit�index�for�logistic�regression�is�pseudovariance�explained��This� index�is�akin�to�multiple�R2�(or�the�coefficient�of�determination)�in�OLS�regression,�and�can� also�be�considered�an�effect�size�measure�for�the�model��The�reason�these�values�are�con- sidered�pseudovariance�explained�in�logistic�regression�is�that�the�variance�in�a�dichoto- mous�outcome,�as�evident�in�logistic�regression,�differs�as�compared�to�the�variance�of�a� continuous�outcome,�as�present�in�OLS�regression� There�are�a�number�of�multiple�R2�pseudovariance�explained�values�that�can�be�computed� in�logistic�regression��We�discuss�the�following:�(a)�Cox�and�Snell�(1989),�(b)�Nagelkerke�(1991),� (c)�Hosmer�and�Lemeshow�(1989),�(d)�Aldrich�and�Nelson�(1984),�(e)�Harrell�(1986),�and�(f)�tra- ditional�R2��Of�these,�SPSS�automatically�computes�the�Cox�and�Snell�and�Nagelkerke�indi- ces��There�is,�however,�no�consensus�on�which�(if�any)�of�the�pseudovariance�explained� indices�are�best,�and�many�researchers�choose�not�to�report�any�of�them�in�their�published� results��If�you�do�choose�to�use�and/or�report�one�or�more�of�these�values,�they�should�be� used� only� as� a� guide� “without� attributing� great� importance� to� a� precise� figure”� (Pampel,� 2000,�p��50)� The�Cox�and�Snell�R2�(1989)�is�computed�as�the�ratio�of�the�likelihood�values�raised�to� the�power�of�2/n�(where�n�is�sample�size)��A�problem�is�that�the�computation�is�such�that� the�theoretical�maximum�of�1�cannot�be�obtained,�even�when�there�is�perfect�prediction: R LL LL CS baseline el n 2 2 1= −    mod / Nagelkerke�(1991)�adjusts�the�Cox�and�Snell�value�so�that�the�maximum�value�of�1�can�be� achieved,�and�it�is�computed�as�follows: R R LL N CS baseline n 2 2 21 = − ( ) / Hosmer�and�Lemeshow’s�(1989)�R2�is�the�proportional�reduction�in�the�log�likelihood�(in� absolute� value� terms)�� Although� not� provided� by� SPSS,� it� can� easily� be� computed� by� the� ratio�of�the�model�to�baseline�−2LL��Ranging�from�0�to�1,�this�value�provides�an�indication� of�how�much�the�badness�of�fit�of�the�baseline�model�is�improved�by�the�inclusion�of�the� predictors�in�the�fitted�model��Hosmer�and�Lemeshow’s�(1989)�R2�is�computed�as R LL LL L model baseline 2 2 2 = − − Harrell� (1986)� proposed� that� Hosmer� and� Lemeshow’s� R2� be� adjusted� for� the� number� of� parameters�(i�e�,�independent�variables)�in�the�model��This�adjustment�(where�m�equals�the� number� of� independent� variables� in� the� model)� to� the� computation� makes� this� R2� value� akin�to�the�adjusted�R2�in�OLS�regression��It�is�computed�as R LL m LL LA model baseline 2 2 2 2 = − − − ( ) 719Logistic Regression Aldrich�and�Nelson�(1984)�provided�an�alternative�to�the�RL 2�that�is�equivalent�to�the�squared� contingency�coefficient��This�measure�has�the�same�problem�as�the�Cox�and�Snell�R2;�the� theoretical�maximum�of�1�cannot�be�obtained�even�when�the�independent�variable(s)�per- fectly�predict�the�outcome��It�is�computed�as pseudoR LL LL n model model 2 2 2 = − − + The�traditional�R2,�the�coefficient�of�determination�as�used�in�simple�and�multiple�regres- sion,� can� also� be� used� in� logistic� regression� (only� with� binary� logistic� regression,� as� the� mean�and�variance�of�a�dichotomous�variable�make�sense;�however�the�mean,�for�example,� in�a�dummy�coded�variable�situation,�is�equal�to�the�proportion�of�cases�in�the�category� labeled�as�1)��R2�can�be�computed�by�correlating�the�observed�values�of�the�binary�depen- dent� variable� with� the� predicted� values� (i�e�,� predicted� probabilities)� obtained� from� the� logistic� regression� model� and� then� squaring� the� correlated� value�� Predicted� probability� values�can�easily�be�saved�when�generating�logistic�regression�models�in�SPSS� 19.4.1.4 Predicted Group Membership Another� test� of� model� fit� for� logistic� regression� can� be� accomplished� by� evaluating� predicted� to� observed� group� membership�� Assuming� a� cut� value� of� �50,� cases� with� predicted� probabilities� at� �5� or� above� are� predicted� as� 1� and� predicted� probabilities� below �5�are�predicted�as�0��A�crosstab�table�of�predicted�to�observed�predicted�prob- abilities� provides� the� frequency� and� percentage� of� cases� correctly� classified�� Correct� classification� would� be� seen� in� cases� that� have� the� same� value� for� both� the� predicted� and� observed� values�� A� perfect� model� produces� 100%� correctly� classified� cases�� A� model� that� classifies� no� better� than� chance� would� provide� 50%� correctly� classified� cases��Press’s�Q�is�a�chi-square�statistic�with�one�degree�of�freedom�and�can�be�used�as� a�formal�test�of�classification�accuracy��It�is�computed�as Q N nK N K = −[ ] − ( ) ( ) 2 1 where N�is�the�total�sample�size n�represents�the�number�of�cases�that�were�correctly�classified K�equals�the�number�of�groups As�with�other�chi-square�statistics�we�have�examined,�this�test�is�sensitive�to�sample�size�� Also,�it�is�important�to�note�that�focusing�solely�on�the�correct�classification�overall�(as�is� done�with�Press’s�Q)�may�result�in�overlooking�one�or�more�groups�that�have�unacceptable� classification��The�researcher�should�evaluate�the�classification�of�each�group�in�addition� to�the�overall�classification� Sensitivity� is� the� probability� that� a� case� coded� as� 1� for� the� dependent� variable� (a�k�a�� “positive”)� is� classified� correctly�� In� other� words,� sensitivity� is� the� percentage� of� correct� predictions�of�the�cases�that�are�coded�as�1�for�the�dependent�variable��In�the�kindergarten� readiness�example�that�we�will�review�later,�of�those�12�children�who�were�prepared�for� 720 An Introduction to Statistical Concepts kindergarten�(i�e�,�coded�as�1�for�the�dependent�variable),�11�were�correctly�classified��Thus,� the�sensitivity�is�11/12�or�about�92%� Specificity� is� the� probability� that� a� case� coded� as� 0� for� the� dependent� variable� (a�k�a�� “negative”)� is� classified� correctly�� In� other� words,� specificity� is� the� percentage� of� correct� predictions�of�the�cases�that�are�coded�as�0�for�the�dependent�variable��In�the�kindergarten� readiness�example�that�we�will�review�later,�of�those�8�children�who�were�unprepared�for� kindergarten�(i�e�,�coded�as�0�for�the�dependent�variable),�7�were�correctly�classified��Thus,� the�specificity�is�7/8,�or�87�5%� False positive rate�is�the�probability�that�a�case�coded�as�0�for�the�dependent�variable� (a�k�a�� “negative”)� is� classified� incorrectly�� In� other� words,� this� is� the� percentage� of� cases� in�error�where�the�dependent�variable�is�predicted�to�be�1�(i�e�,�prepared),�but�in�fact�the� observed�value�is�0�(i�e�,�unprepared)��In�the�kindergarten�readiness�example�that�we�will� review�later,�of�those�8�children�who�were�unprepared�for�kindergarten�(i�e�,�coded�as�0�for� the�dependent�variable),�1�was�incorrectly�classified��Thus,�the�false�positive�rate�is�1/8,�or� 12�5%��The�false�positive�rate�is�also�computed�as�1�minus�specificity� False negative rate�is�the�probability�that�a�case�coded�as�1�for�the�dependent�variable� (a�k�a��“positive”)�is�classified�incorrectly��In�other�words,�this�is�the�percentage�of�cases�in� error�where�the�dependent�variable�is�predicted�to�be�0�(i�e�,�unprepared),�but�in�fact�the� observed� value� is� 1� (i�e�,� prepared)�� In� the� kindergarten� readiness� example� that� we� will� review�later,�of�those�12�children�who�were�prepared�for�kindergarten�(i�e�,�coded�as�1�for� the�dependent�variable),�1�was�incorrectly�classified��Thus,�the�false�negative�rate�is�1/12,�or� about�8%��The�false�negative�rate�is�also�computed�as�1�minus�sensitivity� 19.4.1.5 Cross Validation A�recommended�best�practice�in�logistic�regression�is�to�cross�validate�the�results��If�the� sample� size� is� sufficient,� this� can� be� accomplished� by� using� 75%–80%� of� the� sample� to� derive�the�model�and�then�use�the�remaining�cases�(the�holdout�sample)�to�determine�its� accuracy��With�cross�validation,�you�are�in�essence�testing�the�model�on�two�samples— a� primary� sample� (which� represents� the� largest� percentage� of� the� sample� size)� and� a� holdout�sample�(that�which�remains)��If�classification�accuracy�of�the�holdout�sample�is� within� 10%� of� the� primary� sample,� this� provides� evidence� of� the� utility� of� the� logistic� regression�model� 19.4.2 Test of Significance of logistic Regression Coefficients The�second�test�in�logistic�regression�is�the�test�of�the�statistical�significance�of�each�regres- sion�coefficient,�bk��This�test�allows�us�to�determine�if�the�individual�coefficients�are�statisti- cally�significantly�different�from�0��The�null�and�alternative�hypotheses�can�be�illustrated� in�the�same�mathematical�notation�as�we�used�with�OLS�regression: H H k k 0 1 0 0 : : β β = ≠ Interpreting�the�test�provides�evidence�of�the�probability�of�obtaining�the�observed�sample� coefficient�by�chance�if�the�null�hypothesis�was�true�(i�e�,�if�the�population�regression�coef- ficient� value� was� 0)�� The� Wald� statistic,� which� follows� a� chi-square� distribution,� is� used� 721Logistic Regression as� the� test� statistic� for� regression� coefficients� in� SPSS�� For� continuous� predictors,� this� is� calculated�by�squaring�the�ratio�of�the�regression�coefficient�divided�by�its�standard�error: W SE k k = β β 2 2 When�the�logistic�regression�coefficients�are�large�(in�absolute�value),�rounding�error�can� create�imprecision�in�estimation�of�the�standard�errors��This�can�result�in�inaccuracies�in� testing�the�null�hypothesis,�and�more�specifically,�increased�Type�II�errors�(i�e�,�failing�to� reject� the� null� hypothesis� when� the� null� hypothesis� is� false)�� An� alternative� to� the� Wald� test,�in�situations�such�as�this,�is�the�difference�in�log�likelihood�test�previously�described� to�compare�models�with�and�without�the�variable�of�interest�(Pampel,�2000)� Raferty�(1995)�proposed�a�Bayesian�information�criterion�(BIC),�computed�as�the�differ- ence�between�the�chi-square�value�and�the�natural�log�of�the�sample�size,�that�could�also� be�applied�to�testing�logistic�regression�coefficients: BIC n= −χ2 ln To�reject�the�null�hypothesis,�the�BIC�should�be�positive�(i�e�,�greater�than�0)��That�is,�the�chi- square�value�must�be�greater�than�the�natural�log�of�the�sample�size��BIC�values�below�0�suggest� that�the�variable�contributes�little�to�the�model��BIC�values�between�0�and�+2�are�considered� weak;�between�2�and�6,�positive;�between�6�and�10,�strong;�and�more�than�10,�very�strong� Beyond� determining� statistical� significance� of� the� individual� predictors,� you� may� also� want�to�assess�which�predictors�are�adding�the�most�to�the�model��In�OLS�regression,�we� examined�the�standardized�regression�coefficients��There�are�no�traditional�standardized� regression�coefficients�provided�in�SPSS�for�logistic�regression,�but�they�are�easy�to�calcu- late�� Simply� standardize� the� predictors� before� generating� the� logistic� regression� model,� and�then�run�the�model�as�desired��You�can�then�interpret�the�logistic�regression�coeffi- cients�as�standardized�regression�coefficients�(if�necessary,�review�Chapter�18)� We�can�also�form�a�confidence�interval�(CI)�around�the�logistic�regression�coefficient,�bk�� The�CI�formula�is�the�same�as�in�OLS�regression:�the�logistic�regression�coefficient�plus�or� minus�the�product�of�the�tabled�critical�value�and�the�standard�error: CI b b t sk k n m b( ) ( / ) ( )= ± − −α 2 1 The�null�hypothesis�that�we�tested�was�H0:�βk�=�0��It�follows�that�if�our�CI�contains�0,�then� the�logistic�regression�coefficient�(bk)�is�not�statistically�significantly�different�from�0�at�the� specified�significance�level��We�can�interpret�this�to�say�that�βk�will�be�included�in�(1�−�α)%� of�the�sample�CIs�formed�from�multiple�samples� 19.5 Assumptions and Conditions Compared�to�OLS�regression,�the�assumptions�of�logistic�regression�are�somewhat�relaxed;� however�four�primary�assumptions�must�still�be�considered:�(a)�noncollinearity,�(b)�linear- ity,�(c)�independence�of�errors,�and�(d)�values�of�X�are�fixed��In�this�section,�we�also�discuss� 722 An Introduction to Statistical Concepts conditions� that� are� needed� in� logistic� regression� as� well� as� diagnostics� that� can� be� per- formed�to�more�closely�examine�the�data� 19.5.1 assumptions 19.5.1.1 Noncollinearity Noncollinearity�is�applicable�to�logistic�regression�models�with�multiple�predictors�just� as� it� was� in� multiple� regression� (but� is� not� applicable� when� there� is� only� one� predic- tor� in� any� regression� model)�� This� assumption� has� already� been� explained� in� detail� in� Chapter�18�and�thus�will�not�be�reiterated�other�than�to�explain�tools�that�can�be�used� to�detect�multicollinearity��Although�SPSS�does�not�provide�an�option�to�easily�generate� collinearity� statistics� in� logistic� regression,� you� can� generate� an� OLS� regression� model� (i�e�,�a�traditional�multiple�linear�regression)�with�the�same�variables�used�in�the�logistic� regression� model� and� request� collinearity� statistics� there�� Because� it� is� only� the� collin- earity�statistics�that�are�of�interest,�do�not�be�concerned�in�generating�an�OLS�regression� model�that�violates�some�of�OLS�basic�assumptions�(e�g�,�normality)��We�have�previously� discussed�tolerance�and�the�variance�inflation�factor�(VIF)�as�two�collinearity�diagnos- tics�(where�tolerance�is�computed�as�1 2− Rk ,�where�Rk 2 �is�the�variance�in�each�independent� variable,�X,�explained�by�the�other�independent�variables,�and�VIF�is� 1 1 2− Rk ��In�reviewing� these�statistics,�tolerance�values�less�than��20�suggest�multicollinearity�exists,�and�values� less�than��10�suggest�serious�multicollinearity��VIF�values�greater�than�10�indicate�a�viola- tion�of�noncollinearity� The� effects� of� a� violation� of� noncollinearity� in� logistic� regression� are� the� same� as� that� in�Chapter�18��First,�it�will�lead�to�instability�of�the�regression�coefficients�across�samples,� where�the�estimates�will�bounce�around�quite�a�bit�in�terms�of�magnitude,�and�even�occa- sionally�result�in�changes�in�sign�(perhaps�opposite�of�expectation)��This�occurs�because� the�standard�errors�of�the�regression�coefficients�become�larger,�thus�making�it�more�dif- ficult�to�achieve�statistical�significance��Another�result�that�may�occur�involves�an�overall� regression�that�is�significant,�but�none�of�the�individual�predictors�are�significant��Violation� will�also�restrict�the�utility�and�generalizability�of�the�estimated�regression�model� 19.5.1.2 Linearity In�OLS�regression,�the�dependent�variable�is�assumed�to�have�a�linear�relationship�with� the� continuous� independent� variable(s),� but� this� does� not� hold� in� logistic� regression�� Because�the�outcome�in�logistic�regression�is�a�logit,�the�assumption�of�linearity�in�logis- tic� regression� refers� to� linearity� between� logit of the dependent variable� and� the� continu- ous� independent� variable(s)�� Hosmer� and� Lemeshow� (1989)� suggest� several� strategies� for�detecting�nonlinearity,�the�easiest�of�which�to�apply�is�likely�the�Box–Tidwell�trans- formation��This�strategy�is�also�valuable�as�it�is�not�overly�sensitive�to�minor�violations� of�linearity��This�involves�generating�a�logistic�regression�model�that�includes�all�inde- pendent� variables� of� interest� along� with� an� interaction� term� for� each—the� interaction� term�being�the�product�of�the�continuous�independent�variable�and�its�natural�log�[i�e�,� X*ln(X)]��Statistically�significant�interaction�terms�suggest�nonlinearity��It�is�important�to� note�that�the�assumption�of�linearity�is�applicable�only�for�continuous�predictors��A�viola- tion�of�linearity�can�result�in�biased�parameter�estimates,�as�well�as�the�expected�change� in�the�logit�of�Y�not�being�constant�across�the�values�of�X� 723Logistic Regression 19.5.1.3 Independence of Errors Independence�of�errors�is�applicable�to�logistic�regression�models�just�as�it�was�with�OLS� regression,�and�a�violation�of�this�assumption�can�result�in�underestimated�standard�errors� (and� thus� overestimated� test� statistic� values� and� perhaps� finding� statistical� significance� more�often�than�is�really�viable,�as�well�as�affecting�CIs)��This�assumption�has�already�been� explained�in�detail�during�the�discussion�of�assumptions�in�Chapters�17�and�18,�and,�thus,� additional�information�will�not�be�provided�here� 19.5.1.4 Fixed X The�last�assumption�is�that�the�values�of�Xk�are�fixed,�where�the�independent�variables�Xk� are�fixed�variables�rather�than�random�variables��Because�this�assumption�was�discussed� in�detail�in�Chapters�17�and�18,�we�only�summarize�the�main�points��When�X�is�fixed,�the� regression�model�is�only�valid�for�those�particular�values�of�Xk�that�were�actually�observed� and� used� in� the� analysis�� Thus,� the� same� values� of� Xk� would� be� used� in� replications� or� repeated�samples��As�discussed�in�the�previous�two�chapters,�generally�we�may�not�want� to� make� predictions� about� individuals� having� combinations� of� Xk� scores� outside� of� the� range� of� values� used� in� developing� the� prediction� model;� this� is� defined� as� extrapolating� beyond�the�sample�predictor�data��On�the�other�hand,�we�may�not�be�quite�as�concerned�in� making�predictions�about�individuals�having�combinations�of�Xk�scores�within�the�range� of�values�used�in�developing�the�prediction�model;�this�is�defined�as�interpolating�within� the�range�of�the�sample�predictor�data��Table�19�3�summarizes�the�assumptions�of�logistic� regression�and�the�impact�of�their�violation� 19.5.2 Conditions Although� not� assumptions,� the� following� conditions� should� be� met� with� logistic� regres- sion:�nonzero�cell�counts,�nonseparation�of�data,�lack�of�influential�points,�and�sufficient� sample�size� 19.5.2.1 Nonzero Cell Counts The� first� condition� is� related� to� nonzero� cell� counts� in� the� case� of� nominal� independent� variables��A�zero�cell�count�occurs�when�the�outcome�is�constant�for�one�or�more�categories� Table 19.3 Assumptions�and�Violation�of�Assumptions:�Logistic�Regression�Analysis Assumption Effect of Assumption Violation Noncollinearity�of�Xs •��Regression�coefficients�can�be�quite�unstable�across�samples�(as�standard� errors�are�larger) •�Restricted�generalizability�of�the�model Linearity •�Bias�in�slopes�and�intercept •�Expected�change�in�logit�of�Y�is�not�a�constant�and�depends�on�value�of�X Independence •�Influences�standard�errors�of�the�model�and�thus�hypothesis�tests�and�CIs Values�of�Xs�are�fixed •��Extrapolating�beyond�the�range�of�X�combinations:�prediction�errors� larger,�may�also�bias�slopes�and�intercept •��Interpolating�within�the�range�of�X�combinations:�smaller�effects�than� when�extrapolating;�if�other�assumptions�met,�negligible�effect 724 An Introduction to Statistical Concepts of�a�nominal�variable�(e�g�,�all�females�pass�the�course)��This�results�in�high�standard�errors� because� entire� groups� of� individuals� have� odds� of� 0� or� 1�� Strategies� to� remove� zero� cell� counts� include� recoding� the� categories� (e�g�,� collapsing� categories)� or� adding� a� constant� to� each� cell� of� the� crosstab� table�� If� the� overall� model� fit� is� what� is� of� primary� interest,� then�you�may�choose�not�to�do�anything�about�zero�cell�counts��The�overall�relationship� between� the� set� of� predictors� and� the� dependent� variable� is� not� generally� impacted� by� zero�cell�counts��However,�if�zero�cell�counts�are�retained�and�the�results�of�the�individual� predictors�are�what�is�of�interest,�it�would�be�wise�to�provide�a�limitation�to�your�results� recognizing� higher� standard� errors� that� are� produced� due� to� zero� cell� counts� as� well� as� caution�that�the�values�of�the�individual�regression�coefficients�may�be�affected��Careful� review�of�the�data�prior�to�computing�the�logistic�regression�model�can�help�thwart�poten- tial�problems�with�zero�cell�counts� 19.5.2.2 Nonseparation of Data Another�condition�that�should�be�examined�is�that�of�complete�or�quasi-complete�separa- tion��Complete�separation�arises�when�the�dependent�variable�is�perfectly�predicted�and� results�in�an�inability�to�estimate�the�model��Quasi-complete�separation�occurs�when�there� is�less�than�complete�separation�and�results�in�extremely�large�coefficients�and�standard� errors��These�conditions�may�occur�when�the�number�of�variables�equals�(or�nearly�equals)� the�number�of�cases�in�the�dataset,�such�that�large�coefficients�and�standard�errors�result� 19.5.2.3 Lack of Influential Points Outliers�and�influential�cases�are�problematic�in�logistic�regression�analysis�just�as�with�OLS� regression��Severe�outliers�can�cause�the�maximum�likelihood�estimator�to�reduce�to 0�(Croux,� Flandre,�&�Haesbroeck,�2002)��Residual�analysis�and�other�diagnostic�tests�are�equally�ben- eficial�for�detecting�miscoded�data�and�unusual�(and�potentially�influential)�cases�in�logistic� regression�as�it�is�in�OLS�regression��SPSS�provides�the�option�for�saving�a�number�of�values� including�predicted�values,�residuals,�and�influence�statistics��Both�probabilities�and�group� membership�predicted�values�can�be�saved��Residuals�that�can�be�saved�include�(a)�unstan- dardized,� (b)� logit,� (c)� studentized,� (d)� standardized,� and� (e)� deviance�� The� three� types� of� influence�values�that�can�be�saved�include�Cook’s,�leverage�values,�and�DfBeta� The�wide�variety�of�values�that�can�be�saved�suggests�that�there�are�many�types�of�diag- nostics�that�can�be�performed��Review�should�be�conducted�when�standardized�or�studen- tized�residuals�are�greater�than�an�absolute�value�of�3�0�and�DfBeta�values�are�greater�than�1�� Leverage�values�greater�than�(m +�1)/N�(where�m�equals�the�number�of�independent�vari- ables)�indicate�an�influential�case�(values�closer�to�1�suggest�problems,�while�those�closer�to� 0�suggest�little�influence)��If�outliers�or�influential�cases�are�found,�it�is�up�to�you�to�decide� if�removal�of�the�case�is�warranted��It�may�be�that�they,�while�uncommon,�are�completely� plausible�so�that�they�are�retained�in�the�model��If�they�are�removed�from�the�model,�it�is� important�to�report�the�number�of�cases�that�were�removed�prior�to�analysis�(and�evidence� to� suggest� what� caused� you� to� remove� them)�� A� review� of� Chapters� 17� and� 18� provides� further�details�on�diagnostic�analysis�of�outliers�and�influential�cases� 19.5.2.4 Sample Size Simulation� research� suggests� that� logistic� regression� is� best� used� with� large� samples�� Samples� of� size� 100� or� greater� are� needed� to� accurately� conduct� tests� of� significance� for� 725Logistic Regression logistic�regression�coefficients�(Long,�1997)��Note�that�for�illustrative�purposes,�the�exam- ple�in�this�chapter�uses�a�sample�size�of�20��We�recognize�this�is�insufficient�in�practice,�but� have�used�it�for�greater�ease�in�presenting�the�data� 19.6 Effect Size We� have� already� talked� about� multiple� R2� pseudovariance� explained� values� which� can� be� used�not�only�to�gauge�model�fit�but�also�as�measures�of�effect�size��Another�important�statistic� in�logistic�regression�is�the�odds ratio�(OR),�also�an�effect�size�index�that�is�similar�to�R2��The� odds�ratio�is�computed�by�exponentiating�the�logistic�regression�coefficient�ebk��Conceptually� this�is�the�odds�for�one�category�(e�g�,�prepared�for�kindergarten)�divided�by�the�odds�for�the� other�category�(e�g�,�unprepared�for�kindergarten)��The�null�hypothesis�to�be�tested�is�that�OR�=�1,� which�indicates�that�there�is�no�relationship�between�a�predictor�variable�and�the�dependent� variable��Thus,�we�want�to�find�OR�to�be�significantly�different�from�1� When�the�independent�variable�is�continuous,�the�odds�ratio�represents�the�amount�by� which�the�odds�change�for�a�one-unit�increase�in�the�independent�variable��When�the�odds� ratio�is�greater�than�1,�the�independent�variable�increases�the�odds�of�occurrence��When� the�odds�ratio�is�less�than�1,�the�independent�variable�decreases�the�odds�of�occurrence�� The� odds� ratio� is� provided� in� SPSS� output� as� “Exp(B)”� in� the� table� labeled� “Variables� in� the�Equation�”�In�predicting�kindergarten�readiness,�social�development�is�a�continuous� covariate�with�a�resulting�odds�ratio�of�2�631��We�can�interpret�this�odds�ratio�to�be�that�for� every�one-unit�increase�in�social�development,�the�odds�of�being�ready�for�kindergarten� (i�e�,�prepared)�increase�by�263%,�controlling�for�the�other�variables�in�the�model� In�the�case�of�categorical�variables,�including�dichotomous,�multinomial,�and�ordinal�vari- ables,�odds�ratios�are�often�interpreted�in�terms�of�their�relative�size�or�the�change�in�odds� ratios�in�comparing�models��Consider�first�the�case�of�a�dichotomous�variable��In�the�model� predicting�kindergarten�readiness,�type�of�household�is�one�independent�variable�included� in�the�model�where�a�two-parent�home�is�coded�as�“1”�and�a�single-parent�home�as�“0�”�An� odds�ratio�of��002�indicates�that�the�odds�of�being�prepared�for�kindergarten�(compared�to� unprepared�for�kindergarten)�are�decreased�by�a�factor�of��002�by�being�in�a�single-parent� home�(as�opposed�to�living�in�a�two-family�home)��We�could�also�state�that�the�odds�that�a� child�from�a�single-parent�home�will�be�prepared�for�kindergarten�are��998�(i�e�,�1�−��002)� In� the� case� of� a� categorical� variable� with� more� than� two� categories,� the� odds� ratio� is� interpreted�relative�to�the�reference�(or�left�out)�category��For�example,�say�we�have�a�pre- dictor� in� our� model� that� is� mother’s� education� level� with� categories� that� include� (1)� less� than�high�school�diploma,�(2)�high�school�diploma�or�GED,�and�(3)�at�least�some�college�� Say� we� set� the� last� category� (“at� least� some� college”)� as� the� reference� category�� An� odds� ratio�of��86�for�the�category�of�“high�school�diploma�or�GED”�for�mother’s�education�level� suggests�that�the�odds�of�being�prepared�for�kindergarten�(as�compared�to�unprepared)� decrease� by� a� factor� of� �86� when� the� child’s� mother� has� a� high� school� diploma� or� GED,� relative�to�when�the�child’s�mother�has�at�least�some�college,�when�the�other�variables�in� the�model�are�controlled� Odds�ratio�values�can�also�be�converted�to�Cohen’s�d�using�the�following�equation: d OR = ln( ) .1 81 726 An Introduction to Statistical Concepts 19.7 Methods of Predictor Entry The� three� categories� of� model� building� that� will� be� discussed� include� (a)� simultaneous� logistic�regression,�(b)�stepwise�logistic�regression,�and�(c)�hierarchical�regression� 19.7.1 Simultaneous logistic Regression With� simultaneous� logistic� regression,� all� the� independent� variables� of� interest� are� included� in� the� model� in� one� set�� This� method� of� model� building� is� usually� used� when� the�researcher�does�not�hypothesize�that�some�predictors�are�more�important�than�others�� This�method�of�entry�allows�you�to�evaluate�the�contribution�of�an�independent�variable� over�and�above�that�of�all�other�predictors�in�the�model�(i�e�,�each�independent�variable�is� evaluated�as�if�it�was�the�last�one�to�enter�the�equation)��One�problem�that�may�be�encoun- tered� with� this� method� of� entry� is� related� to� strong� correlations� between� the� predictor� and�the�outcome��An�independent�variable�that�has�a�strong�bivariate�correlation�with�the� dependent� variable� may� indicate� a� weak� correlation� when� entered� simultaneously� with� other�predictors��In�SPSS,�this�method�of�entry�is�referred�to�as�“Enter�” 19.7.2 Stepwise logistic Regression Stepwise�logistic�regression�is�a�data-driven�model�building�technique�where�the�computer� algorithms� drive� variable� entry� rather� than� theory�� Issues� with� this� type� of� technique� have� previously� been� outlined� in� the� discussion� associated� with� this� method� in� multiple� regres- sion� and� thus� are� not� rehashed� here�� If� stepwise� logistic� regression� is� determined� to� be� the� most� appropriate� strategy� to� build� your� model,� Hosmer� and� Lemeshow� (2000)� suggest� set- ting�a�more�liberal�criterion�for�variable�inclusion�(e�g�,�α�=��15�to��20)��They�also�provide�spe- cific�recommendations�on�dealing�with�interaction�terms�and�scales�of�variables��Because�it�is� only�in�unusual�instances�that�this�method�of�model�building�is�appropriate�(e�g�,�exploratory� research),�additional�coverage�of�the�suggestions�by�Hosmer�and�Lemeshow�is�not�presented� SPSS� offers� forward� and� backward� stepwise� methods�� For� both� forward� and� backward� methods,�options�include�conditional,�LR,�and�Wald��The�differences�between�these�options� are�mathematically�driven��The�LR�method�of�entry�uses�the�−2LL�for�estimating�entry�of� independent�variables��The�conditional�method�also�uses�the�likelihood�ratio�test,�but�one� that�is�considered�to�be�computationally�quicker��The�Wald�method�applies�the�Wald�test�to� determining�entry�of�the�independent�variables��With�forward�stepwise�methods,�the�model� begins�with�a�constant�only,�and�based�on�some�criterion,�independent�variables�are�added� one�at�a�time�until�a�specified�cutoff�is�achieved�(e�g�,�all�independent�variables�included�in� the�model�are�statistically�significant,�and�any�additional�variables�not�included�in�the�model� are� not� statistically� significant)�� Backward� stepwise� methods� work� in� the� reverse� fashion� where�initially�all�independent�variables�(and�the�constant)�are�included��Independent�vari- ables�are�then�removed�until�only�those�that�are�statistically�significant�remain�in�the�model,� and�including�an�omitted�independent�variable�would�not�improve�the�model� 19.7.3 hierarchical Regression In� hierarchical� regression,� the� researcher� specifies� a� priori� a� sequence� for� the� individ- ual� predictor� variables� (not� to� be� confused� with� hierarchical� linear� models,� which� is� a� regression�approach�for�analyzing�nested�data�collected�at�multiple�levels,�such�as�child,� 727Logistic Regression classroom,�and�school)��The�analysis�proceeds�in�a�forward�selection,�backward�elimina- tion,� or� stepwise� selection� mode� according� to� a� researcher-specified,� theoretically� based� sequence,� rather� than� an� unspecified,� statistically� based� sequence�� In� SPSS,� this� is� con- ducted� by� entering� predictors� in� blocks� and� selecting� their� desired� method� of� entering� variables� in� each� block� (e�g�,� simultaneously,� forward,� backward,� stepwise)�� Because� this� method� was� explained� in� detail� in� Chapter� 18� and� operation� of� this� method� of� variable� selection�is�the�same�in�logistic�regression,�additional�information�will�not�be�presented� 19.8 SPSS Next�we�consider�SPSS�for�the�logistic�regression�model��Before�we�conduct�the�analysis,� let�us�review�the�data�(note�that�we�recognize�the�sample�size�of�20�does�not�meet�mini- mum�sample�size�criteria�previously�specified;�however�for�illustrative�purposes,�we�felt� it� important� that� we� be� able� to� show� the� entire� dataset,� and� this� would� have� been� more� difficult�with�the�recommended�sample�size�for�logistic�regression)��With�one�dependent� variable�and�two�independent�variables,�the�dataset�must�consist�of�three�variables�or�col- umns,� one� for� each� independent� variable� and� one� for� the� dependent� variable�� Each� row� still�represents�one�individual��As�seen�in�the�following�screenshot,�the�SPSS�data�are�in� the� form� of� three� columns� that� represent� the� two� independent� variables� (a� continuous� teacher-administered�social�development�scale�and�household—a�dichotomous�variable,� single-�vs��two-adult�household)�and�one�binary�dependent�variable�(kindergarten�readi- ness� screening� test—prepared� vs�� not� prepared)�� As� our� dependent� variable� is� dichoto- mous,�we�will�conduct�binary�logistic�regression��When�the�dependent�variable� consists� of�more�than�two�categories,�multinomial�logistic�regression�is�appropriate�(although�not� illustrated�here)� �e independent variables are labeled “Social” and “Household” where each value represents the child’s score on the teacher reported social development scale (interval measurement) and whether the child lives with one or two parents (nominal measurement). A “1” for household indicates two-parents and “0” represents a single-parent family. �e dependent variable is “Readiness” and represents whether or not the child is prepared for kindergarten. �is is a binary variable where “1” represents “prepared” and “0” represents “unprepared.” 728 An Introduction to Statistical Concepts Step 1:�To�conduct�a�binary�logistic�regression,�go�to�“Analyze”�in�the�top�pulldown� menu,�then�select�“Regression,”�and�then�select�“Binary Logistic.”�Following�the� screenshot�(step�1)�that�follows�produces�the�“Logistic Regression”�dialog�box� Logistic regression: Step 1 A B C Step 2:�Click�the�dependent�variable�(e�g�,�“Readiness”)�and�move�it�into�the�“Dependent” box�by�clicking�the�arrow�button��Click�the�independent�variables�and�move�them�into�the� “Covariate(s)”�box�by�clicking�the�arrow�button�(see�screenshot�step�2)� Logistic regression: Step 2 Clicking on “Categorical” will allow you to specify variables that are categorical. Clicking on “Save” will allow you to save various predicted values, residuals, and other statistics useful for diagnostics. Clicking on “Options” will allow you to select various statistics and plots. Clicking on “Enter” will allow you to select different types of methods of entering the variables (e.g., forward, backward). “Enter” is the default and all predictors are entered as one set. Had we been entering our variables hierarchically, we would have used the “Next” button to enter each set of variables in the order of progression. Select the independent variables from the list on the left and use the arrow to move them to the “Covariates” box on the right. Select the dependent variable from the list on the left and use the arrow to move it to the “Dependent” box on the right. Social development... 729Logistic Regression Step 3:�From�the�“Logistics Regression”�dialog�box�(see�screenshot�step�2),�click- ing� on� “Categorical”� will� provide� the� option� to� define� as� categorical� those� variables� that� are� nominal� or� ordinal� in� scale� as� well� as� to� select� which� category� of� the� variable� is� the� reference� category� through� the� “Define Categorical Variables”� dialog� box� (see screenshot� step� 3a)�� From� the� list� of� covariates� on� the� left,� click� the� categorical� covariate(s)� (e�g�, “Household”)� and� move� it� into� the�“Categorical Covariates”� box� by�clicking�the�arrow�button��By�default,�“(Indicator)”�will�appear�next�to�the�variable� name��Indicator�refers�to�traditional�dummy�coding,�and�you�have�the�option�of�select- ing�which�value�is�the�reference�category��For�binary�variables�(only�two�categories),�using� the�“Last”�value�as�the�reference�category�means�that�the�category�coded�with�the�larg- est�value�will�be�the�category�“left�out”�of�the�model�(or�referent),�and�using�the�“First” value�as�the�reference�category�means�that�the�category�coded�with�the�smallest�value�will� be�the�category�“left�out”�of�the�model��Here�two-parent�households�were�coded�as�1�and� single-parent�households�as�0��We�use�single-parent�households�(coded�as�0)�as�the�refer- ence� category�� Thus,� we� select� the� radio� button� for� “First”� (see� screenshot� step� 3a)� to� define�single-parent�households�as�the�reference�category� Logistic regression: Step 3a Selecting “First” means that the category coded with the smallest value is the reference category. Selecting “Last” means that the category coded with the largest value is the reference category. Next,�we�need�to�click�the�button�labeled�“Change”�(see�screenshot�step�3b)�to�define�the�first� value�(i�e�,�0�or�single-parent�household)�as�the�reference�(or�“left�out”)�category��By�doing� that,�the�name�of�our�categorical�covariate�will�now�read�Household(Indicator(first))�� Had�we�had�a�categorical�variable�with�more�than�two�categories,�we�could�just�define�the� variable�as�categorical�within�logistic�regression�and�select�either�the�first�or�last�value�as� the� reference� category�� If� neither� the� first� or� last� were� what� you� wanted� as� the� reference� category,�then�some�recoding�of�the�data�is�necessary� 730 An Introduction to Statistical Concepts Logistic regression: Step 3b Clicking “change” will define the smallest value (0 in this illustration) as the reference category that is “left out” of the model. Before� we� move� on,� notice� that� the� button� for� “Contrast”� is� a� toggle� menu� with� “Indicator”�as�the�default�option��Selecting�the�toggle�menu�allows�you�to�select�other� types� of� contrasts� often� discussed� in� relation� to� analysis� of� variance� (ANOVA)� contrasts� (e�g�,�Simple,�Difference,�Helmert)��These�will�not�be�reviewed�here��Click�on�“Continue”� to�return�to�the�“Logistic Regression”�dialog�box� Should a more complex contrast be desired, additional options are available in SPSS. Step 4:� From� the� “Logistic Regression” dialog� box� (see� screenshot� step� 2),� clicking�on�“Save”�will�provide�the�option�to�save�various�predicted�values,�residuals,� and� statistics� that� can� be� used� for� diagnostic� examination�� From� the�“Save”� dialog� box�under�the�heading�of�Predicted Values,�place�a�checkmark�in�the�box�next�to� the� following:� (1)�probabilities� and� (2)�group membership�� Under� the� heading� of�Residuals,� place� a� checkmark� in� the� box� next� to� the� following:�standardized�� Under� the� heading� of�Influences,� place� a� checkmark� in� the� box� next� to� the� follow- ing:�(1)�Cook’s,�(2)�Leverage values,�and�(3)�DfBeta(s).�Click�on “Continue”�to� return�to�the�original�dialog�box� 731Logistic Regression Logistic regression: Step 4 Step 5:�From�the�“Logistic Regression”�dialog�box�(see�screenshot�step�2),�clicking� on�“Options”�will�allow�you�to�generate�various�statistics�and�plots��From�the�“Options” dialog�box�under�the�heading�of�Statistics and Plots,�place�a�checkmark�in�the�box� next� to� the� following:� (1)�Classification plots,� (2)�Hosmer–Lemeshow goodness- of-fit,�(3)�casewise listing of residuals,�(4)�outliers outside, and�(5)�CI for exp(B). For�Outliers outside, you�must�specify�a�numeric�value�of�standard� deviations� to� define� what� you� consider� to� be� an� outlier�� Common� values� may� be� 2� (in� a� normal�distribution,�95%� of�cases�will�be�within�±2�standard�deviations),�3�(in�a�normal� distribution,� about� 99%� of� cases� will� be� within� ±3� standard� deviations),� or� 3�29� (in� a� normal� distribution,� about� 99�9%� of� cases� will� be� within� ±3�29� standard� deviations)�� For� this�illustration,�we�will�use�a�value�of�2��For�CI for exp(B), you�must�specify�a�CI��This� should�be�the�complement�of�the�alpha�being�tested��If�you�are�using�an�alpha�of��05,�then�the� CI�will�be�1�−��05,�or��95��All�the�remaining�options�in�the�“Options”�dialog�box�will�be�left� as�the�default�settings��Click�on�“Continue”�to�return�to�the�original�dialog�box��From�the� “Logistic Regression”�dialog�box,�click�on “OK”�to�generate�the�output� Logistic regression: Step 5 Interpreting the output:�Annotated�results�are�presented�in�Table�19�4� 732 An Introduction to Statistical Concepts Table 19.4 SPSS�Results�for�the�Binary�Logistic�Regression�Kindergarten�Readiness�Example Case Processing Summary Unweighted Casesa N Percent Included in analysis Missing cases Selected cases Total Unselected cases Total 20 0 20 0 20 100.0 .0 100.0 .0 100.0 a If weight is in effect, see classification table for the total number of cases. Dependent Variables Encodings Original Value Internal Value Unprepared 0 Prepared 1 Categorical Variables Codings Parameter Coding Frequency (1) Single parent household 10 .000Type of household Two-parent household 10 1.000 is table provides information on sample size and missing data. e sample size is 20 and we have no missing data. Information on how the values of the dependent variable are coded is provided under “Internal Value.” “Unprepared” is coded as 0 and “Prepared” is coded as 1. Information on how the values of the categorical variable(s) are coded is provided as “Parameter Coding.” “Single Parent Household” is coded as 0 and “Two-Parent Household” is coded as 1. e sample size per group is presented in the “Frequency” column. Block 0: Beginning Block Classification Tablea,b Predicted Kindergarten Readiness Observed Unprepared Prepared Percentage Correct Unprepared .0Kindergarten readiness Prepared 0 0 8 12 100.0 Step 0 Overall percentage 60.0 a Constant is included in the model. b �e cut value is .500. Block 0 is a summary of the model with the constant only (i.e., none of the predictors are included). �e classification table provides the percentage of cases correctly predicted given the constant only. Without including covariates, we can correctly predict children who are prepared for kindergarten 100% of the time but fail to predict any children (0%) who are unprepared. Here all children are predicted to be prepared. Variables in the Equation B SE Wald df Sig. Exp(B) Step 0 Constant .405 .456 .789 1 .374 1.500 733Logistic Regression Table 19.4 (continued) SPSS�Results�for�the�Binary�Logistic�Regression�Kindergarten�Readiness�Example Variables Not in the Equation Score df Sig. Social development .003Variables Household(1) .068 Step 0 Overall statistics 8.860 3.333 11.168 1 1 2 .004 Variables not in the equation provides an indication of whether each covariate will statistically significantly contribute to predicting the outcome. Only social development ( p = .003) is of value in the logistic model. �e value of 11.168 for overall statistics is a residual chi- square statistic. Since the p value for it indicates statistical significance (p = .004), this indicates that including the two covariates improves the model as compared to the constant only model. Block 1: Method = Enter Omnibus Tests of Model Coefficients Chi-Square df Sig. Step 2 Block 2 Step 1 Model 15.793 15.793 15.793 2 .000 .000 .000 Model Summary Step –2 Log Likelihood Cox and Snell R Square Nagelkerke R Square 1 11.128a .546 .738 a Estimation terminated at iteration number 7 because parameter estimates changed by less than .001. Method = Enter indicates that the method of entering the predictors was simultaneous entry (recall this is the default method in SPSS and is called “Enter”). Model summary statistics provide overall model fit. For good model fit, the value of –2LL for the full model (11.128) should be less than –2LL for the constant only model (26.921). �is is a chi-square value with degrees of freedom equal to the number of parameters in the full model (i.e., two predictors plus one constant) minus the number of parameters in the baseline model (i.e., 1). �us there are two df using the chi-square table, with an alpha of .05 and two df , the critical value is 5.99. Since 11.128 is larger than the critical value, we reject the null hypothesis that the best prediction model is the constant only model. In other words, the full model (with predictors) is better at predicting kindergarten readiness than the constant only model. �e –2LL for the constant only model is computed as the sum of chi-square for the constant only model and –2LL for the full model: �e two R2 values are pseudo R2 and are interpreted similarly to multiple R2. �ese can be used as effect size indices for logistic regression and Cohen’s interpretations for correlation can be used to interpret. Both values indicate a large effect. 2 Model + –2LL = 15.793 + 11.128 = 26.921 (continued) 734 An Introduction to Statistical Concepts Table 19.4 (continued) SPSS�Results�for�the�Binary�Logistic�Regression�Kindergarten�Readiness�Example Contingency Table for Hosmer and Lemeshow Test Kindergarten Readiness = Unprepared Kindergarten Readiness = Prepared Observed Expected Observed Expected Total 1 2 3 4 5 6 7 8 Step 1 9 2 2 1 2 0 1 0 0 0 1.988 1.922 1.651 1.292 .607 .404 .100 .030 .005 0 0 1 0 2 2 2 2 3 .012 .078 .349 .708 1.393 2.596 1.900 1.970 2.995 2 2 2 2 2 3 2 2 3 �e classification table provides information on how well group membership was predicted. Cells on the diagonal indicate correct classification. For example, children who were prepared for kindergarten were accurately classified 91.7% of the time as compared to unprepared children (87.5%). Overall, 90% of children were correctly classified. �is is computed as the number of correctly classified cases divided by total sample size: Using Press’s Q and given the chi-square critical value of 3.841 (df = 1), we find: We reject the null hypothesis. �ere is evidence to suggest that the predictions are statistically significantly better than chance. Q = = =12.8 [N –(nK)]2 N (K–1) 20 (2–1) 7+11 20 .90 [20 –(18)(2)]2 Classification Tablea Predicted Kindergarten Readiness Observed Unprepared Prepared Percentage Correct Unprepared 7 1 87.5Kindergarten readiness Prepared 1 11 91.7 Step 1 Overall percentage 90.0 a �e cut value is .500. Hosmer and Lemeshow Test Step Chi-Square df Sig. 1 4.691 7 .698 As a measure of classification accuracy, non- statistical significance (p= .698) indicates good model fit for the Hosmer and Lemeshow test. �is test is affected by small sample size, however; caution should be used when interpreting the results of this test when sample size is less than 50. 735Logistic Regression Table 19.4 (continued) SPSS�Results�for�the�Binary�Logistic�Regression�Kindergarten�Readiness�Example Variables in the Equation 95% CI for Exp(B) B SE Wald df Sig. Exp(B) Lower Upper Social development Household(1) 1.097 .000 6.313 1.693 Step 1a Constant .967 –6.216 –15.404 .446 3.440 7.195 4.696 3.265 4.584 1 1 1 .030 .071 .032 2.631 .002 .000 a Variable(s) entered on step 1: Social development, household. Since the odds of 1.00 (which indicates similar odds for falling into either category of the outcome) are not contained within the interval for social development, this suggests the odds ratio is statistically significantly different from zero. Note that the odds ratio is only computed for the predictors and not for the intercept (i.e., constant). �e p value for “Social” (p= .030) indicates that the slope is statistically significantly different from zero. �is tells us that the independent variable is contributing to predicting kindergarten preparedness. �e intercept (p = .032) is also statistically significantly different from zero. Exp(B) values are the odds ratios. �e odds ratio of 2.631 for social indicates that the odds for being prepared for kindergarten are over 2–1/2 times greater (or 263%) for every one point increase in social development. �e odds for household are nearly zero. �is indicates that the odds for being prepared for kindergarten are about the same regardless of the child’s household structure (single- versus two-parent home). �e Wald statistic is used to test the statistical significance of each covariate. �e B coefficient is interpreted as the change in the logit of the dependent variable given a one- unit change in the independent variable. Recall that the logit is the natural log of the dependent variable occurring. With B equal to .967, this tells us that a one-unit change in social development will result in nearly a one-unit change in the logit of kindergarten preparedness. �e constant is the expected value of the logit of kindergarten readiness for children of single parents (recall this was coded as 0) and when social development is zero. A positive B indicates that an increase in value of that independent variable will result in an increase in the predicted probability of the dependent variable. A negative B indicates that an increase in value of that independent variable will result in an decrease in the predicted probability of the dependent variable. NOTE! Interpretations of B coefficients are usually done via odds ratios. (continued) 736 An Introduction to Statistical Concepts Table 19.4 (continued) SPSS�Results�for�the�Binary�Logistic�Regression�Kindergarten�Readiness�Example Casewise Lista Observed Temporary Variable Case Selected Statusb Kindergarten Readiness Predicted Predicted Group Resid ZResid 8 S P 15 S U** P** .832 .214 U –.832 .786 –2.226 1.918 b S = Selected, U = Unselected cases, and ** = Misclassified cases. a Cases with studentized residuals greater than 2.000 are listed. Recall we told SPSS to identify residuals that were outside two standard deviations. Based on that decision, cases 8 and 15 were identified as potential outliers. We review this output in the discussion on outliers. “P” indicates “Prepared for Kindergarten” and “U” indicates “Unprepared for Kindergarten.” P’s to the left of .50 indicate misclassified cases. U’s to the right of .50 indicate misclassified cases. Although there are 4 P’s, this represents a frequency of one. 737Logistic Regression Examining Data for Assumptions for Logistic Regression Previously� we� described� a� number� of� assumptions� used� in� logistic� regression�� These� included� (a)� noncollinearity,� (b)� linearity� between� the� predictors� and� logit� of� the� depen- dent�variable,�and�(c)�independence�of�errors��We�also�review�the�data�to�ensure�there�are� no�outliers� Before�we�begin�to�examine�assumptions,�let�us�review�the�values�that�we�requested�to� be�saved�to�our�data�file�(see�dataset�screenshot�that�follows): � 1��PRE _ 1�represents�the�predicted�probabilities� � 2. PGR _ 1� is� the� predicted� group� membership� (here� group� membership� is� either� prepared�or�unprepared�for�kindergarten)� � 3��COO _ 1�represents�Cook’s�influence�statistics��As�a�rule�of�thumb,�Cook’s�values� greater�than�1�suggest�that�case�is�potentially�problematic� � 4��LEV _ 1�represents�leverage�values��As�a�general�guide,�leverage�values�less�than� �20� suggest� there� are� no� problems� with� cases� exerting� undue� influence�� Values� greater�than��5�indicate�problems� � 5��ZRE _ 1�pertains�to�standardized�residuals�computed�as�the�residual�divided�by� an�estimate�of�the�standard�deviation�of�the�residual��Standardized�residuals�have� a�mean�of�0�and�standard�deviation�of�1� � 6��DFB0 _ 1, DFB1 _ 1, and DFB2 _ 1�are�DfBeta�values�and�indicate�the�differ- ence�in�a�beta�coefficient�if�that�particular�case�were�excluded�from�the�model� 1 As we look at the raw data, we see eight new variables have been added to our dataset. ese are predicted values, residuals, and other diagnostic statistics. 2 3 4 5 6 6 6 Noncollinearity It� is� not� possible� to� request� multicollinearity� statistics,� such� as� tolerance� and� VIF,� using� logistic� regression� in� SPSS�� We� can,� however,� estimate� those� values� by� running� the� same� variables� in� a� multiple� regression� model� (see� Chapter� 18)� and� requesting� only� the� collin- earity� statistics�� We� are� not� interested� in� the� parameter� estimates� of� the� model—only� the� collinearity�statistics��Tolerance�values�less�than��10�and�VIF�values�greater�than�10�indicate� multicollinearity�(Menard,�1995)��Because�the�steps�for�generating�multiple�regression�were� 738 An Introduction to Statistical Concepts presented�in�Chapter�18,�we�will�not�reiterate�them�here��Rather,�we�will�merely�present�the� applicable�portion�of�the�output�of�this�model��From�the�output�that�follows�with�a�tolerance� of��248�and�VIF�of�4�037,�we�have�evidence�that�we�do�not�have�multicollinearity��In�examin- ing�collinearity�diagnostics,�condition�index�values�that�are�substantially�larger�than�others� listed�indicate�potential�problems�with�multicollinearity�(although�“substantially�larger”�is� a�subjective�measure)��Here�the�condition�index�of�dimension�3(14�259)�is�about�five�times� larger�than�the�next�largest�condition�index��The�last�three�columns�refer�to�variance�propor- tions��Multiplying�these�values�by�100�provides�a�percentage�of�the�variance�of�the�regres- sion�coefficient�that�is�related�to�a�particular�eigenvalue��Multicollinearity�is�suggested�when� covariates� have� high� percentages� associated� with� a� small� eigenvalue�� Thus,� for� purposes� of� reviewing� for� multicollinearity,� concentrate� only� on� the� rows� with� small� eigenvalues�� In� this� example,� 100%� of� the� variance� of� the� regression� coefficient� for� social� development� and�73%�for�type�of�household�are�related�to�eigenvalue�3�(the�dimension�with�the�smallest� eigenvalue)��This�suggests�there�may�be�some�multicollinearity��In�summary,�we�have�met� the�assumption�of�noncollinearity�with�the�tolerance�and�VIF�values,�but�there�is�some�sug- gestion�of�multicollinearity�with�the�condition�index�and�variance�proportion�values� Coefficientsa Collinearity Diagnosticsa Variance Proportions Model Model Dimension 1 2 3 2.683 .303 .013 1.000 2.974 14.259 .00 .05 .95 .00 .00 1.00 .01 .25 .73 1 Eigenvalue Condition Index (Constant) Social Development Type of Household Social development .248 Collinearity Statistics Tolerance VIF 4.037 .248 4.037 1 Type of household a Dependent Variable: Kindergarten readiness. a Dependent Variable: Kindergarten readiness. Linearity Recall�that�the�linearity�assumption�is�applicable�only�to�continuous�variables��Thus,�we� will�test�this�assumption�only�for�social�development��The�Box-Tidwell�transformation�test� can�be�used�to�test�that�the�assumption�of�linearity�has�been�met��To�generate�this�test,�for� each�continuous�independent�variable,�we�must�first�create�an�interaction�term�that�is�the� product�of�the�independent�variable�and�its�natural�log�(ln)��Here�we�have�only�one�con- tinuous�independent�variable—social�development��Thus,�only�one�interaction�term�will� be�created� Step 1:�To�create�an�interaction�term�of�our�continuous�variable�and�the�natural�log�of�this� variable,�go�to�“Transform”�in�the�top�pulldown�menu,�then�select�“Compute Variable.”� Following�the�screenshot�(step�1)�that�follows�produces�the�“Compute Variable”�dialog�box� 739Logistic Regression Creating an interaction term: Step 1 A B Step 2:�In�the�“Target Variable”�box�in�the�upper�left�corner,�enter�the�variable�name� that�you�want�to�appear�as�the�column�header��Since�this�is�the�column�header�name,�this� name� cannot� begin� with� special� characters� or� numbers� and� cannot� have� any� spaces�� If� you�wish�to�define�the�label�for�this�variable�(i�e�,�what�will�appear�on�the�output;�this�can� include�special�characters,�spaces,�and�numbers),�then�click�on�the�“Type & Label”�box� directly� underneath�“Target Variable”� where� additional� text� to� define� the� name� of� the�variable�can�be�included��Next,�click�on�the�continuous�covariate�(i�e�,�social�develop- ment)�and�move�it�into�the�“Numeric Expression”�box�by�clicking�on�the�arrow�in�the� middle�of�the�screen��Using�either�the�keyboard�on�screen�or�your�keyboard,�click�on�the� asterisks�key�(i�e�,�*)��This�will�be�used�as�the�multiplication�sign��Next,�under�“Function group,”�click�on�arithmetic�to�display�all�of�the�basic�mathematical�functions��From�this� alphabetized� list,� click� on�“Ln”� (natural� log)�� To� move� this� function� into� the�“Numeric Expression”�box,�click�on�the�arrow�key�in�the�right�central�part�of�the�dialog�box� Select the continuous covariate from the list on the left and use the arrow to move it to the “Numeric Expression” box on the right. �en use the keyboard to insert an * directly after our covariate. Use the arrow key to move the “Ln” function into the “Numeric Expression” box. Select “Arithmetic” to display basic mathematical functions in the bottom right list. Creating an interaction term: Step 2 From the list of arithmetic functions, select “Ln” (the natural log). Use the arrow key to move it into the “Numeric Expression” box. 740 An Introduction to Statistical Concepts Step 3: Once�the�natural�log�function�is�displayed�in�the�“Numeric Expression” box,� a�question�mark�enclosed�inside�parentheses�will�appear�(see�screenshot�step�3a)��This�is� SPSS’s�way�of�asking�which�variable�you�want�the�natural�log�computed�for��Here�it�is�the� continuous�covariate,�social�development� Delete the question mark and replace it with the variable for which the natural log should be computed. Creating an interaction term: Step 3a Here�we�want�to�compute�the�natural�log�for�the�continuous�covariate,�social�develop- ment��To�move�this�variable�into�the�parentheses,�use�the�backspace�or�delete�key�to�remove� the�question�mark��Then,�click�on�the�continuous�covariate,�social�development,�and�move� it�into�the�parentheses�next�to�LN�in�the�“Numeric Expression”�box�by�clicking�on�the� arrow�in�the�middle�of�the�screen�(see�screenshot�step�3b)��The�numeric�expression�should� then�read�“Social*LN(Social).”�Click “OK”�to�compute�and�create�the�new�variable�in� the�dataset� Creating an interaction term: Step 3b Step 4:�The�next�step�is�to�include�the�newly�created�variable�(i�e�,�the�interaction�of�the� continuous�variable�with�its�natural�log)�into�the�logistic�regression�model,�along�with�the� other�predictors��As�those�steps�have�been�presented�previously,�they�will�not�be�reiterated� here��The�output�indicates�that�the�interaction�term�is�not�statistically�significant�(p�=��300),� which�suggests�we�have�met�the�assumption�of�linearity� 741Logistic Regression Variables in the Equation Step 1a Social Household(1) Social ... Insocial Constant a Variable(s) entered on step 1: Social, household, social ... Insocial. B SE Wald df Sig. Exp(B) Lower Upper 95% Cl for Exp(B) 12.953 –8.208 –2.948 –76.228 11.897 5.264 2.845 64.345 1.185 2.432 1.074 1.403 1 1 1 1 .276 .119 .300 .236 421981.259 .000 .052 .000 .000 .000 .000 5.647E15 8.236 13.845 Independence We� plot� the� standardized� residuals� (which� were� requested� and� created� through� the� “Save” option)�against�the�values�of�X�to�examine�the�extent�to�which�independence�was� met��The�general�steps�for�generating�a�simple�scatterplot�through�“Scatter/dot”�have� been�presented�in�a�previous�chapter�(e�g�,�Chapter�10),�and�they�will�not�be�repeated�here�� From�the�“Simple Scatterplot”�dialog�screen,�click�the�standardized�residual�(called� “normalized�residual”�in�SPSS)�variable�and�move�it�into�the�“Y Axis”�box�by�clicking�on� the�arrow��Click�the�independent�variable�X�and�move�it�into�the�“X Axis”�box�by�clicking� on�the�arrow��Then�click�“OK.” 742 An Introduction to Statistical Concepts Interpreting independence evidence:�If�the�assumption�of�independence�is�met,� the�points�should�fall�randomly�within�a�band�of�−2�0�to�+2�0��Here�we�have�pretty�good� evidence�of�independence,�especially�given�the�small�sample�size�relative�to�logistic�regres- sion,�as�all�but�one�point�(case�19)�are�within�an�absolute�value�of�2�0� Social development N or m al iz ed re si du al 19 10.00 15.00 20.00 25.00 30.00 –3.00000 –2.00000 –1.00000 .00000 1.00000 2.00000 Type of household N or m al iz ed re si du al .20 .40 .60 .80 1.00 19 .00 –3.00000 –2.00000 –1.00000 .00000 1.00000 2.00000 Absence of Outliers Just�as�we�saw�in�multiple�regression,�there�are�a�number�of�diagnostics�that�can�be�used� to�examine�the�data�for�outliers� Cook’s distance:� Cook’s� distance� provides� an� overall� measure� for� the� influence� of� individual�cases��Values�greater�than�one�suggest�that�a�case�may�be�problematic�in�terms� of�undue�influence�on�the�model��Examining�the�residual�statistics�provided�in�the�binary� 743Logistic Regression logistic�regression�output�(see�following�table),�we�see�that�the�maximum�value�for�Cook’s� distance�is�1�58,�which�indicates�at�least�one�influential�point� Leverage values:� These� values� range� from� 0� to� 1,� with� values� close� to� 1� indicating� greater�leverage��As�a�general�rule,�leverage�values�greater�than�(m +�1)/n�[where�m�equals� the�number�of�independent�variables;�here�(2�+�1)/20�=��15]�indicate�an�influential�case��With� a�maximum�of��307,�there�is�evidence�to�suggest�one�or�more�cases�are�exerting�leverage� DfBeta:�We�saved�the�DfBeta�values�as�another�indication�of�the�influence�of�a�case��The� DfBeta�provide�information�on�the�change�in�the�predicted�value�when�the�case�is�deleted� from�the�model��For�logistic�regression,�the�DfBeta�values�should�be�smaller�than�1��Looking�at� the�minimum�and�maximum�DfBeta�values�for�the�intercept�(labeled�“constant”)�and�for� household,�we�have�at�least�one�case�that�is�suggestive�of�undue�influence� Descriptive Statistics Analog of Cook’s in�uence 20 N Minimum Maximum 20 20 20 20 20 .00000 .00691 –2.22568 –1.68367 –.41034 –1.36519 1.58721 .30726 1.91780 6.53464 .09948 4.10130 20 statistics Leverage value Normalized residual DfBeta for constant DfBeta for social DfBeta for household(1) Valid N (listwise) From� our� logistic� regression� output,� we� can� review� the� “Casewise List”� to� deter- mine�cases�with�studentized�residuals�larger�than�two�standard�deviations�(recall�from�the� “Options”�dialog�box�that�we�told�SPSS�to�identify�residuals�outside�two�standard�devia- tions)��Here�there�were�two�cases�(cases�8�and�15)�that�were�identified�as�outliers,�and�the� relevant�statistics�(e�g�,�observed�group,�predicted�value,�predicted�group,�residual,�and�stan- dardized�residual)�are�provided��We�examine�these�cases�to�make�sure�there�was�not�a�data� entry�error��If�the�data�are�correct,�then�we�determine�whether�to�keep�or�filter�out�the�case(s)� Casewise Lista Case 8 15 S S U** P** Selected Statusb Observed Kindergarten Readiness Predicted .832 .214 P U Predicted Group Resid –.832 .786 ZResid –2.226 1.918 Temporary Variable b S = Selected, U = Unselected cases, and ** = Misclassified cases. a Cases with studentized residuals greater than 2.000 are listed. Since�we�have�a�small�dataset,�we�can�easily�review�the�values�of�our�diagnostics�and�see� which�cases�are�problematic�in�terms�of�exerting�undue�influence�and/or�outliers��Those� that�are�circled�are�values�that�fall�outside�of�the�recommended�guidelines�and�thus�are� suggestive�of�outlying�or�influential�cases��Due�to�the�already�small�sample�size,�we�will� 744 An Introduction to Statistical Concepts not� filter� out� any� of� these� potentially� problematic� cases�� However,� in� this� situation� (i�e�,� with� diagnostics� that� suggest� one� or� more� influential� cases),� you� may� want� to� consider� filtering�out�those�cases�or,�at�a�minimum,�reviewing�the�data�to�be�sure�that�there�was�not� a�data�entry�error�for�that�case� Assessing Classification Accuracy In�addition�to�examining�Press’s�Q�for�classification�accuracy,�we�can�generate�a�kappa�statis- tic��Kappa�is�the�proportion�of�agreement�above�that�expected�by�chance��A�kappa�statistic�of� 1�0�indicates�perfect�agreement,�whereas�a�kappa�of�0�indicates�chance�agreement��Negative� values�can�occur�and�indicate�weaker�than�chance�agreement��General�rules�of�interpreta- tion�for�kappa�are�as�follows:�small,�<�30;�moderate,��30�to��50;�large,�>�50�
Step 1:�Kappa�statistics�are�generated�through�the�“Crosstab”�procedure��Because�the�
process�for�creating�a�crosstab�has�been�presented�previously�(see�Chapter�8),�it�will�not�be�
reiterated�here��Once�the�“Crosstab”�dialog�box�is�open,�select�the�dependent�variable�from�
the�list�on�the�left�and�use�the�arrow�key�to�move�it�to�“Row(s)�”�Select�the�predicted�group�
(PGR_1)�from�the�list�on�the�left�and�use�the�arrow�key�to�move�it�to�“Column(s)”�(see�step�1)�
Kappa statistic:
Step 1
Clicking on
“Statistics”
will allow you to
select the
Kappa
statistic.
Clicking on “Cells”
will allow you to
display expected
counts and
column/row/total
percentages.
Select the
dependent variable
from the list on the
left and use the
arrow to move it to
the “Row(s)” box
on the right.
Select the predicted
group from the list
on the left and use
the arrow to move
it to the
“Column(s)” box on
the right

745Logistic Regression
Step 2:�Click�on�the�“Statistics”�option�button��Place�a�checkmark�in�the�box�next�to�
“Kappa”�(step�2)��Then�click�on�“Continue”�to�return�to�the�main�dialog�box�
Kappa statistic:
Step 2
Step 3:�Click�on�the�“Cells”�option�button��In�the�“Cell Display”�dialog�box,�place�
a�checkmark�in�the�box�next�to�observed, expected,�and�row�(step�3)��Then�click�on�
“Continue”�to�return�to�the�main�dialog�box��Then�click�“OK”�to�generate�the�output�
Kappa statistic:
Step 3
The�crosstab�table�is�interpreted�as�we�have�seen�in�the�past��The�columns�represent�the�
predicted� group� membership,� and� the� rows� represent� the� observed� group� membership��
This�table�should�look�familiar�to�the�one�that�was�provided�to�us�with�the�logistic�regres-
sion�results�

746 An Introduction to Statistical Concepts
Kindergarten Readiness * Predicted Group Crosstabulation
Kindergarten
readiness
Unprepared
Unprepared
Prepared
Prepared
Count
Count
Count
Expected count
Expected count
Expected count
% Within Kindergarten readiness
% Within Kindergarten readiness
% Within Kindergarten readiness
Total
Total
Predicted Group
7
3.2
87.5%
1
4.8
8
8.0
8.3%
40.0%
1
4.8
12.5%
11
7.2
91.7%
12
12.0
60.0%
8
8.0
100.0%
12
12.0
100.0%
20
20.0
100.0%
What�is�of�most�interest�is�the�table�labeled�“Symmetric�Measures,”�as�this�table�contains�
the�kappa�statistic��With�a�kappa�statistic�of��792,�and�using�our�rules�of�thumb�for�interpre-
tation,�this�is�considered�to�be�a�large�value,�which�suggests�strong�agreement�
Symmetric Measures
Measure of agreement
Value
20
.792 .140 3.540 .000
Errora Approx. Tb Approx. Sig
Asymp. Std.
N of valid cases
a Not assuming the null hypothesis.
b Using the asymptotic standard error assuming the null hypothesis.
Kappa
19.9 G*Power
A� priori� and� post� hoc� power� can� again� be� determined� using� the� specialized� software�
described�previously�in�this�text�(e�g�,�G*Power),�or�you�can�consult�a�priori�power�tables�
(e�g�,�Cohen,�1988)��As�an�illustration,�we�use�G*Power�to�first�compute�post�hoc�power�of�
our�example�
Post Hoc Power for Logistic Regression Using G*Power
The� first� thing� that� must� be� done� when� using� G*Power� for� computing� post� hoc� power�
is�to�select�the�correct�test�family��For�logistic�regression,�we�select�“Tests”�in�the�top�
pulldown� menu,� then� “Correlation and regression,”� and� finally� “Logistic
regression.”�Once�that�selection�is�made,�the�“Test family” automatically�changes�
to�“z tests.”

747Logistic Regression
A
B
C
Step 1
The�“Type of Power Analysis”�desired�then�needs�to�be�selected��To�compute�post�hoc�
power,�select�“Post hoc:�Compute achieved power—given α, sample size, and
effect size.”�For�this�illustration,�we�will�compute�power�for�the�continuous�covariate�
�e “Input Parameters” for
computingpost hoc power
must be specified.
Following Step 1
will change the
Test family to z
tests.
Following the procedures presented in Step 1
will automatically change the statistical test
to “Logistic regression.”
Here are the post-hoc
power results.
Once the parameters
are specifed, click on
“Calculate.”
Step 2

748 An Introduction to Statistical Concepts
The�“Input Parameters”�must�then�be�specified��In�our�example,�we�conducted�a�two-
tailed�test��The�odds�ratio�for�our�continuous�variable�social�development�was�2�631��The�
probability�that�Y�=�1�given�that�X�=�1�under�the�null�hypothesis�is�set�to��50��The�alpha�level�
we�used�was��05,�and�the�total�sample�size�was�20��“R2 other X”��refers�to�the�squared�
correlation� between� social� development� and� our� other� covariate�� In� this� case,� the� simple�
bivariate� correlation� between� these� variables� is� �867,� and� the� squared� correlation� is� �752��
Social�development�is�a�continuous�variable;�thus,�it�follows�a�normal�distribution��The�last�
two�parameters�to�be�specified�are�for�the�mean�and�standard�deviation�of�our�covariate��
In�this�case,�the�mean�of�social�development�was�20�20,�and�the�standard�deviation�was�6�39��
Once�the�parameters�are�specified,�click�on�“Calculate”�to�find�the�power�statistics�
The�“Output Parameters”�provide�the�relevant�statistics�for�the�input�just�specified��
In�this�example,�we�were�interested�in�determining�post�hoc�power�for�a�logistic�regression�
model��Based�on�the�criteria�specified,�the�post�hoc�power�was�substantially�less�than�1��In�
other�words,�the�probability�of�rejecting�the�null�hypothesis�when�it�is�really�false�was�sig-
nificantly�less�than�1%�(sufficient�power�is�often��80�or�above)��This�finding�is�not�surpris-
ing�given�the�very�small�sample�size��Keep�in�mind�that�conducting�power�analysis�a�priori�
is�recommended�so�that�you�avoid�a�situation�where,�post�hoc,�you�find�that�the�sample�
size�was�not�sufficient�to�reach�the�desired�level�of�power�(given�the�observed�parameters)�
A Priori Power for Logistic Regression Using G*Power
For� a� priori� power,� we� can� determine� the� total� sample� size� needed� for� logistic� regression�
given�the�same�parameters�just�discussed��In�this�example,�had�we�wanted�an�a�priori�power�
of��80�given�the�same�parameters�just�defined,�we�would�need�a�total�sample�size�of�7094�
Here are the a priori
power results.
A priori power

749Logistic Regression
19.10 Template and APA-Style Write-Up
Finally,� here� is� an� example� paragraph� for� the� results� of� the� logistic� regression� analysis��
Recall�that�our�graduate�research�assistant,�Marie,�was�assisting�Malani,�a�faculty�member�
in�the�early�childhood�department��Malani�wanted�to�know�if�kindergarten�readiness�(pre-
pared�vs��unprepared)�could�be�predicted�by�social�development�(a�continuous�variable)�
and�type�of�household�(single-�vs��two-parent�home)��The�research�question�presented�to�
Malani�from�Marie�included�the�following:�Can kindergarten readiness be predicted from social
development and type of household?
Marie�then�assisted�Malani�in�generating�a�logistic�regression�as�the�test�of�infer-
ence,�and�a�template�for�writing�the�research�question�for�this�design�is�presented�as�
follows:
•� Can [dependent variable] be predicted from [list independent
variables]?
It�may�be�helpful�to�preface�the�results�of�the�logistic�regression�with�information�on�an�
examination�of�the�extent�to�which�the�assumptions�were�met��The�assumptions�include�
(a)�independence,�(b)�linearity,�and�(c)�noncollinearity��We�will�also�examine�the�data�for�
outliers�and�influential�points�
Logistic regression was conducted to determine whether social devel-
opment and type of household (single- vs. two-parent home) could
predict kindergarten readiness.
The assumptions of logistic regression were tested. Specifically,
these include (a) noncollinearity, (b) linearity, and (c) indepen-
dence of errors.
In terms of noncollinearity, a VIF value of 4.037 (below the value
of 10.0 which indicates the point of concern) and tolerance of
.248 (above the value of .10 which suggests multicollinearity) pro-
vided evidence of noncollinearity. However, there was some indica-
tion that multicollinearity existed. In examining the collinearity
diagnostics, a condition index value of 14.259 was observed, about
five times larger than the next largest condition index. Review
of the variance proportions suggested that 100% of the variance of
the regression coefficient for social development and 73% for type
of household were related to the smallest eigenvalue. This also
suggests multicollinearity.
Linearity was assessed by reestimating the model and including, along
with the original predictors, an interaction term which was the prod-
uct of the continuous independent variable (i.e., social development)
and its natural logarithm. The interaction term was not statistically
significant, thus providing evidence of linearity [social*ln(social),
B = −2.948, SE = 2.845, Wald = 1.074, df = 1, p = .300].

750 An Introduction to Statistical Concepts
Independence was assessed by examining a plot of the standardized
residuals against values of each independent variable. With the
exception of one case which was slightly outside the band, all cases
were within an absolute value of 2.0, thus indicating the assumption
of independence has been met.
In reviewing for outliers and influential points, Cook’s distance
values were generally within the recommended range of less than 1.0,
although the maximum value was 1.587. Leverage values ranged from
.007 to .307, well under the recommended .50, suggesting outliers were
not problematic. DfBeta values beyond 1 also suggested cases that may
be exerting influence on the model. Based on the evidence reviewed,
there are some cases that are suggestive of outlying and influen-
tial points. Due to the small sample size, however, these cases were
retained. Readers are urged to interpret the results with caution
given the possible influence of outliers.
Here�is�an�APA-style�example�paragraph�of�results�for�the�logistic�regression�(remember�
that� this� will� be� prefaced� by� the� previous� paragraph� reporting� the� extent� to� which� the�
assumptions�of�the�test�were�met)�
Logistic regression analysis was then conducted to determine
whether kindergarten readiness (prepared vs. unprepared) could be
predicted from social development and type of household (single-
vs. two-parent home). Good model fit was evidenced by nonstatisti-
cally significant results on the Hosmer–Lemeshow test, χ2 (n = 20) =
4.691, df = 7, p = .698, and large effect size indices when interpreted
using Cohen (1988) (Cox and Snell R2 = .546; Nagelkerke R2 = .738).
These results suggest that the predictors, as a set, reliably dis-
tinguished between children who are ready for kindergarten (i.e.,
prepared) versus unprepared. Of the two predictors in the model,
only social development was a statistically significant predic-
tor of kindergarten readiness (Wald = 4.696, df = 1, p = .030). The
odds ratio for social development suggests that for every one-point
increase in social development, the odds are about two and two-
thirds greater for being prepared for kindergarten as compared to
unprepared. Type of household was not statistically significant,
which suggests that the odds for being prepared for kindergarten
(relative to unprepared) are similar regardless of being raised in
a single-parent versus a two-parent household. The following table
presents the results for the model including the regression coef-
ficients, Wald statistics, odds ratios, and 95% CIs for the odds
ratios. This is followed by a table which presents the group means
and standard deviations of each predictor for both children who are
prepared and unprepared for kindergarten.

751Logistic Regression
Logistic Regression Results
95% CI for
Exp(B)
B SE Wald p Exp(B) Lower Upper
Intercept
(constant)
−15.404 7.195 4.584 .032 NA
Social development .967 .446 4.696 .030 2.631 1.097 6.313
Type of household
(two-parent home)
−6.216 3.440 3.265 .071 .002 .000 1.693
Group Means (and Standard Deviations) of Predictors
Predictor
Prepared for
Kindergarten
Unprepared for
Kindergarten
Social development 23.58 (4.74) 15.13 (5.14)
Type of household (two-parent home) .67 (.49) .25 (.46)
Overall, the logistic regression model accurately predicted 90% of the
children in our sample, with children who are prepared for kindergar-
ten slightly more likely to be classified correctly (91.7% of children
prepared for kindergarten and 87.5% of children unprepared correctly
classified). To account for chance agreement in classification, the
kappa coefficient was computed and found to be .792, a large value.
Additionally, Press’s Q was calculated to be 12.8, providing evidence
that the predictions based on the logistic regression model are sta-
tistically significantly better than chance.
19.11 What Is Next?
As�we�conclude�this�text,�the�natural�question�to�ask�is,�what�do�we�consider�next�in�sta-
tistics?� There� are� two� likely� key� alternatives�� First,� you� could� consider� more� advanced�
regression� models� such� as� multinomial� logistic� regression,� propensity� score� analysis,� or�
regression�discontinuity��In�terms�of�more�advanced�regression�readings,�consider�Cohen�
and� Cohen� (1983),� Grimm� and� Arnold� (1995),� Kleinbaum,� Kupper,� Muller,� and� Nizam,�
(1998),�Meyers,�Gamst,�and�Guarino�(2006),�and�Pedhazur�(1997)��For�more�information�on�
logistic� regression,� consider� Christensen� (1997),� Glass� and� Hopkins� (1996),� Hosmer� and�
Lemeshow�(2000),�Huck�(2004),�Kleinbaum�et�al��(1998),�Meyers�et�al��(2006),�Pampel�(2000),�
Pedhazur�(1997),�and�Wright�(1995)�
In� the� regression� framework,� one� of� the� hottest� topics� relates� to� multilevel� models� that�
allow�for�the�examination�of�nested�cases�(e�g�,�children�within�classrooms,�employees�within�
organizations,�residents�within�states)��There�are�a�number�of�excellent�resources�for�learn-
ing�more�about�multilevel�modeling�including�Heck�and�Thomas�(2000),�Kreft�and�de�Leeuw�
(1998),�O’Connell�and�McCoach�(2008),�Reise�and�Dunn�(2003),�and�Snijders�and�Bosker�(1999)�

752 An Introduction to Statistical Concepts
Alternatively� you� could� consider� multivariate� analysis� methods,� either� in� terms� of�
readings� or� in� a� multivariate� course�� Briefly,� the� major� methods� of� multivariate� analysis�
include� multivariate� analysis� of� variance� (MANOVA),� discriminant� analysis,� factor� and�
principal� components� analysis,� canonical� correlation� analysis,� cluster� analysis,� multidi-
mensional� scaling,� multivariate� regression,� and� structural� equation� modeling�� For� mul-
tivariate� readings,� take� a� look� at� Grimm� and� Arnold� (1995,� 2000),� Johnson� and� Wichern�
(1998),�Kleinbaum�et�al��(1998),�Manly�(2004),�Marcoulides�and�Hershberger�(1997),�Meyers�
et�al��(2006),�Stevens�(2002),�and�Timm�(2002)�
19.12 Summary
In�this�chapter,�a�regression�method�appropriate�for�binary�categorical�outcomes�was�consid-
ered��The�chapter�began�with�an�examination�of�how�logistic�regression�works�and�the�logis-
tic�regression�equation��This�was�followed�by�estimation,�model�fit,�significance�tests,�and�
assumptions�within�the�context�of�logistic�regression��Effect�size�indices�of�logistic�regression�
models�were�also�discussed��In�addition,�several�new�concepts�were�introduced,�including�
logit,�odds,�and�odds�ratio��Finally�we�examined�a�number�of�methods�of�variable�entry,�such�
as�simultaneous,�stepwise�selection,�and�hierarchical�regression��At�this�point,�you�should�
have�met�the�following�objectives:�(a)�be�able�to�understand�the�concepts�underlying�logistic�
regression,�(b)�be�able�to�determine�and�interpret�the�results�of�logistic�regression,�(c)�be�able�
to�understand�and�evaluate�the�assumptions�of�logistic�regression,�and�(d)�be�able�to�have�
a�basic�understanding�of�methods�of�entering�the�covariates��This�concludes�our�statistical�
concepts�text��We�wish�you�the�best�of�luck�in�your�future�statistical�adventures�
Problems
Conceptual problems
19.1� Which�one�of�the�following�represents�the�primary�difference�between�OLS�regres-
sion�and�logistic�regression?
� a�� Computer�processing�time�to�estimate�the�model
� b�� The�measurement�scales�of�the�independent�variables�that�can�be�included�in�the�
model
� c�� The�measurement�scale�of�the�dependent�variable
� d�� The�statistical�software�that�must�be�used�to�estimate�the�model
19.2� Which� one� of� the� following� is� NOT� an� appropriate� dependent� variable� for� binary�
logistic�regression?
� a�� Bernoulli
� b�� Dichotomous
� c�� Multinomial
� d�� One�variable�with�two�categories

753Logistic Regression
19.3� Which�of�the�following�would�NOT�be�appropriate�outcomes�to�examine�with�binary�
logistic�regression?
� a�� Employment� status� (employed,� unemployed� not� looking� for� work,� unemployed�
looking�for�work)
� b�� Enlisted�member�of�the�military�(member�vs��nonmember)
� c�� Marital�status�(married�vs��not�married)
� d�� Recreational�athlete�(athlete�vs��nonathlete)
19.4� Which� of� the� following� represents� what� is� being� predicted� in� binary� logistic�
regression?
� a�� Mean�difference�between�two�groups
� b�� Odds�that�the�unit�of�analysis�belongs�to�one�of�two�groups
� c�� Precise�numerical�value
� d�� Relationship�between�one�group�compared�to�the�other�group
19.5� While� probability,� odds,� and� log� odds� may� be� computationally� different,� they� all�
relay�the�same�basic�information�
� a�� True
� b�� False
19.6� A�researcher�is�studying�diet�soda�drinking�habits�and�has�coded�“diet�soda�drinker”�
as�“1”�and�“non-diet�soda�drinker”�as�“0�”�Which�of�the�following�is�a�correct�inter-
pretation�given�a�probability�value�of��52?
� a�� The�odds�of�being�a�diet�soda�drinker�are�about�equal�to�those�of�not�being�a�diet�
soda�drinker�
� b�� The�odds�of�being�a�diet�soda�drinker�are�substantially�greater�than�not�being�a�
diet�soda�drinker�
� c�� The�odds�of�being�a�diet�soda�drinker�are�substantially�less�than�not�being�a�diet�
soda�drinker�
� d�� Cannot�be�determined�from�the�information�provided�
19.7� Which�of�the�following�is�a�correct�interpretation�of�the�logit?
� a�� The�log�odds�become�larger�as�the�odds�increase�from�1�to�100�
� b�� The�log�odds�become�smaller�as�the�odds�increase�from�1�to�100�
� c�� The�log�odds�stay�relatively�stable�as�the�odds�decrease�from�1�to�0�
� d�� The�change�in�log�odds�becomes�larger�when�the�independent�variables�are�cat-
egorical�rather�than�continuous�
19.8� Which�of�the�following�correctly�contrasts�the�estimation�of�OLS�regression�as�com-
pared�to�logistic�regression?
� a�� The� sum� of� the� squared� distance� of� the� observed� data� to� the� regression� line� is�
minimized� in� logistic� regression�� The� log� likelihood� function� is� maximized� in�
OLS�regression�
� b�� The� sum� of� the� squared� distance� of� the� observed� data� to� the� regression� line� is�
maximized� in� logistic� regression�� The� log� likelihood� function� is� minimized� in�
OLS�regression�

754 An Introduction to Statistical Concepts
� c�� The�sum�of�the�squared�distance�of�the�observed�data�to�the�regression�line�is�maxi-
mized� in� OLS� regression�� The� log� likelihood� function� is� minimized� in� logistic�
regression�
� d�� The� sum� of� the� squared� distance� of� the� observed� data� to� the� regression� line�
is� minimized� in� OLS� regression�� The� log� likelihood� function� is� maximized� in�
logistic�regression�
19.9� �Which�of�the�following�is�NOT�a�test�that�can�be�used�to�evaluate�overall�model�fit�
for�logistic�regression�models?
� a�� Change�in�log�likelihood
� b�� Hosmer–Lemeshow�goodness-of-fit
� c�� Cox�and�Snell�R�squared
� d�� Wald�test
19.10� �A�researcher�is�studying�diet�soda�drinking�habits�and�has�coded�“diet�soda�drinker”�
as�“1”�and�“non-diet�soda�drinker”�as�“0�”�She�has�predicted�drinking�habits�based�
on�the�individual’s�weight�(measured�in�pounds)��Given�this�scenario,�which�of�the�
following�is�a�correct�interpretation�of�an�odds�ratio�of�1�75?
� a�� For�every�one-unit�increase�in�being�a�diet�soda�drinker,�the�odds�of�putting�on�
an�additional�pound�increase�by�75%�
� b�� For�every�one-unit�increase�in�being�a�diet�soda�drinker,�the�odds�of�putting�on�
an�additional�pound�decrease�by�75%�
� c�� For� every� 1-pound� increase� in� weight,� the� odds� of� being� a� diet� soda� drinker�
decrease�by�75%�
� d�� For� every� 1-pound� increase� in� weight,� the� odds� of� being� a� diet� soda� drinker�
increase�by�75%�
Computational problems
19.1� �You�are�given�the�following�data,�where�X1�(high�school�cumulative�grade�point�aver-
age)�and�X2�(participation�in�school-sponsored�athletics;�0�=�nonathlete�and�1�=�athlete;�
use�0�as�the�reference�category)�are�used�to�predict�Y�(college�enrollment�immediately�
after�high�school,�“1,”�vs��delayed�college�enrollment�or�no�enrollment,�“0”)�
X1 X2 Y
4�15 1 1
2�72 0 1
3�16 0 0
3�89 1 1
4�02 1 1
1�89 0 0
2�10 0 1
2�36 1 1
3�55 0 0
1�70 0 0
Determine�the�following�values�based�on�simultaneous�entry�of�independent�vari-
ables:�intercept,�−2LL,�constant,�b1,�b2,�se(b1),�se(b2),�odds�ratios,�Wald1,�Wald2�

755Logistic Regression
19.2� You� are� given� the� following� data,� where� X1� (participation� in� high� school� honors�
classes;�yes�=�1,�no�=�0;�use�0�as�the�reference�category)�and�X2�(participation�in�co-op�
program�in�college;�yes�=�1,�no�=�0;�use�0�as�the�reference�category)�are�used�to�predict�
Y�(baccalaureate�graduation�with�honors�=�1�vs��graduation�without�honors�=�0)�
X1 X2 Y
0 1 1
0 0 1
1 0 0
1 1 1
1 1 1
0 0 0
1 0 1
0 1 1
1 0 0
0 0 0
Determine� the� following� values� based� on� simultaneous� entry� of� independent� vari-
ables:�intercept,�−2LL,�constant,�b1,�b2,�se(b1),�se(b2),�odds�ratios,�Wald1,�Wald2�
Interpretive problem
19.1� Use�SPSS�to�develop�a�logistic�regression�model�with�the�example�survey�1�dataset�
on�the�website��Utilize�“do�you�smoke”�as�the�dependent�(binary)�variable�to�find�at�
least�two�strong�predictors�from�among�the�continuous�and/or�categorical�variables�
in�the�dataset��Write�up�the�results�in�APA�style,�including�testing�for�the�assump-
tions��Determine�and�interpret�a�measure�of�effect�size�

757
Appendix:
Tables
Table a.1
The�Standard�Unit�Normal�Distribution
z P(z) z P(z) z P(z) z P(z)
�00 �5000000 �50 �6914625 1�00 �8413447 1�50 �9331928
�01 �5039894 �51 �6949743 1�01 �8437524 1�51 �9344783
�02 �5079783 �52 �6984682 1�02 �8461358 1�52 �9357445
�03 �5119665 �53 �7019440 1�03 �8484950 1�53 �9369916
�04 �5159534 �54 �7054015 1�04 �8508300 1�54 �9382198
�05 �5199388 �55 �7088403 1�05 �8531409 1�55 �9394292
�06 �5239222 �56 �7122603 1�06 �8554277 1�56 �9406201
�07 �5279032 �57 �7156612 1�07 �8576903 1�57 �9417924
�08 �5318814 �58 �7190427 1�08 �8599289 1�58 �9429466
�09 �5358564 �59 �7224047 1�09 �8621434 1�59 �9440826
�10 �5398278 �60 �7257469 1�10 �8643339 1�60 �9452007
�11 �5437953 �61 �7290691 1�11 �8665005 1�61 �9463011
�12 �5477584 �62 �7323711 1�12 �8686431 1�62 �9473839
�13 �5517168 �63 �7356527 1�13 �8707619 1�63 �9484493
�14 �5556700 �64 �7389137 1�14 �8728568 1�64 �9494974
�15 �5596177 �65 �7421539 1�15 �8749281 1�65 �9505285
�16 �5635595 �66 �7453731 1�16 �8769756 1�66 �9515428
�17 �5674949 �67 �7485711 1�17 �8789995 1�67 �9525403
�18 �5714237 �68 �7517478 1�18 �8809999 1�68 �9535213
�19 �5753454 �69 �7549029 1�19 �8829768 1�69 �9544860
�20 �5792597 �70 �7580363 1�20 �8849303 1�70 �9554345
�21 �5831662 �71 �7611479 1�21 �8868606 1�71 �9563671
�22 �5870644 �72 �7642375 1�22 �8887676 1�72 �9572838
�23 �5909541 �73 �7673049 1�23 �8906514 1�73 �9581849
�24 �5948349 �74 �7703500 1�24 �8925123 1�74 �9590705
�25 �5987063 �75 �7733726 1�25 �8943502 1�75 �9599408
�26 �6025681 �76 �7763727 1�26 �8961653 1�76 �9607961
�27 �6064199 �77 �7793501 1�27 �8979577 1�77 �9616364
�28 �6102612 �78 �7823046 1�28 �8997274 1�78 �9624620
�29 �6140919 �79 �7852361 1�29 �9014747 1�79 �9632730
�30 �6179114 �80 �7881446 1�30 �9031995 1�80 �9640697
�31 �6217195 �81 �7910299 1�31 �9049021 1�81 �9648521
�32 �6255158 �82 �7938919 1�32 �9065825 1�82 �9656205
�33 �6293000 �83 �7967306 l�33 �9082409 1�83 �9663750
�34 �6330717 �84 �7995458 1�34 �9098773 1�84 �9671159
�35 �6368307 �85 �8023375 1�35 �9114920 1�85 �9678432
�36 �6405764 �86 �8051055 1�36 �9130850 1�86 �9685572
(continued)

758 Appendix: Tables
Table a.1 (continued)
The�Standard�Unit�Normal�Distribution
z P(z) z P(z) z P(z) z P(z)
�37 �6443088 �87 �8078498 1�37 �9146565 1�87 �9692581
�38 �6480273 �88 �8105703 1�38 �9162067 1�88 �9699460
�39 �6517317 �89 �8132671 1�39 �9177356 1�89 �9706210
�40 �6554217 �90 �8159399 1�40 �9192433 1�90 �9712834
�41 �6590970 �91 �8185887 1�41 �9207302 1�91 �9719334
�42 �6627573 �92 �8212136 1�42 �9221962 1�92 �9725711
�43 �6664022 �93 �8238145 1�43 �9236415 1�93 �9731966
�44 �6700314 �94 �8263912 1�44 �9250663 1�94 �9738102
�45 �6736448 �95 �8289439 1�45 �9264707 1�95 �9744119
�46 �6772419 �96 �8314724 1�46 �9278550 1�96 �9750021
�47 �6808225 �97 �8339768 1�47 �9292191 1�97 �9755808
�48 �6843863 �98 �8364569 1�48 �9305634 1�98 �9761482
�49 �6879331 �99 �8389129 1�49 �9318879 1�99 �9767045
�50 �6914625 1�00 �8413447 1�50 �9331928 2�00 �9772499
2�00 �9772499 2�50 �9937903 3�00 �9986501 3�50 �9997674
2�01 �9777844 2�51 �9939634 3�01 �9986938 3�51 �9997759
2�02 �9783083 2�52 �9941323 3�02 �9987361 3�52 �9997842
2�03 �9788217 2�53 �9942969 3�03 �9987772 3�53 �9997922
2�04 �9793248 2�54 �9944574 3�04 �9988171 3�54 �9997999
2�05 �9798178 2�55 �9946139 3�05 �9988558 3�55 �9998074
2�06 �9803007 2�56 �9947664 3�06 �9988933 3�56 �9998146
2�07 �9807738 2�57 �9949151 3�07 �9989297 3�57 �9998215
2�08 �9812372 2�58 �9950600 3�08 �9989650 3�58 �9998282
2�09 �9816911 2�59 �9952012 3�09 �9989992 3�59 �9998347
2�10 �9821356 2�60 �9953388 3�10 �9990324 3�60 �9998409
2�11 �9825708 2�61 �9954729 3�11 �9990646 3�61 �9998469
2�12 �9829970 2�62 �9956035 3�12 �9990957 3�62 �9998527
2�13 �9834142 2�63 �9957308 3�13 �9991260 3�63 �9998583
2�14 �9838226 2�64 �9958547 3�14 �9991553 3�64 �9998637
2�15 �9842224 2�65 �9959754 3�15 �9991836 3�65 �9998689
2�16 �9846137 2�66 �9960930 3�16 �9992112 3�66 �9998739
2�17 �9849966 2�67 �9962074 3�17 �9992378 3�67 �9998787
2�18 �9853713 2�68 �9963189 3�18 �9992636 3�68 �9998834
2�19 �9857379 2�69 �9964274 3�19 �9992886 3�69 �9998879
2�20 �9860966 2�70 �9965330 3�20 �9993129 3�70 �9998922
2�21 �9864474 2�71 �9966358 3�21 �9993363 3�71 �9998964
2�22 �9867906 2�72 �9967359 3�22 �9993590 3�72 �9999004
2�23 �9871263 2�73 �9968333 3�23 �9993810 3�73 �9999043
2�24 �9874545 2�74 �9969280 3�24 �9994024 3�74 �9999080
2�25 �9877755 2�75 �9970202 3�25 �9994230 3�75 �9999116
2�26 �9880894 2�76 �9971099 3�26 �9994429 3�76 �9999150
2�27 �9883962 2�77 �9971972 3�27 �9994623 3�77 �9999184
2�28 �9886962 2�78 �9972821 3�28 �9994810 3�78 �9999216
2�29 �9889893 2�79 �9973646 3�29 �9994991 3�79 �9999247
2�30 �9892759 2�80 �9974449 3�30 �9995166 3�80 �9999277

759Appendix: Tables
Table a.1 (continued)
The�Standard�Unit�Normal�Distribution
z P(z) z P(z) z P(z) z P(z)
2�31 �9895559 2�81 �9975229 3�31 �9995335 3�81 �9999305
2�32 �9898296 2�82 �9975988 3�32 �9995499 3�82 �9999333
2�33 �9900969 2�83 �9976726 3�33 �9995658 3�83 �9999359
2�34 �9903581 2�84 �9977443 3�34 �9995811 3�84 �9999385
2�35 �9906133 2�85 �9978140 3�35 �9995959 3�85 �9999409
2�36 �9908625 2�86 �9978818 3�36 �9996103 3�86 �9999433
2�37 �9911060 2�87 �9979476 3�37 �9996242 3�87 �9999456
2�38 �9913437 2�88 �9980116 3�88 �9996376 3�88 �9999478
2�39 �9915758 2�89 �9980738 3�39 �9996505 3�89 �9999499
2�40 �9918025 2�90 �9981342 3�40 �9996631 3�90 �9999519
2�41 �9920237 2�91 �9981929 3�41 �9996752 3�91 �9999539
2�42 �9922397 2�92 �9982498 3�42 �9996869 3�92 �9999557
2�43 �9924506 2�93 �9983052 3�43 �9996982 3�93 �9999575
2�44 �9926564 2�94 �9983589 3�44 �9997091 3�94 �9999593
2�45 �9928572 2�95 �9984111 3�45 �9997197 3�95 �9999609
2�46 �9930531 2�96 �9984618 3�46 �9997299 3�96 �9999625
2�47 �9932443 2�97 �9985110 3�47 �9997398 3�97 �9999641
2�48 �9934309 2�98 �9985588 3�48 �9997493 3�98 �9999655
2�49 �9936128 2�99 �9986051 3�49 �9997585 3�99 �9999670
2�50 �9937903 3�00 �9986501 3�50 �9997674 4�00 �9999683
Source:� Reprinted� from� Pearson,� E�S�� and� Hartley,� H�O�,� Biometrika Tables for
Statisticians,� Cambridge� University� Press,� Cambridge,� U�K�,� 1966,�
Table�1��With�permission�of�Biometrika�Trustees�
P(z)�represents�the�area�below�that�value�of�z.

760 Appendix: Tables
Table a.2
Percentage�Points�of�the�t�Distribution
v
α1 = .10 .05 .025 .01 .005 .0025 .001 .0005
α2 = .20 .10 .050 .02 .010 .0050 .002 .0010
1 3�078 6�314 12�706 31�821 63�657 127�32 318�31 636�62
2 1�886 2�920 4�303 6�965 9�925 14�089 22�327 31�598
3 1�638 2�353 3�182 4�541 5�841 7�453 10�214 12�924
4 1�533 2�132 2�776 3�747 4�604 5�598 7�173 8�610
5 1�476 2�015 2�571 3�365 4�032 4�773 5�893 6�869
6 1�440 1�943 2�447 3�143 3�707 4�317 5�208 5�959
7 1�415 1�895 2�305 2�998 3�499 4�029 4�785 5�408
8 1�397 1�860 2�306 2�896 3�355 3�833 4�501 5�041
9 1�383 1�833 2�262 2�821 3�250 3�690 4�297 4�781
10 1�372 1�812 2�228 2�764 3�169 3�581 4�144 4�587
11 1�363 1�796 2�201 2�718 3�106 3�497 4�025 4�437
12 1�356 1�782 2�179 2�681 3�055 3�428 3�930 4�318
13 1�350 1�771 2�160 2�650 3�012 3�372 3�852 4�221
14 1�345 1�761 2�145 2�624 2�977 3�326 3�787 4�140
15 1�341 1�753 2�131 2�602 2�947 3�286 3�733 4�073
16 1�337 1�746 2�120 2�583 2�921 3�252 3�686 4�015
17 1�333 1�740 2�110 2�567 2�898 3�222 3�646 3�965
18 1�330 1�734 2�101 2�552 2�878 3�197 3�610 3�922
19 1�328 1�729 2�093 2�539 2�861 3�174 3�579 3�883
20 1�325 1�725 2�086 2�528 2�845 3�153 3�552 3�850
21 1�323 1�721 2�080 2�518 2�831 3�135 3�527 3�819
22 1�321 1�717 2�074 2�508 2�819 3�119 3�505 3�792
23 1�319 1�714 2�069 2�500 2�807 3�104 3�485 3�767
24 1�318 1�711 2�064 2�492 2�797 3�091 3�467 3�745
25 1�316 1�708 2�060 2�485 2�787 3�078 3�450 3�725
26 1�315 1�706 2�056 2�479 2�779 3�067 3�435 3�707
27 1�314 1�703 2�052 2�473 2�771 3�057 3�421 3�690
28 1�313 1�701 2�048 2�467 2�763 3�047 3�408 3�674
29 1�311 1�699 2�045 2�462 2�756 3�038 3�396 3�659
30 1�310 1�697 2�042 2�457 2�750 3�030 3�385 3�646
40 1�303 1�684 2�021 2�423 2�704 2�971 3�307 3�551
60 1�296 1�671 2�000 2�390 2�660 2�915 3�232 3�460
120 1�289 1�658 1�980 2�358 2�617 2�860 3�160 3�373
∞ 1�282 1�645 1�960 2�326 2�576 2�807 3�090 3�291
Source:� Reprinted� from� Pearson,� E�S�� and� Hartley,� H�O�,� Biometrika Tables for
Statisticians,� Cambridge� University� Press,� Cambridge,� U�K�,� 1966,� Table� 12��
With�permission�of�Biometrika�Trustees�
αt�is�the�upper-tail�value�of�the�distribution�with�v�degrees�of�freedom;�appropriate�for�
use�in�a�one-tailed�test�
Use�α2�for�a�two-tailed�test�

761Appendix: Tables
Table a.3
Percentage�Points�of�the�χ2�Distribution
α
v 0.990 0.975 0.950 0.900 0.100 0.050 0.025 0.010
1 157088 ��10−9 982069��10−8 393214��10−8 0�0157908 2�70554 3�84146 5�02389 6�63490
2 0�0201007 0�0506356 0�102587 0�210721 4�60517 5�99146 7�37776 9�21034
3 0�114832 0�215795 0�351846 0�584374 6�25139 7�81473 9�34840 11�3449
4 0�297109 0�484419 0�710723 1�063623 7�77944 9�48773 11�1433 13�2767
5 0�554298 0�831212 1�145476 1�61031 9�23636 11�0705 12�8325 15�0863
6 0�872090 1�23734 1�63538 2�20413 10�6446 12�5916 14�4494 16�8119
7 1�239043 1�68987 2�16735 2�83311 12�0170 14�0671 16�0128 18�4753
8 1�64650 2�17973 2�73264 3�48954 13�3616 15�5073 17�5345 20�0902
9 2�08790 2�70039 3�32511 4�16816 14�6837 16�9190 19�0228 21�6660
10 2�55821 3�24697 3�94030 4�86518 15�9872 18�3070 20�4832 23�2093
11 3�05348 3�81575 4�57481 5�57778 17�2750 19�6751 21�9200 24�7250
12 3�57057 4�40379 5�22603 6�30380 18�5493 21�0261 23�3367 26�2170
13 4�10692 5�00875 5�89186 7�04150 19�8119 22�3620 24�7356 27�6882
14 4�66043 5�62873 6�57063 7�78953 21�0641 23�6848 26�1189 29�1412
15 5�22935 6�26214 7�26094 8�54676 22�3071 24�9958 27�4884 30�5779
16 5�81221 6�90766 7�96165 9�31224 23�5418 26�2962 28�8454 31�9999
17 6�40776 7�56419 8�67176 10�0852 24�7690 27�5871 30�1910 33�4087
18 7�01491 8�23075 9�39046 10�8649 25�9894 28�8693 31�5264 34�8053
19 7�63273 8�90652 10�1170 11�6509 27�2036 30�1435 32�8523 36�1909
20 8�26040 9�59078 10�8508 12�4426 28�4120 31�4104 34�1696 37�5662
21 8�89720 10�28293 11�5913 13�2396 29�6151 32�6706 35�4789 38�9322
22 9�54249 10�9823 12�3380 14�0415 30�8133 33�9244 36�7807 40�2894
23 10�19567 11�6886 13�0905 14�8480 32�0069 35�1725 38�0756 41�6384
24 10�8564 12�4012 13�8484 15�6587 33�1962 36�4150 39�3641 42�9798
25 11�5240 13�1197 14�6114 16�4734 34�3816 37�6525 40�6465 44�3141
26 12�1981 13�8439 15�3792 17�2919 35�5632 38�8851 41�9232 45�6417
27 12�8785 14�5734 16�1514 18�1139 36�7412 40�1133 43�1945 46�9629
28 13�5647 15�3079 16�9279 18�9392 37�9159 41�3371 44�4608 48�2782
29 14�2565 16�0471 17�7084 19�7677 39�0875 42�5570 45�7223 49�5879
30 14�9535 16�7908 18�4927 20�5992 40�2560 43�7730 46�9792 50�8922
40 22�1643 24�4330 26�5093 29�0505 51�8051 55�7585 59�3417 63�6907
50 29�7067 32�3574 34�7643 37�6886 63�1671 67�5048 71�4202 76�1539
60 37�4849 40�4817 43�1880 46�4589 74�3970 79�0819 83�2977 88�3794
70 45�4417 48�7576 51�7393 55�3289 85�5270 90�5312 95�0232 100�425
80 53�5401 57�1532 60�3915 64�2778 96�5782 101�879 106�629 112�329
90 61�7541 66�6466 69�1260 73�2911 107�565 113�145 118�136 124�116
100 70�0649 74�2219 77�9295 82�3581 118�498 124�342 129�561 135�807
Source:� Reprinted�from�Pearson,�E�S��and�Hartley,�H�O�,�Biometrika Tables for Statisticians,�Cambridge�University�
Press,�Cambridge,�U�K�,�1966,�Table�8��With�permission�of�Biometrika�Trustees�

762 Appendix: Tables
Ta
b
le
a
.4
P
er
ce
n
ta
g
e�
P
o i
n
ts
�o
f�
th
e�
F
�D
i s
tr
ib
u
ti
o
n
v
2
v
1
1
2
3
4
5
6
7
8
9
10
12
15
20
24
30
40
60
12
0
∞
α
�=
.1
0
1
39
�8
6
49
�5
0
53
�5
9
55
�8
3
57
�2
4
58
�2
0
58
�9
1
59
�4
4
59
�8
6
60
�1
9
60
�7
1
61
�2
2
61
�7
4
62
�0
0
62
�2
6
62
�5
3
62
�7
9
63
�0
6
63
�3
3
2
8 �
53
9�
00
9�
16
9�
24
9�
29
9�
33
9�
35
9�
37
9�
38
9�
39
9�
41
9�
42
9�
44
9�
45
9�
46
9�
47
9�
47
9�
48
9�
49
3
5 �
54
5�
46
5�
39
5�
34
5�
31
5�
28
5�
27
5�
25
5�
24
5�
23
5�
22
5�
20
5�
18
5�
18
5�
17
5�
16
5�
15
5�
14
5�
13
4
4 �
54
4�
32
4�
19
4�
11
4�
05
4�
01
3�
98
3�
95
3�
94
3�
92
3�
90
3�
87
3�
84
3�
83
3�
82
3�
80
3�
79
3�
78
3�
76
5
4 �
06
3�
78
3�
62
3�
52
3�
45
3�
40
3�
37
3�
34
3�
32
3�
30
3�
27
3�
24
3�
21
3�
19
3�
17
3�
16
3�
14
3�
12
3�
10
6
3 �
78
3�
46
3�
29
3�
18
3�
11
3�
05
3�
01
2�
98
2�
96
2�
94
2�
90
2�
87
2�
84
2�
82
2�
80
2�
78
2�
76
2�
74
2�
72
7
3 �
59
3�
26
3�
07
2�
96
2�
88
2�
83
2�
78
2�
75
2�
72
2�
70
2�
67
2�
63
2�
59
2�
58
2�
56
2�
54
2�
51
2�
49
2�
47
8
3 �
46
3�
11
2�
92
2�
81
2�
73
2�
67
2�
62
2�
59
2�
56
2�
54
2�
50
2�
46
2�
42
2�
40
2�
38
2�
36
2�
34
2�
32
2�
29
9
3 �
36
3�
01
2�
81
2�
69
2�
61
2�
55
2�
51
2�
47
2�
44
2�
42
2�
38
2�
34
2�
30
2�
28
2�
25
2�
23
2�
21
2�
18
2�
16
10
3 �
29
2�
92
2�
73
2�
61
2�
52
2�
46
2�
41
2�
38
2�
35
2�
32
2�
28
2�
24
2�
20
2�
18
2�
16
2�
13
2�
11
2�
08
2�
06
11
3�
23
2�
86
2�
66
2�
54
2�
45
2�
39
2�
34
2�
30
2�
27
2�
25
2�
21
2�
17
2�
12
2�
10
2�
08
2�
05
2�
03
2�
00
1�
97
12
3 �
18
2�
81
2�
61
2�
48
2�
39
2�
33
2�
28
2�
24
2�
21
2�
19
2�
15
2�
10
2�
06
2�
04
2�
01
1�
99
1�
96
1�
93
1�
90
13
3 �
14
2�
76
2�
56
2�
43
2�
35
2�
28
2�
23
2�
20
2�
16
2�
14
2�
10
2�
05
2�
01
1�
98
1�
96
1�
93
1�
90
1�
88
1�
85
14
3 �
10
2�
73
2�
52
2�
39
2�
31
2�
24
2�
19
2�
15
2�
12
2�
10
2�
05
2�
01
1�
96
1�
94
1�
91
1�
89
1�
86
1�
83
1�
80
15
3 �
07
2�
70
2�
49
2�
36
2�
27
2�
21
2�
16
2�
12
2�
09
2�
06
2�
02
1�
97
1�
92
1�
90
1�
87
1�
85
1�
82
1�
79
1�
76
16
3 �
05
2�
67
2�
46
2�
33
2�
24
2�
18
2�
13
2�
09
2�
06
2�
03
1�
99
1�
94
1�
89
1�
87
1�
84
1�
81
1�
78
1�
75
1�
72
17
3 �
03
2�
64
2�
44
2�
31
2�
22
2�
15
2�
10
2�
06
2�
03
2�
00
1�
96
1�
91
1�
86
1�
84
1�
81
1�
78
1�
75
1�
72
1�
69
18
3 �
01
2�
62
2�
42
2�
29
2�
20
2�
13
2�
08
2�
04
2�
00
1�
98
1�
93
1�
89
1�
84
1�
81
1�
78
1�
75
1�
72
1�
69
1�
66
19
2 �
99
2�
61
2�
40
2�
27
2�
18
2�
11
2�
06
2�
02
1�
98
1�
96
1�
91
1�
86
1�
81
1�
79
1�
76
1�
73
1�
70
1�
67
1�
63
20
2�
97
2�
59
2�
38
2�
25
2�
16
2�
09
2�
04
2�
00
1�
96
1�
94
1�
89
1�
84
1�
79
1�
77
1�
74
1�
71
1�
68
1�
64
1�
61
21
2 �
96
2�
57
2�
36
2�
23
2�
14
2�
08
2�
02
1�
98
1�
95
1�
92
1�
87
1�
83
1�
78
1�
75
1�
72
1�
69
1�
66
1�
62
1�
59
22
2 �
95
2�
56
2�
35
2�
22
2�
13
2�
06
2�
01
1�
97
1�
93
1�
90
1�
86
1�
81
1�
76
1�
73
1�
70
1�
67
1�
64
1�
60
1�
57
23
2 �
94
2�
55
2�
34
2�
21
2�
11
2�
05
1�
99
1�
95
1�
92
1�
89
1�
84
1�
80
1�
74
1�
72
1�
69
1�
66
1�
62
1�
59
1�
55
24
2 �
93
2�
54
2�
33
2�
19
2�
10
2�
04
1�
98
1�
94
1�
91
1�
88
1�
83
1�
78
1�
73
1�
70
1�
67
1�
64
1�
61
1�
57
1�
53
25
2 �
92
2�
53
2�
32
2�
18
2�
09
2�
02
1�
97
1�
93
1�
89
1�
87
1�
82
1�
77
1�
72
1�
69
1�
66
1�
63
1�
59
1�
56
1�
52
26
2 �
91
2�
52
2�
31
2�
17
2�
08
2�
01
1�
96
1�
92
1�
88
1�
86
1�
81
1�
76
1�
71
1�
68
1�
65
1�
61
1�
58
1�
54
1�
50
27
2 �
90
2�
51
2�
30
2�
17
2�
07
2�
00
1�
95
1�
91
1�
87
1�
85
1�
80
1�
75
1�
70
1�
67
1�
64
1�
60
1�
57
1�
53
1�
49

763Appendix: Tables
28
2�
89
2�
50
2�
29
2�
16
2�
06
2�
00
1�
94
1�
90
1�
87
1�
84
1�
79
1�
74
1�
69
1�
66
1�
63
1�
59
1�
56
1�
52
1�
48
29
2�
89
2�
50
2�
28
2�
15
2�
06
1�
99
1�
93
1�
89
1�
86
1�
83
1�
78
1�
73
1�
68
1�
65
1�
62
1�
58
1�
55
1�
51
1�
47
30
2�
88
2�
49
2�
28
2�
14
2�
05
1�
98
1�
93
1�
88
1�
85
1�
82
1�
77
1�
72
1�
67
1�
64
1�
61
1�
57
1�
54
1�
50
1�
46
40
2�
84
2�
44
2�
23
2�
09
2�
00
1�
93
1�
87
1�
83
1�
79
1�
76
1�
71
1�
66
1�
61
1�
57
1�
54
1�
51
1�
47
1�
42
1�
38
60
2�
79
2�
39
2�
18
2�
04
1�
95
1�
87
1�
82
1�
77
1�
74
1�
71
1�
66
1�
60
1�
54
1�
51
1�
48
1�
44
1�
40
1�
35
1�
29
12
0
2�
75
2�
35
2�
13
1�
99
1�
90
1�
82
1�
77
1�
72
1�
68
1�
65
1�
60
1�
55
1�
48
1�
45
1�
41
1�
37
1�
32
1�
26
1�
19
∞
2�
71
2�
30
2�
08
1�
94
1�
85
1�
77
1�
72
1�
67
1�
63
1�
60
1�
55
1�
49
1�
42
1�
38
1�
34
1�
30
1�
24
1�
17
1�
00
α
�=
.0
5
1
16
1�
4
19
9�
5
21
5�
7
22
4�
6
23
0�
2
23
4�
0
23
6�
8
23
8�
9
24
0�
5
24
1�
9
24
3�
9
24
5�
9
24
8�
0
24
9�
1
25
0�
1
25
1�
1
25
2�
2
25
3�
3
25
4�
3
2
18
�5
1
19
�0
0
19
�1
6
19
�2
5
19
�3
0
19
�3
3
19
�3
5
19
�3
7
19
�3
8
19
�1
0
19
�4
1
19
�4
3
19
�4
5
19
�4
5
19
�4
6
19
�4
7
19
�4
8
19
�4
9
19
�5
0
3
10
�1
3
9�
55
9�
28
9�
12
9�
01
8�
94
8�
89
8�
85
8�
81
8�
79
8�
74
8�
70
8�
66
8�
04
8�
62
8�
59
8�
57
8�
55
8�
53
4
7�
71
6�
94
6�
59
6�
39
6�
26
6�
16
6�
09
6�
04
6�
00
5�
96
5�
91
5�
86
5�
80
5�
77
5�
75
5�
72
5�
69
5�
66
5�
63
5
6�
61
5�
79
5�
41
5�
19
5�
05
4�
95
4�
88
4�
82
4�
77
4�
74
4�
68
4�
62
4�
50
4�
53
4�
50
4�
46
4�
43
4�
40
4�
36
6
5�
99
5�
14
4�
76
4�
53
4�
39
4�
28
4�
21
4�
15
4�
10
4�
06
4�
00
3�
94
3�
87
3�
84
3�
81
3�
77
3�
74
3�
70
3�
67
7
5�
59
4�
74
4�
35
4�
12
3�
97
3�
87
3�
79
3�
73
3�
68
3�
64
3�
57
3�
51
3�
44
3�
41
3�
38
3�
34
3�
30
3�
27
3�
23
8
5�
32
4�
46
4�
07
3�
84
3�
69
3�
58
3�
50
3�
44
3�
39
3�
35
3�
28
3�
22
3�
15
3�
12
3�
08
3�
04
3�
01
2�
97
2�
93
9
5�
12
4�
26
3�
86
3�
63
3�
48
3�
37
3�
29
3�
23
3�
18
3�
14
3�
07
3�
01
2�
94
2�
90
2�
80
2�
83
2�
79
2�
75
2�
71
10
4�
96
4�
10
3�
71
3�
48
3�
33
3�
22
3�
14
3�
07
3�
02
2�
98
2�
91
2�
85
2�
77
2�
74
2�
70
2�
66
2�
62
2�
58
2�
54
1 1
4�
84
3�
98
3�
59
3�
36
3�
20
3�
09
3�
01
2�
95
2�
90
2�
85
2�
79
2�
72
2�
65
2�
61
2�
57
2�
53
2�
49
2�
45
2�
40
12
4�
75
3�
89
3�
49
3�
26
3�
11
3�
00
2�
91
2�
85
2�
80
2�
75
2�
69
2�
62
2�
54
2�
51
2�
47
2�
43
2�
38
2�
34
2�
30
13
4�
67
3�
81
3�
41
3�
18
3�
03
2�
92
2�
83
2�
77
2�
71
2�
67
2�
60
2�
53
2�
46
2�
42
2�
38
2�
34
2�
30
2�
25
2�
21
14
4�
60
3�
74
3�
34
3�
1 1
2�
96
2�
85
2�
76
2�
70
2�
65
2�
60
2�
53
2�
46
2�
39
2�
35
2�
31
2�
27
2�
22
2�
18
2�
13
15
4�
54
3�
68
3�
29
3�
06
2�
90
2�
79
2�
71
2�
64
2�
59
2�
54
2�
48
2�
40
2�
33
2�
29
2�
25
2�
20
2�
16
2�
1 1
2�
07
16
4�
49
3�
63
3�
24
3�
01
2�
85
2�
74
2�
66
2�
59
2�
54
2�
49
2�
42
2�
35
2�
28
2�
24
2�
19
2�
15
2�
1 1
2�
06
2�
01
17
4�
45
3�
59
3�
20
2�
96
2�
81
2�
70
2�
61
2�
55
2�
49
2�
45
2�
38
2�
31
2�
23
2�
19
2�
15
2�
10
2�
06
2�
01
1�
96
18
4�
41
3�
55
3�
16
2�
93
2�
77
2�
66
2�
58
2�
51
2�
46
2�
41
2�
34
2�
27
2�
19
2�
15
2�
1 1
2�
06
2�
02
1�
97
1�
92
19
4�
38
3�
52
3�
13
2�
90
2�
74
2�
63
2�
54
2�
48
2�
42
2�
38
2�
31
2�
23
2�
16
2�
1 1
2�
07
2�
03
1�
98
1�
93
1�
88
20
4�
35
3�
49
3�
10
2�
87
2�
71
2�
60
2�
51
2�
45
2�
39
2�
35
2�
28
2�
20
2�
12
2�
08
2�
04
1�
99
1�
95
1�
90
1�
84
21
4�
32
3�
47
3�
07
2�
84
2�
68
2�
57
2�
49
2�
42
2�
37
2�
32
2�
25
2�
18
2�
10
2�
05
2�
01
1�
96
1�
92
1�
87
1�
81
22
4�
30
3�
44
3�
05
2�
82
2�
66
2�
55
2�
46
2�
40
2�
34
2�
30
2�
23
2�
15
2�
07
2�
03
1�
98
1�
94
1�
89
1�
84
1�
78
23
4�
28
3�
42
3�
03
2�
80
2�
64
2�
53
2�
44
2�
37
2�
32
2�
27
2�
20
2�
13
2�
05
2�
01
1�
96
1�
91
1�
86
1�
81
1�
76
24
4�
26
3�
40
3�
01
2�
78
2�
62
2�
51
2�
42
2�
36
2�
30
2�
25
2�
18
2�
1 1
2�
03
1�
98
1�
94
1�
89
1�
84
1�
79
1�
73
( c
on
ti
n
u
ed
)

764 Appendix: Tables
Ta
b
le
a
.4
(
co
n
ti
n
u
ed
)
P
er
ce
n
ta
g
e�
P
o i
n
ts
�o
f�
th
e�
F
�D
i s
tr
ib
u
ti
o
n
v
2
v
1
1
2
3
4
5
6
7
8
9
10
12
15
20
24
30
40
60
12
0
∞
α
�=
.0
5
25
4 �
24
3�
39
2�
99
2�
76
2�
60
2�
49
2�
40
2�
34
2�
28
2�
24
2�
16
2�
09
2�
01
1�
96
1�
92
1�
87
1�
82
1�
77
1�
71
26
4 �
23
3�
37
2�
98
2�
74
2�
59
2�
47
2�
39
2�
32
2�
27
2�
22
2�
15
2�
07
1�
99
1�
95
1�
90
1�
85
1�
80
1�
75
1�
69
27
4 �
21
3�
35
2�
96
2�
73
2�
57
2�
46
2�
37
2�
31
2�
25
2�
20
2�
13
2�
06
1�
97
1�
93
1�
88
1�
84
1�
79
1�
73
1�
67
28
4 �
20
3�
34
2�
95
2�
71
2�
56
2�
45
2�
36
2�
29
2�
24
2�
19
2�
12
2�
04
1�
96
1�
91
1�
87
1�
82
1�
77
1�
71
1�
65
29
4 �
18
3�
33
2�
93
2�
70
2�
55
2�
43
2�
35
2�
28
2�
22
2�
18
2�
10
2�
03
1�
94
1�
90
1�
85
1�
81
1�
75
1�
70
1�
64
30
4 �
17
3�
32
2�
92
2�
69
2�
53
2�
42
2�
33
2�
27
2�
21
2�
16
2�
09
2�
01
1�
93
1�
89
1�
84
1�
79
1�
74
1�
68
1�
62
40
4 �
08
3�
23
2�
84
2�
61
2�
45
2�
34
2�
25
2�
18
2�
12
2�
08
2�
00
1�
92
1�
84
1�
79
1�
74
1�
69
1�
64
1�
58
1�
51
60
4 �
00
3�
15
2�
76
2�
53
2�
37
2�
25
2�
17
2�
10
2�
04
1�
99
1�
92
1�
84
1�
75
1�
70
1�
65
1�
59
1�
53
1�
47
1�
39
12
0
3 �
92
3�
07
2�
68
2�
45
2�
29
2�
17
2�
09
2�
02
1�
96
1�
91
1�
83
1�
75
1�
66
1�
61
1�
55
1�
50
1�
43
1�
35
1�
25
∞
3 �
84
3�
00
2�
60
2�
37
2�
21
2�
10
2�
01
1�
94
1�
88
1�
83
1�
75
1�
67
1�
57
1�
52
1�
46
1�
39
1�
32
1�
22
1�
00
α
�=
.0
1
1
40
52
49
99
�5
54
03
56
25
57
64
58
59
59
28
59
81
60
22
60
56
61
06
61
57
62
09
62
35
62
61
62
87
63
13
63
39
63
66
2
98
�5
0
99
�0
0
99
�1
7
99
�2
5
99
�3
0
99
�3
3
99
�3
6
99
�3
7
99
�3
9
99
�4
0
99
�4
2
99
�4
3
99
�4
5
99
�4
6
99
�4
7
99
�4
7
99
�4
8
99
�4
9
99
�5
0
3
34
�1
2
30
�8
2
29
�4
6
28
�7
1
28
�2
4
27
�9
1
27
�6
7
27
�4
9
27
�3
5
27
�2
3
27
�0
5
26
�8
7
26
�6
9
26
�6
0
26
�5
0
26
�4
1
26
�3
2
25
�2
2
26
�1
3
4
21
�2
0
18
�0
0
16
�6
9
15
�9
8
15
�5
2
15
�2
1
14
�9
8
14
�8
0
14
�6
6
14
�5
5
14
�3
7
14
�2
0
14
�0
2
13
�9
3
13
�8
4
13
�7
5
13
�5
5
13
�5
6
13
�4
6
5
16
�2
6
13
�2
7
12
�0
6
1 1
�3
9
10
�9
7
10
�6
7
10
�4
6
10
�2
9
10
�1
6
10
�0
5
9�
89
9�
72
9�
55
9�
47
9�
38
9�
29
9�
20
9�
11
9�
02
6
13
�7
5
10
�9
2
9�
78
9�
15
8�
75
8�
47
8�
26
8�
10
7�
98
7�
87
7�
72
7�
56
7�
40
7�
31
7�
23
7�
14
7�
06
6�
97
6�
88
7
12
�2
5
9�
55
8�
45
7�
85
7�
46
7�
19
6�
99
6�
84
6�
72
6�
62
6�
47
6�
31
6�
16
6�
07
5�
99
5�
91
5�
82
5�
74
5�
65
8
1 1
�2
6
8�
65
7�
59
7�
01
6�
63
6�
37
6�
18
6�
03
5�
91
5�
81
5�
67
5�
52
5�
36
5�
28
5�
20
5�
12
5�
03
4�
95
4�
86
9
10
�5
6
8�
02
6�
99
6�
42
6�
06
5�
80
5�
61
5�
47
5�
35
5�
26
5�
11
4�
96
4�
81
4�
73
4�
65
4�
57
4�
48
4�
40
4�
31
10
10
�0
4
7�
56
6�
55
5�
99
5�
64
5�
39
5�
20
5�
06
4�
94
4�
85
4�
71
4�
56
4�
41
4�
33
4�
25
4�
17
4�
08
4�
00
3�
91
11
9�
65
7�
21
6�
22
5�
67
5�
32
5�
07
4�
89
4�
74
4�
63
4�
54
4�
40
4�
25
4�
10
4�
02
3�
94
3�
86
3�
78
3�
69
3�
60
12
9 �
33
6�
93
5�
95
5�
41
5�
06
4�
82
4�
64
4�
50
4�
39
4�
30
4�
16
4�
01
3�
86
3�
78
3�
70
3�
62
3�
54
3�
45
3�
36

765Appendix: Tables
13
9�
07
6�
70
5�
74
5�
21
4�
86
4�
62
4�
44
4�
30
4�
19
4�
10
3�
96
3�
82
3�
66
3�
59
3�
51
3�
43
3�
34
3�
25
3�
17
14
8�
86
6�
51
5�
56
5�
04
4�
69
4�
46
4�
28
4�
14
4�
03
3�
94
3�
80
3�
66
3�
51
3�
43
3�
35
3�
27
3�
18
3�
09
3�
00
15
8�
68
6�
36
5�
42
4�
89
4�
56
4�
32
4�
14
4�
00
3�
89
3�
80
3�
67
3�
52
3�
37
3�
29
3�
21
3�
13
3�
05
2�
96
2�
87
16
8�
53
6�
23
5�
29
4�
77
4�
44
4�
20
4�
03
3�
89
3�
78
3�
69
3�
55
3�
41
3�
26
3�
18
3�
10
3�
02
2�
93
2�
84
2�
75
17
8�
40
6�
1 1
5�
18
4�
67
4�
34
4�
10
3�
93
3�
79
3�
68
3�
59
3�
46
3�
31
3�
16
3�
08
3�
00
2�
92
2�
83
2�
75
2�
65
18
8�
29
6�
01
5�
09
4�
58
4�
25
4�
01
3�
84
3�
71
3�
60
3�
51
3�
37
3�
23
3�
08
3�
00
2�
92
2�
84
2�
75
2�
66
2�
57
19
8�
18
5�
93
5�
01
4�
50
4�
17
3�
94
3�
77
3�
63
3�
52
3�
43
3�
30
3�
15
3�
00
2�
92
2�
84
2�
76
2�
67
2�
58
2�
49
20
8�
10
5�
85
4�
94
4�
43
4�
10
3�
87
3�
70
3�
56
3�
46
3�
37
3�
23
3�
09
2�
94
2�
86
2�
78
2�
69
2�
61
2�
52
2�
42
21
8�
02
5�
78
4�
87
4�
37
4�
04
3�
81
3�
64
3�
51
3�
40
3�
31
3�
17
3�
03
2�
88
2�
80
2�
72
2�
64
2�
55
2�
46
2�
36
22
7�
95
5�
72
4�
82
4�
31
3�
99
3�
76
3�
59
3�
45
3�
35
3�
26
3�
12
2�
98
2�
83
2�
75
2�
67
2�
58
2�
50
2�
40
2�
31
23
7�
88
5�
66
4�
76
4�
26
3�
94
3�
71
3�
54
3�
41
3�
30
3�
21
3�
07
2�
93
2�
78
2�
70
2�
62
2�
54
2�
45
2�
35
2�
26
24
7�
82
5�
61
4�
72
4�
22
3�
90
3�
67
3�
50
3�
36
3�
26
3�
17
3�
03
2�
89
2�
74
2�
66
2�
58
2�
49
2�
40
2�
31
2�
21
25
7�
77
5�
57
4�
68
4�
18
3�
85
3�
63
3�
46
3�
32
3�
22
3�
13
2�
99
2�
85
2�
70
2�
62
2�
54
2�
45
2�
36
2�
27
2�
17
26
7�
72
5�
53
4�
64
4�
14
3�
82
3�
59
3�
42
3�
29
3�
18
3�
09
2�
96
2�
81
2�
66
2�
58
2�
50
2�
42
2�
33
2�
23
2�
18
27
7�
68
5�
49
4�
60
4�
1 1
3�
78
3�
56
3�
39
3�
26
3�
15
3�
06
2�
93
2�
78
2�
63
2�
55
2�
47
2�
38
2�
29
2�
20
2�
10
28
7�
64
5�
45
4�
57
4�
07
3�
75
3�
53
3�
36
3�
23
3�
12
3�
03
2�
90
2�
75
2�
60
2�
52
2�
44
2�
35
2�
26
2�
17
2�
06
29
7�
60
5�
42
4�
54
4�
04
3�
73
3�
50
3�
33
3�
20
3�
09
3�
00
2�
87
2�
73
2�
57
2�
49
2�
41
2�
33
2�
23
2�
14
2�
03
30
7�
56
5�
39
4�
51
4�
02
3�
70
3�
47
3�
30
3�
17
3�
07
2�
98
2�
84
2�
70
2�
55
2�
47
2�
39
2�
30
2�
21
2�
1 1
2�
01
40
7�
31
5�
18
4�
31
3�
83
3�
51
3�
29
3�
12
2�
99
2�
89
2�
80
2�
66
2�
52
2�
37
2�
29
2�
20
2�
1 1
2�
02
1�
92
1�
80
60
7�
08
4�
98
4�
13
3�
65
3�
34
3�
12
2�
95
2�
82
2�
72
2�
63
2�
50
2�
35
2�
20
2�
12
2�
03
1�
94
1�
84
1�
73
1�
60
12
0
6�
85
4�
79
3�
95
3�
48
3�
17
2�
96
2�
79
2�
66
2�
56
2�
47
2�
34
2�
19
2�
03
1�
95
1�
86
1�
76
1�
66
1�
53
1�
38
∞
6�
63
4�
61
3�
78
3�
32
3�
02
2�
80
2�
64
2�
51
2�
41
2�
32
2�
18
2�
04
1�
88
1�
79
1�
70
1�
59
1�
47
1�
32
1�
00
S
ou
r c
e:
�
R
ep
ri
n
te
d
� f
ro
m
� P
ea
rs
o
n
,�
E
�S
��
an
d
� H
ar
tl
ey
,�
H
�O
�,�
B
io
m
et
ri
ka
T
ab
le
s
fo
r
S
ta
ti
st
ic
ia
n
s,
� C
am
b
ri
d
g
e�
U
n
iv
er
si
ty
� P
re
ss
,�
C
am
b
ri
d
g
e,
� U
�K
�,�
19
66
,�
Ta
b
le
� 1
8�
� W
it
h
�
p
er
m
is
si
o
n
�o
f�
B
io
m
et
ri
k
a�
T r
u
st
ee
s�
v 1
is
�t
h
e�
n
u
m
er
at
o
r�
d
eg
re
es
�o
f�
fr
ee
d
o
m
,�a
n
d
�v
2�
is
�t
h
e�
d
en
o
m
in
at
o
r�
d
eg
re
es
�o
f�
fr
ee
d
o
m
�

766 Appendix: Tables
Table a.5
Fisher’s�Z�Transformed�Values
r Z r Z
�00 �0000 �50 �5493
1 �0100 1 �5627
2 �0200 2 �5763
3 �0300 3 �5901
4 �0400 4 �6042
�05 �0500 �55 �6184
6 �0601 6 �6328
7 �0701 7 �6475
8 �0802 8 �6625
9 �0902 9 �6777
�10 �1003 �60 �6931
1 �1104 1 �7089
2 �1206 2 �7250
3 �1307 3 �7414
4 �1409 4 �7582
�15 �1511 �65 �7753
6 �1614 6 �7928
7 �1717 7 �8107
8 �1820 8 �8291
9 �1923 9 �8480
�20 �2027 �70 �8673
1 �2132 1 �8872
2 �2237 2 �9076
3 �2342 3 �9287
4 �2448 4 �9505
�25 �2554 �75 0�973
6 �2661 6 0�996
7 �2769 7 1�020
8 �2877 8 1�045
9 �2986 9 1�071
�30 �3095 �80 1�099
1 �3205 1 1�127
2 �3316 2 1�157
3 �3428 3 1�188
4 �3541 4 1�221
�35 �3654 �85 1�256
6 �3769 6 1�293
7 �3884 7 1�333
8 �4001 8 1�376
9 �4118 9 1�422

767Appendix: Tables
Table a.5 (continued)
Fisher’s�Z�Transformed�Values
r Z r Z
�40 �4236 �90 1�472
1 �4356 1 1�528
2 �4477 2 1�589
3 �4599 3 1�658
4 �4722 4 1�738
�45 �4847 �95 1�832
6 �4973 6 1�946
7 �5101 7 2�092
8 �5230 8 2�298
9 �5361 9 2�647
Source:� Reprinted� from� Pearson,� E�S��
and� Hartley,� H�O�,� Biometrika
Tables for Statisticians,�
Cambridge� University� Press,�
Cambridge,�U�K�,�1966,�Table�14��
With� permission� of� Biometrika�
Trustees�

768 Appendix: Tables
Table a.6
Orthogonal�Polynomials
J Trend j = 1 2 3 4 5 6 7 8 9 10 Σcj2
J�=�3 Linear −1 0 1 2
Quadratic 1 −2 1 6
J�=�4 Linear −3 −1 1 3 20
Quadratic 1 −1 −1 1 4
Cubic −1 3 −3 1 20
J�=�5 Linear −2 −1 0 1 2 10
Quadratic 2 −1 −2 −1 2 14
Cubic −1 2 0 −2 1 10
Quartic 1 −4 6 −4 1 70
J�=�6 Linear −5 −3 −1 1 3 5 70
Quadratic 5 −1 −4 −4 −1 5 84
Cubic −5 7 4 −4 −7 5 180
Quartic 1 −3 2 2 −3 1 28
Quintic −1 5 −10 10 −5 1 252
J�=�7 Linear −3 −2 −1 0 1 2 3 28
Quadratic 5 0 −3 −4 −3 0 5 84
Cubic −1 1 1 0 −1 −1 1 6
Quartic 3 −7 1 6 1 −7 3 154
Quintic −1 4 −5 0 5 −4 1 84
J�=�8 Linear −7 −5 −3 −1 1 3 5 7 168
Quadratic 7 1 −3 −5 −5 −3 1 7 168
Cubic −7 5 7 3 −3 −7 −5 7 264
Quartic 7 −13 −3 9 9 −3 −13 7 616
Quintic −7 23 −17 −15 15 17 −23 7 2184
J�=�9 Linear −4 −3 −2 −1 0 1 2 3 4 60
Quadratic 28 7 −8 −17 −20 −17 −8 7 28 2772
Cubic −14 7 13 9 0 −9 −13 −7 14 990
Quartic 14 −21 −11 9 18 9 −11 −21 14 2002
Quintic −4 11 −4 −9 0 9 4 −11 4 468
J�=�10 Linear −9 −7 −5 −3 −1 1 3 5 7 9 330
Quadratic 6 2 −1 −3 −4 −4 −3 −1 2 6 132
Cubic −42 14 35 31 12 −12 −31 −35 −14 42 8580
Quartic 18 −22 −17 3 18 18 3 −17 −22 18 2860
Quintic −6 14 −1 −11 −6 6 11 1 −14 6 780
Source:� Reprinted� from� Pearson,� E�S�� and� Hartley,� H�O�,� Biometrika Tables for Statisticians,� Cambridge�
University�Press,�Cambridge,�U�K�,�1966,�Table�47��With�permission�of�Biometrika�Trustees�

769Appendix: Tables
Table a.7
Critical�Values�for�Dunnett’s�Procedure
df 1 2 3 4 5 6 7 8 9
One tailed,�α�= .05
5 2�02 2�44 2�68 2�85 2�98 3�08 3�16 3�24 3�30
6 1�94 2�34 2�56 2�71 2�83 2�92 3�00 3�07 3�12
7 1�89 2�27 2�48 2�62 2�73 2�82 2�89 2�95 3�01
8 1�86 2�22 2�42 2�55 2�66 2�74 2�81 2�87 2�92
9 1�83 2�18 2�37 2�50 2�60 2�68 2�75 2�81 2�86
10 1�81 2�15 2�34 2�47 2�56 2�64 2�70 2�76 2�81
11 1�80 2�13 2�31 2�44 2�53 2�60 2�67 2�72 2�77
12 1�78 2�11 2�29 2�41 2�50 2�58 2�64 2�69 2�74
13 1�77 2�09 2�27 2�39 2�48 2�55 2�61 2�66 2�71
14 1�76 2�08 2�25 2�37 2�46 2�53 2�59 2�64 2�69
15 1�75 2�07 2�24 2�36 2�44 2�51 2�57 2�62 2�67
16 1�75 2�06 2�23 2�34 2�43 2�50 2�56 2�61 2�65
17 1�74 2�05 2�22 2�33 2�42 2�49 2�54 2�59 2�64
18 1�73 2�04 2�21 2�32 2�41 2�48 2�53 2�58 2�62
19 1�73 2�03 2�20 2�31 2�40 2�47 2�52 2�57 2�61
20 1�72 2�03 2�19 2�30 2�39 2�46 2�51 2�56 2�60
24 1�71 2�01 2�17 2�28 2�36 2�43 2�48 2�53 2�57
30 1�70 1�99 2�15 2�25 2�33 2�40 2�45 2�50 2�54
40 1�68 1�97 2�13 2�23 2�31 2�37 2�42 2�47 2�51
60 1�67 1�95 2�10 2�21 2�28 2�35 2�39 2�44 2�48
120 1�66 1�93 2�08 2�18 2�26 2�32 2�37 2�41 2�45
∞ 1�64 1�92 2�06 2�16 2�23 2�29 2�34 2�38 2�42
One tailed,�α�= .01
5 3�37 3�90 4�21 4�43 4�60 4�73 4�85 4�94 5�03
6 3�14 3�61 3�88 4�07 4�21 4�33 4�43 4�51 4�59
7 3�00 3�42 3�66 3�83 3�96 4�07 4�15 4�23 4�30
8 2�90 3�29 3�51 3�67 3�79 3�88 3�96 4�03 4�09
9 2�82 3�19 3�40 3�55 3�66 3�75 3�82 3�89 3�94
10 2�76 3�11 3�31 3�45 3�56 3�64 3�71 3�78 3�83
11 2�72 3�06 3�25 3�38 3�48 3�56 3�63 3�69 3�74
12 2�68 3�01 3�19 3�32 3�42 3�50 3�56 3�62 3�67
13 2�65 2�97 3�15 3�27 3�37 3�44 3�51 3�56 3�61
14 2�62 2�94 3�11 3�23 3�32 3�40 3�46 3�51 3�56
15 2�60 2�91 3�08 3�20 3�29 3�36 3�42 3�47 3�52
16 2�58 2�88 3�05 3�17 3�26 3�33 3�39 3�44 3�48
17 2�57 2�86 3�03 3�14 3�23 3�30 3�36 3�41 3�45
18 2�55 2�84 3�01 3�12 3�21 3�27 3�33 3�38 3�42
19 2�54 2�83 2�99 3�10 3�18 3�25 3�31 3�36 3�40
20 2�53 2�81 2�97 3�08 3�17 3�23 3�29 3�34 3�38
24 2�49 2�77 2�92 3�03 3�11 3�17 3�22 3�27 3�31
(continued)

770 Appendix: Tables
Table a.7 (continued)
Critical�Values�for�Dunnett’s�Procedure
df 1 2 3 4 5 6 7 8 9
One tailed,�α�= .01
30 2�46 2�72 2�87 2�97 3�05 3�11 3�16 3�21 3�24
40 2�42 2�68 2�82 2�92 2�99 3�05 3�10 3�14 3�18
60 2�39 2�64 2�78 2�87 2�94 3�00 3�04 3�08 3�12
120 2�36 2�60 2�73 2�82 2�89 2�94 2�99 3�03 3�06
∞ 2�33 2�56 2�68 2�77 2�84 2�89 2�93 2�97 3�00
Two tailed,�α�= .05
5 2�57 3�03 3�29 3�48 3�62 3�73 3�82 3�90 3�97
6 2�45 2�86 3�10 3�26 3�39 3�49 3�57 3�64 3�71
7 2�36 2�75 2�97 3�12 3�24 3�33 3�41 3�47 3�53
8 2�31 2�67 2�88 3�02 3�13 3�22 3�29 3�35 3�41
9 2�26 2�61 2�81 2�95 3�05 3�14 3�20 3�26 3�32
10 2�23 2�57 2�76 2�89 2�99 3�07 3�14 3�19 3�24
11 2�20 2�53 2�72 2�84 2�94 3�02 3�08 3�14 3�19
12 2�18 2�50 2�68 2�81 2�90 2�98 3�04 3�09 3�14
13 2�16 2�48 2�65 2�78 2�87 2�94 3�00 3�06 3�10
14 2�14 2�46 2�63 2�75 2�84 2�91 2�97 3�02 3�07
15 2�13 2�44 2�61 2�73 2�82 2�89 2�95 3�00 3�04
16 2�12 2�42 2�59 2�71 2�80 2�87 2�92 2�97 3�02
17 2�11 2�41 2�58 2�69 2�78 2�85 2�90 2�95 3�00
18 2�10 2�40 2�56 2�68 2�76 2�83 2�89 2�94 2�98
19 2�09 2�39 2�55 2�66 2�75 2�81 2�87 2�92 2�96
20 2�09 2�38 2�54 2�65 2�73 2�80 2�86 2�90 2�95
24 2�06 2�35 2�51 2�61 2�70 2�76 2�81 2�86 2�90
30 2�04 2�32 2�47 2�58 2�66 2�72 2�77 2�82 2�86
40 2�02 2�29 2�44 2�54 2�62 2�68 2�73 2�77 2�81
60 2�00 2�27 2�41 2�51 2�58 2�64 2�69 2�73 2�77
120 1�98 2�24 2�38 2�47 2�55 2�60 2�65 2�69 2�73
∞ 1�96 2�21 2�35 2�44 2�51 2�57 2�61 2�65 2�69
Two tailed,�α�= .01
5 4�03 4�63 4�98 5�22 5�41 5�56 5�69 5�80 5�89
6 3�71 4�21 4�51 4�71 4�87 5�00 5�10 5�20 5�28
7 3�50 3�95 4�21 4�39 4�53 4�64 4�74 4�82 4�89
8 3�36 3�77 4�00 4�17 4�29 4�40 4�48 4�56 4�62
9 3�25 3�63 3�85 4�01 4�12 4�22 4�30 4�37 4�43
10 3�17 3�53 3�74 3�88 3�99 4�08 4�16 4�22 4�28
11 3�11 3�45 3�65 3�79 3�89 3�98 4�05 4�11 4�16
12 3�05 3�39 3�58 3�71 3�81 3�89 3�96 4�02 4�07
13 3�01 3�33 3�52 3�65 3�74 3�82 3�89 3�94 3�99
14 2�98 3�29 3�47 3�59 3�69 3�76 3�83 3�88 3�93
15 2�95 3�25 3�43 3�55 3�64 3�71 3�78 3�83 3�88
16 2�92 3�22 3�39 3�51 3�60 3�67 3�73 3�78 3�83

771Appendix: Tables
Table a.7 (continued)
Critical�Values�for�Dunnett’s�Procedure
df 1 2 3 4 5 6 7 8 9
Two tailed,�α�= .01
17 2�90 3�19 3�36 3�47 3�56 3�63 3�69 3�74 3�79
18 2�88 3�17 3�33 3�44 3�53 3�60 3�66 3�71 3�75
19 2�86 3�15 3�31 3�42 3�50 3�57 3�63 3�68 3�72
20 2�85 3�13 3�29 3�40 3�48 3�55 3�60 3�65 3�69
24 2�80 3�07 3�22 3�32 3�40 3�47 3�52 3�57 3�61
30 2�75 3�01 3�15 3�25 3�33 3�39 3�44 3�49 3�52
40 2�70 2�95 3�09 3�19 3�26 3�32 3�37 3�41 3�44
60 2�66 2�90 3�03 3�12 3�19 3�25 3�29 3�33 3�37
120 2�62 2�85 2�97 3�06 3�12 3�18 3�22 3�26 3�29
∞ 2�58 2�79 2�92 3�00 3�06 3�11 3�15 3�19 3�22
Sources:� Reprinted� from� Dunnett,� C�W�,� J. Am. Stat. Assoc�,� 50,� 1096,� 1955,� Table� 1a�
and� Table� 1b�� With� permission� of� the� American� Statistical� Association;�
Dunnett,�C�W�,�Biometrics,�20,�482,�1964,�Table�II�and�Table�III��With�permis-
sion�of�the�Biometric�Society�
The�columns�represent�J�=�number�of�treatment�means�(excluding�the�control)�

772 Appendix: Tables
Table a.8
Critical�Values�for�Dunn’s�(Bonferroni’s)�Procedure
Number of Contrasts
ν α 2 3 4 5 6 7 8 9 10 15 20
2 0�01 14�071 17�248 19�925 22�282 24�413 26�372 28�196 29�908 31�528 38�620 44�598
0�05 6�164 7�582 8�774 9�823 10�769 11�639 12�449 13�208 13�927 17�072 19�721
0�10 4�243 5�243 6�081 6�816 7�480 8�090 8�656 9�188 9�691 11�890 13�741
0�20 2�828 3�531 4�116 4�628 5�089 5�512 5�904 6�272 6�620 8�138 9�414
3 0�01 7�447 8�565 9�453 10�201 10�853 11�436 11�966 12�453 12�904 14�796 16�300
0�05 4�156 4�826 5�355 5�799 6�185 6�529 6�842 7�128 7�394 8�505 9�387
0�10 3�149 3�690 4�115 4�471 4�780 5�055 5�304 5�532 5�744 6�627 7�326
0�20 2�294 2�734 3�077 3�363 3�610 3�829 4�028 4�209 4�377 5�076 5�626
4 0�01 5�594 6�248 6�751 7�166 7�520 7�832 8�112 8�367 8�600 9�556 10�294
0�05 3�481 3�941 4�290 4�577 4�822 5�036 5�228 5�402 5�562 6�214 6�714
0�10 2�751 3�150 3�452 3�699 3�909 4�093 4�257 4�406 4�542 5�097 5�521
0�20 2�084 2�434 2�697 2�911 3�092 3�250 3�391 3�518 3�635 4�107 4�468
5 0�01 4�771 5�243 5�599 5�888 6�133 6�346 6�535 6�706 6�862 7�491 7�968
0�05 3�152 3�518 3�791 4�012 4�197 4�358 4�501 4�630 4�747 5�219 5�573
0�10 2�549 2�882 3�129 3�327 3�493 3�638 3�765 3�880 3�985 4�403 4�718
0�20 1�973 2�278 2�503 2�683 2�834 2�964 3�079 3�182 3�275 3�649 3�928
6 0�01 4�315 4�695 4�977 5�203 5�394 5�559 5�704 5�835 5�954 6�428 6�782
0�05 2�959 3�274 3�505 3�690 3�845 3�978 4�095 4�200 4�296 4�675 4�956
0�10 2�428 2�723 2�939 3�110 3�253 3�376 3�484 3�580 3�668 4�015 4�272
0�20 1�904 2�184 2�387 2�547 2�681 2�795 2�895 2�985 3�066 3�385 3�620
7 0�01 4�027 4�353 4�591 4�782 4�941 5�078 5�198 5�306 5�404 5�791 6�077
0�05 2�832 3�115 3�321 3�484 3�620 3�736 3�838 3�929 4�011 4�336 4�574
0�10 2�347 2�618 2�814 2�969 3�097 3�206 3�302 3�388 3�465 3�768 3�990
0�20 1�858 2�120 2�309 2�457 2�579 2�684 2�775 2�856 2�929 3�214 3�423
8 0�01 3�831 4�120 4�331 4�498 4�637 4�756 4�860 4�953 5�038 5�370 5�613
0�05 2�743 3�005 3�193 3�342 3�464 3�589 3�661 3�743 3�816 4�105 4�316
0�10 2�289 2�544 2�726 2�869 2�967 3�088 3�176 3�254 3�324 3�598 3�798
0�20 1�824 2�075 2�254 2�393 2�508 2�605 2�690 2�765 2�832 3�095 3�286
9 0�01 3�688 3�952 4�143 4�294 4�419 4�526 4�619 4�703 4�778 5�072 5�287
0�05 2�677 2�923 3�099 3�237 3�351 3�448 3�532 3�607 3�675 3�938 4�129
0�10 2�246 2�488 2�661 2�796 2�907 3�001 3�083 3�155 3�221 3�474 3�658
0�20 1�799 2�041 2�212 2�345 2�454 2�546 2�627 2�696 2�761 3�008 3�185
10 0�01 3�580 3�825 4�002 4�141 4�256 4�354 4�439 4�515 4�584 4�852 5�046
0�05 2�626 2�860 3�027 3�157 3�264 3�355 3�434 3�505 3�568 3�813 3�989
0�10 2�213 2�446 2�611 2�739 2�845 2�934 3�012 3�080 3�142 3�380 3�552
0�20 1�779 2�014 2�180 2�308 2�413 2�501 2�578 2�646 2�706 2�941 3�106
11 0�01 3�495 3�726 3�892 4�022 4�129 4�221 4�300 4�371 4�434 4�682 4�860
0�05 2�586 2�811 2�970 3�094 3�196 3�283 3�358 3�424 3�484 3�715 3�880
0�10 2�166 2�412 2�571 2�695 2�796 2�881 2�955 3�021 3�079 3�306 3�468
0�20 1�763 1�993 2�154 2�279 2�380 2�465 2�539 2�605 2�663 2�888 3�048
12 0�01 3�427 3�647 3�804 3�927 4�029 4�114 4�189 4�256 4�315 4�547 4�714
0�05 2�553 2�770 2�924 3�044 3�141 3�224 3�296 3�359 3�416 3�636 3�793
0�10 2�164 2�384 2�539 2�658 2�756 2�838 2�910 2�973 3�029 3�247 3�402
0�20 1�750 1�975 2�133 2�254 2�353 2�436 2�508 2�571 2�628 2�845 2�999

773Appendix: Tables
Table a.8 (continued)
Critical�Values�for�Dunn’s�(Bonferroni’s)�Procedure
Number of Contrasts
ν α 2 3 4 5 6 7 8 9 10 15 20
13 0�01 3�371 3�582 3�733 3�850 3�946 4�028 4�099 4�162 4�218 4�438 4�595
0�05 2�526 2�737 2�886 3�002 3�096 3�176 3�245 3�306 3�361 3�571 3�722
0�10 2�146 2�361 2�512 2�628 2�723 2�803 2�872 2�933 2�988 3�198 3�347
0�20 1�739 1�961 2�116 2�234 2�331 2�412 2�482 2�544 2�599 2�809 2�958
14 0�01 3�324 3�528 3�673 3�785 3�878 3�956 4�024 4�084 4�138 4�347 4�497
0�05 2�503 2�709 2�854 2�967 3�058 3�135 3�202 3�261 3�314 3�518 3�662
0�10 2�131 2�342 2�489 2�603 2�696 2�774 2�841 2�900 2�953 3�157 3�301
0�20 1�730 1�949 2�101 2�217 2�312 2�392 2�460 2�520 2�574 2�779 2�924
15 0�01 3�285 3�482 3�622 3�731 3�820 3�895 3�961 4�019 4�071 4�271 4�414
0�05 2�483 2�685 2�827 2�937 3�026 3�101 3�166 3�224 3�275 3�472 3�612
0�10 2�118 2�325 2�470 2�582 2�672 2�748 2�814 2�872 2�924 3�122 3�262
0�20 1�722 1�938 2�088 2�203 2�296 2�374 2�441 2�500 2�553 2�754 2�896
16 0�01 3�251 3�443 3�579 3�684 3�771 3�844 3�907 3�963 4�013 4�206 4�344
0�05 2�467 2�665 2�804 2�911 2�998 3�072 3�135 3�191 3�241 3�433 3�569
0�10 2�106 2�311 2�453 2�563 2�652 2�726 2�791 2�848 2�898 3�092 3�228
0�20 1�715 1�929 2�077 2�190 2�282 2�359 2�425 2�483 2�535 2�732 2�871
17 0�01 3�221 3�409 3�541 3�644 3�728 3�799 3�860 3�914 3�963 4�150 4�284
0�05 2�452 2�647 2�783 2�889 2�974 3�046 3�108 3�163 3�212 3�399 3�532
0�10 2�096 2�296 2�439 2�547 2�634 2�706 2�771 2�826 2�876 3�066 3�199
0�20 1�709 1�921 2�068 2�179 2�270 2�346 2�411 2�488 2�519 2�713 2�849
18 0�01 3�195 3�379 3�508 3�609 3�691 3�760 3�820 3�872 3�920 4�102 4�231
0�05 2�439 2�631 2�766 2�869 2�953 3�024 3�085 3�138 3�186 3�370 3�499
0�10 2�088 2�287 2�426 2�532 2�619 2�691 2�753 2�806 2�857 3�043 3�174
0�20 1�704 1�914 2�059 2�170 2�259 2�334 2�399 2�455 2�505 2�696 2�830
19 0�01 3�173 3�353 3�479 3�578 3�658 3�725 3�784 3�835 3�881 4�059 4�185
0�05 2�427 2�617 2�750 2�852 2�934 3�004 3�064 3�116 3�163 3�343 3�470
0�10 2�080 2�277 2�415 2�520 2�605 2�676 2�738 2�791 2�839 3�023 3�152
0�20 1�699 1�908 2�052 2�161 2�250 2�324 2�388 2�443 2�493 2�682 2�813
20 0�01 3�152 3�329 3�454 3�550 3�629 3�695 3�752 3�802 3�848 4�021 4�144
0�05 2�417 2�605 2�736 2�836 2�918 2�986 3�045 3�097 3�143 3�320 3�445
0�10 2�073 2�269 2�405 2�508 2�593 2�663 2�724 2�777 2�824 3�005 3�132
0�20 1�695 1�902 2�045 2�154 2�241 2�315 2�378 2�433 2�482 2�668 2�798
21 0�01 3�134 3�308 3�431 3�525 3�602 3�667 3�724 3�773 3�817 3�987 4�108
0�05 2�408 2�594 2�723 2�822 2�903 2�970 3�028 3�080 3�125 3�300 3�422
0�10 2�067 2�261 2�396 2�498 2�581 2�651 2�711 2�764 2�810 2�989 3�114
0�20 1�691 1�897 2�039 2�147 2�234 2�306 2�369 2�424 2�472 2�656 2�785
22 0�01 3�118 3�289 3�410 3�503 3�579 3�643 3�698 3�747 3�790 3�957 4�075
0�05 2�400 2�584 2�712 2�810 2�889 2�956 3�014 3�064 3�109 3�281 3�402
0�10 2�061 2�254 2�387 2�489 2�572 2�641 2�700 2�752 2�798 2�974 3�096
0�20 1�688 1�892 2�033 2�141 2�227 2�299 2�361 2�415 2�463 2�646 2�773
(continued)

774 Appendix: Tables
Table a.8 (continued)
Critical�Values�for�Dunn’s�(Bonferroni’s)�Procedure
Number of Contrasts
ν α 2 3 4 5 6 7 8 9 10 15 20
23 0�01 3�103 3�272 3�392 3�483 3�558 3�621 3�675 3�723 3�766 3�930 4�046
0�05 2�392 2�574 2�701 2�798 2�877 2�943 3�000 3�050 3�094 3�264 3�383
0�10 2�056 2�247 2�380 2�481 2�563 2�631 2�690 2�741 2�787 2�961 3�083
0�20 1�685 1�888 2�028 2�135 2�221 2�292 2�354 2�407 2�455 2�636 2�762
24 0�01 3�089 3�257 3�375 3�465 3�539 3�601 3�654 3�702 3�744 3�905 4�019
0�05 2�385 2�566 2�692 2�788 2�866 2�931 2�988 3�037 3�081 3�249 3�366
0�10 2�051 2�241 2�373 2�473 2�554 2�622 2�680 2�731 2�777 2�949 3�070
0�20 1�682 1�884 2�024 2�130 2�215 2�286 2�347 2�400 2�448 2�627 2�752
25 0�01 3�077 3�243 3�359 3�449 3�521 3�583 3�635 3�682 3�723 3�882 3�995
0�05 2�379 2�558 2�683 2�779 2�856 2�921 2�976 3�025 3�069 3�235 3�351
0�10 2�047 2�236 2�367 2�466 2�547 2�614 2�672 2�722 2�767 2�938 3�058
0�20 1�679 1�881 2�020 2�125 2�210 2�280 2�341 2�394 2�441 2�619 2�743
26 0�01 3�066 3�230 3�345 3�433 3�505 3�566 3�618 3�664 3�705 3�862 3�972
0�05 2�373 2�551 2�675 2�770 2�847 2�911 2�966 3�014 3�058 3�222 3�337
0�10 2�043 2�231 2�361 2�460 2�540 2�607 2�664 2�714 2�759 2�928 3�047
0�20 1�677 1�878 2�016 2�121 2�205 2�275 2�335 2�388 2�435 2�612 2�735
27 0�01 3�056 3�218 3�332 3�419 3�491 3�550 3�602 3�647 3�688 3�843 3�952
0�05 2�368 2�545 2�668 2�762 2�838 2�902 2�956 3�004 3�047 3�210 3�324
0�10 2�039 2�227 2�356 2�454 2�534 2�600 2�657 2�707 2�751 2�919 3�036
0�20 1�675 1�875 2�012 2�117 2�201 2�270 2�330 2�383 2�429 2�605 2�727
28 0�01 3�046 3�207 3�320 3�407 3�477 3�536 3�587 3�632 3�672 3�825 3�933
0�05 2�383 2�539 2�661 2�755 2�830 2�893 2�948 2�995 3�038 3�199 3�312
0�10 2�036 2�222 2�351 2�449 2�528 2�594 2�650 2�700 2�744 2�911 3�027
0�20 1�672 1�872 2�009 2�113 2�196 2�266 2�326 2�378 2�424 2�599 2�720
29 0�01 3�037 3�197 3�309 3�395 3�464 3�523 3�574 3�618 3�658 3�809 3�916
0�05 2�358 2�534 2�655 2�748 2�823 2�886 2�940 2�967 3�029 3�189 3�301
0�10 2�033 2�218 2�346 2�444 2�522 2�588 2�644 2�693 2�737 2�903 3�018
0�20 1�671 1�869 2�006 2�110 2�193 2�262 2�321 2�373 2�419 2�593 2�713
30 0�01 3�029 3�188 3�298 3�384 3�453 3�511 3�561 3�605 3�644 3�794 3�900
0�05 2�354 2�528 2�649 2�742 2�816 2�878 2�932 2�979 3�021 3�180 3�291
0�10 2�030 2�215 2�342 2�439 2�517 2�582 2�638 2�687 2�731 2�895 3�010
0�20 1�669 1�867 2�003 2�106 2�189 2�258 2�317 2�369 2�414 2�587 2�707
40 0�01 2�970 3�121 3�225 3�305 3�370 3�425 3�472 3�513 3�549 3�689 3�787
0�05 2�323 2�492 2�606 2�696 2�768 2�827 2�878 2�923 2�963 3�113 3�218
0�10 2�009 2�189 2�312 2�406 2�481 2�544 2�597 2�644 2�686 2�843 2�952
0�20 1�656 1�850 1�983 2�083 2�164 2�231 2�288 2�338 2�382 2�548 2�663
60 0�01 2�914 3�056 3�155 3�230 3�291 3�342 3�386 3�425 3�459 3�589 3�679
0�05 2�294 2�456 2�568 2�653 2�721 2�777 2�826 2�869 2�906 3�049 3�146
0�10 1�989 2�163 2�283 2�373 2�446 2�506 2�558 2�603 2�643 2�793 2�897
0�20 1�643 1�834 1�963 2�061 2�139 2�204 2�259 2�308 2�350 2�511 2�621

775Appendix: Tables
Table a.8 (continued)
Critical�Values�for�Dunn’s�(Bonferroni’s)�Procedure
Number of Contrasts
ν α 2 3 4 5 6 7 8 9 10 15 20
120 0�01 2�859 2�994 3�067 3�158 3�215 3�263 3�304 3�340 3�372 3�493 3�577
0�05 2�265 2�422 2�529 2�610 2�675 2�729 2�776 2�816 2�852 2�967 3�081
0�10 1�968 2�138 2�254 2�342 2�411 2�469 2�519 2�562 2�600 2�744 2�843
0�20 1�631 1�817 1�944 2�039 2�115 2�178 2�231 2�278 2�319 2�474 2�580
∞ 0�01 2�806 2�934 3�022 3�089 3�143 3�188 3�226 3�260 3�289 3�402 3�480
0�05 2�237 2�388 2�491 2�569 2�631 2�683 2�727 2�766 2�800 2�928 3�016
0�10 1�949 2�114 2�226 2�311 2�378 2�434 2�482 2�523 2�560 2�697 2�791
0�20 1�618 1�801 1�925 2�018 2�091 2�152 2�204 2�249 2�289 2�438 2�540
Source:� Reprinted�from�Games,�P�A�,�J. Am. Stat. Asso�,�72,�531,�1977,�Table�1��With�permission�of�the�American�
Statistical�Association�

776 Appendix: Tables
Table a.9
Critical�Values�for�the�Studentized�Range�Statistic
J or r
v 2 3 4 5 6 7 8 9 10
α = .10
1 8�929 13�44 16�36 18�49 20�15 21�51 22�64 23�62 24�48
2 4�130 5�733 6�773 7�538 8�139 8�633 9�049 9�409 9�725
3 3�328 4�467 5�199 5�738 6�162 6�511 6�806 7�062 7�287
4 3�015 3�976 4�586 5�035 5�388 5�679 5�926 6�139 6�327
5 2�850 3�717 4�264 4�664 4�979 5�238 5�458 5�648 5�816
6 2�748 3�559 4�065 4�435 4�726 4�966 5�168 5�344 5�499
7 2�680 3�451 3�931 4�280 4�555 4�780 4�972 5�137 5�283
8 2�630 3�374 3�834 4�169 4�431 4�646 4�829 4�987 5�126
9 2�592 3�316 3�761 4�084 4�337 4�545 4�721 4�873 5�007
10 2�563 3�270 3�704 4�018 4�264 4�465 4�636 4�783 4�913
11 2�540 3�234 3�658 3�965 4�205 4�401 4�568 4�711 4�838
12 2�521 3�204 3�621 3�922 4�156 4�349 4�511 4�652 4�776
13 2�505 3�179 3�589 3�885 4�116 4�305 4�464 4�602 4�724
14 2�491 3�158 3�563 3�854 4�081 4�267 4�424 4�560 4�680
15 2�479 3�140 3�540 3�828 4�052 4�235 4�390 4�524 4�641
16 2�469 3�124 3�520 3�804 4�026 4�207 4�360 4�492 4�608
17 2�460 3�110 3�503 3�784 4�004 4�183 4�334 4�464 4�579
18 2�452 3�098 3�488 3�767 3�984 4�161 4�311 4�440 4�554
19 2�445 3�087 3�474 3�751 3�966 4�142 4�290 4�418 4�531
20 2�439 3�078 3�462 3�736 3�950 4�124 4�271 4�398 4�510
24 2�420 3�047 3�423 3�692 3�900 4�070 4�213 4�336 4�445
30 2�400 3�017 3�386 3�648 3�851 4�016 4�155 4�275 4�381
40 2�381 2�988 3�349 3�605 3�803 3�963 4�099 4�215 4�317
60 2�363 2�959 3�312 3�562 3�755 3�911 4�042 4�155 4�254
120 2�344 2�930 3�276 3�520 3�707 3�859 3�987 4�096 4�191
∞ 2�326 2�902 3�240 3�478 3�661 3�808 3�931 4�037 4�129
v
J or r
11 12 13 14 15 16 17 18 19
α�= .10
1 25�24 25�92 26�54 27�10 27�62 28�10 28�54 28�96 29�35
2 10�01 10�26 10�49 10�70 10�89 11�07 11�24 11�39 11�54
3 7�487 7�667 7�832 7�982 8�120 8�249 8�368 8�479 8�584
4 6�495 6�645 6�783 6�909 7�025 7�133 7�233 7�327 7�414
5 5�966 6�101 6�223 6�336 6�440 6�536 6�626 6�710 6�789
6 5�637 5�762 5�875 5�979 6�075 6�164 6�247 6�325 6�398
7 5�413 5�530 5�637 5�735 5�826 5�910 5�838 6�061 6�130
8 5�250 5�362 5�464 5�558 5�644 5�724 5�799 5�869 5�935
9 5�127 5�234 5�333 5�423 5�506 5�583 5�655 5�723 5�786
10 5�029 5�134 5�229 5�317 5�397 5�472 5�542 5�607 5�668
11 4�951 5�053 5�146 5�231 5�309 5�382 5�450 5�514 5�573
12 4�886 4�986 5�077 5�160 5�236 5�308 5�374 5�436 5�495
13 4�832 4�930 5�019 5�100 5�176 5�245 5�311 5�372 5�429

777Appendix: Tables
Table a.9 (continued)
Critical�Values�for�the�Studentized�Range�Statistic
J or r
v 11 12 13 14 15 16 17 18 19
α�= .10
14 4�786 4�882 4�970 5�050 5�124 5�192 5�256 5�316 5�373
15 4�746 4�841 4�927 5�006 5�079 5�147 5�209 5�269 5�324
16 4�712 4�805 4�890 4�968 5�040 5�107 5�169 5�227 5�282
17 4�682 4�774 4�858 4�935 5�005 5�071 5�133 5�190 5�244
18 4�655 4�746 4�829 4�905 4�975 5�040 5�101 5�158 5�211
19 4�631 4�721 4�803 4�879 4�948 5�012 5�073 5�129 5�182
20 4�609 4�699 4�780 4�855 4�924 4�987 5�047 5�103 5�155
24 4�541 4�628 4�708 4�780 4�847 4�909 4�966 5�021 5�071
30 4�474 4�559 4�635 4�706 4�770 4�830 4�886 4�939 4�988
40 4�408 4�490 4�564 4�632 4�695 4�752 4�807 4�857 4�905
60 4�342 4�421 4�493 4�558 4�619 4�675 4�727 4�775 4�821
120 4�276 4�353 4�422 4�485 4�543 4�597 4�647 4�694 4�738
∞ 4�211 4�285 4�351 4�412 4�468 4�519 4�568 4�612 4�654
v
J or r
2 3 4 5 6 7 8 9 10
α = .05
1 17�97 26�98 32�82 37�08 40�41 43�12 45�40 47�36 49�07
2 6�085 8�331 9�798 10�88 11�74 12�44 13�03 13�54 13�99
3 4�501 5�910 6�825 7�502 8�037 8�478 8�853 9�177 9�462
4 3�927 5�040 5�757 6�287 6�707 7�053 7�347 7�602 7�826
5 3�635 4�602 5�218 5�673 6�033 6�330 6�582 6�802 6�995
6 3�461 4�339 4�896 5�305 5�628 5�895 6�122 6�319 6�493
7 3�344 4�165 4�681 5�060 5�359 5�606 5�815 5�998 6�158
8 3�261 4�041 4�529 4�886 5�167 5�399 5�597 5�767 5�918
9 3�199 3�949 4�415 4�756 5�024 5�244 5�432 5�595 5�739
10 3�151 3�877 4�327 4�654 4�912 5�124 5�305 5�461 5�599
11 3�113 3�820 4�256 4�574 4�823 5�028 5�202 5�353 5�487
12 3�082 3�773 4�199 4�508 4�751 4�950 5�119 5�265 5�395
13 3�055 3�735 4�151 4�453 4�690 4�885 5�049 5�192 5�318
14 3�033 3�702 4�111 4�407 4�639 4�829 4�990 5�131 5�254
15 3�014 3�674 4�076 4�367 4�595 4�782 4�940 5�077 5�198
16 2�998 3�649 4�046 4�333 4�557 4�741 4�897 5�031 5�150
17 2�984 3�628 4�020 4�303 4�524 4�705 4�858 4�991 5�108
18 2�971 3�609 3�997 4�277 4�495 4�673 4�824 4�956 5�071
19 2�960 3�593 3�977 4�253 4�469 4�645 4�794 4�924 5�038
20 2�950 3�578 3�958 4�232 4�445 4�620 4�768 4�896 5�008
24 2�919 3�532 3�901 4�166 4�373 4�541 4�684 4�807 4�915
30 2�888 3�486 3�845 4�102 4�302 4�464 4�602 4�720 4�824
40 2�858 3�442 3�791 4�039 4�232 4�389 4�521 4�635 4�735
60 2�829 3�399 3�737 3�977 4�163 4�314 4�441 4�550 4�646
120 2�800 3�356 3�685 3�917 4�096 4�241 4�363 4�468 4�560
∞ 2�772 3�314 3�633 3�858 4�030 4�170 4�286 4�387 4�474
(continued)

778 Appendix: Tables
Table a.9 (continued)
Critical�Values�for�the�Studentized�Range�Statistic
J or r
v 11 12 13 14 15 16 17 18 19
α�= .05
1 50�59 51�96 53�20 54�33 55�36 56�32 57�22 58�04 58�83
2 14�39 14�75 15�08 15�38 15�65 15�91 16�14 16�37 16�57
3 9�717 9�946 10�15 10�35 10�53 10�69 10�84 10�98 11�11
4 8�027 8�208 8�373 8�525 8�664 8�794 8�914 9�028 9�134
5 7�168 7�324 7�466 7�596 7�717 7�828 7�932 8�030 8�122
6 6�649 6�789 6�917 7�034 7�143 7�244 7�338 7�426 7�508
7 6�302 6�431 6�550 6�658 6�759 6�852 6�939 7�020 7�097
8 6�054 6�175 6�287 6�389 6�483 6�571 6�653 6�729 6�802
9 5�867 5�983 6�089 6�186 6�276 6�359 6�437 6�510 6�579
10 5�722 5�833 5�935 6�028 6�114 6�194 6�269 6�339 6�405
11 5�605 5�713 5�811 5�901 5�984 6�062 6�134 6�202 6�265
12 5�511 5�615 5�710 5�798 5�878 5�953 6�023 6�089 6�151
13 5�431 5�533 5�625 5�711 5�789 5�862 5�931 5�995 6�055
14 5�364 5�463 5�554 5�637 5�714 5�786 5�852 5�915 5�974
15 5�306 5�404 5�493 5�574 5�649 5�720 5�785 5�846 5�904
16 5�256 5�352 5�439 5�520 5�593 5�662 5�720 5�786 5�843
17 5�212 5�307 5�392 5�471 5�544 5�612 5�675 5�734 5�790
18 5�174 5�267 5�352 5�429 5�501 5�568 5�630 5�688 5�743
19 5�140 5�231 5�315 5�391 5�462 5�528 5�589 5�647 5�701
20 5�108 5�199 5�282 5�357 5�427 5�493 5�553 5�610 5�663
24 5�012 5�099 5�179 5�251 5�319 5�381 5�439 5�494 5�545
30 4�917 5�001 5�077 5�147 5�211 5�271 5�327 5�379 5�429
40 4�824 4�904 4�977 5�044 5�106 5�163 5�216 5�266 5�313
60 4�732 4�808 4�878 4�942 5�001 5�056 5�107 5�154 5�199
120 4�641 4�714 4�781 4�842 4�898 4�950 4�998 5�044 5�086
∞ 4�552 4�622 4�685 4�743 4�796 4�845 4�891 4�934 4�974
J or r
v 2 3 4 5 6 7 8 9 10
α�= .01
1 90�03 135�0 164�3 185�6 202�2 215�8 227�2 237�0 245�6
2 14�04 19�02 22�29 24�72 26�63 28�20 29�53 30�68 31�69
3 8�261 10�62 12�17 13�33 14�24 15�00 15�64 16�20 16�69
4 6�512 8�120 9�173 9�958 10�58 11�10 11�55 11�93 12�27
5 5�702 6�976 7�804 8�421 8�913 9�321 9�669 9�972 10�24
6 5�243 6�331 7�033 7�556 7�973 8�318 8�613 8�869 9�097
7 4�949 5�919 6�543 7�005 7�373 7�679 7�939 8�166 8�368
8 4�746 5�635 6�204 6�625 6�960 7�237 7�474 7�681 7�863
9 4�596 5�428 5�957 6�348 6�658 6�915 7�134 7�325 7�495
10 4�482 5�270 5�769 6�136 6�428 6�669 6�875 7�055 7�213
11 4�392 5�146 5�621 5�970 6�247 6�476 6�672 6�842 6�992
12 4�320 5�046 5�502 5�836 6�101 6�321 6�507 6�670 6�814
13 4�260 4�964 5�404 5�727 5�981 6�192 6�372 6�528 6�667

779Appendix: Tables
Table a.9 (continued)
Critical�Values�for�the�Studentized�Range�Statistic
J or r
v 2 3 4 5 6 7 8 9 10
α�= .01
14 4�210 4�895 5�322 5�634 5�881 6�085 6�258 6�409 6�543
15 4�168 4�836 5�252 5�556 5�796 5�994 6�162 6�309 6�439
16 4�131 4�786 5�192 5�489 5�722 5�915 6�079 6�222 6�349
17 4�099 4�742 5�140 5�430 5�659 5�847 6�007 6�147 6�270
18 4�071 4�703 5�094 5�379 5�603 5�788 5�944 6�081 6�201
19 4�046 4�670 5�054 5�334 5�554 5�735 5�889 6�022 6�141
20 4�024 4�639 5�018 5�294 5�510 5�688 5�839 5�970 6�087
24 3�956 4�546 4�907 5�168 5�374 5�542 5�685 5�809 5�919
30 3�889 4�455 4�799 5�048 5�242 5�401 5�536 5�653 5�756
40 3�825 4�367 4�696 4�931 5�114 5�265 5�392 5�502 5�599
60 3�762 4�282 4�595 4�818 4�991 5�133 5�253 5�356 5�447
120 3�702 4�200 4�497 4�709 4�872 5�005 5�118 5�214 5�299
∞ 3�643 4�120 4�403 4�603 4�757 4�882 4�987 5�078 5�157
J or r
v 11 12 13 14 15 16 17 18 19
α�= .01
1 253�2 260�0 266�2 271�8 277�0 281�8 286�3 290�4 294�3
2 32�59 33�40 34�13 34�81 35�43 36�00 36�53 37�03 37�50
3 17�13 17�53 17�89 18�22 18�52 18�81 19�07 19�32 19�55
4 12�57 12�84 13�09 13�32 13�53 13�73 13�91 14�08 14�24
5 10�48 10�70 10�89 11�08 11�24 11�40 11�55 11�68 11�81
6 9�301 9�485 9�653 9�808 9�951 10�08 10�21 10�32 10�43
7 8�548 8�711 8�860 8�997 9�124 9�242 9�353 9�456 9�554
6 8�027 8�176 8�312 8�436 8�552 8�659 8�760 8�854 8�943
9 7�647 7�784 7�910 8�025 8�132 8�232 8�325 3�412 8�495
10 7�356 7�485 7�603 7�712 7�812 7�906 7�993 8�076 8�153
11 7�128 7�250 7�362 7�465 7�560 7�649 7�732 7�809 7�883
12 6�943 7�060 7�167 7�265 7�356 7�441 7�520 7�594 7�665
13 6�791 6�903 7�006 7�101 7�188 7�269 7�345 7�417 7�485
14 6�664 6�772 6�871 6�962 7�047 7�126 7�199 7�268 7�333
15 6�555 6�660 6�757 6�845 6�927 7�003 7�074 7�142 7�204
16 6�462 6�564 6�658 6�744 6�823 6�898 6�967 7�032 7�093
17 6�381 6�480 6�572 6�656 6�734 6�806 6�873 6�937 6�997
18 6�310 6�407 6�497 6�579 6�655 6�725 6�792 6�854 6�912
19 6�247 6�342 6�430 6�510 6�585 6�654 6�719 6�780 6�837
20 6�191 6�285 6�371 6�450 6�523 6�591 6�654 6�714 6�771
24 6�017 6�106 6�186 6�261 6�330 6�394 6�453 6�510 6�563
30 5�849 5�932 6�008 6�078 6�143 6�203 6�259 6�311 6�361
40 5�686 5�764 5�835 5�900 5�961 6�017 6�069 6�119 6�165
60 5�528 5�601 5�667 5�728 5�785 5�837 5�886 5�931 5�974
120 5�375 5�443 5�505 5�562 5�614 5�662 5�708 5�750 5�790
∞ 5�227 5�290 5�348 5�400 5�448 5�493 5�535 5�574 5�611
Source:� Reprinted�from�Harter,�H�L�,�Ann. Math. Statist�,�31,�1122,�1960,�Table�3��With�permission�of�the�
Institute�of�Mathematical�Statistics�
J for�Tukey��r�for�Newman–Keuls�

780 Appendix: Tables
Table a.10
Critical�Values�for�the�Bryant–Paulson�Procedure
α = .05
v J = 2 J = 3 J = 4 J = 5 J = 6 J = 7 J = 8 J = 10 J = 12 J = 16 J = 20
X = 1
2 7�96 11�00 12�99 14�46 15�61 16�56 17�36 18�65 19�68 21�23 22�40
3 5�42 7�18 8�32 9�17 9�84 10�39 10�86 11�62 12�22 13�14 13�83
4 4�51 5�84 6�69 7�32 7�82 8�23 8�58 9�15 9�61 10�30 10�82
5 4�06 5�17 5�88 6�40 6�82 7�16 7�45 7�93 8�30 8�88 9�32
6 3�79 4�78 5�40 5�86 6�23 6�53 6�78 7�20 7�53 8�04 8�43
7 3�62 4�52 5�09 5�51 5�84 6�11 6�34 6�72 7�03 7�49 7�84
8 3�49 4�34 4�87 5�26 5�57 5�82 6�03 6�39 6�67 7�10 7�43
10 3�32 4�10 4�58 4�93 5�21 5�43 5�63 5�94 6�19 6�58 6�87
12 3�22 3�95 4�40 4�73 4�98 5�19 5�37 5�67 5�90 6�26 6�53
14 3�15 3�85 4�28 4�59 4�83 5�03 5�20 5�48 5�70 6�03 6�29
16 3�10 3�77 4�19 4�49 4�72 4�91 5�07 5�34 5�55 5�87 6�12
18 3�06 3�72 4�12 4�41 4�63 4�82 4�98 5�23 5�44 5�75 5�98
20 3�03 3�67 4�07 4�35 4�57 4�75 4�90 5�15 5�35 5�65 5�88
24 2�98 3�61 3�99 4�26 4�47 4�65 4�79 5�03 5�22 5�51 5�73
30 2�94 3�55 3�91 4�18 4�38 4�54 4�69 4�91 5�09 5�37 5�58
40 2�89 3�49 3�84 4�09 4�29 4�45 4�58 4�80 4�97 5�23 5�43
60 2�85 3�43 3�77 4�01 4�20 4�35 4�48 4�69 4�85 5�10 5�29
120 2�81 3�37 3�70 3�93 4�11 4�26 4�38 4�58 4�73 4�97 5�15
X = 2
2 9�50 13�18 15�59 17�36 18�75 19�89 20�86 22�42 23�66 25�54 26�94
3 6�21 8�27 9�60 10�59 11�37 12�01 12�56 13�44 14�15 15�22 16�02
4 5�04 6�54 7�51 8�23 8�80 9�26 9�66 10�31 10�83 11�61 12�21
5 4�45 5�68 6�48 7�06 7�52 7�90 8�23 8�76 9�18 9�83 10�31
6 4�10 5�18 5�87 6�37 6�77 7�10 7�38 7�84 8�21 8�77 9�20
7 3�87 4�85 5�47 5�92 6�28 6�58 6�83 7�24 7�57 8�08 8�46
8 3�70 4�61 5�19 5�61 5�94 6�21 6�44 6�82 7�12 7�59 7�94
10 3�49 4�31 4�82 5�19 5�49 5�73 5�93 6�27 6�54 6�95 7�26
12 3�35 4�12 4�59 4�93 5�20 5�43 5�62 5�92 6�17 6�55 6�83
14 3�26 3�99 4�44 4�76 5�01 5�22 5�40 5�69 5�92 6�27 6�54
16 3�19 3�90 4�32 4�63 4�88 5�07 5�24 5�52 5�74 6�07 6�33
18 3�14 3�82 4�24 4�54 4�77 4�96 5�13 5�39 5�60 5�92 6�17
20 3�10 3�77 4�17 4�46 4�69 4�88 5�03 5�29 5�49 5�81 6�04
24 3�04 3�69 4�08 4�35 4�57 4�75 4�90 5�14 5�34 5�63 5�86
30 2�99 3�61 3�98 4�25 4�46 4�62 4�77 5�00 5�18 5�46 5�68
40 2�93 3�53 3�89 4�15 4�34 4�50 4�64 4�86 5�04 5�30 5�50
60 2�88 3�46 3�80 4�05 4�24 4�39 4�52 4�73 4�89 5�14 5�33
120 2�82 3�38 3�72 3�95 4�13 4�28 4�40 4�60 4�75 4�99 5�17

781Appendix: Tables
Table a.10 (continued)
Critical�Values�for�the�Bryant–Paulson�Procedure
α = .05
v J = 2 J = 3 J = 4 J = 5 J = 6 J = 7 J = 8 J = 10 J = 12 J = 16 J = 20
X = 3
2 10�83 15�06 17�82 19�85 21�45 22�76 23�86 25�66 27�08 29�23 30�83
3 6�92 9�23 10�73 11�84 12�72 13�44 14�06 15�05 15�84 17�05 17�95
4 5�51 7�18 8�25 9�05 9�67 10�19 10�63 11�35 11�92 12�79 13�45
5 4�81 6�16 7�02 7�66 8�17 8�58 8�94 9�52 9�98 10�69 11�22
6 4�38 5�55 6�30 6�84 7�28 7�64 7�94 8�44 8�83 9�44 9�90
7 4�11 5�16 5�82 6�31 6�70 7�01 7�29 7�73 8�08 8�63 9�03
8 3�91 4�88 5�49 5�93 6�29 6�58 6�83 7�23 7�55 8�05 8�42
10 3�65 4�51 5�05 5�44 5�75 6�01 6�22 6�58 6�86 7�29 7�62
12 3�48 4�28 4�78 5�14 5�42 5�65 5�85 6�17 6�43 6�82 7�12
14 3�37 4�13 4�59 4�93 5�19 5�41 5�59 5�89 6�13 6�50 6�78
16 3�29 4�01 4�46 4�78 5�03 5�23 5�41 5�69 5�92 6�27 6�53
18 3�23 3�93 4�35 4�66 4�90 5�10 5�27 5�54 5�76 6�09 6�34
20 3�18 3�86 4�28 4�57 4�81 5�00 5�16 5�42 5�63 5�96 6�20
24 3�11 3�76 4�16 4�44 4�67 4�85 5�00 5�25 5�45 5�75 5�98
30 3�04 3�67 4�05 4�32 4�53 4�70 4�85 5�08 5�27 5�56 5�78
40 2�97 3�57 3�94 4�20 4�40 4�56 4�70 4�92 5�10 5�37 5�57
60 2�90 3�49 3�83 4�08 4�27 4�43 4�56 4�77 4�93 5�19 5�38
120 2�84 3�40 3�73 3�97 4�15 4�30 4�42 4�62 4�77 5�01 5�19
α = .01
v J = 2 J = 3 J = 4 J = 5 J = 6 J = 7 J = 8 J = 10 J = 12 J = 16 J = 20
X = 1
2 19�09 26�02 30�57 33�93 36�58 38�76 40�60 43�59 45�95 49�55 52�24
3 10�28 13�32 15�32 16�80 17�98 18�95 19�77 21�12 22�19 23�82 25�05
4 7�68 9�64 10�93 11�89 12�65 13�28 13�82 14�70 15�40 16�48 17�29
5 6�49 7�99 8�97 9�70 10�28 10�76 11�17 11�84 12�38 13�20 13�83
6 5�83 7�08 7�88 8�48 8�96 9�36 9�70 10�25 10�70 11�38 11�90
7 5�41 6�50 7�20 7�72 8�14 8�48 8�77 9�26 9�64 10�24 10�69
8 5�12 6�11 6�74 7�20 7�58 7�88 8�15 8�58 8�92 9�46 9�87
10 4�76 5�61 6�15 6�55 6�86 7�13 7�35 7�72 8�01 8�47 8�82
12 4�54 5�31 5�79 6�15 6�48 6�67 6�87 7�20 7�46 7�87 8�18
14 4�39 5�11 5�56 5�89 6�15 6�36 6�55 6�85 7�09 7�47 7�75
16 4�28 4�96 5�39 5�70 5�95 6�15 6�32 6�60 6�83 7�18 7�45
18 4�20 4�86 5�26 5�56 5�79 5�99 6�15 6�42 6�63 6�96 7�22
20 4�14 4�77 5�17 5�45 5�68 5�86 6�02 6�27 6�48 6�80 7�04
24 4�05 4�65 5�02 5�29 5�50 5�68 5�83 6�07 6�26 6�56 6�78
30 3�96 4�54 4�89 5�14 5�34 5�50 5�64 5�87 6�05 6�32 6�53
40 3�88 4�43 4�76 5�00 5�19 5�34 5�47 5�68 5�85 6�10 6�30
60 3�79 4�32 4�64 4�86 5�04 5�18 5�30 5�50 5�65 5�89 6�07
120 3�72 4�22 4�52 4�73 4�89 5�03 5�14 5�32 5�47 5�69 5�85
(continued)

782 Appendix: Tables
Table a.10 (continued)
Critical�Values�for�the�Bryant–Paulson�Procedure
α = .01
v J = 2 J = 3 J = 4 J = 5 J = 6 J = 7 J = 8 J = 10 J = 12 J = 16 J = 20
X = 2
2 23�11 31�55 37�09 41�19 44�41 47�06 49�31 52�94 55�82 60�20 63�47
3 11�97 15�56 17�91 19�66 21�05 22�19 23�16 24�75 26�01 27�93 29�38
4 8�69 10�95 12�43 13�54 14�41 15�14 15�76 16�77 17�58 18�81 19�74
5 7�20 8�89 9�99 10�81 11�47 12�01 12�47 13�23 13�84 14�77 15�47
6 6�36 7�75 8�64 9�31 9�85 10�29 10�66 11�28 11�77 12�54 13�11
7 5�84 7�03 7�80 8�37 8�83 9�21 9�53 10�06 10�49 11�14 11�64
8 5�48 6�54 7�23 7�74 8�14 8�48 8�76 9�23 9�61 10�19 10�63
10 5�02 5�93 6�51 6�93 7�27 7�55 7�79 8�19 8�50 8�99 9�36
12 4�74 5�56 6�07 6�45 6�75 7�00 7�21 7�56 7�84 8�27 8�60
14 4�56 5�31 5�78 6�13 6�40 6�63 6�82 7�14 7�40 7�79 8�09
16 4�42 5�14 5�58 5�90 6�16 6�37 6�55 6�85 7�08 7�45 7�73
18 4�32 5�00 5�43 5�73 5�98 6�18 6�35 6�63 6�85 7�19 7�46
20 4�25 4�90 5�31 5�60 5�84 6�03 6�19 6�46 6�67 7�00 7�25
24 4�14 4�76 5�14 5�42 5�63 5�81 5�96 6�21 6�41 6�71 6�95
30 4�03 4�62 4�98 5�24 5�44 5�61 5�75 5�98 6�16 6�44 6�66
40 3�93 4�48 4�82 5�07 5�26 5�41 5�54 5�76 5�93 6�19 6�38
60 3�83 4�36 4�68 4�90 5�08 5�22 5�35 5�54 5�70 5�94 6�12
120 3�73 4�24 4�54 4�75 4�91 5�05 5�16 5�35 5�49 5�71 5�88
X = 3
2 26�54 36�26 42�64 47�36 51�07 54�13 56�71 60�90 64�21 69�25 73�01
3 13�45 17�51 20�17 22�15 23�72 25�01 26�11 27�90 29�32 31�50 33�13
4 9�59 12�11 13�77 15�00 15�98 16�79 17�47 18�60 19�50 20�87 21�91
5 7�83 9�70 10�92 11�82 12�54 13�14 13�65 14�48 15�15 10�17 16�95
6 6�85 8�36 9�34 10�07 10�65 11�13 11�54 12�22 12�75 13�59 14�21
7 6�23 7�52 8�36 8�98 9�47 9�88 10�23 10�80 11�26 11�97 12�51
8 5�81 6�95 7�69 8�23 8�67 9�03 9�33 9�84 10�24 10�87 11�34
10 5�27 6�23 6�84 7�30 7�66 7�96 8�21 8�63 8�96 9�48 9�88
12 4�94 5�80 6�34 6�74 7�05 7�31 7�54 7�90 8�20 8�65 9�00
14 4�72 5�51 6�00 6�36 6�65 6�89 7�09 7�42 7�69 8�10 8�41
16 4�56 5�30 5�76 6�10 6�37 6�59 6�77 7�08 7�33 7�71 8�00
18 4�44 5�15 5�59 5�90 6�16 6�36 6�54 6�83 7�06 7�42 7�69
20 4�35 5�03 5�45 5�75 5�99 6�19 6�36 6�63 6�85 7�19 7�45
24 4�22 4�86 5�25 5�54 5�76 5�94 6�10 6�35 6�55 6�87 7�11
30 4�10 4�70 5�06 5�33 5�54 5�71 5�85 6�08 6�27 6�56 6�78
40 3�98 4�54 4�88 5�13 5�32 5�48 5�61 5�83 6�00 6�27 6�47
60 3�86 4�39 4�72 4�95 5�12 5�27 5�39 5�59 5�75 6�00 6�18
120 3�75 4�25 4�55 4�77 4�94 5�07 5�18 5�37 5�51 5�74 5�90
Source:� Reprinted�from�Bryant,�J�L��and�Paulson,�A�S�,�Biometrika,�63,�631,�1976,�Table�1(a)�and�Table�1(b)��With�
permission�of�Biometrika�Trustees�
X is�the�number�of�covariates�

783
References
Agresti,�A�,�&�Finlay,�B��(1986)��Statistical methods for the social sciences�(2nd�ed�)��San�Francisco:�Dellen�
Agresti,� A�,� &� Pendergast,� J�� (1986)�� Comparing� mean� ranks� for� repeated� measures� data��
Communications in Statistics—Theory and Methods,�15,�1417–1433�
Aldrich,�J��H�,�&�Nelson,�F��D��(1984)��Linear probability, logit, and probit models��Beverly�Hills,�CA:�Sage�
Algina,� J�,� Blair,� R�� C�,� &� Coombs,� W�� T�� (1995)�� A� maximum� test� for� scale:� Type� I� error� rates� and�
power��Journal of Educational and Behavioral Statistics,�20,�27–39�
Algina,�J�,�&�Keselman,�H��J��(2003)��Approximate�confidence�intervals�for�effect�sizes��Educational and
Psychological Measurement,�63(4),�537–553�
Algina,� J�,� Keselman,� H�� J�,� &� Penfield,� R�� D�� (2005)�� Effect� sizes� and� their� intervals:� The� two-� level�
repeated�measures�case��Educational and Psychological Measurement,�65(2),�241–258�
American� Psychological� Association�� (2010)�� Publication manual of the American Psychological
Association��Washington,�DC:�Author�
Andrews,�D��F��(1971)��Significance�tests�based�on�residuals��Biometrika,�58,�139–148�
Andrews,�D��F�,�&�Pregibon,�D��(1978)��Finding�the�outliers�that�matter��Journal of the Royal Statistical
Society, Series B,�40,�85–93�
Applebaum,�M��I�,�&�Cramer,�E��M��(1974)��Some�problems�in�the�nonorthogonal�analysis�of�variance��
Psychological Bulletin,�81,�335–343�
Atiqullah,�M��(1964)��The�robustness�of�the�covariance�analysis�of�a�one-way�classification��Biometrika,�
51,�365–373�
Atkinson,�A��C��(1985)��Plots, transformations, and regression��Oxford,�U�K�:�Oxford�University�Press�
Barnett,�V�,�&�Lewis,�T��(1978)��Outliers in statistical data��New�York:�Wiley�
Barnett,�V�,�&�Lewis,�T��(1994)��Outliers in statistical data (3rd�ed�)��New�York:�Wiley�
Basu,�S�,�&�DasGupta,�A��(1995)��Robustness�of�standard�confidence�intervals�for�location�parameters�
under�departure�from�normality��Annals of Statistics,�23,�1433–1442�
Bates,�D��M�,�&�Watts,�D��G��(1988)��Nonlinear regression analysis and its applications��New�York:�Wiley�
Beal,�S��L��(1987)��Asymptotic�confidence�intervals�for�the�difference�between�two�binomial�param-
eters�for�use�with�small�samples��Biometrics,�43,�941–950�
Beckman,�R�,�&�Cook,�R��D��(1983)��Outliers…�s��Technometrics,�25,�119–149�
Belsley,�D��A�,�Kuh,�E�,�&�Welsch,�R��E��(1980)��Regression diagnostics��New�York:�Wiley�
Benjamini,�Y�,�&�Hochberg,�Y��(1995)��Controlling�the�false�discovery�rate:�A�practical�and�powerful�
approach�to�multiple�testing��Journal of the Royal Statistical Society, B,�57,�289–300�
Bernstein,�I��H��(1988)��Applied multivariate analysis��New�York:�Springer-Verlag�
Berry,�W��D�,�&�Feldman,�S��(1985)��Multiple regression in practice��Beverly�Hills,�CA:�Sage�
Boik,�R��J��(1979)��Interactions,�partial�interactions,�and�interaction�contrasts�in�the�analysis�of�vari-
ance��Psychological Bulletin,�86,�1084–1089�
Boik,�R��J��(1981)��A�priori�tests�in�repeated�measures�designs:�Effects�of�nonsphericity��Psychometrika,�
46,�241–255�
Box,� G�� E�� P�� (1954a)�� Some� theorems� on� quadratic� forms� applied� in� the� study� of� analysis� of� vari-
ance�problems,�I:�Effects�of�inequality�of�variance�in�the�one-way�model��Annals of Mathematical
Statistics,�25,�290–302�
Box,�G��E��P��(1954b)��Some�theorems�on�quadratic�forms�applied�in�the�study�of�analysis�of�variance�
problems,�II:�Effects�of�inequality�of�variance�and�of�correlation�between�errors�in�the�two-way�
classification��Annals of Mathematical Statistics,�25,�484–498�
Box,�G��E��P�,�&�Anderson,�S��L��(1962)��Robust tests for variances and effect of non-normality and variance
heterogeneity on standard tests��Tech��Rep��No��7,�Ordinance�Project�No��TB�2-0001�(832),�Dept��of�
Army�Project�No��599-01-004�

784 References
Box,�G�E�P��&�Cox,�D�R��(1964)��An�analysis�of�transformations�(with�discussion)��Journal of the Royal
Statistical Society,�Series�B,�26,�211–246�
Bradley,�J��V��(1978)��Robustness?�British Journal of Mathematical and Statistical Psychology, 31,�144–152��
Bradley,�J�V��(1982)��The�insidious�L-shaped�distribution��Bulletin of the Psychonomic Society,�20(2),�
85–88�
Brown,�M��B�,�&�Forsythe,�A��(1974)��The�ANOVA�and�multiple�comparisons�for�data�with�heteroge-
neous�variances��Biometrics,�30,�719–724�
Brunner,� E�,� Detta,� H�,� &� Munk,� A�� (1997)�� Box-type� approximations� in� nonparametric� factorial�
designs��Journal of the American Statistical Association,�92,�1494–1502�
Bryant,� J�� L�,� &� Paulson,�A�� S�� (1976)��An� extension� of� Tukey’s� method� of� multiple� comparisons� to�
experimental�designs�with�random�concomitant�variables��Biometrika,�63,�631–638�
Campbell,�D��T�,�&�Stanley,�J��C��(1966)��Experimental and quasi-experimental designs for research��Chicago:�
Rand�McNally�
Carlson,�J��E�,�&�Timm,�N��H��(1974)��Analysis�of�nonorthogonal�fixed-effects�designs��Psychological
Bulletin,�81,�563–570�
Carroll,� R�� J�,� &� Ruppert,� D�� (1982)�� Robust� estimation� in� heteroscedastic� linear� models�� Annals of
Statistics,�10,�429–441�
Chakravart,� I�� M�,� Laha,� R�� G�,� &� Roy,� J�� (1967)�� Handbook of methods of applied statistics� (Vol�� 1)��
New York:�Wiley�
Chambers,�J��M�,�Cleveland,�W��S�,�Kleiner,�B�,�&�Tukey,�P��A��(1983)��Graphical methods for data analysis��
Belmont,�CA:�Wadsworth�
Chatterjee,�S�,�&�Price,�B��(1977)��Regression analysis by example��New�York:�Wiley�
Christensen,�R��(1997)��Log-linear models and logistic regression�(2nd�ed�)��New�York:�Springer-Verlag�
Cleveland,�W��S��(1993)��Elements of graphing data��New�York:�Chapman�&�Hall�
Clinch,�J��J�,�&�Keselman,�H��J��(1982)��Parametric�alternatives�to�the�analysis�of�variance��Journal of
Educational Statistics,�7,�207–214�
Coe,�P��R�,�&�Tamhane,�A��C��(1993)��Small�sample�confidence�intervals�for�the�difference,�ratio�and�
odds�ratio�of�two�success�probabilities��Communications in Statistics- Simulation and Computation,�
22,�925–938�
Cohen,�J��(1988)��Statistical power analysis for the behavioral sciences�(2nd�ed�)��Hillsdale,�NJ:�Erlbaum�
Cohen,� J�,� &� Cohen,� P�� (1983)�� Applied multiple regression/correlation analysis for the behavioral sciences�
(2nd�ed�)��Hillsdale,�NJ:�Erlbaum�
Conover,�W�,�&�Iman,�R��(1981)��Rank�transformations�as�a�bridge�between�parametric�and�nonpara-
metric�statistics��The American Statistician,�35,�124–129�
Conover,� W�,� &� Iman,� R�� (1982)��Analysis� of� covariance� using� the� rank� transformation�� Biometrics,�
38, 715–724�
Cook,�R��D��(1977)��Detection�of�influential�observations�in�linear�regression��Technometrics,�19,�15–18�
Cook,�T��D�,�&�Campbell,�D��T��(1979)��Quasi-experimentation: Design and analysis issues for field settings��
Chicago:�Rand�McNally�
Cook,�R��D�,�&�Weisberg,�S��(1982)��Residuals and influence in regression��London:�Chapman�&�Hall�
Coombs,�W��T�,�Algina,�J�,�&�Ottman,�D��O��(1996)��Univariate�and�multivariate�omnibus�hypothesis�
tests�selected�to�control�Type�I�error�rates�when�population�variances�are�not�necessarily�equal��
Review of Educational Research,�66,�137–179�
Cotton,� J�� W�� (1998)�� Analyzing within-subjects experiments�� Mahwah,� NJ:� Lawrence� Erlbaum�
Associates�
Cox,�D��R�,�&�Snell,�E��J��(1989)��Analysis of binary data�(2nd�ed�)��London:�Chapman�&�Hall�
Cramer,� E�� M�,� &� Applebaum,� M�� I�� (1980)�� Nonorthogonal� analysis� of� variance—Once� again��
Psychological Bulletin,�87,�51–57�
Croux,�C�,�Flandre,�C�,�&�Haesbroeck,�G��(2002)��The�breakdown�behavior�of�the�maximum�likelihood�
estimator�in�the�logistic�regression�model��Statistics and Probability Letters,�60,�377–386�
Cumming,�G�,�&�Finch,�S��(2001)��A�primer�on�the�understanding,�use,�and�calculation�of�confidence�
intervals� that� are� based� on� central� and� noncentral� distributions�� Educational and Psychological
Measurement,�61(4),�532–574�

785References
D’Agostino,� R�� B�� (1971)�� An� omnibus� test� of� normality� for� moderate� and� large� size� samples��
Biometrika,�58,�341–348�
Derksen,�S�,�&�Keselman,�H��J��(1992)��Backward,�forward�and�stepwise�automated�subset�selection�
algorithms:�Frequency�of�obtaining�authentic�and�noise�variables��British Journal of Mathematical
and Statistical Psychology,�45,�265–282�
Duncan,�G��T�,�&�Layard,�M��W��J��(1973)��A�Monte-Carlo�study�of�asymptotically�robust�tests�for�cor-
relation�coefficients��Biometrika,�60,�551–558�
Dunn,�O��J��(1961)��Multiple�comparisons�among�means��Journal of the American Statistical Association,�
56,�52–64�
Dunn,�O��J��(1974)��On�multiple�tests�and�confidence�intervals��Communications in Statistics,�3,�101–103�
Dunn,� O�� J�,� &� Clark,� V�� A�� (1987)�� Applied statistics: Analysis of variance and regression� (2nd� ed�)��
New York:�Wiley�
Dunnett,� C�� W�� (1955)��A� multiple� comparison� procedure� for� comparing� several� treatments� with� a�
control��Journal of the American Statistical Association,�50,�1096–1121�
Dunnett,�C��W��(1964)��New�tables�for�multiple�comparisons�with�a�control��Biometrics,�20,�482–491�
Dunnett,� C�� W�� (1980)�� Pairwise� multiple� comparisons� in� the� unequal� variance� case�� Journal of the
American Statistical Association,�75,�796–800�
Durbin,�J�,�&�Watson,�G��S��(1950)��Testing�for�serial�correlation�in�least�squares�regression,�I��Biometrika,�
37,�409–428�
Durbin,� J�,� &� Watson,� G�� S�� (1951)�� Testing� for� serial� correlation� in� least� squares� regression,� II��
Biometrika,�38,�159–178�
Durbin,� J�,� &� Watson,� G�� S�� (1971)�� Testing� for� serial� correlation� in� least� squares� regression,� III��
Biometrika,�58,�1–19�
Educational�and�Psychological�Measurement��(2000,�October)��Special�section:�Statistical�significance�
with�comments�by�editors�of�marketing�journals��Educational and Psychological Measurement,�60,�
661–696�
Educational� and� Psychological� Measurement�� (2001a,�April)�� Special� section:� Colloquium� on� effect�
sizes:� The� roles� of� editors,� textbook� authors,� and� the� publication� manual�� Educational and
Psychological Measurement,�61,�181–228�
Educational�and�Psychological�Measurement��(2001b,�August)��Special�section:�Confidence�intervals�
for�effect�sizes��Educational and Psychological Measurement,�61,�517–674�
Elashoff,� J�� D�� (1969)��Analysis� of� covariance:�A� delicate� instrument�� American Educational Research
Journal,�6,�383–401�
Feldt,�L��S��(1958)��A�comparison�of�the�precision�of�three�experimental�designs�employing�a�concomi-
tant�variable��Psychometrika,�23,�335–354�
Ferguson,�G��A�,�&�Takane,�Y��(1989)��Statistical analysis in psychology and education�(6th�ed�)��New�York:�
McGraw-Hill�
Fidler,� F�,� &� Thompson,� B�� (2001)�� Computing� correct� confidence� intervals� for� ANOVA� fixed-� and�
random-effects�effect�sizes��Educational and Psychological Measurement,�61,�575–604�
Finch,� S�,� &� Cumming,� G�� (2009)�� Putting� research� in� context:� Understanding� confidence� intervals�
from�one�or�more�studies��Journal of Pediatric Psychology,�34(9),�903–916�
Fink,�A��(1995)��How to sample in surveys��Thousand�Oaks,�CA:�Sage�
Fisher,�R��A��(1949)��The design of experiments��Edinburgh,�U�K�:�Oliver�&�Boyd,�Ltd�
Friedman,�M��(1937)��The�use�of�ranks�to�avoid�the�assumption�of�normality�implicit�in�the�analysis�
of�variance��Journal of the American Statistical Association,�32,�675–701�
Games,�P��A�,�&�Howell,�J��F��(1976)��Pairwise�multiple�comparison�procedures�with�unequal�n’s�and/
or�variances:�A�Monte�Carlo�study��Journal of Educational Statistics,�1,�113–125�
Geisser,� S�,� &� Greenhouse,� S�� (1958)�� Extension� of� Box’s� results� on� the� use� of� the� F� distribution� in�
multivariate�analysis��Annals of Mathematical Statistics,�29,�855–891�
Ghosh,�B��K��(1979)��A�comparison�of�some�approximate�confidence�intervals�for�the�binomial�param-
eter��Journal of the American Statistical Association,�74,�894–900�
Glass,�G��V�,�&�Hopkins,�K��D��(1996)��Statistical methods in education and psychology�(3rd�ed�)��Boston:�
Allyn�&�Bacon�

786 References
Glass,� G�� V�,� Peckham,� P�� D�,� &� Sanders,� J�� R�� (1972)�� Consequences� of� failure� to� meet� assumptions�
underlying�the�fixed�effects�analyses�of�variance�and�covariance��Review of Educational Research,�
42,�237–288�
Grimm,� L�� G�,� &� Arnold,� P�� R�� (Eds�)�� (1995)�� Reading and understanding multivariate statistics��
Washington,�DC:�American�Psychological�Association�
Grimm,� L�� G�,� &� Arnold,� P�� R�� (Eds�)�� (2002)�� Reading and understanding more multivariate statistics��
Washington,�DC:�American�Psychological�Association�
Grissom,� R�� J�,� &� Kim,� J�� J�� (2005)�� Effect sizes for research: A broad practical approach�� Mahwah,� NJ:�
Lawrence�Erlbaum�Associates�
Hair,�J��F�,�Black,�W��C�,�Babin,�B��J�,�Anderson,�R��E�,�&�Tatham,�R��L��(2006)��Multivariate data analysis�
(6th�ed�)��Upper�Saddle�River,�NJ:�Pearson�Prentice�Hall�
Harlow,�L�,�Mulaik,�S�,�&�Steiger,�J��(Eds�)��(1997)��What if there were no significance tests?�Mahwah,�NJ:�
Lawrence�Erlbaum�Associates�
Harrell,� F�� E�� J�� (1986)�� The� LOGIST� procedure�� In� I�� SAS� Institute� (Ed�),� SUGI supplemental library
user’s guide�(5th�ed�,�pp��269–293)��Cary,�NC:�SAS�Institute,�Inc�
Harwell,�M��(2003)��Summarizing�Monte�Carlo�results�in�methodological�research:�The�single-�factor,�
fixed-effects�ANCOVA�case��Journal of Educational and Behavioral Statistics,�28,�45–70�
Hawkins,�D��M��(1980)��Identification of outliers��London:�Chapman�&�Hall�
Hays,�W��L��(1988)��Statistics�(4th�ed�)��New�York:�Holt,�Rinehart�and�Winston�
Hayter,�A��J��(1986)��The�maximum�familywise�error�rate�of�Fisher’s�least�significant�difference�test��
Journal of the American Statistical Association,�81,�1000–1004�
Heck,� R�� H�,� &� Thomas,� S�� L�� (2000)�� An introduction to multilevel modeling techniques�� Mahwah,� NJ:�
Lawrence�Erlbaum�
Heck,�R��H�,�Thomas,�S��L�,�&�Tabata,�L��N��(2010)��Multilevel and longitudinal modeling with IBM SPSS��
New�York:�Routledge�
Hellevik,� O�� (2009)�� Linear� versus� logistic� regression� when� the� dependent� variable� is� a� dichotomy��
Quality & Quantity,�43(1),�59–74�
Heyde,�C��C�,�Seneta,�E�,�Crepel,�P�,�Feinberg,�S��E�,�&�Gani,�J��(Eds�)��(2001)��Statisticians of the centuries��
New�York:�Springer�
Hochberg,� Y�� (1988)�� A� sharper� Bonferroni� procedure� for� multiple� tests� of� significance�� Biometrika,�
75, 800–802�
Hochberg,�Y�,�&�Tamhane,�A��C��(1987)��Multiple comparison procedures��New�York:�Wiley�
Hochberg,� Y�,� &� Varon-Salomon,� Y�� (1984)�� On� simultaneous� pairwise� comparisons� in� analysis� of�
covariance��Journal of the American Statistical Association,�79,�863–866�
Hocking,�R��R��(1976)��The�analysis�and�selection�of�variables�in�linear�regression��Biometrics,�32, 1–49�
Hoenig,�J��M�,�&�Heisey,�D��M��(2001)��The�abuse�of�power:�The�pervasive�fallacy�of�power�calcula-
tions�for�data�analysis��The American Statistician,�55,�19–24�
Hoerl,� A�� E�,� &� Kennard,� R�� W�� (1970a)�� Ridge� regression:� Biased� estimation� for� non-orthogonal�
models��Technometrics,�12,�55–67�
Hoerl,� A�� E�,� &� Kennard,� R�� W�� (1970b)�� Ridge� regression:� Application� to� non-orthogonal� models��
Technometrics,�12,�591–612�
Hogg,�R��V�,�&�Craig,�A��T��(1970)��Introduction to mathematical statistics��New�York:�Macmillan�
Hosmer,�D��W�,�Hosmer,�T�,�LeCessie,�S�,�&�Lemeshow,�S��(1997)��A�comparison�of�goodness-of-fit�tests�
for�the�logistic�regression�model��Statistics in Medicine,�16,�965–980�
Hosmer,�D��W�,�&�Lemeshow,�S��(1989)��Applied logistic regression,�New�York:�Wiley�
Hosmer,�D��W�,�&�Lemeshow,�S��(2000)��Applied logistic regression�(2nd�ed�)��New�York:�Wiley�
Howell,�D��(1997)��Statistical methods for psychology�(4th�ed�)��Belmont,�CA:�Wadsworth�
Huberty,�C��J��(1989)��Problems�with�stepwise�methods—Better�alternatives��In�B��Thompson�(Ed�),�
Advances in social science methodology�(Vol��1,�pp��43–70)��Greenwich,�CT:�JAI�Press�
Huck,�S��W��(2004)��Reading statistics and research�(4th�ed�)��Boston:�Allyn�&�Bacon�
Huck,� S�� W�,� &� McLean,� R��A�� (1975)�� Using� a� repeated� measures�ANOVA� to� analyze� data� from� a�
pretest-posttest�design:�A�potentially�confusing�task��Psychological Bulletin,�82,�511–518�

787References
Huitema,�B��E��(1980)��The analysis of covariance and alternatives��New�York:�Wiley�
Huberty,�C��J��(2002)��A�history�of�effect�size�indices��Educational and Psychological Measurement,�62(2),�
227–240�
Huynh,� H�,� &� Feldt,� L�� S�� (1970)�� Conditions� under� which� mean� square� ratios� in� repeated� mea-
surement� designs� have� exact� F-distributions�� Journal of the American Statistical Association,�
65, 1582–1589�
Jaeger,�R��M��(1984)��Sampling in education and the social sciences��New�York:�Longman�
James,�G��S��(1951)��The�comparison�of�several�groups�of�observations�when�the�ratios�of�the�popula-
tion�variances�are�unknown��Biometrika,�38,�324–329�
Jennings,�E��(1988)��Models�for�pretest-posttest�data:�Repeated�measures�ANOVA�revisited��Journal of
Educational Statistics,�13,�273–280�
Johansen,�S��(1980)��The�Welch-James�approximation�to�the�distribution�of�the�residual�sum�of�squares�
in�a�weighted�linear�regression��Biometrika,�67,�85–93�
Johnson,�P��O�,�&�Neyman,�J��(1936)��Tests�of�certain�linear�hypotheses�and�their�application�to�some�
educational�problems��Statistical Research Memoirs,�1,�57–93�
Johnson,�R��A�,�&�Wichern,�D��W��(1998)��Applied multivariate statistical analysis�(4th�ed�)��Upper�Saddle�
River,�NJ:�Prentice�Hall�
Kaiser,�L�,�&�Bowden,�D��(1983)��Simultaneous�confidence�intervals�for�all�linear�contrasts�of�means�
with�heterogeneous�variances��Communications in Statistics—Theory and Methods,�12,�73–88�
Kalton,�G��(1983)��Introduction to survey sampling��Thousand�Oaks,�CA:�Sage�
Keppel,�G��(1982)��Design and analysis: A researcher’s handbook�(2nd�ed�)��Englewood�Cliffs,�NJ:�Prentice-Hall�
Keppel,�G�,�&�Wickens,�T��D��(2004)��Design and analysis: A researcher’s handbook�(3rd�ed�)��Upper�Saddle�
River,�NJ:�Pearson�
Kirk,� R�� E�� (1982)�� Experimental design: Procedures for the behavioral sciences� (2nd� ed�)�� Monterey,� CA:�
Brooks/Cole�
Kleinbaum,�D��G�,�Kupper,�L��L�,�Muller,�K��E�,�&�Nizam,�A��(1998)��Applied regression analysis and other
multivariable methods�(3rd�ed�)��Pacific�Grove,�CA:�Duxbury�
Kramer,�C��Y��(1956)��Extension�of�multiple�range�test�to�group�means�with�unequal�numbers�of�rep-
lications��Biometrics,�12,�307–310�
Kreft,�I�,�&�de�Leeuw,�J��(1998)��Introducing multilevel modeling��Thousand�Oaks,�CA:�Sage�
Kruskal,�W��H�,�&�Wallis,�W��A��(1952)��Use�of�ranks�on�one-criterion�variance�analysis��Journal of the
American Statistical Association,�47,�583–621�(with�corrections�in�48,�907–911)�
Lamb,�G��S��(1984)��What�you�always�wanted�to�know�about�six�but�were�afraid�to�ask��The Journal of
Irreproducible Results,�29,�18–20�
Larsen,� W�� A�,� &� McCleary,� S�� J�� (1972)�� The� use� of� partial� residual� plots� in� regression� analysis��
Technometrics,�14,�781–790�
Levy,� P�� S�,� &� Lemeshow,� S�� (1999)�� Sampling of populations: Methods and applications� (3rd� ed�)�� New�
York:�Wiley�
Li,�J�,�&�Lomax,�R��G��(2011)��Analysis�of�variance:�What�is�your�statistical�software�actually�doing?�
Journal of Experimental Education,�73,�279–294�
Lilliefors,� H�� (1967)�� On� the� Kolmogorov-Smirnov� test� for� normality� with� mean� and� variance�
unknown��Journal of the American Statistical Association,�62,�399–402�
Lomax,�R��G�,�&�Surman,�S��H��(2007)��Factorial�ANOVA�in�SPSS:�Fixed-,�random-,�and�mixed-effects�
models��In�S��S��Sawilowsky�(Ed�),�Real data analysis��Greenwich,�CT:�Information�Age�
Long,� J�� S�� (1997)�� Regression� models� for� categorical� and� limited� dependent� variables�� Thousand�
Oaks,�CA:�Sage�
Lord,�F��M��(1960)��Large-sample�covariance�analysis�when�the�control�variable�is�fallible��Journal of
the American Statistical Association,�55,�307–321�
Lord,�F��M��(1967)��A�paradox�in�the�interpretation�of�group�comparisons��Psychological Bulletin,�68,�
304–305�
Lord,�F��M��(1969)��Statistical�adjustments�when�comparing�preexisting�groups��Psychological Bulletin,�
72,�336–337�

788 References
Manly,�B��F��J��(2004)��Multivariate statistical methods: A primer�(3rd�ed�)��London:�Chapman�&�Hall�
Mansfield,� E�� R�,� &� Conerly,� M�� D�� (1987)�� Diagnostic� value� of� residual� and� partial� residual� plots��
The American Statistician,�41,�107–116�
Marascuilo,� L�� A�,� &� Levin,� J�� R�� (1970)�� Appropriate� post� hoc� comparisons� for� interactions� and�
nested�hypotheses�in�analysis�of�variance�designs:�The�elimination�of�type�IV�errors��American
Educational Research Journal,�7,�397–421�
Marascuilo,� L�� A�,� &� Levin,� J�� R�� (1976)�� The� simultaneous� investigation� of� interaction� and� nested�
hypotheses� in� two-factor� analysis� of� variance� designs�� American Educational Research Journal,�
13,�61–65�
Marascuilo,� L��A�,� &� McSweeney,� M�� (1977)�� Nonparametric and distribution-free methods for the social
sciences��Monterey,�CA:�Brooks/Cole�
Marascuilo,�L��A�,�&�Serlin,�R��C��(1988)��Statistical methods for the social and behavioral sciences��New�
York:�Freeman�
Marcoulides,�G��A�,�&�Hershberger,�S��L��(1997)��Multivariate statistical methods: A first course��Mahwah,�
NJ:�Lawrence�Erlbaum�Associates�
Marquardt,� D�� W�,� &� Snee,� R�� D�� (1975)�� Ridge� regression� in� practice�� The American Statistician,�
29, 3–19�
Maxwell,� S�� E�� (1980)�� Pairwise� multiple� comparisons� in� repeated� measures� designs�� Journal of
Educational Statistics,�5,�269–287�
Maxwell,�S��E�,�&�Delaney,�H��D��(1990)��Designing experiments and analyzing data: A model comparison
perspective��Belmont,�CA:�Wadsworth�
Maxwell,� S�� E�,� Delaney,� H�� D�,� &� Dill,� C�� A�� (1984)�� Another� look� at� ANOVA� versus� blocking��
Psychological Bulletin,�95,�136–147�
McCulloch,�C��E��(2005)��Repeated�measures�ANOVA,�RIP?�Chance,�18,�29–33�
Menard,�S��(1995)��Applied logistic regression analysis��Thousand�Oaks,�CA:�Sage�
Menard,�S��(2000)��Applied logistic regression analysis�(2nd�ed�)��Thousand�Oaks,�CA:�Sage�
Mendoza,� J�� L�,� &� Stafford,� K�� L�� (2001)�� Confidence� intervals,� power� calculation,� and� sample� size�
estimation�for�the�squared�multiple�correlation�coefficient�under�the�fixed�and�random�regres-
sion� models:� A� computer� program� and� useful� standard� tables�� Educational and Psychological
Measurement,�61,�650–667�
Meyers,�L��S�,�Gamst,�G�,�&�Guarino,�A��J��(2006)��Applied multivariate research: Design and interpretation��
Thousand�Oaks,�CA:�Sage�
Mickey,�R��M�,�Dunn,�O��J�,�&�Clark,�V��A��(2004)��Applied statistics: Analysis of variance and regression�
(3rd�ed�)��Hoboken,�NJ:�Wiley�
Miller,�A��J��(1984)��Selection�of�subsets�of�regression�variables�(with�discussion)��Journal of the Royal
Statistical Society, A,�147,�389–425�
Miller,�A��J��(1990)��Subset selection in regression��New�York:�Chapman�&�Hall�
Miller,�R��G��(1997)��Beyond ANOVA: Basics of applied statistics��Boca�Raton,�FL:�CRC�Press�
Morgan,�G��A�,�Leech,�N��L�,�Gloeckner,�&�Barrett,�K��C��(2011)��IBM SPSS for introductory statistics:�
Use and interpretation�(4th�edition)��New�York:�Routledge�
Morgan,� G�� A�,� &� Griego,� O�� V�� (1998)�� Easy use and interpretation of SPSS for Windows: Answering
research questions with statistics��Mahwah,�NJ:�Lawrence�Erlbaum�Associates�
Morgan,�G��A�,�Leech,�N��L�,�Gloeckner,�G��W�,�&�Barrett,�K��C��(2005)��IBM SPSS for introductory statis-
tics: Use and interpretation�(4th�ed�)��New�York:�Routledge�
Mosteller,�F�,�&�Tukey,�J�W��(1977)��Data analysis and regression��Reading,�MA:�Addision-Wesley�
Murphy,�K��R�,�&�Myors,�B��(2004)��Statistical power analysis: A simple and general model for traditional
and modern hypothesis tests�(2nd�ed�)��Mahwah,�NJ:�Lawrence�Erlbaum�Associates�
Murphy,�K��R�,�Myors,�B�,�&�Wolach,�A��(2008)��Statistical power analysis:�A simple and general model for
traditional and modern hypothesis tests�(3rd�ed�)��New�York:�Routledge
Myers,�R��H��(1979)��Fundamentals of experimental design�(3rd�ed�)��Boston:�Allyn�and�Bacon�
Myers,�R��H��(1986)��Classical and modern regression with applications��Boston:�Duxbury�
Myers,�R��H��(1990)��Classical and modern regression with applications�(2nd�ed�)��Boston:�Duxbury�

789References
Myers,� J�� L�,� &� Well,� A�� D�� (1995)�� Research design and statistical analysis�� Mahwah,� NJ:� Lawrence�
Erlbaum�Associates�
Nagelkerke,� N�� J�� D�� (1991)�� A� note� on� a� general� definition� of� the� coefficient� of� determination��
Biometrika,�78,�691–692�
Noreen,�E��W��(1989)��Computer intensive methods for testing hypotheses��New�York:�Wiley�
O’Connell,�A��A�,�&�McCoach,�D��B��(Eds�)��(2008)��Multilevel modeling of educational data��Charlotte,�
NC:�Information�Age�Publishing�
O’Grady,� K�� E�� (1982)�� Measures� of� explained� variance:� Cautions� and� limitations�� Psychological
Bulletin,�92,�766–777�
Olejnik,�S��F�,�&�Algina,�J��(1987)��Type�I�error�rates�and�power�estimates�of�selected�parametric�and�
nonparametric�tests�of�scale��Journal of Educational Statistics,�21,�45–61�
Overall,�J��E�,�Lee,�D��M�,�&�Hornick,�C��W��(1981)��Comparison�of�two�strategies�for�analysis�of�vari-
ance�in�nonorthogonal�designs��Psychological Bulletin,�90,�367–375�
Overall,� J�� E�,� &� Spiegel,� D�� K�� (1969)�� Concerning� least� squares� analysis� of� experimental� data��
Psychological Bulletin,�72,�311–322�
Page,� M�� C�,� Braver,� S�� L�,� &� MacKinnon,� D�� P�� (2003)�� Levine’s guide to SPSS for analysis of variance��
Mahwah,�NJ:�Lawrence�Erlbaum�Associates�
Pampel,�F��C��(2000)��Logistic regression: A primer��Thousand�Oaks,�CA:�Sage�
Pavur,� R�� (1988)�� Type� I� error� rates� for� multiple� comparison� procedures� with� dependent� data�� The
American Statistician,�42,�171–173�
Pearson,�E��S��(Ed�)��(1978)��The history of statistics in the 17th and 18th centuries��New�York:�Macmillan�
Peckham,� P�� D�� (1968)�� An investigation of the effects of non-homogeneity of regression slopes upon the
F-test of analysis of covariance��Laboratory� of� Educational� Research,� Rep�� No�� 16,� University� of�
Colorado,�Boulder,�CO�
Pedhazur,� E�� J�� (1997)�� Multiple regression in behavioral research� (3rd� ed�)�� Fort� Worth,� TX:� Harcourt�
Brace�
Pingel,� L�� A�� (1969)�� A comparison of the effects of two methods of block formation on design precision��
Paper�presented�at�the�annual�meeting�of�the�American�Educational�Research�Association,�Los�
Angeles�
Porter,�A��C��(1967)��The effects of using fallible variables in the analysis of covariance��Unpublished�doc-
toral�dissertation,�University�of�Wisconsin,�Madison,�WI�
Porter,�A��C�,�&�Raudenbush,�S��W��(1987)��Analysis�of�covariance:�Its�model�and�use�in�psychological�
research��Journal of Counseling Psychology,�34,�383–392�
Puri,� M�� L�,� &� Sen,� P�� K�� (1969)�� Analysis� of� covariance� based� on� general� rank� scores�� Annals of
Mathematical Statistic,�40,�610–618�
Quade,� D�� (1967)�� Rank� analysis� of� covariance�� Journal of the American Statistical Association,� 62,�
1187–1200�
Raferty,�A��E��(1995)��Bayesian�model�selection�in�social�research��In�P��V��Marsden�(Ed�),�Sociological
methodology 1995�(pp��111–163)��London:�Tavistock�
Ramsey,� P�� H�� (1989)�� Critical� values� of� Spearman’s� rank� order� correlation�� Journal of Educational
Statistics,�14,�245–253�
Ramsey,� P�� H�� (1994)�� Testing� variances� in� psychological� and� educational� research�� Journal of
Educational Statistics,�19,�23–42�
Reichardt,�C��S��(1979)��The�statistical�analysis�of�data�from�nonequivalent�control�group�designs��In�
T��D��Cook�&�D��T��Campbell�(Eds�),�Quasi-experimentation: Design and analysis issues for field set-
tings��Chicago:�Rand�McNally�
Reise,�S��P�,�&�Duan,�N��(Eds�)��(2003)��Multilevel modeling: Methodological advances, issues, and applica-
tions��Mahwah,�NJ:�Lawrence�Erlbaum�
Robbins,�N��B��(2004)��Creating more effective graphs��San�Francisco:�Jossey-Bass�
Rogosa,�D��R��(1980)��Comparing�non-parallel�regression�lines��Psychological Bulletin,�88,�307–321�
Rosenthal,�R�,�&�Rosnow,�R��L��(1985)��Contrast analysis: Focused comparisons in the analysis of variance��
Cambridge,�U�K�:�Cambridge�University�Press�

790 References
Rousseeuw,�P��J�,�&�Leroy,�A��M��(1987)��Robust regression and outlier detection��New�York:�Wiley�
Rudas,�T��(2004)��Probability theory: A primer��Thousand�Oaks,�CA:�Sage�
Ruppert,�D�,�&�Carroll,�R��J��(1980)��Trimmed�least�squares�estimation�in�the�linear�model��Journal of
the American Statistical Association,�75,�828–838�
Rutherford,�A��(1992)��Alternatives�to�traditional�analysis�of�covariance��British Journal of Mathematical
and Statistical Psychology,�45,�197–223�
Sawilowsky,�S��S�,�&�Blair,�R��C��(1992)��A�more�realistic�look�at�the�robustness�and�type�II�error�prop-
erties�of�the�t-test�to�departures�from�population�normality��Psychological Bulletin,�111,�352–360�
Scariano,�S��M�,�&�Davenport,�J��M��(1987)��The�effects�of�violations�of�independence�assumptions�in�
the�one-way�ANOVA��The American Statistician,�41,�123–129�
Schafer,� W�� D�� (1991)�� Reporting� hierarchical� regression� results�� Measurement and Evaluation in
Counseling and Development,�24,�98–100�
Scheffe’,� H�� (1953)�� A� method� for� judging� all� contrasts� in� the� analysis� of� variance�� Biometrika,�
40, 87–104�
Schmid,�C��F��(1983)��Statistical graphics: Design principles and practices��New�York:�Wiley�
Seber,�G��A��F�,�&�Wild,�C��J��(1989)��Nonlinear regression��New�York:�Wiley�
Shapiro,�S��S�,�&�Wilk,�M��B��(1965)��An�analysis�of�variance�test�for�normality�(complete�samples)��
Biometrika,�52,�591–611�
Shadish,�W��R�,�Cook,�T��D�,�&�Campbell,�D��T��(2002)��Experimental and quasi-experimental designs for
generalized causal inference��Boston:�Houston�Mifflin�
Shavelson,�R��J��(1988)��Statistical reasoning for the behavioral sciences�(2nd�ed�)��Boston:�Allyn�&�Bacon�
Sidak,�Z��(1967)��Rectangular�confidence�regions�for�the�means�of�multivariate�normal�distributions��
Journal of the American Statistical Association,�62,�626–633�
Smithson,�M��(2001)��Correct�confidence�intervals�for�various�regression�effect�sizes�and�parameters:�
The�importance�of�noncentral�distributions�in�computing�intervals��Educational and Psychological
Measurement,�61,�605–632�
Snijders,�T��A��B�,�&�Bosker,�R��J��(1999)��Multilevel analysis: An introduction to basic and advanced multi-
level modeling��Thousand�Oaks,�CA:�Sage�
Steiger,�J��H�,�&�Fouladi,�R��T��(1992)��R2:�A�computer�program�in�interval�estimation,�power�calcula-
tion,� and� hypothesis� testing� for� the� squared� multiple� correlation�� Behavior Research Methods,
Instruments, and Computers,�4,�581–582�
Stevens,�J��P��(1984)��Outliers�and�influential�data�points�in�regression�analysis��Psychological Bulletin,�
95(2),�334–344�
Stevens,�J��P��(2002)��Applied multivariate statistics for the social sciences�(4th�ed�)��Mahwah,�NJ:�Lawrence�
Erlbaum�Associates�
Stevens,� J�� P�� (2009)�� Applied multivariate statistics for the social sciences� (5th� ed�)�� New� York:�
Routledge�
Stigler,�S��M��(1986)��The history of statistics: The measurement of uncertainty before 1900��Cambridge,�MA:�
Harvard�
Storer,�B��E�,�&�Kim,�C��(1990)��Exact�properties�of�some�exact�test�statistics�for�comparing�two�bino-
mial�proportions��Journal of the American Statistical Association,�85,�146–155�
Sudman,�S��(1976)��Applied sampling��New�York:�Academic�
Tabachnick,�B��G�,�&�Fidell,�L��S��(2007)��Using multivariate statistics�(5th�ed�)��Boston:�Pearson�
Tabatabai,�M�,�&�Tan,�W��(1985)��Some�comparative�studies�on�testing�parallelism�of�several�straight�
lines�under�heteroscedastic�variances��Communications in Statistics—Simulation and Computation,�
14,�837–844�
Thompson,�M��L��(1978)��Selection�of�variables�in�multiple�regression��Part�I:�A�review�and�evalua-
tion�� Part� II:� Chosen� procedures,� computations� and� examples�� International Statistical Review,�
46,�1–19�and�129–146�
Tijms,� H�� (2004)�� Understanding probability: Chance rules in everyday life�� New� York:� Cambridge�
University�Press�
Tiku,� M�� L�,� &� Singh,� M�� (1981)�� Robust� test� for� means� when� population� variances� are� unequal��
Communications in Statistics—Theory and Methods,�10,�2057–2071�

791References
Timm,�N��H��(2002)��Applied multivariate analysis��New�York:�Springer-Verlag�
Timm,� N�� H�,� &� Carlson,� J�� E�� (1975)�� Analysis� of� variance� through� full� rank� models�� Multivariate
Behavioral Research Monographs,�No��75-1�
Tomarken,�A�,�&�Serlin,�R��(1986)��Comparison�of�ANOVA�alternatives�under�variance�heterogeneity�
and�specific�noncentrality�structures��Psychological Bulletin,�99,�90–99�
Tufte,�E��R��(1992)��The visual display of quantitative information��Cheshire,�CT:�Graphics�Press�
Tukey,�J��W��(1949)��One�degree�of�freedom�for�nonadditivity��Biometrics,�5,�232–242�
Tukey,�J��W��(1953)��The problem of multiple comparisons�(396pp)��Ditto:�Princeton�University�
Tukey,�J��W��(1977)��Exploratory data analysis��Reading,�MA:�Addison-Wesley�
Wainer,�H��(1984)��How�to�display�data�badly��The American Statistician,�38,�137–147�
Wainer,�H��(1992)��Understanding�graphs�and�tables��Educational Researcher,�21,�14–23�
Wainer,�H��(2000)��Visual revelations��Mahwah,�NJ:�Lawrence�Erlbaum�Associates�
Wallgren,�A�,�Wallgren,�B�,�Persson,�R�,�Jorner,�U�,�&�Haaland,�J�-A��(1996)��Graphing statistics & data��
Thousand�Oaks,�CA:�Sage�
Weinberg,� S�� L�,� &� Abramowitz,� S�� K�� (2002)�� Data analysis for the behavioral sciences using SPSS��
Cambridge,�U�K�:�Cambridge�University�Press�
Weisberg,�H��I��(1979)��Statistical�adjustments�and�uncontrolled�studies��Psychological Bulletin,�86,�1149–1164�
Weisberg,�S��(1985)��Applied linear regression�(2nd�ed�)��New�York:�Wiley�
Welch,�B��L��(1951)��On�the�comparison�of�several�mean�values:�An�alternative�approach��Biometrika,�
38,�330–336�
Wetherill,�G��B��(1986)��Regression analysis with applications��London:�Chapman�&�Hall�
Wilcox,� R�� R�� (1986)�� Controlling� power� in� a� heteroscedastic� ANOVA� procedure�� British Journal of
Mathematical and Statistical Psychology,�39,�65–68�
Wilcox,�R��R��(1987)��New statistical procedures for the social sciences: Modern solutions to basic problems��
Hillsdale,�NJ:�Lawrence�Erlbaum�Associates�
Wilcox,� R�� R�� (1988)��A� new� alternative� to� the�ANOVA� F� and� new� results� on� James’� second-� order�
method��British Journal of Mathematical and Statistical Psychology,�41,�109–117
Wilcox,�R��R��(1989)��Adjusting�for�unequal�variances�when�comparing�means�in�one-way�and�two-
way�fixed�effects�ANOVA�models��Journal of Educational Statistics,�14,�269–278�
Wilcox,� R�� R�� (1993)�� Comparing� one-step� M-estimators� of� location� when� there� are� more� than� two�
groups��Psychometrika,�58,�71–78�
Wilcox,�R��R��(1996)��Statistics for the social sciences��San�Diego,�CA:�Academic�
Wilcox,�R��R��(1997)��Introduction to robust estimation and hypothesis testing��San�Diego,�CA:�Academic�
Wilcox,� R�� R�� (2002)�� Comparing� the� variances� of� two� independent� groups�� British Journal of
Mathematical and Statistical Psychology,�55,�169–175�
Wilcox,�R��R��(2003)��Applying contemporary statistical procedures��San�Diego,�CA:�Academic�
Wilkinson,�L��(2005)��The grammar of statistics�(2nd�ed�)��New�York:�Springer�
Wonnacott,�T��H�,�&�Wonnacott,�R��J��(1981)��Regression: A second course in statistics��New�York:�Wiley�
Wright,�R��E��(1995)��Logistic�regression��In�L��G��Grimm�&�P��R��Arnold�(Eds�)��Reading and understand-
ing multivariate statistics�(pp��217–244)��Washington,�DC:�American�Psychological�Association�
Wu,� L�� L�� (1985)�� Robust� M-estimation� of� location� and� regression�� In� N�� B�� Tuma� (Ed�),� Sociological
methodology, 1985��San�Francisco:�Jossey-Bass�
Xie,�X�-J�,�Pendergast,�J�,�&�Clarke,�W��(2008)��Increasing�the�power:�A�practical�approach�to�goodness-
of-fit�test�for�logistic�regression�models�with�continuous�predictors��Computational Statistics &
Data Analysis,�52(5),�2703–2713�
Yu,�M��C�,�&�Dunn,�O��J��(1982)��Robust�tests�for�the�equality�of�two�correlation�coefficients:�A�Monte�
Carlo�study��Educational and Psychological Measurement,�42,�987–1004�
Yuan,� K�-H�,� &� Maxwell�� S�� (2005)�� On� the� post� hoc� power� in� testing� mean� differences�� Journal of
Educational and Behavioral Statistics,�30,�141–167�
Zimmerman,�D��W��(1997)��A�note�of�interpretation�of�the�paired-samples�t-test��Journal of Educational
and Behavioral Statistics,�22,�349–360�
Zimmerman,� D�� W�� (2003)�� A� warning� about� the� large-sample� Wilcoxon-Mann-Whitney� test��
Understanding Statistics,�2,�267–280�

793
Odd-Numbered Answers to Problems
Chapter 1
Conceptual Problems
1.1� �Constant� (all� individuals� in� the� study� are� married;� thus,� the� marital� status� will� be�
“married”�for�everyone�participating;�in�other�words,�there�is�no�variation�in�“marital�
status”�for�this�particular�scenario)�
1.3� c�(true�ratios�cannot�be�formed�with�interval�variables)�
1.5� d�(true�ratios�can�only�be�formed�with�ratio�variables)�
1.7� �d�(an�absolute�value�of�zero�would�indicate�an�absence�of�what�was�measured—i�e�,�the�
number�of�years�playing�in�a�band—and�thus�ratio�is�the�scale�of�measure;�although�
an� answer� of� zero� is� not� likely� given� that� the� students� in� the� band� are� those� being�
measured,�if�someone�were�to�respond�with�an�answer�of�zero,�that�value�would�truly�
indicate�“no�years�playing�an�instrument”)�
1.9� �True�(there�are�only�population�parameters�and�sample�statistics;�no�other�combina-
tions�exist)�
1.11� �True�(categorical�variables�can�have�any�number�of�qualitative�values;�dichotomous�
variables�are�limited�to�only�two�values)�
1.13� c�(equal�intervals�is�not�a�characteristic�of�an�ordinal�variable)�
1.15� No�(equal�intervals�is�not�a�characteristic�of�an�ordinal�variable)�
Computational Problems
1.1
Value Rank
10 7
15 5
12 6
8 8
20 2
17 4
5 9
21 1
3 10
19 3

794 Odd-Numbered Answers to Problems
1.3
Value Rank
8 6
6 8
3 10
12 4
19 3
7 7
10 5
25 2
4 9
42 1
Chapter 2
Conceptual Problems
2.1� c�(percentile�and�percentile�rank�are�two�sides�of�the�same�coin;�if�the�50th�percen-
tile =�100,�then�PR(100)�=�50)�
2.3� a�(for�96,�crf�=��09�for�both�X�and�Y�and�crf�=��10�for�Z)�
2.5� d�(ethnicity�is�not�continuous,�so�only�a�bar�graph�is�appropriate)�
2.7� c�(see�Section�2�2�3)�
2.9� False�(the�proportion�is��25�by�definition)�
2.11� a�(eye�color�is�nominal�and�not�continuous)�
2.13� True�(with�the�same�interval�width,�each�is�based�on�exactly�the�same�information)�
2.15� No�(it�is�most�likely�that�Q1�will�be�smaller�for�the�negatively�skewed�variable)�
2.17� c�(if�the�relative�frequency�for�the�value�55�is�20%�and�for�70�is�30%,�the�cumulative�
relative�frequency�for�the�value�70�is�50%)�
Computational Problems
2.1� (a–d)�Frequency�Distributions:
X f cf rf crf
41 2 2 f/n�=�2/50�=��04 �04
42 2 4 �04 �08
43 4 8 �08 �16
44 5 13 �10 �26
45 6 19 �12 �38
46 8 27 �16 �54
47 11 38 �22 �76
48 4 42 �08 �84
49 5 47 �10 �94
50 3 50 �06 1�00
n�=�50 1�00

795Odd-Numbered Answers to Problems
x
50494847464544434241
Fr
eq
ue
nc
y
12
10
8
6
4
2
0
� (e)� Frequency�polygon
� (g)� Q1�=�44�4,�Q2�=�46�25,�Q3�=�47�4545�(using�“values�are�group�midpoints”�option)
� (h)� P10�=�42�75,�P90�=�49�1
� (i)� PR(41)�=�2%,�PR(49�5)�=�94%
� (j)� Box-and-Whisker�plot
52504846444240
� (k)� Stem-and-leaf�display
Frequency Stem & Leaf
2.00 41 . 00
2.00 42 . 00
4.00 43 . 0000
5.00 44 . 00000
6.00 45 . 000000
8.00 46 . 00000000
11.00 47 . 00000000000
4.00 48 . 0000
5.00 49 . 00000
3.00 50 . 000

796 Odd-Numbered Answers to Problems
2.3 (a–c)�Q1�=�4�4,�Q2�=�5�375,�Q3�=�7�3333�(using�“values�are�group�midpoints”�option)
� (d)� P44�5�=�5�169
� (e)� PR(7)�=�71�6667%
� (f)� Box-and-Whisker�plot
.0 2.0 4.0 6.0
x
8.0 10.0
� (g)� Histogram
8.0
6.0
2.0
.0
.0 2.0 4.0 6.0
x
8.0 10.0 12.0
Mean = 5.8
Std. dev = 2.041
N = 30
4.0C
ou
nt

797Odd-Numbered Answers to Problems
Chapter 3
Conceptual Problems
3.1� b�(will�affect�variance�the�most)�
3.3� d�(variance�cannot�be�negative)�
3.5� False�(that�proportion�is�always��25)�
3.7� No�(class�rank�is�ordinal,�so�mean�inappropriate)�
3.9� Yes�(middle�score�still�the�same)�
3.11� No�(will�be�different�for�small�samples)�
3.13� True�(they�are�based�on�the�same�measurement�scales)�
3.15� �No� (impossible� as,� by� nature� of� the� median� being� the� second� quartile,� the� median�
must�be�larger�than�the�first�quartile;�fire�the�statistician)�
3.17� �d�(range�as�it�is�computed�as�the�difference�between�the�two�extreme�values�in�the�
data)�
3.19 No�(interval�or�ratio�data�must�be�used�to�compute�the�variance)�
Computational Problems
3.1� �Mode� =� 47,� median� =� 46�25,� mean� =� 46,� exclusive� range� =� 9,� inclusive� range� =� 10,�
H = 3�0546,�variance�=�5�28,�standard�deviation�=�2�2978�
3.3 Mode� =� 5,� median� =� 5�375,� mean� =� 5�80,� exclusive� range� =� 8,� inclusive� range� =� 9,�
H = 2�9334,�variance�=�4�1655,�standard�deviation�=�2�041�
3.5� �Mode�=�12,�median�=�11�5,�mean�=�12,�exclusive�range�=�12,�inclusive�range�=�13,�H�= 2,�
variance�=�8�0690,�standard�deviation�=�2�8406�
3.7� Distribution�Z�(it�has�more�extreme�scores�than�the�other�distributions)�
Chapter 4
Conceptual Problems
4.1� d�(skewness�is�zero�for�normal)�
4.3� b�(±2�standard�deviations)�
4.5� b�(only�median�is�a�value�of�X)�
4.7� c�(positive�value�=�leptokurtic)�
4.9� True�(see�z�score�equation)�
4.11� False�(mean�can�be�any�value)�
4.13� �a�(a�long�left�tail�due�to�the�substantial�negative�skewness,�and�a�very�flat�distribution,�
platykurtic,�due�to�the�large�negative�kurtosis�value)�
4.15� c�(where�there�is�the�highest�concentration�of�scores�in�the�middle)�
4.17� �False�(the�variance�of�z�is�always�1�while�the�variance�of�the�raw�scores�can�be�any�
non-negative�value)�
4.19� �a� (a� is� 90th� percentile,� b� is� 84th� percentile,� c� is� 75th� percentile,� and� d� is� 84th�
percentile)�
4.21� �a�(once�standardized�into�a�unit�normal�distribution,�the�mean�is�always�zero,�regard-
less�of�the�values�of�the�original�distribution)�

798 Odd-Numbered Answers to Problems
Computational Problems
4.1� a�=��0485,�b�=��6970,�c�=�10�16,�d�=�46�31,�e�=�approximately�79�67%,�f�=�approximately�
21�48%,�g�=�76�12%�
4.3� a�=��9332,�b�=��7611,�c�=�8�97,�d�=�104,200,�e�=�approximately�97�72%,�f�=�approximately�
62�93%,�g�=�78�87%�
Chapter 5
Conceptual Problems
5.1� c�(see�definition�in�Section�5�2�2)�
5.3� a�(2�out�of�9)�
5.5� a�(see�Section�5�2�2)�
5.7� True�(less�sampling�error�as�n�increases)�
5.9� False�(90%�CI�has�a�wider�range�than�68%�CI)�
5.11� Yes�(extreme�mean�more�likely�with�smaller�n)�
5.13� �b�(probability�of�winning�the�lottery�is�the�same�for�each�attempt,�regardless�of�how�
long�it�has�been�since�a�winner�was�announced)�
5.15� �c� (for� all� teachers� to� have� an� equal� and� independent� probability� of� being� selected,�
the� sampling� procedure� must� be� a� type� of� simple� random� sampling;� the� nature� of�
Malani’s�research�is�such�that�this�should�be�done�without�replacement�as�she�would�
not�want�to�survey�the�same�teacher�twice)�
5.17� c�(due�to�the�central�limit�theorem�with�large�size�samples)�
Computational Problems
5.1� �(a)� population� mean� =� 5;� population� variance� =� 6;� (b)� construct� table� of� possible�
sample�means�like�Table�5�1;�(c)�mean�of�the�sampling�distribution�of�the�mean�=�5;�
variance�of�the�sampling�distribution�of�the�mean�=�3�
5.3� 256�
5.5� Standard�error�of�the�mean�=��6325;�90%�CI�=�1�9595–4�0405�
Chapter 6
Conceptual Problems
6.1� c�(see�definition)�
6.3� b�(willing�to�reject�only�if�sample�mean�is�below�100)�
6.5� a�(cannot�make�Type�II�error�there)�
6.7� e�(most�extreme�value�regardless�of�sign)�
6.9� False�(cannot�make�a�Type�I�error�there)�
6.11� Yes�(the�p�value�is�less�than�the�alpha�level,�so�there�is�a�statistical�significance)�
6.13� No�(cannot�tell�just�from�mean�difference,�need�more�information)�
6.15� No�(the�range�will�be�wider�for�the�99%�CI)�
6.17� False�(the�mean�is�zero�for�any�t�distribution)�
6.19� True�(the�width�of�the�CI�only�depends�on�the�critical�value�and�the�standard�error)�

799Odd-Numbered Answers to Problems
Computational Problems
6.1� �(a)�B�may�or�may�not�reject;�(b)�A�also�rejects;�(c)�B�also�fails�to�reject;�(d)�A�may�or�may�
not�fail�to�reject�
6.3� (a)�95th,�(b)�90th,�(c)�97�5th,�(d)�0,�(e)�0,�(f)�1�25,�(g)�1�761�
6.5� (a)�t�=�−�884,�critical�values�=�−2�093�and�+2�093,�fail�to�reject�H0;
�� (b)�(2�3265,�3�2735),�includes�hypothesized�value�of�3�0�and�thus�fail�to�reject�H0�
Chapter 7
Conceptual Problems
7.1� �e� (if� null� hypothesis� is� true� and� you� reject,� then� you� have� definitely� made� a� Type� I�
error)�
7.3� c�(see�definition)�
7.5� False�(sampling�error�is�less�for�larger�samples)�
7.7� Yes�(smaller�value�when�all�of�critical�region�is�in�one�tail;�see�t�table)�
7.9� d�(there�is�no�such�test;�the�tests�mentioned�all�deal�with�means)�
7.11� �a�(the�independent�t�test�is�appropriate�to�use�for�testing�mean�differences�between�
groups—as�is�the�case�here)�
7.13� No�(it�will�decrease,�as�shown�in�Table�A�2)�
7.15� �d�(homogeneity�of�variances,�via�Levene’s�test,�is�provided�by�default�in�SPSS�when�
conducting�the�independent�t�test)�
Computational Problems
7.1� �(a)�t�=�−2�1097,�critical�values�are�approximately�−2�041�and�+2�041,�reject�H0��(b)�(−9�2469,�
−�1531),�does�not�include�hypothesized�value�of�0�and�thus�reject�H0�
7.3� �(a)�t�=�−3�185,�critical�values�are�−2�074�and�+2�074,�reject�H0��(b)�(−6�742,�−1�4248),�does�
not�include�hypothesized�value�of�0�and�thus�reject�H0�
7.5� �(a)�t�=�4�117,�critical�values�are�−2�145�and�+2�145,�reject�H0��(b)�(9�7396,�30�9271),�does�not�
include�hypothesized�value�of�0�and�thus�reject�H0�
7.7� t�=�2�4444,�critical�value�is�1�658,�reject�H0�
Chapter 8
Conceptual Problems
8.1� b�(4�×�6�=�24)�
8.3� True�(see�definition)�
8.5� No�(cannot�have�a�negative�proportion)�
8.7� No�(reject�when�test�statistic�exceeds�critical�value)�
8.9� �d� (as� the� difference� between� the� observed� and� expected� proportions� increases,� the�
chi-square�test�statistic�increases,�and,�thus,�we�are�more�likely�to�reject)�
8.11� �a� (chi-square� goodness-of-fit� test� given� there� is� only� one� variable� and� the� goal�
is� to� determine� if� the� proportions� within� the� categories� of� that� variable� are� the�
same)�

800 Odd-Numbered Answers to Problems
Computational Problems
8.1� p�=��75,�z�=�2�1898,�critical�values�=�−1�96�and�+1�96,�thus�reject�H0�
8.3� z�=�−�1644,�critical�values�=�−1�96�and�+1�96,�thus�fail�to�reject�H0�
8.5� �Critical�value�=�9�48773,�fail�to�reject�H0�as�the�test�statistic�does�not�exceed�the�criti-
cal�value�
8.7� χ2�=��404,�critical�value�=�2�70554,�thus�fail�to�reject�H0�
Chapter 9
Conceptual Problems
9.1� c�(see�Section�9�4)�
9.3� Yes�(cannot�reject�if�sample�variances�are�equal)�
9.5� �Yes�(this�is�a�right-tailed�test�and�the�sample�variance�is�in�the�direction�of�the�right�
tail)�
9.7� No,�not�enough�information�(do�not�know�hypothesized�variance)�
9.9� b�(involves�naturally�occurring�couples�or�pairs)�
Computational Problems
9.1� �(a)�sample�variance�=�27�9292,�χ2�=�5�5858,�critical�values�=�7�2609�and�24�9958,�thus�
reject�H0��(b)�(16�7603,�57�6978),�thus�reject�H0�as�the�interval�does�not�contain�75�
9.3� t�=�2�3474,�critical�values�=�−2�042�and�+2�042,�thus�reject�H0�
9.5� χ2�=�8�0,�critical�values�=�9�59078�and�34�1696,�thus�reject�H0�
9.7� t�=�−2�6178,�critical�values�=�−2�756�and�+2�756,�thus�fail�to�reject�H0�
Chapter 10
Conceptual Problems
10.1� d�[2/(3)(2)�=��3333]�
10.3� c�(weakest�means�correlation�nearest�to�0)�
10.5� �a� (a� linear� relationship� will� fall� into� a� reasonably� linear� scatterplot,� although� not�
necessarily�a�perfectly�straight�line)�
10.7� �False� (the� correlation� will� become� smaller;� see� the� correlation� equation� involving�
covariance)�
10.9� Yes�(a�perfect�relationship�implies�a�perfect�correlation,�assuming�linearity)�
10.11� �False� (in� negative� relationships,� the� higher� the� score� on� one� variable,� the� lower� the�
score�on�the�other�variable)�
10.13� �False�(a�correlation�simply�means�that�two�variables�are�related,�not�why�they�are�
related�and�not�because�there�is�definite�causation)�
10.15� �False� (the� Pearson� is� most� appropriate� for� interval/ratio� variables,� while� the�
Spearman’s�rho�or�Kendall’s�τ�are�most�appropriate�for�ordinal�variables)�

801Odd-Numbered Answers to Problems
Computational Problems
10.1� (a)�scatterplot�shown�in�the�following;�(b)�covariance�=�3�250;�(c)�r�=��631;�(d)�r�=��400�
7
6
5
C
ar
ds
_b
al
an
ce
4
3
2
1
2 3 4 5 6
Cards_owned
7 8
10.3� t�=�3�9686,�critical�values�are�approximately�−2�048�and�+2�048,�fail�to�reject�H0�
10.5��(�a)�scatterplot�shown�in�the�following;�(b)�nonlinear�relationship;�(c)�r�=�approximately�
zero�
5
4
3
Bi
lls
2
1
2 3 4 5
Coins
6 7

802 Odd-Numbered Answers to Problems
10.7� (a)�r�=��78;�(b)�strong�effect�
40
30
20
W
or
ds
re
ad
10
0
9 12 15
Letters written
18 21
Chapter 11
Conceptual Problems
11.1� a�(if�the�sample�means�are�all�equal,�then�MSbetw�is�0)�
11.3� c�(lose�1�df�from�each�group;�63�−�3�=�60)�
11.5� d�(equals�the�dfbetw�+�dfwith�=�dftotal;�60�+�2�=�62)�
11.7� d�(null�hypothesis�does�not�consider�SS�values)�
11.9� a�(for�between�source�=�5�−�1�=�4�and�for�within�source�=�250�−�5�=�245)�
11.11� c�(an�F�ratio�of�1�0�implies�that�between-�and�within-groups�variations�are�the�same)�
11.13� True�(mean�square�is�a�variance�estimate)�
11.15� True�(F�ratio�must�be�greater�than�or�equal�to�0)�
11.17� �No�(rejecting�the�null�hypothesis�in�ANOVA�only�indicates�that�there�is�some�differ-
ence�among�the�means,�not�that�all�of�the�means�are�different)�
11.19� c�(the�more�t�tests�conducted,�the�more�likely�a�Type�I�error�for�the�set�of�tests)�
11.21� True�(basically�the�definition�of�independence)�
11.23� No�(find�a�new�statistician�as�a�negative�F�value�is�not�possible�in�this�context)�
Computational Problems
11.1� �dfbetw�=�3,�dfwith�=�60,�dftotal�=�63,�SSwith�=�9�00,�MSbetw�=�3�25,�MSwith�=�0�15,�F�=�21�6666,�
critical�value�=�2�76�(reject�H0)�
11.3� �SSbetw�=�150,�SStotal�=�1110,�dfbetw�=�3,�dfwith�=�96,�dftotal�=�99,�MSbetw�=�50,�MSwith�=�10,�
critical�value�approximately�2�7�(reject�H0)�
11.5� �SSbetw� =� 25�333,� SSwith� =� 27�625,� SStotal� =� 52�958,� df betw� =� 2,� dfwith� =� 21,� dftotal� =� 23,�
MSbetw�=�12�667,�MSwith�=�1�315,�F�=�9�629,�critical�value�=�3�47�(reject�H0)�

803Odd-Numbered Answers to Problems
Chapter 12
Conceptual Problems
12.1� False�(requires�equal�n’s�and�equal�variances;�we�hope�the�means�are�different)�
12.3� c�(c�is�not�legitimate�as�the�contrast�coefficients�do�not�sum�to�0)�
12.5� a�(see�flowchart�of�MCPs�in�Figure�12�2)�
12.7� False�(use�Dunnett�procedure)�
12.9� e�(Scheffe’�is�most�flexible�of�all�MCPs;�can�test�simple�and�complex�contrasts)�
12.11� False�(conducted�to�determine�why�null�has�been�rejected)�
12.13� True�(see�characteristics�of�Tukey�HSD)�
12.15� a�(see�Figure�12�2)�
12.17� �Yes� (each� contrast� is� orthogonal� to� the� others� as� they� rely� on� independent�
information)�
12.19� d�(see�Figure�12�2)�
12.21� No�(do�not�know�the�values�of�the�standard�error,�t,�critical�value,�etc�)�
Computational Problems
12.1� �Contrast� =� −5,� standard� error� =� 1;� t� =� −5,� critical� values� are� 5�10� and� −5�10,� fail� to�
reject�
12.3� Standard�error�=� 60 20 3 1 7321= = . :
•� q1�=�(85�−�50)/1�7321�=�20�2073�
•� q2�=�(85�−�70)/1�7321�=�8�6603�
•� q3�=�(70�−�50)/1�7321�=�11�5470�
•� Critical� values� approximately� 3�39� and� −3�39;� all� contrasts� are� statistically�
significant�
12.5� (a)� μ�1�−�μ�2,�μ�3�−�μ�4,�(μ�1�+�μ�2)/2�−�(μ�3�+�μ�4)/2;�all�of�the�Σcj�are�equal�to�0�
� (b)� No,�as�Σcj�is�not�equal�to�0�
� (c)� H0:�μ�1�−�[(μ�2�+�μ�3�+�μ�4)/3]�
Chapter 13
Conceptual Problems
13.1� c�(a�plot�of�the�cell�means�reveals�an�interaction)�
13.3� �b�(product�of�number�of�degrees�of�freedom�for�each�main�effect;�(J�−�1)(K�−�1)�=�(2)
(2)�=�4)�
13.5� d�(p�less�than�alpha�only�for�the�interaction�term)�
13.7� c�(c�is�one�definition�of�an�interaction)�
13.9� b�(interaction�df�=�product�of�main�effects�df )�
13.11� �d�(the�effect�of�one�factor�depends�on�the�second�factor;�see�definition�of�interaction�
as�well�as�example�profile�plots�in�Figure�13�1)�
13.13� �False� (when� the� interaction� is� significant,� this� implies� nothing� about� the� main�
effects)�

804 Odd-Numbered Answers to Problems
13.15� No�(the�numerator�degrees�of�freedom�for�factor�B�can�be�anything)�
13.17� e�(3�levels�of�A,�2�levels�of�B,�thus�6�cells)�
13.19� a�(check�F�table�for�critical�values;�only�reject�main�effect�for�factor�A)�
13.21� b�(as�dftotal�=�14,�then�total�sample�size�=�15)�
Computational Problems
13.1� �SSwith�=�225;�dfA�=�1;�dfB�=�2;�dfAB�=�2;�dfwith�=�150;�dftotal�=�155;�MSA�=�6�15;�MSB�=�5�30;�
MSAB�=�4�55;�MSwith�=�1�50;�FA�=�4�10;�FB�=�3�5333;�FAB�=�3�0333;�critical�value�for�A�is�
approximately�3�91,�thus�reject�H0�for�A;�critical�value�for�B�and�AB�approximately�
3�06,�thus�reject�H0�for�B�and�fail�to�reject�H0�for�AB�
13.3� See�the�following�completed�table:
Source SS df MS F Critical Value Decision
A 14�06 1 14�06 �25 4�75 Fail�to�reject�H0
B 39�06 1 39�06 �70 4�75 Fail�to�reject�H0
AB 1�56 1 1�56 �03 4�75 Fail�to�reject�H0
Within 668�75 12 55�73
Total 723�43 15
13.5� �FA�=�4�0541,�FB�=�210�1622,�FC�=�31�7838,�FAB�=�7�9459,�FAC�=�13�1351,�FBC�=�10�3784,�FABC�=�
4�0541,�all�but�ABC�and�A�are�significant�
Chapter 14
Conceptual Problems
14.1� No�(there�is�no�covariate�mentioned�for�which�to�control)�
14.3� �c�(evidence�of�meeting�the�assumption�of�independence�can�be�examined�by�a�scat-
terplot� of� residuals� by� group� or� category� of� the� independent� variable;� a� random�
display�of�points�suggests�the�assumption�is�met)�
14.5� b�(see�discussion�on�homogeneity�of�regression�slopes)�
14.7� b�(14�df�per�group,�3�groups,�42�df�−�2�df�for�covariates�=�40)�
14.9� c�(want�covariate�having�a�high�correlation�with�the�dependent�variable)�
14.11� �c� (the� covariate� and� dependent� variable� need� not� be� the� same� measure;� could� be�
pretest�and�posttest,�but�does�not�have�to�be)�
14.13� �b� (an� interaction� indicates� that� the� regression� lines� are� not� parallel� across� the�
groups)�
14.15� �c�(a�post�hoc�covariate�typically�results�in�an�underestimate�of�the�treatment�effect,�
due�to�confounding�or�interference�of�the�covariate)�
14.17� �No�(if�the�correlation�is�substantial,�then�error�variance�will�be�reduced�in�ANCOVA�
regardless�of�its�sign)�
14.19� b�(11�df�per�group,�6�groups,�66�df�−�1�df�for�covariate�=�65)�
14.21� �No�(there�will�be�no�adjustment�due�to�the�covariate�and�one�df�will�be�lost�from�the�
error�term)�

805Odd-Numbered Answers to Problems
Computational Problems
14.1� �The�adjusted�group�means�are�all�equal�to�150;�this�resulted�because�the�adjust-
ment�moved�the�mean�for�Group�1�up�to�150�and�the�mean�for�Group�3�down�to�
150�
14.3� �ANOVA� results:� SSbetw� =� 4,763�275,� SSwith� =� 9,636�7,� dfbetw� =� 3,� dfwith� =� 36,� MSbetw� =�
1,587�758,� MSwith� =� 267�686,� F� =� 5�931,� critical� value� approximately� 2�88� (reject� H0)��
Unadjusted�means�in�order:�32�5,�60�4,�53�1,�39�9�
ANCOVA�results:�SSbetw�=�5402�046,�SSwith�=�3880�115,�dfbetw�=�3,�dfwith�=�35,�MSbetw�=�
1800�682,�MSwith�=�110�8604,�F�=�16�24,�critical�value�approximately�2�88�(reject�H0),�
SScov�=�5117�815,�Fcov�=�46�164,�critical�value�approximately�4�12�(reject�H0)�
Adjusted�means�in�order:�30�7617,�61�2544,�53�1295,�40�7544�
Chapter 15
Conceptual Problems
15.1� b�(when�there�are�both�random�and�fixed�factors,�then�the�design�is�mixed)�
15.3� c�(gender�is�fixed,�order�is�random,�thus�a�mixed-effects�model)�
15.5� �a�(clinics�were�randomly�selected�from�the�population;�thus,�the�one-factor�random-
effects�model�is�appropriate)�
15.7� �False�(the�F�ratio�will�be�the�same�for�both�the�one-factor�random-�and�fixed-effects�
models)�
15.9� �Yes�(the�test�of�the�interaction�is�exactly�the�same�for�both�models�yielding�the�same�
F�ratio)�
15.11� �Yes� (SStotal� is� the� same� for� both� models;� the� total� amount� of� variation� is� the�
same;�it�is�just�divided�up�in�different�ways;�review�the�example�dataset�in�this�
chapter)�
15.13� �c�(see�definition�of�design)�
15.15� �True� (rarely� is� one� interested� in� particular� students;� thus,� students� are� usually�
random)�
15.17� �False� (the� F� test� is� not� very� robust� in� this� situation� and� we� should� be� concerned�
about�it)�
Computational Problems
15.1� �SSwith�=�1�9,�dfA�=�2,�dfB�=�1,�dfAB�=�2,�dfwith�=�18,�dftotal�=�23,�MSA�=�1�82,�MSB�=��57,�MSAB =�
1�035,� MSwith� =� �1056,� FA� =� 1�7585,� FB� =� 5�3977,� FAB� =� 9�8011,� critical� value� for� AB� =� 6�01�
(reject�H0�for�AB),�critical�value�for�B�=�8�29�(fail�to�reject�H0�for�B),�critical�value�for�A�=�
99�(fail�to�reject�H0�for�A)�
15.3� �SStime�=�126�094,�SStime�×�program�=�2�594,�SSprogram�=�3�781,�MStime�=�42�031,�MStime�×�program =�
0�865,�MSprogram�=�3�781,�Ftime�=�43�078�(p�<��001),�Ftime�×�program�=�0�886�(p�>��05),�Fprogram�=�
0�978�(p�>��05)�
15.5� �SStime� =� 691�467,� SStime� ×� mentor� =� 550�400,� SSmentor� =� 1968�300,� MStime� =� 345�733,� MStime� ×�
mentor�=�275�200,�MSmentor�=�1968�300,�Ftime�=�2�719�(p�=��096),�Ftime�×�mentor�=�2�164�(p�=��147),�
Fmentor�=�7�073�(p�<��001)� 806 Odd-Numbered Answers to Problems Chapter 16 Conceptual Problems 16.1� �d�(teachers� are�ranked�according�to�a�ratio�blocking�variable;� a�random�sample�of� blocks�are�drawn;�then�teachers�within�the�blocks�are�assigned�to�treatment)� 16.3� a�(children�are�randomly�assigned�to�treatment�based�on�ordinal�SES�value)� 16.5� d�(interactions�only�occur�among�factors�that�are�crossed)� 16.7� a�(this�is�the�notation�for�teachers�nested�within�methods;�see�also�Problem�16�2)� 16.9� False�(cannot�be�a�nested�design;�must�be�a�crossed�design)� 16.11� Yes�(see�the�discussion�on�the�types�of�blocking)� 16.13� c�(physician�is�nested�within�method)� 16.15� Yes�(age�is�an�appropriate�blocking�factor�here)� 16.17� a�(use�of�a�covariate�is�best�for�large�correlations)� 16.19� a�(see�the�summary�of�the�blocking�methods)� Computational Problems 16.1� �(a)�Yes�(b)�at�age�4�type�1�is�most�effective,�at�age�6�type�2�is�most�effective,�and�at�age� 8�type�2�is�most�effective� 16.3� �SStotal� =� 560,� dfA� =� 2,� dfB� =� 1,� dfAB� =� 2,� dfwith� =� 24,� dftotal� =� 29,� MSA� =� 100,� MSB� =� 100,� MSAB =�10,�MSwith�=�10,�FA�=�10,�FB�=�10,�FAB�=�1,�critical�value�for�B�=�4�26�(reject�H0� for B),�critical�value�for�A�and�AB�=�3�40�(reject�H0�for�A�and�fail�to�reject�H0�for�AB)� 16.5� �Fsection� =� 44�385,� p� =� �002;� FGRE-Q� =� 61�000,� p� =� �001;� thus� reject� H0� for� both� effects;� Bonferroni�results:�all�but�sections�1�and�2�are�different,�and�all�but�blocks�1�and�2� are�statistically�different� Chapter 17 Conceptual Problems 17.1� c�(see�definition�of�intercept;�a�and�b�refer�to�the�slope�and�d�to�the�correlation)� 17.3� �c�(the�intercept�is�37,000�which�represents�average�salary�when�cumulative�GPA� is�zero)� 17.5� �a�(the�predicted�value�is�a�constant�mean�value�of�14�regardless�of�X;�thus,�the�vari- ance�of�the�predicted�values�is�0)� 17.7� �d� (linear� relationships� are� best� represented� by� a� straight� line,� although� all� of� the� points�need�not�fall�on�the�line)� 17.9� a�(if�the�slope�=�0,�then�the�correlation�=�0)� 17.11� �b�(with�the�same�predictor�score,�they�will�have�the�same�residual�score;�whether�the� residuals�are�the�same�will�only�depend�on�the�observed�Y)� 17.13� d�(see�definition�of�homogeneity)� 17.15� �d�(various�pieces�of�evidence�for�normality�can�be�assessed,�including�formal�tests� such�as�the�Shapiro–Wilk�test)� 17.17 True�(the�value�of�Y�is�irrelevant�when�the�correlation�=�0,�so�the�mean�of�Y�is�the� best�prediction)� 807Odd-Numbered Answers to Problems 17.19� �False�(if�the�variables�are�positively�correlated,�then�the�slope�would�be�positive�and� a�low�score�on�the�pretest�would�predict�a�low�score�on�the�posttest)� 17.21� No�(the�regression�equation�may�generate�any�number�of�points�on�the�regression�line)� Computational Problems 17.1� a�—�b�(slope)�=��8571,�a�(intercept)�=�1�9716;�b�—Y�(outcome)�=�7�1142� 17.3� 118� Chapter 18 Conceptual Problems 18.1� b�(partial�correlations�correlate�two�variables�while�holding�constant�a�third)� 18.3� c�(perfect�prediction�when�the�standard�error�=�0)� 18.5 False�(adding�an�additional�predictor�can�result�in�the�same�R2)� 18.7� No�(R2�is�higher�when�the�predictors�are�uncorrelated)� 18.9� �c� (given� there� is� theoretical� support,� the� best� method� of� selection� is� hierarchical� regression)� 18.11� �No�(the�purpose�of�the�adjustment�is�to�take�the�number�of�predictors�into�account;� thus�Radj 2 �may�actually�be�smaller�for�the�most�predictors)� Computational Problems 18.1� �Intercept�=�28�0952,�b1�=��0381,�b2�=��8333,�SSres�=�21�4294,�SSreg�=�1128�5706,�F�=�105�3292� (reject�at��01),�s2res�=�5�3574,�s(b1)�=��0058,�s(b2)�=��1545,�t1�=�6�5343�(reject�at��01),�t2�= 5�3923� (reject�at��01)� 18.3� In�order,�the�t�values�are�0�8�(not�significant),�0�77�(not�significant),�−8�33�(significant)� 18.5� r1(2�3)�=�−�0140� 18.7� r12�3�=�−�8412,�r1(2�3)�=�−�5047� 18.9� �Intercept�=�−1�2360,�b1�=��6737,�b2�=��6184,�SSres�=�58�3275,�SSreg�=�106�6725,�F�=�15�5453� (reject�at��05),�s2res�=�3�4310,�s(b1)�=��1611,�s(b2)�=��2030,�t1�=�4�1819�(reject�at��05),�t2�=�3�0463� (reject�at��05)� Chapter 19 Conceptual Problems 19.1� c—The�measurement�scale�of�the�dependent�variable� 19.3� �a—Employment�status�(employed;�unemployed,�not�looking�for�work;�unemployed,� looking�for�work)�as�there�are�more�than�two�groups�or�categories� 19.5� a—True� 19.7� a—The�log�odds�become�larger�as�the�odds�increase�from�1�to�100� 19.9� d—Wald�test�(assesses�significance�of�individual�predictors)� Computational Problems 19.1� �−2LL�=�7�558,�bHSGPA�=�−�366,�bathlete�=�22�327,�bconstant�=��219,�se(bHSGPA)�=�1�309,�se(bathlete)�=� 20006�861,�odds�ratioHSGPA�=��693,�odds�ratioathlete�<��001,�WaldHSGPA�=��078,�Waldathlete�=��000� 809 Author Index A Abramowitz,�S�K�,�311,�680,�703 Agresti,�A�,�440,�498,�499 Aldrich,�J�H�,�718,�719 Algina,�J�,�139,�250,�310,�311,�313 Anderson,�R�E�,�495,�640,�691 Anderson,�S�L�,�437 Andrews,�D�F�,�630,�672 Applebaum,�M�I�,�394 Arnold,�P�R�,�751,�752 Atiqullah,�M�,�437 Atkinson,�A�C�,�628 B Babin,�B�J�,�495,�640,�691 Barnett,�V�,�630,�640,�691 Barrett,�K�C�,�331 Basu,�S�,�143 Bates,�D�M�,�680 Beal,�S�L�,�213 Beckman,�R�,�630 Belsley,�D�A�,�630,�672 Benjamini,�Y�,�361 Bernstein,�I�H�,�679 Berry,�W�D�,�680 Black,�W�C�,�495,�640,�691 Blair,�R�C�,�171,�187,�195,�250 Boik,�R�J�,�383,�571 Bosker,�R�J�,�751 Bowden,�D�,�358 Box,�G�E�P�,�310,�311,�381,�437,�497,�505,�571 Bradley,�J�V�,�310,�311 Braver,�S�L�,�331,�399,�444,�508,�562 Brown,�M�B�,�310,�312,�313,�358 Brunner,�E�,�381 Bryant,�J�L�,�436,�782 C Campbell,�D�T�,�270,�295,�430,�443 Carlson,�J�E�,�394 Carroll,�R�J�,�630,�672 Chakravart,�I�M�,�148 Chambers,�J�M�,�29 Chatterjee,�S�,�675 Christensen,�R�,�751 Clarke,�W�,�716 Clark,�V�A�,�394,�431,�498,�561,�562,�565,�628,� 630, 679 Cleveland,�W�S�,�29 Clinch,�J�J�,�313 Coe,�P�R�,�213 Cohen,�J�,�137,�139,�155,�156,�168,�169,�177,�196–198,� 211,�224,�231,�234,�235,�265,�267,�269,�272,� 274,�275,�278,�283,�286,�303,�304,�307,�331,� 384,�416,�436,�439,�468,�478,�548,�622,�647,� 649,�652,�667,�668,�679,�680,�698,�701,�703,� 746,�750,�751 Cohen,�P�,�439,�679,�680,�751 Conerly,�M�D�,�674 Conover,�W�,�175,�180,�444 Cook,�R�D�,�628,�630 Cook,�T�D�,�270,�295,�430 Coombs,�W�T�,�250,�310,�311,�313 Cotton,�J�W�,�499,�508 Cox,�D�R�,�311,�718 Craig,�A�T�,�267 Cramer,�E�M�,�394 Crepel,�P�,�4,�5 Croux,�C�,�724 Cumming,�G�,�139 D D’Agostino,�R�B�,�672 DasGupta,�A�,�143 Davenport,�J�M�,�309,�437 Delaney,�H�D�,�439,�440,�575 de�Leeuw,�J�,�751 Derksen,�S�,�679 Detta,�H�,�381 Dill,�C�A�,�575 Duan,�N�,�751 Duncan,�G�T�,�269 Dunnett,�C�W�,�354,�355,�361 Dunn,�O�J�,�269,�355,�357,�394,�431,�498,�561,�562,� 565,�628,�630,�679 Durbin,�J�,�309,�437,�628 E Elashoff,�J�D�,�431 810 Author Index F Feinberg,�S�E�,�4,�5 Feldman,�S�,�680 Feldt,�L�S�,�495,�497,�505,�569,�571,�572,�575 Ferguson,�G�A�,�444 Fidell,�L�S�,�495,�679 Fidler,�F�,�304,�384,�478 Finch,�S�,�139 Fink,�A�,�111 Finlay,�B�,�440 Fisher,�R�A�,�361 Flandre,�C�,�724 Forsythe,�A�,�310,�312,�313,�358 Fouladi,�R�T�,�622,�667 Friedman,�M�,�498,�574 G Games,�P�A�,�361 Gamst,�G�,�637,�680,�703,�751,�752 Gani,�J�,�4,�5 Geisser,�S�,�497,�505,�571 Ghosh,�B�K�,�211 Glass,�G�V�,�171,�187,�195,�221,�309–311,�394,�440,� 508,�560,�562,�567,�633,�660,�675,�751 Gloeckner,�G�W�,�331 Greenhouse,�S�,�497,�505,�571 Griego,�O�V�,�637,�703 Grimm,�L�G�,�751,�752 Grissom,�R�J�,�139 Guarino,�A�J�,�637,�680,�703,�751,�752 H Haaland,�J�-A�,�29 Haesbroeck,�G�,�724 Hair,�J�F�,�495,�640,�691 Harlow,�L�,�139 Harrell,�F�E�J�,�718 Harwell,�M�,�437,�438,�440,�444,�469 Hawkins,�D�M�,�630 Hays,�W�L�,�570,�660 Hayter,�A�J�,�361 Heck,�R�H�,�576,�751 Heisey,�D�M�,�138 Hellevik,�O�,�710 Hershberger,�S�L�,�752 Heyde,�C�C�,�4,�5 Hochberg,�Y�,�361,�436,�498 Hocking,�R�R�,�677 Hoenig,�J�M�,�138 Hoerl,�A�E�,�675 Hogg,�R�V�,�267 Hopkins,�K�D�,�221,�311,�394,�508,�560,�562,�567,� 633,�660,�675,�751 Hornick,�C�W�,�394 Hosmer,�D�W�,�717,�718,�722,�726,�751 Hosmer,�T�,�717 Howell,�D�,�274 Howell,�J�F�,�361 Huberty,�C�J�,�168,�169,�679 Huck,�S�W�,�576,�751 Huitema,�B�E�,�431,�436,�438–440,�444,�576 Huynh,�H�,�495,�497,�505,�569,�571 I Iman,�R�,�175,�180,�444 J Jaeger,�R�M�,�111 James,�G�S�,�312,�313 Jennings,�E�,�576 Johansen,�S�,�381 Johnson,�P�O�,�440 Johnson,�R�A�,�752 Jorner,�U�,�29 K Kaiser,�L�,�358 Kalton,�G�,�111 Kennard,�R�W�,�675 Keppel,�G�,�304,�309,�311,�345,�355,�357,�361,�383,� 384,�390,�393,�394,�431,�438,�439,�444,�478,� 493,�495,�498,�499,�505,�508,�562,�565,�567,� 569,�572,�576 Keselman,�H�J�,�139,�313,�679 Kim,�C�,�213 Kim,�J�J�,�139 Kirk,�R�E�,�308,�352,�354,�357,�358,�361,�394,�444,�495,� 498,�505,�508,�561,�565,�569–571,�574,�576 Kleinbaum,�D�G�,�629,�630,�672,�675,�679,�680,� 751, 752 Kleiner,�B�,�29 Kramer,�C�Y�,�360 Kreft,�I�,�751 Kruskal,�W�H�,�312 Kuh,�E�,�630,�672 Kupper,�L�L�,�629,�630,�672,�675,�679,�680,�751,�752 L Laha,�R�G�,�148 Lamb,�G�S�,�272 Larsen,�W�A�,�674 811Author Index Layard,�M�W�J�,�269 LeCessie,�S�,�717 Leech,�N�L�,�331 Lee,�D�M�,�394 Lemeshow,�S�,�111,�717,�718,�722,�726,�751 Leroy,�A�M�,�630,�672 Levin,�J�R�,�383 Levy,�P�S�,�111 Lewis,�T�,�630,�640,�691 Li,�J�,�448,�514,�580 Lilliefors,�H�,�148 Lomax,�R�G�,�448,�514,�580 Long,�J�S�,�725 Lord,�F�M�,�439,�444 M MacKinnon,�D�P�,�331,�399,�444,�508,�562 Manly,�B�F�J�,�752 Mansfield,�E�R�,�674 Marascuilo,�L�A�,�269,�383,�390,�499,�574 Marcoulides,�G�A�,�752 Marquardt,�D�W�,�675 Maxwell,�S�E�,�138,�439,�440,�498,�571,�575 McCleary,�S�J�,�674 McCoach,�D�B�,�751 McCulloch,�C�E�,�508 McLean,�R�A�,�576 McSweeney,�M�,�499,�574 Menard,�S�,�712,�737 Mendoza,�J�L�,�622,�667 Meyers,�L�S�,�637,�680,�703,�751,�752 Mickey,�R�M�,�394,�431,�498,�562,�630,�679 Miller,�A�J�,�679 Miller,�R�G�,�381,�436,�672 Morgan,�G�A�,�331,�637,�703 Mosteller,�F�,�311 Mulaik,�S�,�139 Muller,�K�E�,�629,�630,�672,�675,�679,�680,�751,�752 Munk,�A�,�381 Murphy,�K�R�,�137,�304,�384,�478,�493,�668 Myers,�J�L�,�311,�313,�390,�431,�444,�495,�498,�499,� 508,�565,�572,�576,�633,�675 Myers,�R�H�,�498,�508,�565,�572,�672,�675 Myors,�B�,�137,�304,�384,�478,�493,�668 N Nagelkerke,�N�J�D�,�718 Nelson,�F�D�,�718,�719 Neyman,�J�,�440 Nizam,�A�,�629,�630,�672,�675,�679,�680,�751,�752 Noreen,�E�W�,�142 O O’Connell,�A�A�,�751 O’Grady,�K�E�,�304,�384 Olejnik,�S�F�,�250 Ottman,�D�O�,�310,�311,�313 Overall,�J�E�,�394 P Page,�M�C�,�331,�399,�444,�508,�562 Pampel,�F�C�,�714,�718,�721,�751 Paulson,�A�S�,�436,�782 Pavur,�R�,�349 Pearson,�E�S�,�4,�5 Peckham,�P�D�,�171,�187,�195,�309,�310,�440 Pedhazur,�E�J�,�348,�439,�618,�630,�633,�660,�665,� 675,�679,�680,�751 Pendergast,�J�,�498,�499,�716 Penfield,�R�D�,�139 Persson,�R�,�29 Pingel,�L�A�,�571,�572 Porter,�A�C�,�439,�444 Pregibon,�D�,�630 Price,�B�,�675 Puri,�M�L�,�444 Q Quade,�D�,�444 R Raferty,�A�E�,�721 Ramsey,�P�H�,�250,�273 Raudenbush,�S�W�,�444 Reichardt,�C�S�,�439,�576 Reise,�S�P�,�751 Robbins,�N�B�,�29 Rogosa,�D�R�,�440 Rosenthal,�R�,�355 Rosnow,�R�L�,�355 Rousseeuw,�P�J�,�630,�672 Roy,�J�,�148 Rudas,�T�,�106 Ruppert,�D�,�630,�672 Rutherford,�A�,�444 S Sanders,�J�R�,�171,�187,�195,�309,�310,�440 Sawilowsky,�S�S�,�171,�187,�195 Scariano,�S�M�,�309,�437 812 Author Index Schafer,�W�D�,�679 Scheffé,�H�,�357 Schmid,�C�F�,�29 Seber,�G�A�F�,�680 Seneta,�E�,�4,�5 Sen,�P�K�,�444 Serlin,�R�C�,�269,�313,�390 Shadish,�W�R�,�270,�295,�430 Shapiro,�S�S�,�148,�171,�186,�191,�323,�407,�437,�457,� 541,�587,�600,�644,�672,�694 Shavelson,�R�J�,�506 Sidak,�Z�,�357 Singh,�M�,�171 Smithson,�M�,�304,�384,�622,�667 Snee,�R�D�,�675 Snell,�E�J�,�718 Snijders,�T�A�B,�751 Spiegel,�D�K�,�394 Stafford,�K�L�,�622,�667 Stanley,�J�C�,�295,�430,�443 Steiger,�J�H�,�139,�622,�667 Stevens,�J�P�,�640,�675,�690,�752 Stigler,�S�M�,�4,�5 Storer,�B�E�,�213 Sudman,�S�,�111 Surman,�S�H�,�514 T Tabachnick,�B�G�,�495,�679 Tabatabai,�M�,�440 Tabata,�L�N�,�576 Takane,�Y�,�444 Tamhane,�A�C�,�213,�436,�498 Tan,�W�,�440 Tatham,�R�L�,�495,�640,�691 Thomas,�S�L�,�576,�751 Thompson,�B�,�304,�384,�478 Thompson,�M�L�,�677 Tijms,�H�,�106 Tiku,�M�L�,�171 Timm,�N�H�,�394,�570,�752 Tomarken,�A�,�313 Tufte,�E�R�,�24,�29 Tukey,�J�W�,�28,�33,�58,�311,�358,�360,�570,�573 Tukey,�P�A�,�29 V Varon-Salomon,�Y�,�436 W Wainer,�H�,�29 Wallgren,�A�,�29 Wallgren,�B�,�29 Wallis,�W�A�,�312 Watson,�G�S�,�309,�437,�628 Watts,�D�G�,�680 Weinberg,�S�L�,�311,�680,�703 Weisberg,�H�I�,�439 Weisberg,�S�,�628,�629,�672,�674,�675,�679,�680 Welch,�B�L�,�313,�381 Well,�A�D�,�311,�313,�390,�431,�444,�495,�498,�499,� 508,�565,�572,�576,�633,�675 Welsch,�R�E�,�630,�672 Wetherill,�G�B�,�675 Wichern,�D�W�,�752 Wickens,�T�D�,�304,�309,�311,�355,�357,�361,� 383,�384,�390,�393,�394,�431,�438,� 439,�444,�478,�493,�498,�499,�505,� 508,�562,�565,�567,�572,�576 Wilcox,�R�R�,�142,�143,�171,�175,�180,�211,�245,� 247,�249,�268,�269,�271,�272,�304,�310,� 311,�313,�355,�357,�358,�361,�381,�384,� 440,�444,�478,�482,�498,�499,�625,� 629,�630,�632,�672,�679 Wild,�C�J�,�680 Wilkinson,�L�,�29 Wilk,�M�B�,�148,�171,�186,�191,�323,�407,�437,�457,� 541,�587,�600,�644,�672,�694 Wolach,�A�,�137,�304,�384,�478,�493 Wonnacott,�R�J�,�675 Wonnacott,�T�H�,�675 Wright,�R�E�,�751 Wu,�L�L�,�630,�672 X Xie,�X�-J�,�716 Y Yuan,�K�-H�,�138 Yu,�M�C�,�269 Z Zimmerman,�D�W�,�172,�175,�309 813 Subject Index A Additive�effects,�ANOVA,�380 Additive�model,�569 All�possible�subsets�regression,�678 Analysis�of�covariance�(ANCOVA) adjusted�means�and�related�procedures,� 434–436 assumptions�and�violation�of�assumptions,� 436–441 characteristics,�428–431 example,�441–443 G*Power,�445–469 layout�of�data,�431 more�complex�models,�444 nonparametric�procedures,�444 one-factor�fixed-effects�model,�431–432 partitioning�the�sums�of�squares,�433 population�parameters,�431–432 SPSS,�445–469 summary�table,�432–433 template�and�APA-style�paragraph,� 469–471 without�randomization,�443–444 Analysis�of�variance�(ANOVA) alternative�procedures Brown–Forsythe�procedures,�313 James�procedures,�313 Kruskal–Wallis�test,�312–313 Welch�procedures,�313 vs��ANCOVA,�575–576 assumptions�and�violation�of�assumptions,� 309–312,�380–381 characteristics�of�one-factor�model,�292–296 effect�size�measures,�confidence�intervals,� and�power,�303–304,�383–384 examples,�304–307,�384–389 factorial SPSS,�395–417 template�and�APA-style�write-up,� 417–419 three-factor�and�higher-order,�390–393 two-factor�model,�372–390 with�unequal�n’s,�393–394 Friedman�test,�574 layout�of�data,�296 model,�302–309 multiple�comparison�procedures,�382–383 one-factor�fixed-effects�model,�291–336 one-factor�random-effects�model assumptions�and�violation� of assumptions,�482–483 characteristics,�479–480 hypotheses,�480 multiple�comparison�procedures,�483 population�parameters,�480 residual�error,�480 SPSS�and�G*Power,�508–513 summary�table�and�mean�squares,� 481–482 one-factor�repeated�measures�design assumptions�and�violation� of assumptions,�495–496 characteristics,�493–494 example,�499–500 Friedman�test,�498–499 hypotheses,�495 layout�of�data,�494 multiple�comparison�procedures,�498 population�parameters,�494 residual�error,�494 SPSS�and�G*Power,�515–524 summary�table�and�mean�squares,� 496–498 parameters�of�the�model,�302–303 partitioning�the�sums�of�squares,�299,�381 summary�table,�300–301,�381–382,� 391–392 template�and�APA-style�write-up,�548–551,� 603–605 theory,�296–302 three�factor�and�higher-order,�390–393 triple�interaction,�393 two-factor�hierarchical�model characteristics,�559–561 example,�565–566 hypotheses,�562–563 layout�of�data,�561 multiple�comparison�procedures,�565 nested�factor,�562 population�parameters,�562 SPSS,�576–581 summary�table�and�mean�squares,� 563–565 814 Subject Index two-factor�mixed-effects�model assumptions�and�violation� of assumptions,�492 characteristics,�488 hypotheses,�489–490 multiple�comparison�procedures,�492–493 population�parameters,�488 residual�error,�488 SPSS�and�G*Power,�514–515 summary�table�and�mean�squares,� 490–492 two-factor�model assumptions�and�violations� of assumptions,�380–381 characteristics,�373–374 effect�size�measures,�confidence�intervals,� and�power,�383–384 examples,�384–389 expected�mean�squares,�389–390 layout�of�data,�374 main�effects�and�interaction�effects,� 377–380 multiple�comparison�procedures,�382–383 partitioning�the�sums�of�squares,�381 summary�tables,�381–382 two-factor�random-effects�model assumptions�and�violation� of assumptions,�487 characteristics,�483–484 hypotheses,�484–485 multiple�comparison�procedures,�487 population�parameters,�484 residual�error,�484 SPSS�and�G*Power,�513–514 summary�table�and�mean�squares,�485–487 two-factor�randomized�block�design� for�n�>�1,�574
SPSS,�603
two-factor�randomized�block�design�
for�n�=�1
assumptions�and�violation�
of assumptions,�569–570
block�formation�methods,�571–572
characteristics,�567–568
example,�572–573
G*Power,�603
hypotheses,�569
layout�of�data,�568
multiple�comparison�procedures,�571
population�parameters,�568
SPSS,�589–563
summary�table�and�mean�squares,�
570–571
two-factor�split-plot/mixed�design
assumptions�and�violation�
of assumptions,�503
characteristics,�500
example,�506–508
hypotheses,�495
layout�of�data,�500–501
multiple�comparison�procedures,�505–506
population�parameters,�501–502
residual�error,�502
SPSS�and�G*Power,�526–548
summary�table�and�mean�squares,�503–505
unequal�n’s,�312
APA-style�paragraph
data�representation,�41–42
univariate�population�parameters,�69–70
A�priori�power,�137
Assumption�of�linearity,�269–270
Asymptotic�curve,�83–84
B
Backward�elimination,�676–677
Balanced�case,�296
Bar�graph,�23–24
Between-groups�variability,�298
Binomial�distribution,�proportion,�209
Bivariate�measures�of�association,�see�Measures�
of�association
Blockwise�regression,�678
Box,�33
Box-and-whisker�plot,�33
Brown–Forsythe�procedures,�249–251,�313
Bryant–Paulson�test,�780–782
C
Categorical�variable,�definition,�7
Causation,�correlation�coefficients,�270
Cell,�222
Central�limit�theorem,�116–117
Chi-square�distribution
goodness-of-fit�test,�218–221
percentage�points,�761
SPSS,�225–231
test�of�association,�221–224
Chunkwise�regression,�678
Coefficient�of�determination,�620–622,�
665–668
College�Entrance�Examination�Board�(CEEB)�
score,�86
Column�marginal,�222
Comparisons,�342

815Subject Index
Complete�factorial�design,�559
Completely�randomized�design,�296
Completely�randomized�factorial�design,�
ANOVA,�374
Complex�post�hoc�contrasts,�Scheffé�and�
Kaiser–Bowden�methods,�357–358
Compound�symmetry,�495,�569
Computational�formula,�299
Conditional�distribution,�628–629
Confidence�interval�(CI),�115–116,�133–134
Constant,�definition,�7
Contingency�table
chi-square�test�of�association,�221–222
proportion,�215–216
Continuous�variable,�definition,�8
Contrast-based�multiple�comparison�
procedures,�346
Contrasts,�343–345
Correlation�coefficients
assumption�of�linearity,�269–270
correlation�and�causality,�270
different�types,�275–276
Pearson�product-moment,�265–269
restriction�of�range,�271
Covariance�analysis,�relationship�among�
variables,�263–265
Covariate
definition,�429
independence�of,�438–439
measured�without�error,�439
Cramer’s�phi�type�correlation,�275
Crossed�design,�559
Cross�validation,�720
Cumulative�frequency�distribution,�22
Cumulative�frequency�polygon,�26–27
Cumulative�relative�frequency�
distribution,�23
Cumulative�relative�frequency�polygon,�27
D
Data�representation
APA-style�paragraph,�41–42
appropriate�techniques,�measurement�scale�
types,�42–43
graphical�display
bar�graph,�23–24
cumulative�frequency�polygon,�26–27
frequency�distribution�shapes,�27–28
frequency�polygon,�25–26
histogram,�25
relative�frequency�polygon,�26
stem-and-leaf�display,�28–29
percentiles
box-and-whisker�plot,�33
computing�formula,�29–31
definition,�29
quartiles,�31
ranks,�31–32
SPSS�procedures,�33–41
tabular�display
cumulative�frequency�distribution,�22
cumulative�relative�frequency�
distribution,�23
frequency�distribution,�19–22
relative�frequency�distribution,�
22–23
Decision�errors,�124–126
Decision-making
example�situation,�124–125
full�context,�134–136
overview�of�steps,�129–130
table,�125–126
Definitional�(conceptual)�formula,�299
Degrees�of�freedom�concept,�140
Dependent�proportions,�215–217
Dependent�samples,�164–165
Dependent�t�test
assumptions,�180
confidence�interval,�177
effect�size,�177
example,�177–179
recommendations,�180
standard�error,�176
Dependent�variable,�criterion,�292
Dependent�variance,�246–248
Descriptive�statistics,�definition,�6
Deviational�measures
deviation�score,�58–59
population�variance
characteristics,�61
computational�formula,�60–61
definitional�formula,�60
sample�variance,�62–64
standard�deviation
characteristics,�61
description,�61
and�population�variance,�61–62
and�sample�variance,�62–64
Deviation�score,�58–59
Dichotomous�variable,�definition,�8
Directional�alternative�hypothesis,�128
Discrete�variable,�definition,�8
Dummy�variable,�681,�711
Dunnett�test,�769–771
Dunn�(or�Bonferroni)�method,�772–775

816 Subject Index
E
Effect�size,�139,�267,�725
in�chi-square�test�of�association,�224
in�G*Power,�151
in�inferences�about�2�dependent�means,�177
in�inferences�about�2�independent�
means, 168–169
measures�of,�139,�303–304,�383–384
in�proportions�involving�chi-square�
distribution,�220–221
Equal�n’s,�296
Errors�of�estimate,�619
Exact�probability,�132
Expected�proportion,�218–219
Experiment-wise�type�I�error�rate,�293
Extrapolation,�value�of�X,�632
F
Factorial�analysis�of�variance
SPSS,�395–417
template�and�APA-style�write-up,�417–419
three-factor�and�higher-order
characteristics,�390–391
summary�table,�391–392
triple�interaction,�393
two-factor�model
assumptions�and�violations�
of assumptions,�380–381
characteristics,�373–374
effect�size�measures,�confidence�intervals,�
and�power,�383–384
examples,�384–389
expected�mean�squares,�389–390
layout�of�data,�374
main�effects�and�interaction�
effects, 377–380
multiple�comparison�procedures,�382–383
partitioning�the�sums�of�squares,�381
summary�tables,�381–382
with�unequal�n’s,�393–394
Factorial�design,�ANOVA,�373
Fail�to�reject,�125
False�negative�rate,�720
False�positive�rate,�720
Family�of�curves,�80
Family-wise�multiple�comparison�
procedures, 346
F�distribution,�243,�762–765
Fisher’s�Z�transformation,�268,�766–767
Fixed�independent�variable,�assumption�
in ANCOVA,�438
Fixed�X
assumptions�in�linear�regression,�632–633
assumptions�in�logistic�regression,�723
assumptions�in�multiple�regression,�674–675
Forced�stepwise�regression,�678
Forward�selection,�677
Frequency�distributions
shapes,�27–28
tabular�display,�19–22
Frequency�polygon,�25–26
Friedman�test
hierarchical�and�randomized�block�
ANOVA, 574
nonparametric�one-factor�repeated�measures�
ANOVA,�524–526
one-factor�repeated�measures�ANOVA,�
498–499
Fully�crossed�design,�ANOVA,�373
G
G*Power
ANCOVA�model,�445–469
chi-square�distribution,�233
dependent�t�test,�193–194
independent�t�test,�192–193
linear�regression,�647–650
logistic�regression,�746–748
measures�of�association,�283–285
multiple�regression,�698–701
one-factor�ANOVA,�313–334
one-factor�random-effects�model,�508–513
one-factor�repeated�measures�design,�515–524
testing�hypothesis,�149–154
two-factor�mixed-effects�model,�514–515
two-factor�random-effects�model,�513–514
two-factor�split-plot/mixed�design,�526–548
Grouped�frequency�distributions,�21
H
Hierarchical�design,�559
Hierarchical�regression,�678
logistic�regression,�726–727
Hinge,�33
Histogram,�25
Homogeneity�of�regression�slopes,�ANCOVA�
model,�440–441
Homogeneity�of�variance,�310–311
assumption�in�ANCOVA,�437
assumption�in�ANOVA,�310–311
assumptions�in�linear�regression,�628–629
assumptions�in�multiple�regression,�672

817Subject Index
Homogeneity�of�variances,�248
assumption�in�ANOVA,�249,�251–252
Homoscedasticity,�310
Hosmer–Lemeshow�goodness-of-fit�test,�717
H�spread,�58
Hypotheses
differences�between�two�means,�165–166
types,�122–124
Hypothesis�testing
confidence�intervals,�133–134
decision�errors,�124–126
decision-making
example�situation,�124–125
full�context,�134–136
overview�of�steps,�129–130
table,�125–126
G*Power,�149–154
level�of�significance,�127–129
power
determinants,�136–138
type�II�error�and,�134–136
SPSS,�145–149
statistical�vs��practical�significance,�138–139
template�and�APA-style�write-up,�155–156
type�II�error�(β),�134–138
types�of,�122–124
z�test,�130–133
I
Incomplete�factorial�design,�559
Independence
assumption�in�ANCOVA,�436–437
assumption�in�ANOVA,�309–310
assumptions�in�linear�regression,�628
assumptions�in�multiple�regression,�
671–674
random-and�mixed-effect�ANOVA�
assumptions,�542–544
two-factor�hierarchical�ANOVA�
assumptions,�589
two-factor�randomized�block�ANOVA�
assumptions,�601–602
Independence�of�errors,�723
Independent�proportion,�212–215
Independent�samples,�164–165
Independent�t�test
assumptions,�171–172
confidence�interval,�168
effect�size,�168–169
example,�169–171
measurement�scales,�167
recommendations,�174–175
standard�error,�167
Welch�t’�test,�172–174
Independent�variable
ANCOVA�model,�438–439
predictor,�612
Independent�variances,�248–252
Inferential�statistics,�definition,�6–7,�109
Intact�groups,�429,�443
Interaction�effect
ANOVA�model,�377–380
and�main�effects,�377–380
two-factor�ANOVA�model,�373
Interpolation,�value�of�X,�632
Interval�measurement�scale,�11–12
Intervals
in�data�sets,�20
midpoint,�19–21,�26
width,�21
Intuition�vs��probability,�108–109
K
Kendall’s�tau,�measures�of�association,�273–274
Kruskal–Wallis,�follow-up�tests�to,�361–362
Kurtosis,�89–91
nonzero,�630
L
Least�squares�criterion,�620
Leptokurtic�distribution,�89–90
Level�of�significance,�127–129
Likelihood�ratio�test,�716–717
Linearity
assumption�in�ANCOVA,�438
assumptions�in�linear�regression,�631–632
assumptions�in�logistic�regression,�722
assumptions�in�multiple�regression,�672,�674
Linear�regression
concepts,�612–614
G*Power,�647–650
population,�614–615
sample�model
assumptions�and�violation�
of assumptions,�627–633
coefficient�of�determination,�620–622
least�squares�criterion,�620
prediction�errors,�619–620
significance�tests�and�confidence�
intervals,�622–627
standardized�regression�model,�618
unstandardized�regression�model,�
615–617

818 Subject Index
SPSS,�634–647
template�and�APA-style�write-up,�650–652
Linear�relationship,�269
Logistic�regression
assumptions,�722–723
conditions
lack�of�influential�points,�724
nonseparation�of�data,�724
nonzero�cell�counts,�723–724
sample�size,�724–725
description,�710–712
effect�size,�725
equation
odds�and�logit,�713–715
probability,�712–713
estimation�and�model�fit,�715–716
G*Power,�746–748
predictor�entry�methods
hierarchical�regression,�726–727
simultaneous,�726
stepwise,�726
significance�tests
logistic�regression�coefficients,�720–721
overall�regression�model,�716–720
SPSS,�727–746
template�and�APA-style�write-up,�749–751
Logistic�regression�coefficients,�720–721
M
Main�effect,�ANOVA�model,�377–380
Mean,�54–55
differences�between�two,�163–198
independent�vs��dependent�samples,�
164–165
inferences�about�two�dependent,�175–180
inferences�about�two�independent,�166–175
sampling�distribution�of�the�differences,�166
standard�error�of�the�difference�between�
two,�167
Mean�squares�term,�301
Measurement,�definition,�8
Measures�of�association
correlations,�269–271
covariance,�263–265
Cramer’s�phi,�275
G*Power,�283–285
Kendall’s�tau,�273–274
Pearson�product-moment�correlation�
coefficient,�265–269
phi�coefficient,�274–275
scatterplot,�260–263
Spearman’s�rho,�272–273
SPSS,�276–282
template�and�APA-style�write-up,�286
Measures�of�central�tendency
advantages,�55–56
disadvantages,�55–56
mean,�54–55
median,�53–54
mode,�51–53
Measures�of�dispersion
advantages,�64
deviational�measures,�58–64
disadvantages,�64
H�spread,�58
range,�56–58
Median,�53–54
Mesokurtic�distribution,�89–90
Midpoint,�intervals,�19–21,�26
Mixed�design,�500
Mode,�51–53
Moments�around�the�mean,�89
Multilevel�model,�559
Multiple�comparison�procedure�(MCP),�382–383
concepts�of,�342–348
Dunn�(or�Bonferroni)�and�Dunn–Sidak�
methods,�355–357
Dunnett�method,�354–355
flowchart,�366–367
follow-up�tests�to�Kruskal–Wallis,�361–362
Games–Howell,�Dunnett�T3�and�C�tests,�361
selected,�348–362
SPSS,�362–365
template�and�APA-style�write-up,�366
Tukey�HSD,�Tukey–Kramer,�Fisher�LSD,�
and Hayter�tests,�358–361
Multiple�linear�regression
assumptions,�671–676
coefficient�of�multiple�determination�
and correlation,�665–668
significance�tests,�668–671
standardized�regression�model,�664–665
unstandardized�regression�model,�661–664
Multiple�regression
categorical�predictors,�680–681
G*Power,�698–701
interactions,�680
linear�regression,�661–676
multiple�predictor�model
all�possible�subsets�regression,�678
backward�elimination,�676–677
forward�selection,�677
hierarchical�regression,�678
sequential�regression,�676,�678–679
simultaneous�regression,�676

819Subject Index
stepwise�selection,�677–678
variable�selection�procedures,�676
nonlinear�relationships,�679–680
part�correlation,�660–661
partial�correlation,�659–660
semipartial�correlation,�660–661
SPSS,�682–698
template�and�APA-style�write-up,�701–703
N
Negatively�skewed�distribution,�28,�88–89
Nested�design,�559
Nominal�measurement�scale,�9,�12
Noncollinearity
assumptions�in�logistic�regression,�722
assumptions�in�multiple�regression,�675
Nondirectional�alternative�hypothesis,�128
Nonlinear�models,�632
Nonlinear�relationship,�270,�679–680
Nonparametric�tests,�171
Normal�distribution,�27,�28
characteristics
area,�80–81
family�of�curves,�80
standard�curve,�79
unit�normal�distribution,�80
history,�78–79
proportions�involving,�206–217
standard�scores�and,�77–99
Normality
assumption�in�ANCOVA,�437–438
assumption�in�ANOVA,�311–312
assumptions�in�linear�regression,�629–631
assumptions�in�multiple�regression,�672
two-factor�hierarchical�ANOVA�
assumptions,�585–589
two-factor�randomized�block�ANOVA�
assumptions,�598–601
two-factor�split-plot�ANOVA�assumptions,�
538–542
Null�hypothesis,�122–123
Numerical�variable,�definition,�8
O
O’Brien�procedure,�251–252
Observed�proportions,�218
Odds�ratio�(OR),�725
Omnibus�test,�294
One-tailed�test�of�significance,�128
Ordinal�measurement�scale,�10–12
Orthogonal�contrasts,�347–348
planned,�352–354
Orthogonal�polynomials,�768
Outliers,�33,�629
Overall�regression�model
cross�validation,�720
Hosmer–Lemeshow�goodness-of-fit�test,�717
likelihood�ratio�test,�716–717
predicted�group�membership,�719–720
pseudovariance�explained,�718–719
P
Parameter,�definition,�5–6
Parametric�tests,�171
Part�correlation,�660–661
Partial�correlation,�659–660
Partially�sequential�approach,�factorial�ANOVA�
with�unequal�n’s,�393
Pearson�product-moment�correlation�coefficient
inference�for�a�single�sample,�267–268
inference�for�two�independent�samples,�
268–269
Percentile�rank,�31–32
Percentiles
box-and-whisker�plot,�33
computing�formula,�29–31
definition,�29
quartiles,�31
rank,�31–32
Phi�type�of�correlation,�274–275
Planned�analysis�of�trend,�MCP,�349–352
Planned�contrasts,�345–346
Dunn�(or�Bonferroni)�and�Dunn–Sidak�
methods,�355–357
orthogonal,�352–354
with�reference�group,�Dunnett�method,�
354–355
SPSS,�364
Platykurtic�distribution,�89–90
Points�of�inflection,�83–84
Population,�definition,�5
Population�parameters
definition,�5–6
estimation�of
central�limit�theorem,�116–117
confidence�interval,�115–116
sampling�distribution�of�the�mean,�112–113
standard�error�of�the�mean,�114–115
variance�error�of�the�mean,�113–114
univariate,�49–71
Population�prediction�model,�614
Population�proportion,�207
Population�regression�model,�614
Population�variance,�proportions�of,�208

820 Subject Index
Positively�skewed�distribution,�27–28,�88–89
Post�hoc�blocking�method,�572
Post�hoc�contrasts,�346
SPSS,�363
Post�hoc�power,�137
Power
definition,�134,�575
determinants,�136–138
type�II�error�and,�134–136
Practical�significance,�vs��statistical�significance,�
138–139
Precision,�definition,�575
Predicted�group�membership,�719–720
Prediction�errors,�619–620
Probability
definition�of,�106–108
importance�of,�106
intuition�vs.,�108–109
logistic�regression�equation,�712–713
sampling�and�estimation,�109–117
Profile�plot,�377
Proof�(prove),�126
Proportions
binomial�distribution,�209
chi-square�distribution,�217–224
definition,�205
dependent,�215–217
independent,�212–215
inferences,�205–235
normal�distribution,�206–217
sampling�distribution,�208
single,�210–212
standard�error,�209
standard�error�of�difference�between�two,�213
tests�of,�206–207
variance�error,�209
Pseudovariance�explained,�718–719
Q
Quartiles,�31
Quasi-experimental�designs,�429,�443
R
Randomization,�definition,�443
Randomized�block�designs,�567
Range,�56–58
exclusive,�57
inclusive,�57
Ratio�measurement�scale,�12
Raw�residuals,�628
Raw�scores,�20
Real�limits,�in�data�sets,�20–21
Regression�approach,�factorial�ANOVA�
with unequal�n’s,�393
Relative�frequency�distribution,�22–23
Relative�frequency�polygon,�26
Repeated�factor,�478
Repeated-measures�models,�295
Replacement,�simple�random�sampling�
with and�without,�110–111
Research�hypothesis,�122–123
Restriction�of�range,�271
Row�marginal,�222
S
Sample,�definition,�6
Sampled�range�blocking�method,�572
Sampled�value�blocking�method,�572
Sample�prediction�model,�616
Sample�proportion,�208
Sample�regression�model,�615
Sample�size,�19–20,�724–725
Sample�statistics
probability�and,�77–91
univariate�population�parameters
APA-style�paragraph,�69–70
appropriate�descriptive�statistics,�70–71
measures�of�central�tendency,�51–56
measures�of�dispersion,�56–64
SPSS,�65–69
summation�notation,�50–51
Sample�variance,�62–64
Sampling�distribution
difference�between�two�means,�166
full�decision-making�context,�134–136
intelligence�test�case,�135–136
of�the�mean,�112–113
proportion,�208
variance,�242
Sampling�error,�112–113
Scales�of�measurement,�8–12
Scatterplot
measures�of�association,�260–263
two-factor�randomized�block�ANOVA�
assumptions,�601–602
two-factor�split-plot�ANOVA�
assumptions, 543
Scientific�hypothesis,�122–123
Semipartial�correlation,�660–661
Sensitivity,�719–720
Sequential�approach,�factorial�ANOVA�
with unequal�n’s,�393
Sequential�regression�model,�676
commentary�on,�678–679

821Subject Index
Setwise�regression,�678
Significance�tests�and�confidence�intervals,�
622–627
Simple�post�hoc�contrasts
Tukey�HSD,�Tukey–Kramer,�Fisher�LSD,�
and Hayter�tests,�358–361
for�unequal�variances,�Games–Howell,�
Dunnett�T3�and�C�tests,�361
Simple�random�sampling
with�replacement,�110
without�replacement,�111
Simultaneous�logistic�regression,�726
Simultaneous�regression�model,�676
Single�variances,�244–246
Skewed�distribution,�88–89
Skewness
definition,�88
nonzero,�630
Spearman’s�rank�correlation,�272–273
Specificity,�720
Sphericity,�495,�569
Split-plot�design,�500
SPSS
ANCOVA�model,�445–469
chi-square�distribution,�225–231
data�representation,�33–41
dependent�t�test,�188–192
factorial�analysis�of�variance,�395–417
independent�t�test,�180–187
logistic�regression,�727–746
measures�of�association,�276–282
multiple�regression,�682–698
normal�distribution�and�standard�scores,�
91–97
one-factor�ANOVA,�313–334
one-factor�random-effects�model,�508–513
one-factor�repeated�measures�design,�
515–524
simple�linear�regression,�634–647
testing�hypothesis,�145–149
two-factor�mixed-effects�model,�514–515
two-factor�random-effects�model,�513–514
two-factor�randomized�block�design�
for n > 1,�603
two-factor�randomized�block�design�
for n = 1,�589–563
two-factor�split-plot/mixed�design,�526–548
univariate�population�parameters,�65–69
variances,�252
Standard�curve,�79
Standard�deviation
constant�relationship�with,�82–83
sample�variance,�62–64
Standard�error
difference�between�two�means,�167
difference�between�two�proportions,�213
of�the�mean,�114–115
proportion,�209
Standard�error�of�estimate,�624
Standardized�regression�model,�
618, 664–665
Standardized�residuals,�628
Standard�scores
College�Entrance�Examination�Board�(CEEB)�
score,�86
IQ�score,�86
normal�distribution�and,�77–99
T�score,�86
z�scores,�84–86
Standard�unit�normal�distribution,�
80, 757–759
Statistical�hypothesis,�122–123
Statistical�significance,�vs��practical�significance,�
138–139
Statistic,�definition,�6
Statistics
definitions,�5–7
history�of,�4–5
scales�of�measurement,�8–12
value�of,�3–4
variables,�7–8
Stem-and-leaf�display,�28–29
Stepwise�logistic�regression,�726
Stepwise�selection,�677–678
Studentized�range�test,�358,�776–779
Studentized�residuals,�628
Summation�notation,�50–51
Symmetric�around�the�mean,�87
Symmetric�distributions,�27,�88
T
t�distribution,�140–142,�760
Template�and�APA-style�write-up
ANCOVA�model,�469–471
chi-square�distribution,�234–235
dependent�t�test,�196–198
factorial�analysis�of�variance,�417–419
independent�t�test,�195–196
linear�regression,�650–652
logistic�regression,�749–751
measures�of�association,�286
multiple�regression,�701–703
normal�distribution�and�standard�scores,�
98–99
one-factor�ANOVA,�334–336

822 Subject Index
testing�hypothesis,�155–156
variances,�253
Tetrad�difference,�ANOVA,�383
Tied�ranks,�10
Transformations,�632
Trend�analysis,�349
True�experimental�designs,�429
True�experiments,�443
t�test,�140,�142–145
correlated�samples,�165
dependent,�176–180,�188–192,�196–198
dependent�samples,�164–165
independent,�167–174,�180–187,�
195–196
independent�samples,�164–165
paired�samples,�165
Welch,�313
Two-tailed�test�of�significance,�128
Type�II�error�(β),�134–138
U
Unbalanced�case,�296,�312
Unequal�n’s,�296,�312
Ungrouped�frequency�distribution,�
19, 21–22
Unit�normal�distribution
description,�80
transformation�of,�82
Univariate�analysis,�260;�see also�Univariate�
population�parameters
Univariate�population�parameters
APA-style�paragraph,�69–70
appropriate�descriptive�statistics,�70–71
measures�of�central�tendency
advantages,�55–56
disadvantages,�55–56
mean,�54–55
median,�53–54
mode,�51–53
measures�of�dispersion
advantages,�64
deviational�measures,�58–64
disadvantages,�64
H�spread,�58
range,�56–58
SPSS,�65–69
summation�notation,�50–51
Unstandardized�regression�model,�615–617,�
661–664
Untied�ranks,�10
V
Variables
definition,�7
types,�7–8
Variable�selection�procedures,�676
Variance�error
of�the�mean,�113–114
proportion,�209
Variance�error�of�estimate,�624
Variance�of�the�residuals,�624
Variances
Brown–Forsythe�procedure,�249–251,�313
F�distribution,�243
homogeneity,�248,�310–311
independent,�248–252
O’Brien�procedure,�251–252
sampling�distribution,�242
single,�244–246
SPSS,�252
template�and�APA-style�write-up,�253
traditional�test,�248–249
two�dependent,�246–248
Variance�stabilizing�transformations,�629
W
Welch�t’�test,�172–174,�293
Whiskers,�33
Within-groups�variability,�298
Within-subjects�design,�493
Z
z�scores,�84–86
z�test,�130–133

An Introduction to Statistical Concepts
Copyright
Contents
Preface
Acknowledgments
1. Introduction
1.1 What Is the Value of Statistics?
1.2 Brief Introduction to History of Statistics
1.3 General Statistical Definitions
1.4 Types of Variables
1.5 Scales of Measurement
1.6 Summary
Problems
2. Data Representation
2.1 Tabular Display of Distributions
2.2 Graphical Display of Distributions
2.3 Percentiles
2.4 SPSS
2.5 Templates for Research Questions and APA-Style Paragraph
2.6 Summary
Problems
3. Univariate Population Parameters and Sample Statistics
3.1 Summation Notation
3.2 Measures of Central Tendency
3.3 Measures of Dispersion
3.4 SPSS
3.5 Templates for Research Questions and APA-Style Paragraph
3.6 Summary
Problems
4. Normal Distribution and Standard Scores
4.1 Normal Distribution
4.2 Standard Scores
4.3 Skewness and Kurtosis Statistics
4.4 SPSS
4.5 Templates for Research Questions and APA-Style Paragraph
4.6 Summary
Problems
5. Introduction to Probability and Sample Statistics
5.1 Brief Introduction to Probability
5.2 Sampling and Estimation
5.3 Summary
Appendix: Probability That at Least Two Individuals Have the Same Birthday
Problems
6. Introduction to Hypothesis Testing: Inferences About a Single Mean
6.1 Types of Hypotheses
6.2 Types of Decision Errors
6.3 Level of Significance (α)
6.4 Overview of Steps in Decision-Making Process
6.5 Inferences About μ When σ Is Known
6.6 Type II Error (β) and Power (1 − β)
6.7 Statistical Versus Practical Significance
6.8 Inferences About μ When σ Is Unknown
6.9 SPSS
6.10 G*Power
6.11 Template and APA-Style Write-Up
6.12 Summary
Problems
7. Inferences About the Difference Between Two Means
7.1 New Concepts
7.2 Inferences About Two Independent Means
7.3 Inferences About Two Dependent Means
7.4 SPSS
7.5 G*Power
7.6 Template and APA-Style Write-Up
7.7 Summary
Problems
8. Inferences About Proportions
8.1 Inferences About Proportions Involving Normal Distribution
8.2 Inferences About Proportions Involving Chi-Square Distribution
8.3 SPSS
8.4 G*Power
8.5 Template and APA-Style Write-Up
8.6 Summary
Problems
9. Inferences About Variances
9.1 New Concepts
9.2 Inferences About Single Variance
9.3 Inferences About Two Dependent Variances
9.4 Inferences About Two or More Independent Variances (Homogeneity
9.5 SPSS
9.6 Template and APA-Style Write-Up
9.7 Summary
Problems
10. Bivariate Measures of Association
10.1 Scatterplot
10.2 Covariance
10.3 Pearson Product–Moment Correlation Coefficient
10.4 Inferences About Pearson Product–Moment Correlation Coefficient
10.5 Assumptions and Issues Regarding Correlations
10.6 Other Measures of Association
10.7 SPSS
10.8 G*Power
10.9 Template and APA-Style Write-Up
10.10 Summary
Problems
11. One-Factor Analysis of Variance: Fixed-Effects Model
11.1 Characteristics of One-Factor ANOVA Model
11.2 Layout of Data
11.3 ANOVA Theory
11.4 ANOVA Model
11.5 Assumptions and Violation of Assumptions
11.6 Unequal n’s or Unbalanced Procedure
11.7 Alternative ANOVA Procedures
11.8 SPSS and G*Power
11.9 Template and APA-Style Write-Up
11.10 Summary
Problems
12. Multiple Comparison Procedures
12.1 Concepts of Multiple Comparison Procedures
12.2 Selected Multiple Comparison Procedures
12.3 SPSS
12.4 Template and APA-Style Write-Up
12.5 Summary
Problems
13. Factorial Analysis of Variance: Fixed-Effects Model
13.1 Two-Factor ANOVA Model
13.2 Three-Factor and Higher-Order ANOVA
13.3 Factorial ANOVA With Unequal n’s
13.4 SPSS and G*Power
13.5 Template and APA-Style Write-Up
13.6 Summary
Problems
14. Introduction to Analysis of Covariance: One- Factor Fixed-Effects Model
14.1 Characteristics of the Model
14.2 Layout of Data
14.3 ANCOVA Model
14.4 ANCOVA Summary Table
14.5 Partitioning the Sums of Squares
14.6 Adjusted Means and Related Procedures
14.7 Assumptions and Violation of Assumptions
14.8 Example
14.9 ANCOVA Without Randomization
14.10 More Complex ANCOVA Models
14.11 Nonparametric ANCOVA Procedures
14.12 SPSS and G*Power
14.13 Template and APA-Style Paragraph
14.14 Summary
Problems
15. Random- and Mixed-Effects Analysis of Variance Models
15.1 One-Factor Random-Effects Model
15.2 Two-Factor Random-Effects Model
15.3 Two-Factor Mixed-Effects Model
15.4 One-Factor Repeated Measures Design
15.5 Two-Factor Split-Plot or Mixed Design
15.6 SPSS and G*Power
15.7 Template and APA-Style Write-Up
15.8 Summary
Problems
16. Hierarchical and Randomized Block Analysis of Variance Models
16.1 Two-Factor Hierarchical Model
16.2 Two-Factor Randomized Block Design for n = 1
16.3 Two-Factor Randomized Block Design for n > 1
16.4 Friedman Test
16.5 Comparison of Various ANOVA Models
16.6 SPSS
16.7 Template and APA-Style Write-Up
16.8 Summary
Problems
17. Simple Linear Regression
17.1 Concepts of Simple Linear Regression
17.2 Population Simple Linear Regression Model
17.3 Sample Simple Linear Regression Model
17.4 SPSS
17.5 G*Power
17.6 Template and APA-Style Write-Up
17.7 Summary
Problems
18. Multiple Regression
18.1 Partial and Semipartial Correlations
18.2 Multiple Linear Regression
18.3 Methods of Entering Predictors
18.4 Nonlinear Relationships
18.5 Interactions
18.6 Categorical Predictors
18.7 SPSS
18.8 G*Power
18.9 Template and APA-Style Write-Up
18.10 Summary
Problems
19. Logistic Regression
19.1 How Logistic Regression Works
19.2 Logistic Regression Equation
19.3 Estimation and Model Fit
19.4 Significance Tests
19.5 Assumptions and Conditions
19.6 Effect Size
19.7 Methods of Predictor Entry
19.8 SPSS
19.9 G*Power
19.10 Template and APA-Style Write-Up
19.11 What Is Next?
19.12 Summary
Problems
Appendix: Tables
References
Odd-Numbered Answers to Problems
Author Index
Subject Index

What Will You Get?

We provide professional writing services to help you score straight A’s by submitting custom written assignments that mirror your guidelines.

Premium Quality

Get result-oriented writing and never worry about grades anymore. We follow the highest quality standards to make sure that you get perfect assignments.

Experienced Writers

Our writers have experience in dealing with papers of every educational level. You can surely rely on the expertise of our qualified professionals.

On-Time Delivery

Your deadline is our threshold for success and we take it very seriously. We make sure you receive your papers before your predefined time.

24/7 Customer Support

Someone from our customer support team is always here to respond to your questions. So, hit us up if you have got any ambiguity or concern.

Complete Confidentiality

Sit back and relax while we help you out with writing your papers. We have an ultimate policy for keeping your personal and order-related details a secret.

Authentic Sources

We assure you that your document will be thoroughly checked for plagiarism and grammatical errors as we use highly authentic and licit sources.

Moneyback Guarantee

Still reluctant about placing an order? Our 100% Moneyback Guarantee backs you up on rare occasions where you aren’t satisfied with the writing.

Order Tracking

You don’t have to wait for an update for hours; you can track the progress of your order any time you want. We share the status after each step.

Order Now Talk to Us

Areas of Expertise

Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

Essay

Thesis

Presentation

Dissertation

Term Paper

Research Paper

Book Review

Assignment

Report

Case Study

Letter

Article

Coursework

Speech

Q & A

Critical Thinking

Areas of Expertise

Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

Essay

Thesis

Presentation

Dissertation

Term Paper

Research Paper

Book Review

Assignment

Report

Case Study

Letter

Article

Coursework

Speech

Q & A

Critical Thinking

Trusted Partner of 9650+ Students for Writing

From brainstorming your paper's outline to perfecting its grammar, we perform every step carefully to make your paper worthy of A grade.

Preferred Writer

Hire your preferred writer anytime. Simply specify if you want your preferred expert to write your paper and we’ll make that happen.

Grammar Check Report

Get an elaborate and authentic grammar check report with your work to have the grammar goodness sealed in your document.

One Page Summary

You can purchase this feature if you want our writers to sum up your paper in the form of a concise and well-articulated summary.

Plagiarism Report

You don’t have to worry about plagiarism anymore. Get a plagiarism report to certify the uniqueness of your work.

Free Features $66FREE

Most Qualified Writer $10FREE
Plagiarism Scan Report $10FREE
Unlimited Revisions $08FREE
Paper Formatting $05FREE
Cover Page $05FREE
Referencing & Bibliography $10FREE
Dedicated User Area $08FREE
24/7 Order Tracking $05FREE
Periodic Email Alerts $05FREE

Our Services

Join us for the best experience while seeking writing assistance in your college life. A good grade is all you need to boost up your academic excellence and we are all about it.

On-time Delivery
24/7 Order Tracking
Access to Authentic Sources

Academic Writing

We create perfect papers according to the guidelines.

Professional Editing

We seamlessly edit out errors from your papers.

Thorough Proofreading

We thoroughly read your final draft to identify errors.

Thorough Proofreading

We thoroughly read your final draft to identify errors.

Delegate Your Challenging Writing Tasks to Experienced Professionals

Work with ultimate peace of mind because we ensure that your academic work is our responsibility and your grades are a top concern for us!

Check Out Our Sample Work

Dedication. Quality. Commitment. Punctuality

It May Not Be Much, but It’s Honest Work!

Here is what we have achieved so far. These numbers are evidence that we go the extra mile to make your college journey successful.

0+

Happy Clients

0+

Words Written This Week

0+

Ongoing Orders

0%

Customer Satisfaction Rate

Process as Fine as Brewed Coffee

We have the most intuitive and minimalistic process so that you can easily place an order. Just follow a few steps to unlock success.

Call Us +1 (877) 657-8180 Discuss Order Details Now

See How We Helped 9000+ Students Achieve Success

We Analyze Your Problem and Offer Customized Writing

We understand your guidelines first before delivering any writing service. You can discuss your writing needs and we will have them evaluated by our dedicated team.

Clear elicitation of your requirements.
Customized writing as per your needs.

We Mirror Your Guidelines to Deliver Quality Services

We write your papers in a standardized way. We complete your work in such a way that it turns out to be a perfect description of your guidelines.

Proactive analysis of your writing.
Active communication to understand requirements.

We Handle Your Writing Tasks to Ensure Excellent Grades

We promise you excellent grades and academic excellence that you always longed for. Our writers stay in touch with you via email.

Thorough research and analysis for every order.
Deliverance of reliable writing service to improve your grades.

Place an Order Start Chat Now

SPSS questions

What Will You Get?

Premium Quality

Experienced Writers

On-Time Delivery

24/7 Customer Support

Complete Confidentiality

Authentic Sources

Moneyback Guarantee

Order Tracking

Areas of Expertise

Essay

Thesis

Presentation

Dissertation

Term Paper

Research Paper

Book Review

Assignment

Report

Case Study

Letter

Article

Coursework

Speech

Q & A

Critical Thinking

Areas of Expertise

Essay

Thesis

Presentation

Dissertation

Term Paper

Research Paper

Book Review

Assignment

Report

Case Study

Letter

Article

Coursework

Speech

Q & A

Critical Thinking

Trusted Partner of 9650+ Students for Writing

Preferred Writer

Grammar Check Report

One Page Summary

Plagiarism Report

Free Features $66FREE

Our Services

Academic Writing

Professional Editing

Thorough Proofreading

Thorough Proofreading

Delegate Your Challenging Writing Tasks to Experienced Professionals

Check Out Our Sample Work

It May Not Be Much, but It’s Honest Work!

0+

Happy Clients

0+

Words Written This Week

0+

Ongoing Orders

0%

Customer Satisfaction Rate

Process as Fine as Brewed Coffee

Share Your Requirements

Place Order & Deposit Funds

Release Payment to Your Writer

See How We Helped 9000+ Students Achieve Success

We Analyze Your Problem and Offer Customized Writing

We Mirror Your Guidelines to Deliver Quality Services

We Handle Your Writing Tasks to Ensure Excellent Grades