SOCW 6311 wk 4 Discussion: Selection of a Statistical Analysis Approach
300 to 500 words questions in bold then answers
Though data analysis occurs after the study has completed a data collection stage, the researcher needs to have in mind what type of analysis will allow the researcher to obtain an answer to a research question. The researcher must understand the purpose of each method of analysis, the characteristics that must be present in the study for the design to be appropriate and any weaknesses of the design that might limit the usefulness of the study results. Only then can the researcher select the appropriate design. Choosing the appropriate design enables the researcher to claim the data that is potential evidence that provides information about the relationship being studied. Notice that it is not the statistical test which tells us that research is valid, rather, it is the research design. Social workers must be aware of and adjust any limitations of their chosen design that may impact the validity of the study.
To prepare for this Discussion, review the handout, A Short Course in Statistics and pages 210–220 in your course text Social Work Evaluation: Enhancing What We Do. If necessary, locate and review online resources concerning internal validity and threats to internal validity. Then, review the “Social Work Research: Chi Square” case study located in this week’s resources. Consider the confounding variables, that is, factors that might explain the difference between those in the program and those waiting to enter the program.
Post an interpretation of the case study’s conclusion that “the vocational rehabilitation intervention program may be effective at promoting full-time employment.”
Describe the factors limiting the internal validity of this study, and explain why those factors limit the ability to draw conclusions regarding cause and effect relationships.
Resources
Dudley, J. R. (2014). Social work evaluation: Enhancing what we do. (2nd ed.) Chicago, IL: Lyceum Books.
Chapter 9, “Is the Intervention Effective?” (pp. 226–236: Read from “Determining a Causal Relationship” to “Outcome Evaluations for Practice”)
Plummer, S.-B., Makris, S., & Brocksen S. (Eds.). (2014b). Social work case studies: Concentration year. Baltimore, MD: Laureate International Universities Publishing. [Vital Source e-reader].
Read the following section: is uploaded
“Social Work Research: Chi Square” (pp. 63–65)
Trochim, W.M.K. (2006). Internal validity. Retrieved from
https://conjointly.com/kb/construct-validity/
Make sure to click on all the links in the narrative
Statistics for Social
Workers
J. Timothy Stocks
tatrstrrsrefers to a branch ot mathematics dealing ‘”‘th the direct de
tion of sample or population characteristics and the an.ll)’5i• of popula·
lion characteri>tics b)’ inference from samples. It co•·ers J wide range of
content, including th~ collection, organization, and interpretJtion of
data. It is divided into two broad categoric>: de;cnptive >lathrics
and
inferential >lJt ost ics.
Descriptive statistics involves the CQnlputation of statistics or pnr.1meters to describe a
sample’ or a popu lation _~ All t he data arc available and used in <.omputntlon o f t hese
aggregate characteristics. T his may involve reports of central tendency or v.~r i al>il i ty of
single variables (univariate statistics). ll also may involve enumeration of the I’Ciation-
sh ips between or among two or moo·e variables’ (bivariate or multivariJte stot istics}.
Descriptiw statistics arc used 10 provide information about a large m.b> of data in a form
that ma)’ be easily understood. The defining characteristic of descriptive ;tJtistks b that
the product is a report, not .on inference.
Inferential statisti<> imolvc’ the construction of a probable description of the charac· Descriptive Statistics
Measures of Central Tendency Arir!Jmeric .\1ea11. The arithmetic mean usually is simply called the mca11. It also is called 75 76 PA11 f I • OuANTifAllVi AffkOAGHU: fouHo~;noM Of Ot.r”‘ CO ltf(TIO’J
~, =l:: X , 1
where 11 represents the popu I at ion mean, X represems an individual score, and rr is t he The formula for the sample mean is the same except t hat the mean is represented by – l:;X X= –. Following are t he numbers of class periods skipped by 20 seventh-graders d uring l:;X 219 • II 2 0
Mode. The mode is the most frequently appearing score. It really is not so much a measure quency distribution and determining which score has t he greatest fre-
TABLE 6 . 1 Truancy Scores Score
20 18 7
1 6
IS 4
1 3
1 2
II
10 6 5
4
3 2 frequ ency
2 3 2 0 I 0 0 Because 17 is the most frequently appearing number, the mode (or Unlike the mean or median, a distribution o f scores can have more ,llfedinrr. lf we take all the scores in a set of scores, place t hem in o rder There a.re 20 scores in the previous example. The median would be Measures of Variabi li ty de
information on how “spread out” scores in a d istribution are.
J If R
:.aJ de c .. …nu 6 • STAnsnu t<~~ Soc&AL Wouta~ 77
to the maximum ( highest) score. h is obtained by subtracting the 111ini murn score flom Let us compute th.- rang.- for the following dJt.l ~ct:
/1, 6, 10, 14, 18,22/.
‘T’he n1inimum i!) 2, and tht.” tnJximum is 22:
Range = 22 – 2 20.
Sum ofSquaus. The sum of squares is a measure of the total amount of variability in” set The formulas for sample and population sums ot squares are the same except for sam- SS = I(X ~tl’
Using the dJtJ set fo r t11e range, the sum of squnres would be computed as i n
‘ldble6.2.
V.~rinuce. Another name for variance i~ mean square. This is short for mean of squared ss n whc1e cr2 is the syn>bol for populn tion variance, SS is the symbol fo r sum of squares, and The variance for the example we used to compute TAOLE 6.2 Computing the Sum of Squares
X X m
2 tO
6 6
10 ]
l 18 >6
12 10
NOTE, !X~ 72; n- 6; ~ • 12; l:(X – p)’ ~ 780
(X – m)’
100
36
4 2 280 6 The sample variJnce is not an unbi.as.ed estin1a1o1 ss II – I CHA,Ut 6 • Sr”n~nn HJa SOCIAl wouus 77
to the maximum (highc;t) score. h is obtained by subtracting the minimum scoo·c from let us compute the rnnge for the following data set:
12. 6, 10, 14, 18.221 .
The minimum is 2. and the maximum is 22:
Range 22-2 = 20.
Sum of8qo~t~res. The ,um of squares;, a measure of the total amoun t o f variability in a set The formulas for <.omple and popul.llion sums of squares are the ~arne except tor S ss l.(X -X)’
Usi ng the data set for the range, t he su m of squares would be computed ns i n \~rta11u. Another name for variance is mean square. This is short for menn of 51JIUtred ss n where o ‘ is th e symbol foo· population v•o·ia.nc.e, SS is t he symbol fo o· Slim o f squares. a11d The •-..ria nee for the example we used to compute TABu 6.2 Computing the Sum of Squares
X X-m
2 – 10
6 -6
10 -2
14 +2
18 +6
22 +10
HOT£: r.x- 72: n; ti; p = 12: l:lX Ill’= 250.
(X- m)’
100 j(,
4 J&
tOO
280 6 The snmple variance is uot Jn \Ulbiased estimalor ss r =-. 78 PAll I • QuAiuu.ot.nvt A”MACH(S.:. FouHDAIIOif”i Of O.AIA CoLLfcnow
The n – 1 i> a correction fac tor for this tendency to undcre>tima te. I t is c.1 lled .1 280 6 – 1 Sumdard Deviatron. Although the variance is a measure of average variability associJtc’ Using the same .ct of numbers as before, the population standard deviation would be
cr -/46.67 = 6.83 .
and the sample st.mdard deviation would be
s J56 = 7.’18.
For a normally d istribured set of scores, n ppwximately 68% of all ;cores will be within Measures of Relationship One can use ,·eg,·cssion procedures to dcrivr the line that best fo ts the data. This line is Y,_. = 3.555 t 1.279X,
where Yis fi-equ ency o f Slope is the ch•ngc in Y for a unit increase in X. So, the slope of 11.279 meam that”” The equation does not give the actual value of Y (called the obt.tined or obserwd – r iQUIO 6.1 8
Frequency ol Stre 0 Punishment
~ c . Y P’td; – 3.555 + 1.279X .. 3 til 2 0
0 0 Stressors
example, if X were 3 , rhen we would predi<.t t hal Y would be - 3.555 + 1.279(3) ~ - 3.5
55
+ 3.837 ~ 0.282.
Tuu 6 . 3 frequency of Sue-ssors Pun1.shm~nt
3 0
4 4 }
s 3 7 ~
8 6
7 q 8
1() 9
T he regression li ne is the line that predicts Y >UCh t hat t he error E= Y Y..,.. ..
\\~1en X= 4, there arc two obL1ined ”alues of Y: I and 2. The Y,,…t = – 3.555 I 1.279( 4) = – 3.555 + S. l l6 ~ 1.56 1. E – 2 – 1.561 = +0.<139fnr Y=2 .
If we square each error difference score and sum the squares.
then we get a quantity called the enor sum of sq.ure;., which i;. SSI: L( Y – Y,..,.,)’.
T he regressi011 line io !he o ne line that give> the sm.11lcst va lue 80 P~oar 1 • QUAtHnAnvE A ,ROACHES: FouNOAHO~r~~$ of DAtA Conte I!Otf
The SSE is a measure of the lOla I variability of obtained score values around their pre- The total sum of squ.m:s (SS1) i$ a measure of the total variabilit)’ of the obtained SST = L(Y-Y)’.
The remaining sum of squa 1·cs is coiled the regression sum of S SSR L( v, …. – Y)’.
The SSR is a measure of the tot.d variabil ity of the predicted score values around the An important and interesting feature of the>e three sums of squares is that the sum of SST SSR- SSE.
This leads us to three o ther imponnnt stat istics: t he proportion of variance explJined Proportion of \Iarin nee Expluir~ctl. T ht I’VE is a measure of how good Lhc rcs,·cssion line SSR SST
There also is a computational equation for the PVE. which is
where
PVE – ( SSXY )’ SSXY is the “co variance” ~um of ;qua res: l.(X – X)( Y – Y ), The procedure fo r computing these sums of squares is outlined in Table 6.4. explained by stressors experienced ;,
( 4 6L5)1 3782.25 (48.1)(825) 3968.25 TABLE 6.4 Computation of r2 (PVE)
y Y – y (Y- Y)’ X X x (X – X)’ (X X)( Y Y) 4 -2 3 5.29 -lS 12 .25 +80S
4 -23 529 2 -15 6 .25 < 5.
75
5 – Ll 1.69 3 1.5 2.25 • 1.95
6 -ol 0 .09 < -o5 0.25 0 IS
7 +0./ 0.49 5 ·10.5 0.25 035
8 + II 2.89 6 ; 1.5 2 .25 • 2.55
7 TO.! 0.49 7 12.5 6 .25 11.75
9 +27 7.29 R t3.5 12.25 -19. 45
10 +3 I 13 69 9 “‘5 20.25 16. 65
NOTE: Y – 6.3; SSY – 48. l; X = 4.5; S5X = 82.5; S5XY • •6 l S
The PVEsometimes is en lied th~ coefticient of determination and is represented by the Correlation Co~ffirirm. A correlation coellicient also is a 111easure of th e strength of rela- SSXY For our examph: data, the correlation coefficient would be
+61.5 ~ 61.5 +61.5 ./(18.1)(82.5) ¥’3968.25 62.994
Standard Error of Em mate. The standard error of estimate is the The fi rst s tep is to compute the variance error (s:.J:
..1 SSE Notice that the value for degrees of freedom is 11 2 rather than 11 – l. The reason why ‘l he standard error of estimate is the square root of the variance error:
Sf.= …j(ij.
The standard error of estimate tells us hO\v spread out scores are with respect to their We can calculate the standard error of estimate using the foUowing computing formula:
( n-1) ( I — r 2)(——-) , u-2 s,. is the standard deviation of Y, n is tl1e sample si7.c.
for the example dat..1, this would be
S£ = 2.3lli ((J — .953) :~ = D = 2.311 ((0.47)~) Inferential Statistics: Hypothesis Testing
The Null and Alternative Hypotheses
Classical ;tatistical hypothesis testing is based on the evaluation of two rival hypothescs: We try to dete<:t relationsh ips by identifying changes that are unl ikely to have occurred
simp!)• bccau~e of random fluctuat ions The null hypothesis is the hypotltcsis that there is no relationship between two vari- Statistical hypothesis tests arc carried out on samples. for example, in nn experi- Figure 6.2 C14Anu 6 • StAJtmu f the popula tion of all individuals as if they had received the i.ntervt•ntion. Th e control lf the intervention had no effect, then th e populations would be iden tical. However, it Statistical hypothC$is tests invoh·e e’-aluating evidence from .amples to make inler- or as
H, : ll = ~to = 0.
H, stands for the null hypothC$iS. It is J letter H with J ” ro subscript. It is a statement To <:>tablish that a relat ionship exists between th e in tervention (independent Vilfi:tble) Strictly speaking, we do not mak~ J decision as to whether the nul] hypoth eoi:. is vVhen we reject the n ull hypothesh and it is true, we ltJve committed a Type I error. By Tbe probability that we will fail to reje Situahon: NULL HYPOTH ESIS TRUE
Deas1on ACSlllt
Reject H, 1’ype I Error Fail to Reject H, Correct Decision
I a= the probability of not rejecttng the Nun Hypothesis wllcn
11 is true. 84 PAII t I • Qv.umr:.WI~ A PI’I\OACHH: Fourwt. lt Figure G.:Y The fol!pwing hypothesis would be evaluated by c<>mparing the difference between If’ we carried out multiple samples from populations with identical. n>eans (the null The mean d ifference for the total distribution of samp le means is 0, and the standard On the other hand, we would expect to fin d a difference more than 9.8 about 1 in 20 The probability that a relationship or a difference of a certain size would be seen in a 1 – u = .95
– 4 – 3 – 1 0 +1 +3 +4 – 20 – 15 – 10 – 5 0 +5 +10 +15 +20 CH..,tU 6 • Sr.r.nsnu •o• SoctAt Wo~·~ui 85
Rejecting the H0: We believe that it i~ likely that the relationship in the sample IS gcncr Not rejutmg the H,; We do not believe that we have >umcient e1•idence to draw infer- For the previous example, let us imagine that we ha-e set a= .OS. Al;o, imagine thJt we Some texts create the impression that the alternative (or research or experimental) When we are inter<5ted in an effect of a particular •ize. we use a specific altemnti1e
hypotbesil. that takes the following form:
H, : f.l 1 – ~,.,;:: id I,
where dis a difference of a particular size. If the test is a nondirectional I<'St, then the dif-
ference in the alternative hypothesis would be expressed as an absolute value, ldl, to ohnw
that either ,t positive or neg.tt tve differe~tct~ ;, involv~d.
lt is custo mary to exprc>S the mea11 d i ffere nce in an II , in units of standard deviat ion. Cohen (1988) groups effect sizes into small, medium , and large cntegorics. The criteda Small effect >iu (d ~ .2): It is appro:rimatcly the effect size for the average difference in Medium effect size (d • .5): It is ap proximately the effect size fo r t he average differc11ce Large cff l ntuit iv<:ly. it would se..-m t hat we wo uld want to detect even ve1y >mall effect si ~t·s in Because ‘cry large sample sizes require resources thdt might not be readily available, If we rejeCt t he null hypothesis, t hen we implicitly huvc decided that t he evidence >Up- 86 P1o11r I • Qt•MmTM •; e A ?PI\OAC HtS: Fou NDAti ON) o, 0.-.tA Contr’fiO’I
Figur• 6 .4
The Null Hypoth Decision Reject 1io
Fail to Reject
H•
Siluation: ALTERNATIVE HYPOTHESIS TRUE
Result
Correct 0 edslon
1 -13 a t he probabinty of rejecling tho Null Hypothesis when the Type II E n· or obability of not rejecling the Null Hypothesis w11e 11 the Beta(~) is t he probdbility o f committing a Type rr error. This probability is eStdblished We should decide on the power (I – (3) as well as the a level before we carry out a sta- Assumptio ns for Statisti cal Hypothesis Tests
Although assumptions arc diffc •·cm leu different tests, all tests of the uull hypo1 hcsis shn re T he randomness assum ption is t hnt sample members m ust be randomly selected from The mathematical models that underlie statistical hypothesis testing depend on ran- The independence asswnption t.\ that one member’s score • Again, the mathematical models are dependent on the independence of sample scores. Parametric and Nonpara metric Hypothesis Tests
Traditionally. hypothesis tests arc g rouped into parametric and nonp.trJntCt ric tests. T he Parametric tests are based on the assumption that t he populations from whkh the C HAJ>TEJI 6 • STATI \11(~ 1011: SOCIAl WO !U({I\S 87
assumption. T hus, a non parametric test can be carr ied out on a broader range of data When the populations from which we sample are nor mally distributed , and when all Specific Hypothesis Tests
\•Ve now investigate several frequently used hypothesis te.m and issues surrounding their Single-Sample Hypothesis Tests
These are tests i n which a single sample is drawn. Comparisons are made between sample For example, we might wish to gather evidence as to whether a particular population Typicrully, these tests are not used for experiments. T hey tend to be used to demonstrate Here, we investigate two single-sample test~:
L Single-sample rtest (interval or ratio scale)
2. x’ (chi-square) goodness of fit test (nominal scale)
TIJe Single-Srmrple t Test. This rest usually is used to sec whether a strotum of a population The null hypothesis for t his test is t hat the mean wages fo r a particular strntum where !lo is the mean wage fo r the population and ~t 1 is the mean wage fo r t he stratum. Randomness: Sample members must be randomly drawn from the pop ulation.
fndeptmdence: Sa mple (X) scores rnust be independent of each other.
Sct1liug:The dependent m~sure (X scores) must be interval or ratio.
Norma l distribr 88 PAIIT I • QUANnrAnVf At-nOA.t-H£s: Fo u i\OAnotn o• OA t A Cou.£CIION
These asswnprioos are li Violation of the assumption of,, normal distribution will introduce >Ome error into 42
Jctu ally will be a p value of .057). This is what is meant whe n some-one snys t ha t the t test T he tstatistic fo•· t he sing le sample t te;t is computed by subtr:ocr ing t he null hypotbe- T he fo rmu la for r…,, (pronOlii1Ced “t obr•ined”) is
As the absolute value of ‘·• get> larger, tht> more unlikely it is that such a difference T he critical value oft (the v.d ue t hat too. must equal or exceed to reject the null hypoth- Let us look at how to compute ‘”k tient rehabilitation p rogram r-or .o certain injury, X, is 46.6 d ays. We w ish to see whethe r We randomly sa mple 16 fil e< from the pa>t year, We review these c.1>cS anu dete•mine To determine thi>, we ne..’ Th e standard erro r of the lliCJn i> calculated by d ividing the standard deviation by t he s ;
_s_ = l 1.888 = l 1.888 =
2 72.
/ii Jl6 4
We take th e fo rmu la for t,,…, Joel p lug in our n umbers 10 obLain
29.875- 46.6
2.972
-1 6.725 8 We look up the tabled t val u e {I., ) at 15 degrees offreroom. This turns out to be 2. 131 than t”” = 2.131, so we reject the n ull hypothesis. The e-.-idencc suggests thot clicnls in o ur T he effect size index for a test o f means is d and is computed as follows fo r a single- d = ~~o . The effect size for our example would be as follows:
d = 29.875 – 46.6 which would be classifie d as a large effect.
-16.725 1he x’ Cootfne;s-of· Fit Test . Th e.%’ goodness- of-fit test is a single·sam pic test. lL is used in T he null hypothesis for !he x’ test is that the population from which the s.1mple has where P., is the proportion o r case~ •.vitbin category kin the null hypothesis population P01 is the proportion of cases within category k in the population from which the test The assumptio n> fo r thet’ goodness-of fit test arc as follows:
• Randomness: Sample members m ust be randornly drawn from the populnt i<)ll.
• Independence: Snmplc scores m ust be independent of each other. O ne im plication of
this is that categories must be mut ually cxclu;ive (no case may appear in more than • Scaling: The dependent measure (categories) m ust be nominal. and no more than 20% of the expected frequencies should be less than 5.
As “ith all tests of !he nuU hypothesis, the x’ test begins with the assumptions of ran · Mutually exclusive means t hat an individual may not be in more than one categot)’ per These assumpliom nrc listed more or less in o rder of i.n1portance. Violations of the first 90 PA~-r l • OVAinllAt•vt Al’tfiOo\CI!CS: FouNOo\TION They} goodness-of-fit test is basically a h>rgc-sam plc test. Whc11 the c·xpectcd frequen The usual pt·occdtu’c in this case is either to increase expc led frc<1ucncb b)' colbp, ing
adj.>ccnt C<>tcgorics (also called cells) <>r to u. The workers at the Interdenom ina tional Social Services Center in St. Win ifre d The workers randomly sampled 50 clients from those seen during the previous year. TABLE 6.5 Expected Frequencies for Religious Preferences
Expected Christi (In
J2
Jewish
5 t\i1uslim Other/No Preference Agnostic/ Atheist
4 7 2
Two (40%) of our expected frequencies (Muslim and agnostichlllteist) are less than 5. However, we could increase the sample size. To get a sample in which onl)• one (20%) 0.08 • 11 = 5
” = – 5- = 62.5 “‘ 6J. So, our sample size would need to be 63, givi11g us th e expected frcq ucncio.:> show11 in TABLE 6.6 New Expected Frequencies for Religious Prefere~ce; ‘ · < · ;. : •: •: •
. . ~ ' * •
Christian Jewish Muslim Other/No P(eference Agn ostic:/ Atheist
————————– frcq uc:nc;· TABLE 6.7 Observed and Expected Frequencies for Religious Preferences
Christian Jewish Muslim Other/ No Preference Agno$tic/ Ath~isl:
Expected 40.3L &.30 5.04 8 .82 2.52 Obse1·.-cd 49 2 2 9 The null hypothesis fo r this example is th;~ l the p roporlion of peo ple living in St. The null hypoth~sis expresses the expectation that observed and expected frequencies /v IJ&
X2 = “‘ (Jo – rd 0 0 1 L- fE .
T he form ula tells us to >U btract the e xpe The x\.,. is evaluated by comparing it to a cr-itical value For ax’ goodness of fit, the degrees of freedom are equal to the number of ,,ategories The critical value fo r X’ at C< = .05 an d df =4 is X' .," = 9.49. We have calculllted 7.'.,., as
23. 1295. Because X1<,1>1 is greater than X.~ena , we reject the null hypothesh:. The evidence .sug- Earlier, we discussed the use of t he effect size measure d for the t test. Jt is an appropri- 92 PAI\T I • Q UAIITI TA.Tivt A PPfiOAW £s: fou~OAliONS O f DATA Coll.ECTI OM
TABLE 6.8 Computation of x’ …
Observed (f ) Expected (f,) fo – fe lfc – f,)’ (f.- t,)’
f,
49 4032 +8.68 75.3424 17.4404
2 6-30 -4.30 18.4900 2.9349
2 5 04 – 3.04 9.24 16 1.8337
9 .8.82 – 0. 18 0.0324 0.0037
2.S2 – 1.52 2.310• 0.9!68
!’JOT!.: I 17,4404 + 2.9349 + I 8337 + 0.0037 + 0.9168= :t’,, = 23.1295. means. It compares frequencies (or proportions}. Therefore, a d ifferent effect size index is Small effe
Medium effect size: w ~ .30
Large effect size: w ~ .50
The effect size c.oefficient for a x! goodness-of-fi t test is computed according to the fol- where N = the total sample size. IV= J(23.! 295/ 63}- J(0.367l) = 0.6059,
which would be classiGed as a large effect.
Hypothesis Tests for Two Related Samples We· investigate three examples of two related sample tests in this section:
I. Dependent (matched, paired, correlated) samples t test (in terval or ratio scale)
2. Wilcoxon matched pairs, signed rank.~ test (ordinal scale)
3. McNemar change test ( nominal scale) C1MPH~ 6 • Sunsncs FOR Sot-IAt \’IOKKUlS 93
Difference Scores. The dependent r test and the Wilcoxon matched pairs, signed ranks test x; – X1 =X0 ,
X, is the first of a pair of scores,
x; is the second of a pair of scores. and 0 The null hypothesis for all these tests is that the samples came from popub tions in Tlte Dependenr. Samples t Test. This also is called the correlated, paired, or matched t test. where J.l.xo = the mean diffe rence between the populations from which the samples were )!00 “” the mean difference between the populations specified by the null hypothesis.
Because the null hypotnesis typically Sp
The t statistic for the dependent t test is the mean of the sample differences divided by Xo – l’oo XD
As the absolute va.lue of t. gets larger, the more unlikely it is that such a difference could The assumptions of the dependem t test are as follows:
Randomness: Sample members must be randomly d rawn from the population.
Tndependence: Xvscores must be independen t of each other. 0 No r·mal distribution: The population of X scores must be normally distributed . These a>sumptions a re list ed more or less in order of import>l 11cc. Viola tions of the t1 rsl Violation of th~ assu mption of a normal distribution “ill introduce some error into Still, cvm thoug h t he erwr is sli~;ht, the nonp Let us look at the proc<" Ten clienL~ were rand omly s~kcted r,·om clients seen fo r d ep ression problcn” a t a (l,un – ‘I he ne>.’t step is the cnmpntation of the ‘landard error ol tllc mean. Wedhdde the stan- .< XD = 1.'33/ V 10 - l .;l3j 3 .16 = 0 .•12.
\ Ve plug the value.< into the formula li>r t.,.:
XI> -‘xl’>
– 1 Fo1· a = .05 and rlf ~ 11 – I = 10 – I -9, r, … = 2.262 (sec a t<~nle of critical values for the
1 te,r, nondire.:tional, fo und in m ost stali>Li” texts). Because lt …. l – 2 .. l8 is greater !loan or The cff~ctsi/e index for tbiotc.,l i’ ll and is rom puled a; foUows:
; For the depr~ssion intervention cx,unplc,
-1-0 – 1 1.33 1.33
w hich wou ld be classifier! ns ” medium effect. CHAI’rER 6 • SI All~ucs Hl!t Socu .. l Woll.~Eas 95
lv’ilc&X011 Matched Pairs, Signed Ranks Test. The Wilcoxon matched pairs, signed ranks test The assumptions fo r t he Wilcoxon matched pairs, signed ranks test are as follows:
• Ratufomness: Sample members must be randomly drawn fro m the population. ferences must be converted to ranks).
Let us look at the procedure for computing the Wilcoxon matched pairs, signed ranks Ten clients were randomly selected from clients seen for deprcs.~ion problems at a com- o r
each indi,·idual. We assign a rank to each difference score based on irs closeness to 0. So, if we look at Table 6.9, we see that there is one difference score of 0 that goes TABLE 6.9 Computation of the Wilcoxon T .. ,
Signed Ranks
JD Number Pretcsl Postte.st Difference Rank Positive Negati ve
17 16 – 1 3 3
2 19 t8 -1 3 3
3 18 15 -3 9 9
4 18 17 -1 3 3
s 16 16 0 7 18 16 – 2 7 7
8 21 19 – 2 7 7
9 18 19 .+1 3 3
10 18 16 – 2 7 7
NOTE: Sum of ranks for less, frequent ~ign ~ 6: 9 6 t-‘11111 I • QUAWhlAII\11 Al•f’II(IA(tUI\: r t i UNOATI(Hn ()I I)AlA (.OU I CI101i
T he M We then determine which ,ign (JXl,ithe or neg.ttive) apJl<'ared 1.-s~ fre Th e IC> I. stati>l icis w iled ‘f.,1, . This is an 11 ppcrcase T a nd is not the >flllll’ as the >tatistic There are two other i 1. The Wilcoxon T…, is cvaluat<·d according to rhe ruombtr of nontcro differentc
~cores. So, we should subt ract I from the o rigina l 11 fo r each 2. Unlike most other t~>l &ratistic~. the Wilcoxon T,,, must be lrss tlta11 or equa l to t he We consult a table of critica l values for I he W ilcoxon T(scc t ahlc of .:ritical values for There is n o weD-accepted post h oc measure of effect sizt for Otd in:d tesL~ of rela ted The p1·ocedure bc:gins with compul ing the miniJuum and maximum ~cores for each of We count t he n umber of scores in both groups w ithin this mngc (including rhe end Cohe11 ( 1988) calcula tes equivalent< between U a nd d, which would imply the foUow·
ing definition> of strength of effect:
Small ct rect slzr
Uugc (‘tfect SIZC
d~ ~
d:.8
u- .IS f”Or the example da1~, the minimum scooc for th e prctC\l wa& 16, and the mnximum Of 20 total ‘>(\)1 e.,, 1 ~ f~U with in thi, 1werl.•1> r.onge. The p ru(‘ CHAnt~ 6 • STAT1srtcs rQR SQetAL Wcnrxus 97
.WcNmmr Change Test. The Mc:-icmar change test is used for pre- and post intervention The layout for the McJ-:emar change test is shown in Figure 6.5. Cell A cont.Un> the where P, is t he proportion of cases shifting from+ to- (decreasing) in the null hypothesis P is the proponion of ca,.,; shifting from – to + (increasing) in the ouU hypothesi’ The assumptions for the McNemar change test are sintilar to those for the X’ test:
Rrmrlomness: Sample members must be randomly drawn from the population.
Independence: Withi n-group sa111 plc sco•cs must be independent of each other (although Smling: The dependent measure (categol’ies) must be nomi nal.
F.xpected frequencies: No expected freq ue11cy within a category should be less than 5.
A special case of X’..,, b t he test >tatistic for the McNemar change test:
where t _ (If,. .fi,f – I ) 2 J. =the frequency in Cell A, and Th ·is is a test statistic with df = I , For rlf I , we need to include s·omcthiug called the Figure 6.5
McNemar Change Before +
After
A B 98 PART I • QuAutlfi~T•vt A PI’AOAC HlS! Fou~JDAfiONS OF Ot.rA CotUCliON
Let us imagine that we are interested in marijuana use among high school students. We TABLE 6.10 Observed and Expected Frequencies for the McNemar 2009
None Marijuana
2007
Marijvana 2 (Cell A) 21 (Cell S)
None 31 (Cell C) 11 (Cell 0)
Total 33 32
l’o! 23
42 Cell A repn-serm thMe studeitts who had used marijuaM in 2007 hut who had nOf used So, the sum of Cells A and D is the total number of students whose patterns of mari- In other words, of the I 3 individuals who c.ha11ged their pauern of marijuana usc, “e Tile calculation of the McNemar change test statistic is shown in Table 6. 1 L tistics texts). Because x ‘,., = 4.92, we would reject the null hypolhesis at u = .OS. We would TABLE 6.11 Computation of the McNemar Change Test Statistic
( JI~ – f01)-1
2 11 8
NOTE: 7~1 = 4.923.
64
(If. – f. l- 1 I’ 4 ,9230767 CHAot1U 6 e STATISTICS fO-. SOCI~l W O’-I(rll\ 99
The effect size coefficient for a M’:-lemar change test is wand is computed according For the high school survey,
w = J(4.923/65) “‘ Jo.o757 = 0.2752,
which wo uld be classified as a medium effect.
Hypothes is Tests fQr Two Ind e p e nde nt S amples
These are tests in ‘•hich a sam ple is randomly drawn and individ uals fro m the sample Jrc We investigate three examples of two independent samples tests:
I. Independent samples (group) /test (interval or ratio scale)
2. vV”dcoxonfMann-Whitney (WfM-W) test (ordinal scale)
3. ;(2 test of independence (2 X k) ( uominal scale)
l11depeudent Samples 1 Test. T his sometimes is CJIIcd the g roup t test. It is a test of mcJ.ns Following are the assum ptions of t he independent t rest:
Randomness: Sample members m usr be randomly drawn from the populotion and ran· ltrdepe11dence: Scores must be independent of e.1ch or her.
Scalitrg: The dependenr measure musr be inrervlll or ratio.
Normal distribution: T he populations from which tbe individuals in the samples were Homogeneity of variances (a,’- a ,’): ‘ f he samples must be drawn from populatious Equality of sample sizes ( “• = n,): ‘ I he samples m ust be of the same sir.e.
As before, these assumptions are listed more or less in o rder of imp o rtance. T he fir. r Violation o f the nonnaliry assumption will make for Jess accurate p val ues. However, The independent groups t tesr alw is fair!)’ robu>t .-ith respect to \•iolation of the 100 PAnl I • OUANntAuvt Art~AoAc.ul\~ Fou~~rooAT ION> o• 0″‘” Ct~ur If the ,maller variance •~ mthc “11allca >.~mple.then the probability of,, I ypc II ca ror ( 1101 If there is no ..tSsodarion lk·twt-en s.;1mplt”‘ Mit.’ ~lnd vari:wcc. then ”iol.l1ion of c:.u.h of T he t stat i~tic for the independent 1 lc x , – x2 Sx 1- … ~
Be«luse rwo sample mean• arc computed, 2 degrees of freedom are lost:
df 110 + n, – 2. “• = number of scores for the first group, and
11 = number of scores for the seco11d group.
Following is an example ot the ll>e o( the independent t test statistic. We whh to sec We c\’aluate tl1is with an independent 1 tc.\L The first step in calculating ‘·•• i, to com· $
.,;;; \/sl !.. II p te size (l/11). “ttdnda l’d crroJ’ for a sinr,l<.- ... amplt. Bccauo,r we have two sample:, in ,m iudcpcndt•nt
WOU(JS lCsi, the formula has to he Jitert·tf J bit.
Th~ first difference i\ in the (orrnuiJ for •he: va ria nce. TIH! variM1u: i’ the \Uill o l ss, ss1 CH.t.PHR 6 • Su.nsncs f OR SOC IAL W ORKERS 101
s; is the pooled estimate of the variance based on two groups, 1 SS, is the sum of squares for Group 2,
n is the number of scores in Group J, and
n, is the number of scores in Group 2.
Because there are two groups, we do not multiply s: times (1/n); rather, we multiply it S.\’1-Xl = , (I 1) s- – + -P IlL nz . us tq• computing t..,,.
TABLE 6.12 Group Statistics
Group Mean Sum of Squafcs ” Home care 21.36 17{)7. 16 \4
First, we compute the pooled standard error of the mean (also called the standard ss, + ssl 43:;0.40 + 1101.16 6037.56 From the estimate fo r the pooled vari s2 – +- = ( 1 I) I’ tll ll2, 2 15.63 (~ + ~) = ,128.88 = 5.37 16 14 001 27.88 – 21.36 6.52 5.37 5.37
For ex = .05 and df = 111 + 112 – 2 = I 6 + L4 – 2 = 28, Ia;, = 2.048. Because 1100,1 = 1.213 is 102 PAI!.l I • QuANtiTATIVE AI’P~OACHES: Fou … O-.liOM Of 0ATA co~UtliO’f
There are two post hoc effe<:t size measures for an independent t test. The 11m of these
(d) already has lxen di.cmsed:
Note dlatthe numerator is the difference between the two sample m eanl and that th e Sp = fs~ = V215.63 = 14.68.
The effect size for the example would be
d = 27.88 21 36 = 6.52 = 0.44 which would be classified .ts a 1mallto medium effect size. This is equivalent to the ‘quared point-biserial correlation coefficient and is computed by
2 2 if. We ”’ere com paring socinl nc tivity in c hild ren in after-school care vcrMJ> t hose in home r.,.,. was 1.2 13 with df • 28. Pu tting these numbers in t h e formu la, we obtain the l_ ( 1.213) ” – ( 1.213) 1.471 So, a litde less than 5% of the variability in social activity among the chlldren was Wilcoxon/Mann -Whiwey Test. Statistic> texts used t o reter to this te>t as t he Mann- Tbe W/M-W test is a nonp a1·ametric test th at involves initia lly t reating both samples as The assumptions of the W/M W test are as follows :
Randomness: Sample members must be randomly drawn fr<>m the popuiJtion of inter- C U AI’rtll 6 • S IAHSHCS FOR $o cu._t W ORKU$ 103
Independence: Scores m ust be independent of each othe r.
Scaling: The dependent measure must be ordinal (inter val or ratio scores must be con- ‘When the assumptions of the t test are met, the r test will be slightly more powel’ful let us look at the procedure for com puti ng t he W/M-W test statistic. We use the same The first step in carrying out the W/M· W test is to assign ranks to the scores without TABLE 6.13 Summed Ranks for the Wilcoxon/ Mann-Whitney Test
Summed ranks
After-School Care
n = 16
w,= 218
Home Care
n = 14
w;-= 247
The test statistic for the W/M-W test is u..,,. We begin by calculating U statistics for U 1 = 11J n;z. + lFV1 n2 + (n2 + 1) nt(nt + 1} u, = ,,, tiJ + 2 – w,
= ( 16)( 14) + ( l6)(~6 – I} 2 18 = 126
(] 2 = , J l’l:z. + -=-‘-=,…–‘- w, = ( 16}(14) + ( 14}( 14 – l) 182 2 We choose the smaller U as u;,.,. Ln this instance, u.,. = u, = 68.
247 u •• , m ust be less tlran or equal to the critical value to reject t he null h ypothesis. 104 Po\IU I • 0uAN11tAT!V( A1’1’110M.Ht~ : FOU’IDATIO.,.S or Oo\TA CouH.UO\’
U.,..: 142 is not less than or equal to the critical value, so we fail to rejtct the null hypothe- As before, t here is no well-established effect size measure fo r the W/M-W test. The U For o ur example data, the minimum and maximum fo r t he after -school care g roup X’ Test of lmlcpt!m/ence (2 x k). The assumption> fo r d1e x’ test of indCj>Crtdence are as /lat~dom/les.: Sample members must be rnndo mly dra”‘n from the 1>opulation.
/Jillependl’!lre: Sample scores m ust be independent of each other. O ne implication of Scaling: The dependent measure (categories) must be nominal.
Expmcd frequmcie$: No expected frequency within a category should be less than 1, As wit h all tests of t he null hypothesis. the x2 test begins with t he assumptions of ran- Mwunlly rtclusive meaJlS that nn individual may not be ill more thn n one category per let us imagine that we are interested in marijuana use among high school students and TABLE 6.14 Marijuana Use
None
MatiJuanil
l eta I
Grade
9th 12th
42 33
23 22
65 55
Toto!
75 1 ~0
A higher proport ion of 12th-g raders The usual test used to evaluate such C HAI’TEII: 6 • STATISTIC-S fOR. Soc•AL Wo~Kflt S 105
co uld have come from a po pulatio n in which no such relationship existed (call ed The null hypothesis for this example would be that the same proportion of 9th-graders Because 45 of 120 of the total sample (9th· and 12th-graders) used marijuana during The%’ test evaluates the likelihe>od of the observed frequency departing from the H,: P”‘- P,,= O,
where P , is the pro port ion of cases within category k in the null hypothesis population The X’.,, test statistic is
Degrees o f freedom for a x’ test of independence are computed by multiplying the df= (Row – I )(Colum ns- 1)
TABlE 6.15 Observed and Expected Frequencies for Marijuana Use
None Total
9th
42 (40.625)
23 (24.375}
65 N01’E: Expwcd frequencies are in parentheses.
Grade 33 (34.375)
22 (20.675)
55 For Ollr example, this would be
d/=(2 -1}(2 1)=(1)(1)=1
Re.::all from our dbcussion of the ;’.lcNemar change te:.t that we include the Yates cor X 1 ul>• /c
The form of the equ~tion tells us to suhtr.ltt the expected ;core from the observed The reader might have noticed that t he con ection for the McNemar c hange test wa,l Tnble 6.16 shows how 10 work out the ma rijuJna survey data. 0. 1 09. Bec As before, the effe.::t c measure is “;which is wmputed a• a post h oc measure by
w – Ji.x’/N).
~or a 2 >< 2 tab le, w;, eq ual to the absolute v.tlue of
T AILE 6.16 Compuution of x’ … 42 ~() 615 8/~
lJ 14 375 81~
23 ]4.375 .875
n 20 62~ 875 bbt
(If. – f, J- 0.5)’ (Jf.- f,l – 0 .5)’ 0.7651>2’> 0.019
0.76~6lS 0022
0765675 0.031
0.765625 0.037 CHAI’tfft 6 • Sr.t.nsncs FOil So C-I.t.l WOI\I(US 107
For our example,
w = /(O. J09/t 20) = Jo.ooo90S3 – oo3o i w’ = PVE – .0009.
This is an extremely smaU effect size. Hypothesis Tests fork > 2 Independent Samples
Irnaginc that we wert: in terested in ageist attitudes among sodal \\’Orkers. Specificall)’> we We cotdd conduct independent group tests among aU possible pair ings: hospital (a) with This gives us three tests. When we conduct o ne test at the ex= .05 levd, we have a As the number of comparisons incceases, t·he likelihood of rejecting the null hypothe- One way of dealing with capitalization on chance would be to use a stricter alpha However, there are tests that allow one to detect whether there are any differences If significant differences are detected, then further pair comparisons are conducted to lf we conduct our screening test at a ,. .OS, then we will carry out the pair comparisons We look at three examples of screen ing tests fork> 2 independent samples:
I. One-way analysis o f variance (ANOVA) (interval or ratio scale)
2. Kruskal· Wallis (K· W) test (ordinal scale)
3. X1 test of independence (k x k) (nominal scale) 108 ‘””‘ I • QUANTITATIVl AmtOA.CIILS : fOU”-DATIOJr.S Of DA’rA C.olUCltOh’
One· Way A011dysis of\’ariance. The At\OVA is a test of means. The null hypothesis is
where k is the number of population nocans being estimated. I 10 : &,. = 0.
The test statistic used in A..’\OVA is called F and is calculated as follows:
n_.; where the numerator is the variance of the sample means mu ltiplied by the sample size, The assumptions underlying o ne-way ANOVA are as follows:
Randomness: Sample members must be randomly drawn from the population and randomly Indepelltltllct: Scores must be independent of each other.
Scalir~g: The dependent measure must be interval or ratio.
Normnl distribution: The populations from which the individuals in the sam ples were Homoge11ciry of variances (oi = o~ = .. . = o~): The samples must be drawn from pop· &jualiry of sample sizes (n, = n, = … = 11,): The samples must be of the same size.
ANOVA involves taking the variability among scores and detumining which is vari· The totnl variability of scores is divided into one componenl representing the variability {X – X)~ (X -Xl +(X-X). The X with two bars represems the grand mean, which is the mean of all scores with· C.HAPlUt 6 a STATiiliGS roll: SOCIAl W Oill({fi S 109
This equation illustrates that tbe deviatio n of the particul ar score fro m t he grand mean There are k = 3 g ro ups, with each containing n = 4 scores. The total number of scores There are t hree types of sum of squares calculated in AN OVA. T he fo rm ulas fo r the ss, … ences, and add ing up (summing) the squared differences: ss,.”‘ = (X – Xl .
ss …. m is calculated by subtracting the group mean fro m each score within a group, – 2 – 2 – 2 s~.~ is calculated by subtracting t he g rand mea n from each group mean, squaring S~”‘””‘ = ” I (X – X)’.
The sums of squares arc as fo llow·s:
SS,.;,'”‘ = 20 + 20 + 20 = 60
s~ ….. ,”‘ (4) 18.667 = 74 .667 The to tal sum of squares (SS~,1 ) is t he sum of the within-g ro up su m of sq o r 134.667 = 60.00 + 74.667. 110 PAtH I a Q u AN11JA1 1V[ APPI\0A(H£S: FOUIIOAltO~S Of 0 AlA COlltCTIO!.’
Each of these sums o f squares is a component o f a d iffere nt variance. In ANOVA jar- Because the total sum o f squares (SS,.,1) involves t he varia bility o f all scores aro und BeCtJase :1 (/tlritlii<'Y:' (meoll sqa,?re) is,? Rllll of square> diviOed br degrees of freedom, Two mean squares are u::;ed to calcnlate the Fubt statistic: MS~·i!Jun and A-f~,wMn · Their There are k ~ 3 groups, so df,”””” = k – 1 = 3- 1 = 2. We may now compute and T here are a to tal of N = 12 scores within k = 3. so di,;,,;0 = 12- 3 = 9 and MS .. n ,h;, ~ 60/9 These are the two variances u~ed ro m ake up the F ratio (F •• ,): MS.., • …., and MS,.,,,,. MSt,.,w..,n l f we plug in t he values from o ur example, t hen we obtain
fo~x = MSb””‘”” = 37.333 = S.6s. This is a bit confusing when presented in bits aJ1d pieces. The ANOVA sununary table Once we have computed the Poht’ iL is compared to a critical F. Because two variances For our example, the n umerator degrees o f freedo m are df = 2 because 2 degr ees of C HJo i’IU 6 • S t ATISTIC.S fO ft S OCtAl 1N CIIUP.S 111
TABLE 6 . 17 ANOVA Summary Table
Source Sum of Squares Degrees of Fceedom Mean Squar~ F B~tween 74.667 3 – 1 – 2 74.67/2 = 37 333 37..333/6 667 = 5 65
Within
Total 60.00
134.667
12 – 3 – 9 60.00/ 9 = 6.667
12- 1 • 11
are df: 9 because 9 degrees of freedom were used in the calculation of MS . .,,h;, · The criti- Based on these findin gs, it is likely th at at least one pair of means come from d ifferent Group l versus Group 2
Group I versus Group 3
Group 2 versus Group 3
The individual pair comparisons may be carried out using any of a number of multi- where tt, is the nwnber of scores in Group i, and
tt, is the number of scores in Group J.
If the group TIS are equal, then this becomes
For our example, Sx;-.<_; = )(2}(6 .667)/4 = J3.333 = 0.557.
We now maycarry oul our comparisons evaluating tat df= N – k= 12 – 3 = 9 (Figure 6.6). I Figure 6 .6
Multiple Comparisons
Hospilal (Group I) vs t – 3 – 5 – 3466 df= 9,«= 05 Reject H.
Hosprtal (Group 1) vs. r.,. •• g;~ = 10399 Clf = 9, a- .05 Nursrng Home (Group 2) ‘-=~5~ = 6.932 Clf = 9,a ~ 05 vs. Adun Pro!ectrve la.= 2.262 T here are a number of measure> for effect size for ru’\0\’A. For the >.Ike of srmplicity, The J effect· size mca>ure is eq ual to Lhe stand ard deviatio n of th e sam ple means divided f = JnFobr·
11′ wa, discussed earlier and defined as a proportion of variance explarned. It is calcu- l S.’itwlwttn It also may be calcul.lled from art F.,.:
Cohen ( 1988) categorizes these effect si1-“s into small, medium, and large categories. Sm all cfYcct size: f :. .lO Using the exarn plr dJLa, 11′ is
11′ = .0 1
11′ ; .06
11′; . 14
z SSt.,,…. 74.667 ss,”‘·” t 34.667 CHArtfa 6 • Sr.c..nsTIC;.s fQI\ SociAL WoRKEss 113
which is a very large effect.
Kmskal-Wal!is Test. The K-W test is the k > 2 groups equivalent o f the W/M -W test. The assumptions of the K-W test are as follows:
Rat~donmess: Sample members must be randomly drawn from the population of inter- Independence: Scores must be independent of each other.
Scali?Jg: The dependent measure must be ordi nal (interval or ratio scores must be con- When the assumptions of ANOVA arc mer, the analysis of variance will be sligh tly The K-W test is a screening test. If th ere is no significant difference foun d, then we stop Our example involves the evaluation of three interven tion techniques being used with The proced ure for tlle K-W test is s imilar to that for the W/M-W test. We begin by The test statistic for the K-W test is H..,,, which is approximately distributed as X’ with TABlE 6.18 Summed Ranks for the Kruskal-Wallis Test
C roup 1 G(Oup 2 Group 3
Summed Rant, = w, • 122.S Summed Rank3 .;;: W3 ;; t 66.5 12 ( Wk) 2 N(N + I) II(
wh ere
W, is the sum of ranks for Group k,
111 is the n umber of individuals in Group k, and
N is the total number of individuals in all groups.
From our example, we obtain the following:
12 (89l ( 122.5) 2 ( 166.5)2
27(27 I I) • -9-+ 9 + 9 J(27 + I} 27(28) 9 = 756 . 9 84 = (0.0 159 . 5627.7222)- 84 = 89.3289 – 84 T h is is the test staU>tic if there are no t ied scores. However, if there are tied scores, then E(t’ t) N’ N
The letter 1 refers to the number of tied scores for a IY•nicular tied group of numbers. 1 he correction is calculated as foUow;:
C {2 -2)+(2 1 I – (8- 2) H8- 2) + (8 – 2) i-(8 -2) + (8 – 2) + (~- 2) f (8- 21 I (9- 3) + (9- 3) _,_6+6 6+6•6+6+6l6r6 s• 0.0027 0.9973
We divide H.,.. by the correct ion factor ( q to obtain the corre<.tcd test statistic H':
1-1′ = Hrobl 5.3289 3 . 973 1-f.., is app roxi mnrely d i,trib uted as x’ with k- I degre~s of freedom . Th e critical value Based o n these results, we would not carry o ut m ultip le pair compar isons. Because t he ;(” Test of fudepet1dmce (k X k). fhe test otatistic is the same for a k X k X’ test of indepen · Randomness: Sample members muM be randomly drawn from the population.
Jndeprndence: Sample score~ must be independent of each oth e r. 011e im plication of Scalirrg: Th e dependen t m easure (categor y) mmt be n om ina l.
£xpecwl freq11e11cits: No expected fr•”luency within a category should be less than I , Let us imagine that we >till are interested in marijuana usc among high school The null hypothesis for this example \>Ould be thattbe same proportions of lOth, lith, Table 6.20 show” the crOS!o tabulation \-‘ith the expe<"led frequencies.
Table 6.21 showo the p roc<"<< nre for colculating x:.~x·
For rlf ~ 2 a nd a ~ .05, the clitical value fo r z' is 5.99 (see a table of cri t ical value$ of x'
round in most statbtics texts). Our calculated value (x!,.) was 3.420. Because the obtained Tuu 6 . 19 Reported frequencies of Marijuilna Use
lOth lllh 12th Total
None 30 28 33 91
Marijuana 30 37 22 89
l ot• I 60 65 55 180 116 PAin I • OuAum ATIVE Atl’ltOA(.HfS: Fou NDATI ONS Of DArt. CottKllo:.r
TABLE 6.20 Observed and Expected Frequencies for Marijuana Use
Grade 10 th 11th 12 th Total
None 30 (30.33) 28 (32 8 6) 33 (27.8 1) 91
~;:1 arijuana 30 (29.67) 37 (32. 14) 1 2 (27.19) 89
Total 60 65 55 180
NOTE: Expected frequencies are in paret~thMcs..
TABLE 6.21 Computation x’ •••
Observed (f ) Expected (f,) (f. – f,) (f0 – f, )’ (f.- f,)’
f, 30 30.33 -{).33 0 .1089 0.00359050
28 32.86 -4.86 23.6196 0 .71879489
33 2 7.8 1 +5.19 26.9361 0 .96 857605
30 29.67 -0.33 0. 1089 0 .00367037
37 32. 14 +4.86 23.6196 0 .73489732
Z2 27.19 -5. 19 16.9361 0 .9906620 1
IIIOTt: 7.!.= 0.00359050 + 0.7 1879489 + 0 .96857605 + 0.00367037 + 0.734897 32 + 0.9906620 1 at 0< = .05. l~ecause the screening test results were not statistically significant at a = .OS, we
do not carry out the pair comparisons (lOth with 11th grades, lOth with 12th grades, and
II th with 12th grades).
Conclusion
This chapter· has discussed some of the m ore frequendy used statistical hypothesis tests CHAf’TfR 6 • STATISTICS EQI\ SOCJAI \’iQI\KU:t 117
many others. The reader who wishes to learn more should consult one of the recom- Similarly, the discussion of statistical power in this chapter was necessarily limited d11e Finally, the reader should recognize that statistical hypothesis rests provide evidence only Notes
I. A sample is a s ubgroup trom a popu1atioo. of variables include number of people living in a household. score on the Index of Family References
Cohen, J. ( 1988). Swtistiad power rmaly;sis for the bdwviornl sciences (2nd ed.). Hillsdale, NJ: Mann, H . B .• & \Vhit ney, 1). R. (1947). O n a 1est of w hether o ne of two ra ndom variables is s lO- VVikoxon> F. ( L945). Individual comparisons by ranking methods.JJiometriC$, /, 80-83.
Recommended Readings
Cohen> J.) & Cohen, P. (2003}. Applied multiple regression/correlnthm nnaly$iS for the bchaviornf sciences Siegel. S., & Castelbn, N. ). ( 1988). Nonpammetttc swtistics for tlte bchaviotal sdeno.t< (2nd ed.). New York:
M Stevens~ j . {2002). Applied nwltivnri(llt surtistks j’or tile soc:i”l sciences (4th ed.). tvhlll\V” hLLp:/ /statastics.corn/ http:/ /www.cochrane.org/ncws/workshops.htm http./ /www.siJIS.gla.ac.uk/steps/glossary/ DISCUSSION QuESTIONS
1. locate a recenlly publiShed research study in a SOCia wort !OUmal.and see •I you can’ •d where the 2. Ask your •nstructor to cxplarn why a measure of eftect sile should always accompany any report of 3 Suppose you measure a group of socraf work clients before and after t.hey recerve a specrfic socral – ©2014 Laureate Education, Inc. Page 1 of 5 Week 4: A Short Course in Statistics Handout This information was prepared to call your attention to some basic concepts underlying Statistical symbols: µ mu (population mean) 2 nominal or ordinal)) Descriptives: Descriptives are statistical tests that summarize a data set. They include calculations of measures of central tendency (mean, median, and mode), Note: The measures of central tendency depend on the measurement level of the You can only calculate a mean and standard deviation for interval or ratio scale For nominal or ordinal variables, you can examine the frequency of responses. For Often nominal data is recorded with numbers, e.g. male=1, female=2. Sometimes http://www.ats.ucla.edu/stat/mult_pkg/whatstat/nominal_ordinal_interval.htm © 2014 Laureate Education, Inc. Page 2 of 5 meaningless. Many questionnaires (even course evaluations) use a likert scale to Inferential Statistics: Statistical tests for analysis of differences or relationships are Inferential, All statistical tests have what are called assumptions. These are essentially rules that Other assumptions have to do with whether the variables are normally distributed. The Understanding Statistical Significance Regardless of what statistical test you use to test hypotheses, you will be looking to see Parametric Tests: Parametric tests are tests that require variables to be measured at interval or ratio © 2014 Laureate Education, Inc. Page 3 of 5 These tests compare the means between groups. That is why they require the data to The T test To compare mean from a sample group to a known mean from a population
To compare the mean between two samples two samples is: Is there a difference in scores between group 1 and group H0: µgroup1 = µgroup2 To compare pre- and post-test scores for one sample sample with pre and posttests is: Is there a difference in scores between H0: µpre = µpost
Example of the form for reporting results: The results of the test were not statistically An explanation: The t is a value calculated using means and standard deviations and a ANOVA (Analysis of variance) The hypotheses would be
H0: µgroup1 = µgroup2 = µgroup3 = µgroup4 © 2014 Laureate Education, Inc. Page 4 of 5 Correlation The coefficient can range from -1 to +1. An r of 1 is a perfect correlation. A + means that The research question for correlation is: “Is there a relationship between variable 1 and The hypotheses for a Pearson correlation: H0: ρ = 0 (there is no correlation) Non-parametric Tests Nonparametric tests are tests that do not require variable to be measured at Chi Square The research question for a chi square test for independence is: Is there a relationship The hypotheses are: H0 (The null hypothesis) There is no difference in the proportions in each category of Or: The frequency distribution for variable 2 has the same proportions for both categories of H1 (The alternative hypothesis) There is a difference in the proportions in each category The calculations are based on comparing the observed frequency in each category to © 2014 Laureate Education, Inc. Page 5 of 5 See the SOCW 6311: Week 4 Working With Data Assignment Handout to explore the Other non-parametric tests: Spearman rho: A correlation test for rank ordered (ordinal scale) variables.
Week4 Handout: Chi-Square Findings The chi square test for independence is used to determine whether there is a relationship between the two variables that are categorical in the level of measurement. In this case, the variables are: The research question for the study is: Is there a relationship between the independent variable, The hypotheses are: H0 (The null hypothesis): There is no difference in the proportions of individuals in the three
** It is the null hypothesis that is actually tested by the statistic. A chi square statistic
H1 (The alternative hypothesis): There is a difference in the proportions of individuals in the
** The alternative hypothesis states that there is a difference. It would allow us to say Assume that the data has been collected to answer the above research question. Someone has We provide professional writing services to help you score straight A’s by submitting custom written assignments that mirror your guidelines.
Get result-oriented writing and never worry about grades anymore. We follow the highest quality standards to make sure that you get perfect assignments. Our writers have experience in dealing with papers of every educational level. You can surely rely on the expertise of our qualified professionals. Your deadline is our threshold for success and we take it very seriously. We make sure you receive your papers before your predefined time. Someone from our customer support team is always here to respond to your questions. So, hit us up if you have got any ambiguity or concern. Sit back and relax while we help you out with writing your papers. We have an ultimate policy for keeping your personal and order-related details a secret. We assure you that your document will be thoroughly checked for plagiarism and grammatical errors as we use highly authentic and licit sources. Still reluctant about placing an order? Our 100% Moneyback Guarantee backs you up on rare occasions where you aren’t satisfied with the writing. You don’t have to wait for an update for hours; you can track the progress of your order any time you want. We share the status after each step. Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types. Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types. From brainstorming your paper's outline to perfecting its grammar, we perform every step carefully to make your paper worthy of A grade. Hire your preferred writer anytime. Simply specify if you want your preferred expert to write your paper and we’ll make that happen. Get an elaborate and authentic grammar check report with your work to have the grammar goodness sealed in your document. You can purchase this feature if you want our writers to sum up your paper in the form of a concise and well-articulated summary. You don’t have to worry about plagiarism anymore. Get a plagiarism report to certify the uniqueness of your work. Join us for the best experience while seeking writing assistance in your college life. A good grade is all you need to boost up your academic excellence and we are all about it.
We create perfect papers according to the guidelines.
We seamlessly edit out errors from your papers.
We thoroughly read your final draft to identify errors.
Work with ultimate peace of mind because we ensure that your academic work is our responsibility and your grades are a top concern for us! Dedication. Quality. Commitment. Punctuality Here is what we have achieved so far. These numbers are evidence that we go the extra mile to make your college journey successful. We have the most intuitive and minimalistic process so that you can easily place an order. Just follow a few steps to unlock success. We understand your guidelines first before delivering any writing service. You can discuss your writing needs and we will have them evaluated by our dedicated team. We write your papers in a standardized way. We complete your work in such a way that it turns out to be a perfect description of your guidelines. We promise you excellent grades and academic excellence that you always longed for. Our writers stay in touch with you via email.
teristics of a population b•sed on s.unple data. We compute statistics from .1 pJrtial;et of
the population data (a samplt) to estimate the population parameters. Thrse t
m:uics tO provide evidence for the exi
Measures of central tenden’)’ are individual numbers that typify the tot.tl set of ~cores.
The three most frequently used mca>urcs of centraltendenq are the arithmetic mean, the
mode, and the median.
the m-erage. It is computed b)’ adding up all of a set of scores and dwidmg by the number
of scores in the set. The algebraic representation of this is
1
number of scores being adde(l.
the variable lener with a bar above it:
II
I week: {1, 6,2,6, 15,2(),3,20, 17, 11, 15, 18,8,3, 17, 16, 14, 17,0, 101. Wecomputethe
mean by adding up the class periods missed and dh•iding by 20:
J.l = — = – = 10.9o.
of centrality as it is a measure of typicalness. It is found by o rganizing scores int o a fre-
quency. Table 6. 1 displays the truancy scores arranged in a frequency
distribution.
19
1
1
9
8
7
1
0
0
1
1
I
0
l
0
1
2
0
0
2
modal number) of class periods skipped is 17.
than one mode.
from least to greatest, and count in to the middle, then the score in the
middle is the median. This is easy enough if there is an odd number of
scores. However, if there is an even number of scores, then there is no
single score in the middle. In this case, t he two middle scores are
selected, and their average is the median.
the a”erage of the lOth and lith scores. We usc t he frequency table to
find these scores, which are 14 and J 5. T hus, the median is 14.5.
Whereas measures of central tendency are used to estimate a typical
score in a dimibution, measures of variability may be thought of ns a
way in which to measure departu re from typic<~lness. They pro"i
10
13
lhe maximum ~cor~.
of scores. Jts na me tells how to wmpute it. Smu ofsqunres is short (or sum ofsqumed dc1ti
til ion scores. It is represented by the S)’lnbol SS.
ple and populat•on mean symbob:
devintron score<. 1l1is is obtained by dividi ng the sum of squares by the number of scores
(11). It is a me,tsure of the average amount of variabilit y associated with each score in a set
of scores. The population variance fOI'mu la is
a2= -.
11 st,uJds for th e number of scores in the population.
sum of squares would be
4
36
100
(J –= 46.67.
of thf population variance. If we compute the vari
anccs for these samples using the SS/11 formula, then
the- san1ple vadn nccs wil1 average o ut smaller than
the population val’iance. For th is rc:~son, the sample
variance is computed differently froru the population
variance:
sl = – – .
the maximum score.
of score~. It> name tells how to compute it. Sum of 51Jo.arcs is short for ;um of squared dco•i-
atiou scores. It is reprewnt<>tl by the symlxll SS.
T.,b)e 6.2.
devontw11 scores. This os obtained by dividing the sum of squares by the number of ><.ores
(n). It is a measure of t he averoge ••m ount of var iability associated w ith each score in a set
of scores. T he popula tio n variance for m11ln is
¢ =- .
11 stands for the numbet of scores in the population.
sum of squar~s would be
4
cr2 =
~ 46.67.
o f’ t he population variance. Jf we com pute t he vari-
ances for these samples using th” SShr formu la, then
the sample variances will average out smaller than
thc population ••ariance. For this reJson, the sample
Vllriance is computed differe ntly from the population
variance:
n – J
degree• of freedon1. If
> =–
280 6
5 = 5.
age squared deviation from the mean. To get ” me
ll •tanrlard deviation of 1 he mean.
T.1ble 6.3 shows the relat iortship between number of >treSsors experien
rcfel’l’ed to as a regression line (or line of best ii 1 o r prediction I inc). Su ch a line bas been
.CJiculated for the example plot. It has a Y ime,·cept of – 3.555 t11id a slope of + 1.279. T his
gives us the prediction equation of
increase in stres.ors (X) of 1 will be accomp.ulicd by an increase in predicted frequency of
~orporal punishment (I’) of + 1.279 incidents per week. If the slope were a negati’e
number, then an increase in X would be accompanied by a pred ictcd decrease in Y.
score); rather, it giv~s a prediction of the value of Y for a certain value of X. Fo r
Cu,”na 6 • SrAliSnc
6 0
” 5 0 r:r
e …
c 4 ..
E
.r:
·;:
” Q.
0 1 2 3 4 5 6 7 8 9
Sttessors and Use of
Corporal Punishment
6 4
of p redictio n is minim ized. Error is d efined as the d ifference
between the predicted score and the obtaine
p redicted value of Y is
rhe error of prediction i~ E =I – 1.561 = -0.561 fu r Y = I, and
r~presented b)•
fo r SSt.
dicted values. There are two other ;un” of squares !hat are important to undcr>tanding
correlation and regri’SSion.
score values around the mean oft he obtained scores. The SST is represented by
mean of the obtained scores.
the SSR and SSE is equal to the SS1:
(I’VE) , the correlation coefficient, ond the standard error of estim ate.
p red icts obtained scores. The values of PV£ 1·ange fro m 0 ( no p red ictive value) to I ( pre-
diction with perfect accurJLy). The cqunt ion fo r PV£ is
J>vE – – ·
SSX • SSY’
SSX is t he sum of squares for vn rinble X: IlX – XJ’, and
SSYis the sum of squares for varinblc Y: 2:( Y – Y)’.
The proportion of v.triance in the freque ncy of corporal punishment thnl may be
l’VE = – = = 0 .953.
3 -33 10 .89 0 -4 5 20 .2 5 +1405
symbol r’.
tionship between two variables. The correlation cocfficicnt is represented by the letter r
and can take on values between – 1 and + I inclu~ivc. The correlation coefficient always has
the same sign a.< the slope. If one squares a correlation coefficient, then
r = -vr.;;S50sx””•””S;;;S;;o;Y
R — = = = -0.976 .
‘E
n-2
we subtract 2 in this instance is that variance error (and standard Cfi’Or of c:stimatc) is a
statistic describing characteristics of two variables. T hey deal with the error involved in
the prediction of Y (one variable) from X {the other v.triable) .
predicted values. If the error· scores ( E = Y- Y,.o~> are normally distributed around the
prediction line, then about 68% of actual scores will foil between ±I :;,; of their predicted
values.
where
r is the correlation coefficient fo r X and Y, and
= 2.311J0.053 = (0.230)(0.727) = 0 .167.
the null hypothesis and the alrermltive hypothesis.
ables. This implies that if the null hypothesis is true, then any apparent relationship in
Mmples i> the resuh of random flu ctuations in the dependent meas ure or sampling error.
ment!// two-gro11p posttcst-only design, there would be a sample whose members
received an intervention and a sample whose members did not. Both of these would be
probability samples from a larger population. The interven tion >ample would reprcse>11
The Null Hypothesis
and Type I Error
sample would be repre
would be unlikely that two samples from two ident ical popula tions would he ident ical. So,
although the sample mea ns would be diffe rent, they would not rcpre>CtH any effect of t he
independent variable. The apparent difference would be due to sampling error.
ences about populations. II is for this reason that the null hypothe>i> is a statement about
population parameters. For example, o ne null hypothe>iS for I he previous design cou ld be
stated as
t.ha t the m~ans of the experime ntal ( Mean I) and cont rol ( Mean 2) popultnio’ls arc eq ual.
and the outcome (measure o f the dependent variable), we must collect eviden
correct. \Ve evaluate the evidence to determine the ext<·nL to which it •cncls to confirn"' or
disconfi rm the null hypothesis. If the evide nce wct·e suc.h that it is unlikely that an
observed relationship would have ocwrrcd as the re.ult of sampling e r ror, then we would
reject the null hypothesis. If the eviden«: were more ambiguous, then we would f.1il to
reject the null hypothesis. The terms re;err and fail to rrjm carry the implicit under
setting certain statistic•! criteria beforehand, we can ~”tablish the prombiliry that we “•ill
commit a ‘JYpe l error. \\’c decide what proportion of the time we arc willing to commit a
Type l error. This proportion ( proba bility) is called a l1>ha (o:). If we n1e willing to reject
the null hypothesis when it is true onl)• I in 20 times, thc11 we set our a level at .05. If’ on ly I
in 100 time>, then we set it at .0 I.
deci;ion) ts 1 – a (Figure 6.2).
ex • the probability or rejecting the Null Hypo thes is when it is true
The Nu ll Hypothesis
and u Level
sample means:
hypothesis was true), then we would find that most of the vallles for the differences
between the sample means wou ld not be 0. Figure 6.3 represents a distribm ion of the dif·
fercn ces between sample means drawn from identical populations.
deviation is 5. I f the differences are normally distributed, then approximately 68% of
lhese differences will be between – 5 (z = – 1) and +5 (z= +l). Fully 95% of the differences
in the distribution will fall between the range of -9.8 ( z =-1 .96} and +9.8 (z = +1 .96). If
we drew a random sa mple from each population, it ‘~ould not be unusual to find a di ffer-
ence between sample means of as mnch as 9 .8, even though the population means were
the same.
times. If we set our criterion fo r rejecting the null hypothesis such that a mean difference
must be greater than +9.8 or less than – 9.8, tben we would commit a Type I error only 1
in 20 times (.OS) on average. O ur (J. level ( the probability of committing a Type l error)
would be set at .05.
sample if the nuU hypothesis were true is represented by p. To reject the null hypothesis,
p mu~t be less than or equal to
z
X, -x2
a = .05
alizablc to the population.
ences about the populat ion.
obtained a difference betwt-en the sample me.ms of 10. The probability that we would
obtain a difference of +10 or – 10 would be equivalent to the probability of a z ~core
g reater than +2.0 plus the probabilit y of a z ~core less th.111 – 2.0 o r .0228 + .0228 = .0156.
This is o ur p value; p = .0456. Because p
hypothes~ b simply tbc opposite of the null hypothesis. In fact, sometimes d1is nail·c
alternative h)pothesis is used. However, it generally is not particularly useful to
researchers. Usually. we nrc inrertsted i n defecting an in lcrvention effccl of a particu l :~r
size. On certnin measu,·c,, we would be interested in .mwll effects (<:.g., death rate),
whereas on others, o nly l~rger effects would be of interest.
Such scores are called zsco,·es. T he diffe(ence is called an effect size. Effect sizes frequently
are used in meta-analyse> of outcome studies to compare the relati\c cllicacy of different
t )’Pes of intencntioos acrOS> ‘tudies.
for each arc al follows:
height (i.e., 0.5 inches and < = 2.1) between 15- and 16 year-old girls.
in heigh t ( i.e., 1.0 inches and s~ 2.0) bNwccn 14- aud 18· year-old g ir ls.
our research. llo1Vever, t here is a practicdl trade-off involved. All o ther things being equal.
the consistcllt detection of unaU effect >izc’ requires very large (1l > 200) sample size,,
they might not be practical for all studies. Furthermore. there are c~rtail1 outcome vari-
ables for which we would not be part icuia l’l y in terested in small effec t>.
ports the alternative hypothesis. If the alttrnative hypothc
Alternative
Hypothesis is true. The power ot a test.
I}~ the p r
Altornatlvo
Hypothesis is true.
when we set our criterion for rejecting the null hypothesis. The probdbility of a correct
decision (I – f3) is an importdnt probability. It is so important that it has a nJmc~power.
Power refers to the probability t h.u “e will detect an eff«t of the size we have sckctcd.
tistical test. just as with Type 1 error, we should decide beforehand how often we are will-
ing to make a Type 11 error (fail to detect a certain effect size). This is our f3 level. The
procedure for making such determinat ions is discussed in Cohen ( 1988).
two related assumptions: randomness nud independence.
the populatio n being evaluate d. If the sample is being divided into groups (e.g., trc:>tment
and control), then assignment to gro ups al.
rlom selection and random fWigmnem.
dom sampling. If the samples Jre not random. then •
l f t he scores are not independent, t hen the probability (p) is, as before. >i mply n number
t h•t has little to do with the p ro babilit)’ of a Type I erro r.
names are misleading given th at one class of test has no more or less to do with popula-
tion parameters than t he other. T he difference between t he two tests lies in the mathe
matical assumptions used to compute the likelihood of a Type I error.
samples are drown are norm.•lly di~t rihuted. Non parametric tests do not have this rigid
than can a parametric test. Nonparametric lests remain serviceable even in circumstances
where parametric procedures collapse.
the other assumptions of t he parametric test are met, parametric test~ are slightly more
powerful than non parametr ic tests. However, when the parametr ic assu mptions are not
met, nonparametric tests are more powerful.
appropria te use. Where appropriate, parametric and nonparametric tes ts are presented
together for ead1 type of design.
values and population parameters to see whether the sample differs in a statistically sig-
nificant way fro m the parent populnt.ion. Occasionally, these tests are used to determine
~
was normally distributed. We would take a randon1 sample from this population and com·
pare the
that certain strata within populations differ from t he population as a whole.
is different on average from the population as a whole (e.g., are the mean wages received
by social workers in Lansing different from the mean for aU social workers in M ichigaJJ?) .
(l ansing social workers) of the population and the population as a whole ( Michigan
social wor kers) will be the same:
The assumptions of the single-sample t test are as follows:
assumptions can introduce major error into the compmation of p value~.
the computation of p vJiues. Unless the population distribution is markedly different
fro m a normal distribution, rhe erro” will tend to be slight (e.g., a re ported p v.tlue of.0
is a <•robust" test.
• is (popula tion) mean from t h e s”mple mean and dividing by th e sta ndard error of th e
n1ean.
could occur if the null hypothc>sis is true. At a certain point, tht’ probabilit)’ (p) of obtam-
ing a t so large becomes sufficiently small (rt’acbt’S the a. level) that we rcjt’
esis) depends o n the degrees of freedom. For a single-sample rtest,the degree> of freedom
ure df= n – I , whe re” is the s.omp k >itt’.
v.re know from a statewide SUI’VC)’ I hat the average time taken to complete an outpa-
clients seen at o u r clinic nrc taking longer o r ;horter than the state average.
the length of program for each of the clients in the sample. The mean n umber of days to
complete rehabilitation a t our clinic is 19.875 days. This is lower than the populat ion
mean of 46.6 days. The question is whether this result is statistically significant. I> itlikel)’
that this sample could ha,·e been drawn from a population with a mean of 46.6?
square root of the sample size or
_
9
2.972 – 5.62
for a nondirectional test at (X .05 (sec • t•ble of the critical values for the ttt»t, non
clinic average fewer days in rehabilitation thon is t he case in the statewide population.
sample t test:
s
11.888
11.888 = 1.4069′
t he evaluation of 11ominal (categorical) variables. The test involves comparisons between
observed and expected frequencies wi thin strata in a sample. Expected freq uencies are
derived from either population v-alues or t heoretical values. Observed frequencie-s are
those derived from the sample.
been drawn will have !he same proportion of members in each category as the empirical
or theoretical null hypothesis population:
(expected), and
sample was drawn (observed).
one category).
• expected frequenck$: No exl’ected frequency within a category should be less !han I,
domness and independence. Deriving fr o m thc.~c assumptions is the requirement that the
categor ies in the cross-tabulation must be mutunlly exclusive and exhaustive.
variable. ExiJaustive means that all categories of int ere;t arc covered.
three assumptions are essentially “fatal” ones. Even slight violations of the first two
assumptions can introduce major errors into the computation of p values.
cies are small (expected frequency les.~ thnn I or atlc:1~t 20o,(, of expected ft·equ,•ncics less
than 5), the probabilities associated with the X’ t~St will be in accurate.
Township wanted to see whether they were servi ng people o f all fniths (and those of no
fit ith) equ:11l)’· The)’ had census 11gures indicating that religious preferences in the town>hip
were as follows: Ch risti~n (64%), Jewish (10%), Muslim (8%), other religionino preference
(14%). and agnostic/atheist ( 4%).
Befor• they drew the sample, they calculated the expected freq uency for each category. To
obtain rhe expected frequencies for the sample, the)’ converted the percentage for each
preference to a decimal proportion and multiplied ir by 50. Thus, the expected frequency
for Christians was 64% of 50 or .64 x 50 : 32, the Jewish category was 10% of 50 or
. 10 x 50 = 5, and so on. Table 6.5 depicts the expected frequencies.
fr(!q uency
Given that the maximum allowable is 20%, we are violating a test assumption . We can
remedy this by collapsing categories (merging two or more categories into one) Ot’ by
increasing the sample size. However, thet·e is no c.ategoq• that we could reasonably com·
bir1e with agnostic/atheist. lt would not work to combine this C<\tegory with any of the
other categol'ics because the latter ar• religious individuals, whereas atheists and agnostics
aJe not religious.
o f the expected frequencies was less than 5, we would need a sample large enough so that
8% ( percentage of the population identifying as Muslim ) of il would equal 5:
0.08
Table 6.6. On!)’ one of live (20%) of the expect«l frequencies is less I han 5, and nQne of
them is less tha n I, so the s:un ple size assumption is mel. The results of a random sample
of 63 cases were as found in Table 6.7.
~>:pecte.fl
~0.32 6.30 5.04 8 82 2 52
rr~(j ll CrtCy
frequency
Win ifred T<>wnship who identify 1vith each religious categor)’ will be the sam.: as the pro·
portion of people who have received services at the Interdenominational Services Center
in St. Winifred 1b w nship who identify wit·h each relig io us catt:gory.
will not be differem. Notice the similari ty ben~<.>en the nu ll hypothesis and the numerator
of the ,,, .•. test statistic:
(c) min us I or df = c- L In our case, we have five categories (Christian. Jewish, Muslim,
otherino prefere nce, and agnostic/athe;st), so df = 5- I = 4.
gests that people of all faiths (and those of no faith) are not being sec11 proportionately to
their representations in the township.
ale measure of eftect size: fO r a test of means. However, Lhc X2 test doc,~ not compare
0
(f, – f,)’
f,
used for the X’ test-w. This measure of effect size ranges from 0 to I . Cohen ( !988) clas-
sifi es these effect s izes into three categories:
lowing formula:
For the St. Winifred Township example,
These are Jests in which either a single sample is drawn and rneasLtremen ts are taken at
rwo times or two samples are drawn and members of the sample are individually matched
o n som e altribute. ~vfeasureJDeDts are taken fot each member of the matched groups.
evaluate d ifference scores. These may be differences between scores f
X
is the d ifference between the two.
which the expected differences are zero.
The nu ll hypothesis for this test is that the mean of the differences between the paired
scores is 0:
d rav.1n) and
the standard error of the mean difference or
lobt = 5= ·
occur if the nnll ll)’pothesis is true. AI a certain point, the p robability (p) of obtaining at so
large becomes sufficiently small (reaches the alpha level) that we reject the null hypothesis.
Sca ling: The Mpcndt’nt measure (X
scores) must be interval or ratio.
0
t hree asswup tions i1re essen t ially “dea th penalty” violation.. Eve n slight violation. “r the
(ht two assumpti011s can intr oduce majo r e rror in to th e comp ullll ion or p values. Sim i lady,
dilTnence scores computed fro1n ~””‘O sel!t of ordi nal data tnay inwrporate major error.
the computation of p values. However. Wllcss the population distribution is markedly dif
fcrent fi-om a normal di>tribu tion, the errors will tend to be slight (e.g., a reported p value
of .042 actually will be a p value of .057). Th is is what is ml·an t wh en someone ‘”YS thnt
the t test is a “‘robu~t .. test.
t~ irs,
sig ned ranks test (discussed in the next section} prob;,bly will yield a more accu rate test
when there are viulation~ of this normal dislribution as.su.mpliun.
dcpn:s~;un.
m unity cent~r. ‘I ‘hey were pretested (X,) with t he BDI, r<·cd ved I he treatment, ;,nd t he n
were posrtested (~)wi th t h e same inst ru111e n1.. The m ean of the d iffe rence scores (.k0 )
wa.s - L This means that tJ K· aven1ge: chtUl.gC' in BD f scnrefi fron1 pcelC'Sl tu pn:-:ttest was a
dtcrease of I poinl. The standard deviation of the ditlcrcnce s.:ort> \\’aS l.H .
dard deviation by the square rout of t he s.unpk siu: to get t he standard c·rror of th e mean:
r\”lobt = –
-~ – .1..38
0.42 ..
equal tn the critical \’;liuc, we reject the null hyp(llhcsis at a= .05.
d = = = – 0.752.
is a nonparametric test for the evalua tion of d ifference scores. The test involves ranking
d ifference scores as 10 how far they are from 0. The difference score closest to 0 receives
the rank of I, the next score receives the rank of 2, and so on. The ranks for diffe rence
scores below 0 are given a negative sign, whereas those above 0 are given a positive s ign.
T he null hypothesis is t hat the sample comes from a population of di fference scores in
“‘ hich the expected difference score is 0.
• independence: XD scores 111ust be independen t of each other.
• Scaling: T he dependent measure (XD scores) must be ordi nal (interval or ratio dif-
test statistic. We use the same example as for t he t test. The dependent measure is t he BDI,
a measure of depression. Scores on the BDI are not normally distributed, tending to be
positively skewed.
mun ity center. They were pretested w·ith the BDI~ received the treatment, and I hen were
posttested with t he same instrument. We c.ompute the difference scores (post -pre) f
Difference scores ofO do not receive a rank. Tied ranks receive the average nlllk for the tie.
unranked. There are five difference so::ores of eit her – 1 or +L These cover t he first five
ranks {I, 2, 3, 4, 5), giving an average rank of 3. T here are three difference scores of – 2
(and none of +2). T hese cover the next three ranks (6, 7, 8) , giving an average rank of 7.
The una! score is – 3, which is given the rank of 9.
6 16 17 +1 3 3
tivc or ncg.uivc.
~even tim~s for lhc ncg:.uivc sill.n)~ w~: add up I he rank~ in the pO$itivc column .lnd obtain
1>. rhi•” I he IC\1 \l3l”lic v~lue for the Wil
us<:d with the (lo'"erc.tse) I distribullon.
c ritical value to ,·eject the null hypothc>is.
Wilcoxon Tin any general swristics book) Jnd stt whether obe result (7.,.. = 6) was sig·
nificant at o. = .05. lle<:ause there wa. one differen ce score equal to 0, the corrected 11 = 9.
The critical value for the Wilcoxon 7"a t n=9 and a .05 is T.,. = 5. 1:,.. = 6 is not less than
or equ•lto the critic.ol value, so we fail to reject the nuU h)·polhesi> at o.- .05.
scores. One possib le measure would be proportion of nonoverlapping scores as a measure
of effect. Cohen ( 1988) brieOy discu~s this measure, called U.
the two related g roups. We choose the least maximum and the greatest minimum. Tbi>
establish es the end points for the overlap range.
JX>ints) and divide by the total number of scores. This gives a proportion of overlapping
score.o;. Subt ract t his number from I , and wr o btain the p ropottion of nunoverlapping
$Cores. T his indc.~ ranges from 0 to I. Lower proportions arc indicative of ~mallcr effects,
and higher on~> are indicative of larg<·r effects.
u- .33
u ~ 47
~core w;1~ 2 1. The poSit(!St miuimum and ua.tximllln -;cores wt:r~ 15 .md llJ. rc-‘>petti\•cly.
‘I h e grc•test minimum is 16 •• md lht lcastm.l.ximum is 19.
designs “‘here the variables in the anai)’Sis arc dichotomously scored (e.g., improved ~.
not impro,•ed, same,.,_ different, increase ‘s. decrease).
number of indh~dual.s who changed from+ to-. Cell B contains the number of individ-
uals who recei,ed +on both measu rement>. Cell C contains the number of individuals
who received – on both measurements. Cell D contains the number of individullh who
changed from – to +. The null hypot hesis is expressed “‘
population, and
0
population.
llerween-group scores [pre· ~nd poM1c~1 ~cores] will necessarily be dependent).
‘”” – f, + fn
fn =the freq uency in Cell D.
Yates correction for continuity in the equation. This is – I, which appears in the n ur.-‘1~ 1′”
tor of the test statistic.
Test layout
c 0
also are interested in change in marijuana ust over time. Jmagine that we collected survey
data on a random sample of ninth-graders in 2007.1n 2009, we surveyed the same sample
that had been in ninth grade in 2007. We fo und that 32 of 65 students said that they used
marijuana during the previous year, as compared 10 23 of 65 in 2009. The results are sum-
marized in Table 6. 10.
Change Test
65
it in 2009. Cell B shows the number of students who had used marijuana in both 2007 and
2009. CeU C shows the number of students who did not use marijuana either in 2007 or in
2009. Cell D shows the number of students who did not use marijuana in 2007 but who did
use it in 2009.
juano use changed. The nuU hypothesis fo r the McNemar change test is th at changing from
nonuse to use would be just as likely as changing from use to nonuse.
would expect half (6.5} to go from not using 10 using and the other half (6.5) to go from
using to not using if the null hypothesis were true.
!’or df ~ 1 and C/. ~ .05, x\,, = 3.84 (see a I
conclude that there was in fact aJl increase in marijuana use between 2007 and 2009.
f..,. + fl)
to the following formula:
rJ.ndomly assigned to one of two experimental conditions.
whose null hypothesis is fo r mally stated •• follows:
dom ly assigned to o ne of the ‘-“0 groups.
d r,own must be normally distribured.
whose variances are eq ual.
three assumptions are rbe ” fa tal” assum pt ion;.
unlc;.s Lhe population dist r iburion is markedly diiTerent from a normal d isrr iburion, the
errors will tend to be slight. Slill, e”en though the error is slight. the oonparamcrric W /M-
W test probably will be more accurate when the norma lit)• assum prion is violated.
homogeneiry of variances assumption and the equal sample size assumprion. A problem
may .orise when both of these assumptions are violated Jtthe same time.
deteaing an exi;,ting dilfcrcn
thc>e .~S»umptions is not partiCufMiy problem.uic. There may be fairly ,,ub>t.mtial di~
crrpJncies bctwet•n s. .. mplc si1C!’\ withnut much effect on Lhc dtc.ur~cy o i Ottr /’ cMim.lttl’!.
Similarly, if e- very other n~~nmption i!) mel, 1hcu a slight difference in v11riam:c:. will not
h ave a fa rge effect on probability estimates.
lut-·1 —
where
2
wl1ethf:r there is a difference i11 ((•vel of soci.al act iv ity in children depending 011 whether
they are in after-school care <>r h0111c (.(ltc . Because more childre11 attendc
(Group I ) and 14 childien in home care (Group 2) was drawn. The dcpcnclcnt meJsure
v,•as a score on a socir1 l activity ).CJ )e in whk h lower scores represent less soc ial aclivity and
higher scores represent more social activity.
pule the sample mean for each group. The next step is to compute the stJndard error of
the mean. Howe•·er,the pl'()(cdure for doing thi< i~ a little different from that u«<< before.
A> lou might recall. the standard error of the mean is the standard dcvi,ation d” aded by
the square root oi the sample ‘ire:
This also is equivalent to the squ:HC •·oot· o f the variance times the inverse of the,., , .
Unf{‘trtunately) we c:u•not u~t..· lhis IOI’tnuln for t+ae standa rd error o f lhc mean. It is I he
..qual’l.”> divided b)’ the deg~C·c~ of lrct’dom. ll•s tht same he…- eX(Cpt that we have two
‘oms of squan:s (one for Group I and one for Group 2). and o u1 degree< of freedom Jr('
11 1 rt. 2. Thi• gives "' the folfowint: cquJtion:
” ‘ I II• 2’
55
is the sum of squares fo r Group I ,
1
by i lin,+ I In,). We take the square root of this and obtain the pooled standard error of
the mean:
The means and sums of squares for our example are presented in Table 6.1 2. Now, let
27.8B <1330.40 16
error of the mean difference). We begin by calculating the pooled variance:
28 = 215.63 . = n, + n2 – 2 16+14-2
Wt calculate 1
:
lobt = = — = 1.213 .
less than the critical value, we fa il to reject the null hypothesis at a. = .OS.
denominator is the pooled c>ti mate oft he standard deviation. The pooh.’!! •t andard de,•i-
ation is t he square root of the pooled variance that we calculated earlier:
14 .68 14.68 ,
The other measure is Tl • (eta-.quare). n’ is the proportion of variance explained ( Pifl:) .
/
/Obi + d
ca re. Children in after-sdtool cure sCC)rcd h igher on social activity than d id c hild ren in
home care. T he differe nce was not statistically s ignificant for <> ur chosen ex = .05.
following:
1
2 + 28
29.47 1 = 0’0499′
potentially explained by whether they were in after-school care or home cJre.
~Vhitney test. Recent ly, th e name of Wilcoxon has been added to it. The reason t hat
Wilcoxon’s n ame has been added is t hat he developed the test first and published it first
( Wilcoxon, 1945). Unfortunately, m OI’e fo lks noticed the art ide publishtd by Mann a nd
I•Vhitn ey ( 1947) 2 years later.
one group and ranking scores from lcn;t to most. After this is done, the freq ue ncies of low
and high ranks between groups arc compared.
est and randomly a>Signed to one of the two groups.
verted to ranks).
than the W!M-W test. However, if the distr ibution of population scores is even slightly
d iffe rent from normal, t hen theW /M • W test may be t he more powerful test.
exam ple as we d id fo r t he independent r test. We evaluated level of social activity in
children in arter-school ca re and in home care. T he dependent measure was a score o n a
social activity scale in which lower scores represent Jess social activity and higher scores
represent more social activity.
respect to which g roup individuals ‘”ere in. The rank of I goes to the highest score, t he
rank of2 to the next highest score, and so on . Tied ranks receive the average rank. We then
sum t he ran ks within each g roup. The summed ranks are called W1 for G rou p 1 and W,
for Group 2 and are fo und in Table 6.13.
1
1
each according to t he fol lo wing equations:
111 + ( 111 + l)
2
U2=11rnz+ 2 w,
112(n 2 + I}
2
2
= 224 +– 247 = 224 + 91 – 247 = 68.
The critical value for the W/M· W U at n, = 16 and at n, = 14, and o: = .OS is U”” = 64.
sis at CL: .05.
m easure of nonoverlap probably would be the best bet.
w ere 2 and 55. whereas they were 7 and 40 for the home care grout>· The greatest mini –
mum is 7, and the le”‘t ma.ximum is 40. All 14 .cores in the home ca re g roup are within
the overlap range, and 12 of l4 scores in the after-school care group are in t he overlap
range. This gi•es us a proportion of overlap of 26/30: .867. The proport•on of nonover-
lap is U I .867″‘ .133. This would be ,, small effect.
follows:
this is tha t categories must be mutually exclusi’e (no case m ay appear in more than
one c.1tegory ).
and no more d1an 20% of t he exp«tcd freq uencies sho uld be less t han 5.
d omness and independence. Deriving from t hese assumptions is the requirement that the
categories in the cross·L1 bulation be mulllnl/y exclusive and ex/u~ustive.
variable. Bxluwsti•-e means that all possible categories are covered.
sp<-cifically whether there are any diffcrcn= in sutb use between 9th and 12th-graders
in our school di>trict. We conduct • proportionate str atified samplt in which we ran-
domly s:~mplc oixt)’-five 9th-graders and fifty-five 12th-g raders from all Mudents in the
district. T he students are surveyed on t heir usc of ((rugs over the past ye.ar under condi-
tio ns guaranteeing co nfiden tiality of response. Table 6.14 depicts reported marijuana use
f o r t he s tudents in the sam ple o ver the past yenr.
than 9th-graders in t his sample used mar-
ijua na at least once during t he past year.
The question we are interested in is
whether it is likely that >uch a sample
could have come from a population in
which the proportion.1 of 9th- and 12th-
graders using mc:1rijuana were identicaL
data is the x: test of i ndepcndcnce. The X1
test evaluates the likelihood that a per·
ccived relationsg1ip between propor tions
in categories (called being dependent)
independence) .
as 12th-graders used marijuana during the past year. The null hypot hesis values for this
test are called the expected frequencies. These expected frequencies ior marijuana are cal-
culated so as to be proportionately equal for bot h 9th- and 12th -graders.
the past year, the proportion for t he total sample is 45f!20 = .375. The expected frequency
of marijuana use for the sixty-live 9th-graders would be .375(65) = 24 .375. T he expected
marijuana use fo rthe fifty-five 12th-graders would be .375(55) = 20.625. Table 6.15 shows
the expected frequencies in parentheses.
expected freq uency. T he null hypothesis is
0
(e.xpected; in this case, this is the expected proportion of students in each of the two gt·ade
levels [9th and 12th] who fell into o ne or t he other use category [marijuana use or no
marijuana usc)}; and P,~ is the proportion of cases wi thin categor y k drawn from the
actual population (observed; in this case, this is the obser ved [or obtaine.d] proportion of
students in eacb of t he two grade levels [9th and 12th] who fell into one or the other use
category [marijuana use or no marijuana use]).
number of rows minus I times the n umber of columns min us I or
Marijuana
12th
Total
75
45
120
rection for continuit)· in the formula ,,hen df l . The equation for the corrected test sta
tistic is as follows:
1 = I: (Vo- fr,l – 0.5)
>eore and take the ab:.olute value of the difference (make the difference positive). Then.
subtract O.S fro m the absolute difference (I/., f. I -0.5) and square t he result. Next. divide
by t he expected score. T his is re~1eated for ca
I.Q, whereas th e correct ion for the X’ test of independence (and the goodness-ol:fitiCit)
was 0.5. I will not go iuto an)’ detail beyond sa)’ing that this is be.::ause the McNemar
change test uses o nly half of the a••ailable cross-tabulation cells ( two of four) to computl’
its x.’..,., ••hereas all cells Jre used to compute ;c,.. in the independence and goodne~< of·
fit tl'sts.
For df= I and ex .05, the critical value fot· x’,,.,. is 3.84. Ou r c alculated value (X’,,,l was
CJb,crved (f0 ) Expected (1, ) (If.- f, J – 0.5)
NOTE: 7.’ = 0.01 9 + 0.02l + 0.031 + 0.037 ~ 0. 109.
f,
and
f’or 2 x k tabulation, we cannot convert tv to PVE.
are interested in whether there are any d ifferences in the magnitudes of ageist attitudes
among (a) hospital social workers. ( b) nursing home social workers, and (c) adult pro tee-
tive services social workers.
nursing home (b), hospital (a) with protective services (c), and nursing home (b) with pro-
tective services (c).
.05 chance of committing a Type I error (rejecting the null hypothesis when it is tr ue) and
a .95 chance of making a correct decision (not rejecting the null hypot~esis when it is
true). If 1ve conduct three tests at u = .05, our chance of commi tting at least one Type I
error increases to about .15 (the precise probability is . 142625). So, we actually are testing
at around 0′. = . 15.
sis \”rhen it is true increases. \oVe are ((capitalizing on chattce .’>
leveL f’o r three co mpa risons, we m ight cond uct our tests at u “‘ .05/3 “‘ .0 167.
Unfortunately, if we do th is, then we will reduce the po,ver ( I – ~) of o ur test to detect a
possible existing effect.
among groups wiLhout compromising power. This is done by siJnultaneously eva1U(lting
all groups for any differences. If no d ifferences are detected, then we fai l to reject the null
hypothesis and stop. No further tests are conducted because w e already have our ans11w.
The difference> among all gro ups are not sufficien tly large that we can reject the notion
that all of the samples come from the s ame population.
determine which pairs arc different. T he screening tests do not tell us whether only one
pair, two pairs, o r all pairs show statistically significant differences. Screening tests show
only that there are some differences among all possible comparisons.
when the null hypothesis is true 1 out of20 times (commit a Type I error). By conducting
the in itial overall screening in a single test, we protect against the compounding o f the
alpha level brought on by multiple comparisons.
If all of the means are equal, then it fo llows that the voriance of the means is 0 or
7
and the denominator is a pooled estimntc of the score variances within the samples.
assigned to one of the k groups.
drawn must be normally d istributed.
ulntions whose variances arc equal.
ability due to membership in a particular group (variability a.~sociated with group means
or between-group variance) and which is variability associated with unexplained fluctua·
tions (wi thin-group variance).
of treatment group means around an overall mean (sometimes called a grand mean) and
another component representing the variability of group scores around their own individ·
ual group means. The variability of group means around the grand mean is called between·
group variance. The variabiliry of individual scores around their own group means is called
within-group variance. This division is rep.–nted by the foUowing equation:
Total Within Between
out respect to which group they are in. X is a particular score, and the X with one bar is
the mean of the group to which that score belongs.
is the sLun of the deviation of the sco re fro m its g roup mean and the deviation of tbe
g ro up mean fro m t he g rand mean. T his might be a little dearer if we look at a simple data
set. Let us hlke the exam ple about ageist attit udes among hospital social workers (Group I),
nursing ho me social workers (Gro up 2), a11d adult protective services social workers
(Group 3). T be dependent measure quan tifies ageist attitudes (higher scores represent
n1ore ageist sentiment).
is N= 12. The group means are 3 (Gro up 1 ), 5 (G roup 2), and 9 (Grotlp 3), and the grand
mea n is 5.67.
sums of sq uares are derived fro m t he deviatio n score C
1
is calculated by subtracting the grand mean from each score, squaring the differ-
=2
squaring the differences, a nd adding up (summing) the squared differences fo r each
g ro up. This gives us t hree s ums of squares: sswoup I’ SSC.,>I>p , . and SS.;ooup>· These are added
up to give us ssv.·ilhin:
ssW”'” = r
the diffe rences, and adding up (summing) the squared differences. Then, we multiply the
to tal by the sample size. This is because this sum of squares needs to be weighted. Whereas
N = 12 scores ~~ent to make up SS10,.1, and ( k)(n) = (3)(4) = 12 scores went to m ake up
SS., … ,,,, o nly the k= 3 g roup means went to make upS~””‘”. We m ultiply by 11 = •l so that
S~~ will have t he same ” ‘eig ht as tlte o ther two sums of squares:
ss … ,, = 134.667.
gon, a variance is called a mean square. Each particular m ean square ( variance) has its
own degrees of freedom .
o ne grand mean, the degrees of freedom ar e N – l. The within-groups sum of squares
(SSw”””) involves the variability of all scores wit hin g roups around k g ro up m eans, where
k is the n umber o f g ro ups. So, the within-groups degrees o f freedo m are N- k. T he
between-groups sum of squares($\””””‘) involves the va riability of k gr o up m eans
around the grand mea n. So, the between-g roups degrees of freed om are k – J.
the fo rmu la fo r a m ean square would be MS ~ SSitlf
specific fo rm ulas are as follows:
A•f\””‘” = i 4.66712 = 3i.333
~ 6.667.
The fo rm LLla for F •• , is
MSwulUn .
MS,,;,hin 6.667
is a way of p resent ing t he information about the sums of squares, degrees of freedom,
mean squares, and F statistics in a more easily understood fashion. Table 6 . 17 uses the
example data.
were used to calculate o ur F •• ,. there are two types of degrees o f freedom asso ciated with
it: n umerator deg rees o f freedom (between g ro u ps) and de;w .minator d egrees of freedom
(within g roups). T hese are used either to look up values in a table o f the F distribution or
by computer programs to com pu te p values.
freedom were used in the calculation o f MS,””‘”” The d enominator d egrees of freedom
11111
cal value for Fat 2 and 9 degrees of freedom is .t~”‘ = 4.26. Because F..,,: 5.6 is greater than
the critical value, we reject the null hypothesis at«= .OS.
populations. Because we already have screened out other opportuni ties LO commit’I)’Pe 1
error, further testing would not be capi[aiizing on chance. Thus, we may carry out the fol-
lowing pair comparisons:
ple comparison tests. One of the more frequently used is the least significant difference
(LSD) test. The l.SD test is a variant on the t test. However, the standard error of the mean
is calculated from the within-groups mean square (variance) from the ANOVA:
In all three instances, we reject the rwll hypothesis at a = .OS.
Nursing Home (Group 2) “‘ – 0.577 – . / t!tl = 2.262
Adult Protective Services t .. , = 2.262
(Group 3) Reject H.
Services (Group 3) Rejecl H.
we d eal wit h rwo: Cohen’• (1988) J and 1{
by the pooled “ithin group standard devialion. It ranges from a min imum of 0 to an
rndetinitcly large upper limit. It m~) be estimated from F..,. by using the following for mula:
laled by the fo llowing formula:
1) =– – . ss,,,,.,
The critcri~ lor each are as folio” s:
Medium efYect size: f; .25
Large effect size: f .40
‘l = = 0.554.
The test involves iniliall y treating all samples as one gro up and ranking scores from
least to most. After this is done, the frequenc ies of low and high ranks among groups <1re
compared.
est and randomly assigned to one of the k groups.
verted to ranks).
more po<,•erful than the K -W test. However, if the distribution of population scores is not
normal and/or the population variances are not equal. then the K-W test might be the
more powerful test.
testing. If a significant difference is fo und, then we proceed to test ind ividual pairs with
the W/M -W test.
clients who wish to stop making negative self-statements: (a) self-disputation,
(b) thought stopping, and (c) identifying the source of the negative statement (insight). A
total o r 27 clients with this concern were randomly selected and assigned to one of the
three intervention conditions. On the 28th day of the intervention, each client counted
the n umber of negative self-statementS that he or she had made.
assigning ranks to the scores without regard to which group individuals were in. We then
sum the ranks within each group. The sununed ranks are called W, for Group I, W2 for
Group 2, and W, fo r Group 3 (Table 6 .18).
df'” k – 1- Jt is calcuhted according to the follow ing equation:
lfoo,. = – – • 2:— 3(N +I)
– _ 1_2- • 7921 -i 15006.25 + 27722.25 – 3( 28)
12 S0649.5
= 5.3289
the K-W test statistic has a corr~tion for tics, which is as follows:
C= 1-~–:–f
In our example, the score of 4 occurred twice, ‘0 t = 2 fo r this g roup. 1 he score of 5
occ u•-red three t ime>, so I= 3 for t his group. There were seven grou ps fo r which t = 2 and
two groups for which 1 = 3.
1
1
1
-2)+{2
21+(2′ l)r~~:-~!+(z’ l,r(Z’-2)-(3′ )j+(3′ 3)
19,683 27
—-.-q,65{, = ‘i9.65(, ~ I
c
-09 = 5. 434.
fo r ‘J.2 at df= 2 and Cl”‘ .05 is x:., = 5.99 (see a table of critical value. of X’ found in mo>t
statistics teru). fl •- 5.34 is not greater than or equal to the critical value, so we fail to
reject the null hypothesis at a. .05.
K • W test did not find anr significant differences among the three group~ retesting the
same n u ll hypothesis by a series of pair comparisons would not be justi fi ed.
dence as for a 2 x k test. The asoumptions are as follows:
this is t ha t categories must be mutually exclusive ( no case may appear in more than
one category).
and no more than 20% of the expected frequencies should he less than 5 ..
student>. We are mtere.sted m lhe marijuana u~c differences (if any) among lOth, lith,
a nd 12th graders in our school distr ic t. A p roponionatc stratified 1·a ndom ;ample w,,s
drawn of sixty l Oth g raders, ,[xty·five l l l h graders, and fifty-five 12th grader> from all
students in the district. The students “ere surve)ed on their use ot drugs over lhe past
year under conditions guaranteeing con fidentia lity of res ponse. Table 6. 19 s hows
reported m arijuana use for th e sampled studen ts.
and 12th graders used marijuana during the pa>l year. The null hypothesis values for this
test are the expected freq uencies. These exp ected freque ncies are culculated in t he same
•vay as fo • u 2 x kz’.
(calculated) value did not exceed the critical value, we would not rejc(l the null hypoth e>i>
0
= 3.42019114.
and their associated measures of effect size. Of course, there are m any other importan t
statistical hypothesis tests that were not discussed. T hese incl ude tests of correhllion coef-
fic ients, multiple regression analysis, and factor ial and block design ANOVAs, among
mended further readings at the end of the chapter.
to space constraints. I strongly urge the reader to become more deeply acquainted with
power analysis.
for relationships between independent and dependent variables. They do not provide evi-
dence that such relationships are fu nctional ones. This is the more difficult task of accounting
for or controlling e.’l.1raneous variables that is discussed in other chapters of this handbook.
2. A popula1 ion is all rhat there is of a particular thing.
3. A va dable is a characteristic that m ay assume mor<.' than one value. It varies. Some examples
Relations. length of time engaged in cooperative pia)’• and stlf-raling of anxiety.
l,a\.,l’ence Erlbaum.
chastk aJly larger than the other. t\mUlls of Mathematical Statistks, 18, 50-60.
(3rd ed. ). Mahwah, Nl: l awrenre Erlbaum.
Th1s is a wonde1fu l Web site hsting online classes you can take in a wide a nay of statistical topics, rang·
ing from the introductol)’ to the advanced.
Th1s Web site from the Cochrane Collaboration lis!$ a number of llarning opportunitJes related to design-
ing and conducting systematrc reviews of the emprncal research lite
This is a s11e for an online glossary of stallsllcs terms and tests.
authors exphcitfy state one Of more predictive hypotheses. Unde~ine these and bring the artocle to
class, so you can read the hypothesrs to your classmates Orscuss the qual rues of thrs hypothesrs. rn
terms of rts testability.
a statisncalty significant drfference. What do effect SIZe reportS add 10 s1mpiy reporting whether a
gillen dtffercnce exceeded chance expectallonsl
work 1ntervent1on. What type of stat istrcal te>t would be most approprrate, one for independent
samples or one for dependent samples’ Explain why.
statistical procedures and to illustrate what types of research questions can be
addressed by different statistical tests. You may not fully understand these tests without
further study. However, you are strongly encouraged to note distinctions related to type
of measurement used in gathering data and the choice of statistical tests. Feel free to
post questions in the “Contact the Instructor” section of the course.
α alpha (degree of error acceptable for incorrectly rejecting the null hypothesis,
probability that results are unlikely to occur by chance)
≠ (not equal)
≥ (greater than or equal to)
≤ less than or equal to)
ᴦ (sample correlation)
ρ rho (population correlation)
t r (t score)
z (standard score based on standard deviation)
χ
Chi square (statistical test for variables that are not interval or ratio scale, (i.e.
p (probability that results are due to chance)
and dispersion (e.g., standard deviation and range).
variable (nominal, ordinal, interval, or ratio). If you do not recall the definitions for these
levels of measurement, see
http://www.ats.ucla.edu/stat/mult_pkg/whatstat/nominal_ordinal_interval.htm
variables.
example, you can calculate the percentage of participants who are male and female; or
the percentage of survey respondents who are in favor, against, or undecided.
people are tempted to calculate a mean using these coding numbers. But that would be
represent attitudes along a continuum (e.g. Strongly like … Strongly dislike). These too
are often assigned a number for data entry, e.g. 1–5. Suppose that most of the
responses were in the middle of a scale (3 on a scale of 1–5). A researcher could
observe that the mode is 3, but it would not be reasonable to say that the average
(mean) is 3 unless there were exact differences between 1 and 2, 2 and 3 etc. The
numbers on a scale such as this are ordered from low to high or high to low, but there is
no way to say that there is a quantifiably equal difference between each of the choices.
In other words, the responses are ordered, but not necessarily equal. Strongly agree is
not five times as large as strongly disagree. (See the textbook for differences between
ordinal and interval scale measures.)
allowing a researcher to infer relationships between variables.
indicate that the analysis is appropriate for the type of data. Two key types of
assumptions relate to whether the samples are random and the measurement levels.
determination of statistical significance is based on the assumption of the normal
distribution. A full course in statistics would be needed to explain this fully. The key point
for our purposes is that some statistical procedures require a normal distribution and
others do not.
whether the results are statistically significant. The statistic p is the probability that the
results of a study would occur simply by chance. Essentially, a p that is less than or
equal to a predetermined (α) alpha level (commonly .05) means that we can reject a null
hypothesis. A null hypothesis always states that there is no difference or no relationship
between the groups or variables. When we reject the null hypothesis, we conclude (but
don’t prove) that there is a difference or a relationship. This is what we generally want to
know.
scale and for the variables to be normally distributed.
be at an interval or ratio scale. They make use of the standard deviation to determine
whether the results are likely to occur or very unlikely in a normal distribution. If they are
very unlikely to occur, then they are considered statistically significant. This means that
the results are unlikely to occur simply by chance.
Common uses:
o The research question for a t test comparing the mean scores between
2? The hypotheses tested would be:
H1: µgroup1 ≠ µgroup2
o The research question for a t test comparing the mean scores for a
time 1 and time 2? The hypotheses tested would be :
H1: µpre ≠ µpost
significant, t (57) = .282, p = .779, thus the null hypothesis is not rejected. There is not a
difference in between pre and post scores for participants in terms of a measure of
knowledge (for example).
relationship to a normal distribution. If you calculated the t using a formula, you would
compare the obtained t to a table of t values that is based on one less than the number
of participants (n-1). n-1 represents the degrees of freedom. The obtained t must be
greater than a critical value of t in order to be significant. For example, if statistical
analysis software calculated that p = .779, this result is much greater than .05, the usual
alpha-level which most researchers use to establish significance. In order for the t test
to be significant, it would need to have a p ≤ .05.
Common uses: Similar to the t test. However, it can be used when there are more than
two groups.
H1: The means are not all equal (some may be equal)
Common use: to examine whether two variables are related, that is, they vary together.
The calculation of a correlation coefficient (r or rho) is based on means and standard
deviations. This requires that both (or all) variables are measured at an interval or ratio
level.
as one variable increases, so does the other. A – means that as one variable increases,
the other decreases.
one or more other variables?”
H1: ρ ≠ 0 (there is a real correlation)
interval or ratio scale and do not require the variables to be normally distributed.
Common uses: Chi square tests of independence and measures of association and
agreement for nominal and ordinal data.
between the independent variable and a dependent variable?
one variable between the groups (defined as categories of another variable).
variable 1.
of one variable between the groups (defined as categories of another variable).
what would be expected if the proportions were equal. (If the proportions between
observed and expected frequencies are equal, then there is no difference.)
Crosstabs procedure for chi square analysis.
employment level and treatment condition. It tests whether there is a difference between groups.
treatment, and the dependent variable, employment level? In other words, is there a difference in
the number of participants who are not employed, employed part-time and employed full-time in
the program and the control group (i.e., waitlist group)?
employment categories between the treatment group and the waitlist group. In other words, the
frequency distribution for variable 2 (employment) has the same proportions for both categories
of variable 1 (program participation).
that is found to be statistically significant, (e.g. p< .05) indicates that we can reject the
null hypothesis (understanding that there is less than a 5% chance that the relationship
between the variables is due to chance).
three employment categories between the treatment group and the waitlist group.
that it appears that the treatment (voc rehab program) is effective in increasing the
employment status of participants.
entered the data into SPSS. A chi-square test was conducted, and you were given the following
SPSS output data:
What Will You Get?
Premium Quality
Experienced Writers
On-Time Delivery
24/7 Customer Support
Complete Confidentiality
Authentic Sources
Moneyback Guarantee
Order Tracking
Areas of Expertise
Areas of Expertise
Trusted Partner of 9650+ Students for Writing
Preferred Writer
Grammar Check Report
One Page Summary
Plagiarism Report
Free Features $66FREE
Our Services
Academic Writing
Professional Editing
Thorough Proofreading
Delegate Your Challenging Writing Tasks to Experienced Professionals
Check Out Our Sample Work
It May Not Be Much, but It’s Honest Work!
0+
Happy Clients
0+
Words Written This Week
0+
Ongoing Orders
0%
Customer Satisfaction Rate
Process as Fine as Brewed Coffee
See How We Helped 9000+ Students Achieve Success
We Analyze Your Problem and Offer Customized Writing
We Mirror Your Guidelines to Deliver Quality Services
We Handle Your Writing Tasks to Ensure Excellent Grades