Nonparametric Tests

Description


Must reflect your understanding of the material

.

Please see the Document attach (hw4.docx) for information about this work, you Must go through and over it. You Must include the screen photos or drawings as directed when needed please provide DOC file or any other program prefered “Stata” file that you used to show your work

review the question for more information .


Complete and show the break down of work,

Instructions: Solve each of the following problems. When you use Stata to arrive at your
solution, copy the supporting work (plots, etc.) into your solutions. Do not copy all the output
into your solution. Include only what is needed.
Nonparametric Tests
Steven King/Icon SMI 258/Newscom
Introduction
The most commonly used methods for inference about the means of quan­
titative response variables assume that the distributions of sample means
are approximately Normal. This condition is satisfied when we have Normal
distributions in the population or populations from which we draw our data.
In practice, of course, no distribution is exactly Normal. Fortunately, our
usual methods for inference about population means (the one-sample and
two-sample t procedures and analysis of variance) are quite robust. That is,
the results of inference are not very sensitive to moderate lack of Normality,
especially when the samples are reasonably large. Some practical guidelines
for taking advantage of the robustness of these methods appear in Chapter 7
(page 423).
What can we do if plots suggest that the population distribution is clearly
not Normal, especially when we have only a few observations? This is not a
simple question. Here are the basic options:
1. If lack of Normality is due to outliers, it may be legitimate to remove the
outliers. An outlier is an observation that may not come from the same
population as the other observations. Equipment failure that produced
a bad measurement, for example, entitles you to remove the outlier and
analyze the remaining data.
15
15.1 The Wilcoxon Rank
Sum Test
15.2 The Wilcoxon
Signed Rank Test
15.3 The Kruskal-Wallis
Test
robustness
outlier
15-1
15-2
Chapter 15 Nonparametric Tests
LOOK BACK
transformations,
p. 91
other standard
distributions
2. Sometimes we can transform our data so that their distribution is more
nearly Normal. Transformations such as the logarithm that pull in the long
tail of right-skewed distributions are particularly helpful. Example 7.25
(page 470) illustrates use of the logarithm.
3. In some settings, other standard distributions replace the Normal
distributions as models for the overall pattern in the population. We
mentioned in Chapter 5 (page 305) that the Weibull and exponential
distributions are common models for the lifetimes in service of equipment
in statistical studies of reliability. Also, we studied the exponential
distributions (page 300) and the Poisson distributions (page 328) in Chapter 5.
There are inference procedures for the parameters of these distributions
that replace the t procedures when we use specific non-Normal models.
bootstrap methods
permutation tests
4. Modern bootstrap methods and permutation tests do not require
Normality or any other specific form of sampling distribution. Moreover,
you can base inference on resistant statistics such as the trimmed mean.
We recommend these methods unless the sample is so small that it may not
represent the population well. Chapter 16 gives a full discussion.
nonparametric methods
5. Finally, there are other nonparametric methods that do not require any
specific form for the distribution of the population. Unlike bootstrap and
permutation methods, common nonparametric methods do not make use
of the actual values of the observations. We have already discussed the
sign test (page 473) which works with counts of observations. This chapter
presents rank tests based on the rank (place in order) of each observation
in the set of all the data.
rank tests
The methods of this chapter are designed to replace the t tests and one-way
analysis of variance (ANOVA) when the Normality conditions for those tests
are not met. Figure 15.1 presents an outline of the standard tests (based on
Normal distributions) and the rank tests that compete with them.
The rank tests we will study concern the center of a population or popu­
lations. When a population has at least roughly a Normal distribution, we
describe its center by the mean. The “Normal tests’’ in Figure 15.1 test hy­
potheses about population means. When distributions are strongly skewed, we
often prefer the median to the mean as a measure of center. In simplest form,
the hypotheses for rank tests just replace the mean by the median.
FIGURE 15.1 Comparison
of tests based on Normal
distributions with nonparametric
tests for similar settings.
Setting
Normal test
Rank test
One sample
One-sample t test
Section 7.1
Wilcoxon signed rank test
Section 15.2
Matched pairs
Apply one-sample test to differences within pairs
Two independent samples
Two-sample t test
Section 7.2
Wilcoxon rank sum test
Section 15.1
Several independent samples
One-way ANOVA F test
Chapter 12
Kruskal-Wallis test
Section 15.3
15 .1 The Wilcoxon Rank Sum Test
15-3
We devote a section of this chapter to each of the rank procedures. Section
15.1, which discusses the most common of these tests, also contains general
information about rank tests. The big idea of using ranks, kind of assumptions
required, the nature of the hypotheses tested, and the contrast between using
exact distributions for small samples and approximate distributions for larger
samples are common to all rank tests. Sections 15.2 and 15.3 more briefly
describe other rank tests.
15.1 The Wilcoxon Rank Sum Test
When you complete
this section, you will
be able to:
●
●
●
●
●
●
LOOK BACK
Find the rank transformation for a set of data.
Compute the Wilcoxon rank sum statistic for the comparison of two
populations.
State the null and alternative hypotheses that are used for the analysis
of data using the Wilcoxon rank sum test.
Use the two sample sizes to find the mean and the standard deviation
of the sampling distribution of the Wilcoxon rank sum statistic under the
null hypothesis.
Find the P-value for the Wilcoxon rank sum significance test using the
Normal approximation with the continuity correction.
For the Wilcoxon rank sum test, use computer output to determine the
results of the significance test.
Two-sample problems (see Section 7.2) are among the most common in
statistics. The most useful nonparametric significance test compares two
distributions. Here is an example of this setting.
two sample problems,
p. 433
EX A M P L E 15.1
HITS
Does the American League get more hits? In 1973, the American League
adopted the designated-hitter rule, which allows a substitute player to take
the place of the pitcher when it is the pitcher’s turn to bat. Because pitchers
typically do not hit as well as other players, it was expected that the rule
would produce more hits and, therefore, more excitement for the fans. The
National League has not adopted this rule. Let’s look at some data to see if
we can detect a difference in hits between the American League and the
National League. Here are the number of hits for eight games played on the
same spring day, four from each league.
League
American
National
Hits
21
19
18
7
24
11
20
13
The samples are too small to assess Normality adequately or to rely on the
robustness of the t test, although the first entry for the National League
suggests that there may be some skewness. Let’s use a test that does not
require Normality.
15-4
Chapter 15 Nonparametric Tests
We recommend always using either the exact distribution (from software
or tables) or the continuity correction for the rank sum statistic W. The
exact distribution is safer for small samples. As Example 15.4 illustrates,
however, the Normal approximation with the continuity correction is often
adequate.
The rank transformation
We first rank all eight observations together. To do this, arrange them in order
from smallest to largest:
7
11
13
18 19
20
21
24
The boldface entries in the list are the hits for the American League. The idea
of rank tests is to look just at position in this ordered list. To do this, replace
each observation by its order, from 1 (smallest) to 8 (largest). These numbers
are the ranks:
Hits
7
Rank
1
11 13 18
19
20
21
24
4
5
6
7
8
2
3
RAnKS
To rank observations, first arrange them in order from smallest to largest.
The rank of each observation is its position in this ordered list, starting
with rank 1 for the smallest observation.
It would not be unusual in the baseball example to have sampled from a
day where more than one game had the same number of hits. We will discuss
how to handle ties later in this section.
Moving from the original observations to their ranks is a transformation
of the data, like moving from the observations to their logarithms. The rank
transformation retains only the ordering of the observations and makes
no other use of their numerical values. Working with ranks allows us to
dispense with specific assumptions about the shape of the distribution, such
as Normality.
UsE YoUR KnoWLEdGE
HOTELS
15.1 Numbers of rooms in top meeting hotels. Cvent ranks meeting
hotels in the United States and lists the top 100 with characteristics of
each hotel.1 We let Group A be the 25 top-ranked hotels and let Group B
be the hotels ranked 26 to 50. A simple random sample (SRS) of size 5
was taken from each group, and the number of rooms in each selected
hotel was recorded. Here are the data:
Group A
1628
1622
2019
1260
1996
Group B
1544
736
3933
1214
1096
Rank all the observations together and make a list of the ranks for
Group A and Group B.
15 .1 The Wilcoxon Rank Sum Test
HOTEL2
15-5
15.2 The effect of Caesars Palace on the result. Refer to the previous
exercise. Caesars Palace in Las Vegas, with 3933 rooms, was the third
hotel selected in Group B. Suppose, instead, a different hotel, with
1600 rooms, less than half as many, had been selected. Replace the
observation 3933 in Group B by 1600. Use the modified data to make
a list of the ranks for Groups A and B combined. What changes?
The Wilcoxon rank sum test
If the American League games tend to have more hits than the National
League, we expect the ranks of the American League games to be higher than
those for the National League games. Let’s compare the sums of the ranks
from the two treatments:
League
Sum of ranks
American
National
25
11
These sums compare the hits of the American League with those of the
National League. In fact, the sum of the ranks from 1 to 8 is always equal to
36, so it is enough to report the sum for one of the two groups.
Because the sum of the ranks for the American League is 25, the ranks
for the National League must be 11 because 25 1 11 5 36. If there was no
difference between the leagues, we would expect the sum of the ranks for each
league to be 18 (half of the total sum of 36). Here are the facts we need in a
more general form that takes account of the fact that our two samples need
not be the same size.
The WILCOxOn RAnK Sum TeST
Draw an SRS of size n1 from one population and draw an independent
SRS of size n2 from a second population. There are N observations in all,
where N 5 n1 1 n2. Rank all N observations. The sum W of the ranks for
the first sample is the Wilcoxon rank sum statistic. If the two popula­
tions have the same continuous distribution, then W has mean
mW 5
and standard deviation
sW 5
n1(N 1 1)
2
ÃŽ
n1n2(N 1 1)
12
The Wilcoxon rank sum test rejects the hypothesis that the two popula­
tions have identical distributions when the rank sum W is far from its
mean.* This test is also called the Mann-Whitney test.
*This test was invented by Frank Wilcoxon (1892–1965) in 1945. Wilcoxon was a chemist who
encountered statistical problems in his work at the research laboratories of American Cyanamid
Company.
15-6
Chapter 15 Nonparametric Tests
For the baseball question of Example 15.1, we want to test
H0: the number of hits in an American League game is the
same as the number of hits in a National League game
against the one-sided alternative
Ha: the number of hits in American League games is greater
than the number of hits in National League games
Our test statistic is the rank sum W 5 25 for the American League games.
UsE YoUR KnoWLEdGE
HOTELS
HOTEL2
EX A M P L E 15.2
15.3 Hypotheses and test statistic for top hotels. Refer to Exercise 15.1.
State appropriate null and alternative hypotheses for this setting, and
calculate the value of W, the test statistic.
15.4 The effect of Caesars Palace on the test statistic. Refer to Exercise 15.2.
Using the altered data, state appropriate null and alternative hypotheses, and calculate the value of W, the test statistic.
Perform the significance test. In Example 15.1, n1 5 4, n2 5 4, and there are
N 5 8 observations in all. The sum of ranks for the American League games
has mean
mW 5
5
and standard deviation
sW 5
5
ÃŽ
ÃŽ
n1(N 1 1)
2
(4)(9)
5 18
2
n1n2(N 1 1)
12
(4)(4)(9)
5 Ï12 5 3.464
12
The observed sum of the ranks, W 5 25, is higher than the mean,
about 2 standard deviations higher, (25 2 18)y3.464. It appears that the
data support our idea that American League games have more hits than
National League games. The P-value for our one-sided alternative is
P(W $ 25), the probability that W is at least as large as the value for our
data when H0 is true.
To calculate the P-value P(W $ 25), we need to know the sampling
distribution of the rank sum W when the null hypothesis is true. This
distribution depends on the two sample sizes n1 and n2. Tables are, therefore,
a bit unwieldy, though you can find them in handbooks of statistical tables.
Most statistical software will give you P-values, as well as carry out the
ranking and calculate W. However, some software gives only approximate
P-values.
15 .1 The Wilcoxon Rank Sum Test
15-7
The normal approximation
The rank sum statistic W becomes approximately Normal as the two sample
sizes increase. We can then form yet another z statistic by standardizing W:
z5
5
LOOK BACK
continuity
correction,
p. 325
EX A M P L E 15.3
W 2 mW
sW
W 2 n1(N 1 1)y2
Ïn1n2(N 1 1)y12
Use standard Normal probability calculations to find P-values for this statistic.
Because W takes only whole-number values, the continuity correction
improves the accuracy of the approximation.
The continuity correction. The standardized rank sum statistic W in our
baseball example is
z5
W 2 mW 25 2 18
5
5 2.02
sW
3.464
We expect W to be larger when the alternative hypothesis is true, so the
approximate P-value is
P(Z $ 2.02) 5 0.0217
The continuity correction acts as if the whole number 25 occupies
the entire interval from 24.5 to 25.5. We calculate the P-value P(W $ 25) as
P(W $ 24.5) because the value 25 is included in the range whose probability
we want. Here is the calculation:
S
P(W $ 24.5) 5 P
W 2 mW 24.5 2 18
$
sW
3.464
D
5 P(Z $ 1.876)
5 0.0303
Software output. Figure 15.2 shows the output from JMP. The sum of
the ranks for the American League is given as W 5 25. The value for the
National League is W 5 11. Dividing these sums by the sample sizes, both 4
in this example, gives the means displayed in the Score Means column in the
output. JMP uses the continuity correction for its calculations. A z statistic
is given for each league. These have the same values but with opposite
signs. The P-value for the two-sided alternative is 0.0606. For the one-sided
alternative that American League games have more hits than the National
League games, we divide the P-value by 2 giving P 5 0.0303.
LOOK BACK
two-sample
t test,
p. 440
It is worth noting that the two-sample t test for the one-sided alternative
gives essentially the same result as the Wilcoxon test in Example 15.3 (t 5 2.95,
P 5 0.016).
15-8
Chapter 15 Nonparametric Tests
FIGURE 15.2 Output from
JMP for the baseball hit data,
Example 15.4.
UsE YoUR KnoWLEdGE
HOTELS
HOTEL2
15.5 The P-value for top hotels. Refer to Exercises 15.1 and 15.3 (pages
15-4 and 15-6). Find mW, sW, and the standardized rank sum statistic.
Then give an approximate P-value using the Normal approximation.
What do you conclude?
15.6 The effect of Caesars Palace on the P-value. Refer to Exercises 15.2
and 15.4 (pages 15-5 and 15-6). Perform the same analysis steps as in
Exercise 15.5 using the altered data.
M
FIGURE 15.3 Output from
(a) Minitab and (b) SPSS for the
data in Example 15.1. (a) Minitab
uses the Normal approximation
for the distribution of W. (b) SPSS
gives the exact value for the twosided alternative.
(a) Minitab
15 .1 The Wilcoxon Rank Sum Test
FIGURE 15.3 Continued
15-9
(b) SPSS
What hypotheses does Wilcoxon test?
Our null hypothesis is that the distribution of hits per game is the same in the
two leagues. Our alternative hypothesis is that there are more hits per game in
the American League than in the National League. If we are willing to assume
that hits are Normally distributed, or if we have reasonably large samples, we
use the two-sample t test for means. Our hypotheses then become
H0: m1 5 m2
Ha: m1 . m2
When the distributions may not be Normal, we might restate the hypoth­
eses in terms of population medians rather than means:
H0: median1 5 median2
Ha: median1 . median2
The Wilcoxon rank sum test does test hypotheses about population medians, but
only if an additional assumption is met: both populations must have distributions
of the same shape and spread. That is, the density curve for hits per game in the
American League must look exactly like that for the National League except
that it may be shifted to the left or to the right. The Minitab output in Figure
15.3(a) states the hypotheses in terms of population medians and also gives a
confidence interval for the difference between the two population medians.
The same-shape assumption is too strict to be reasonable in practice. Recall
that our preferred version of the two-sample t test does not require that the two
populations have the same standard deviation—that is, it does not make a sameshape assumption. Fortunately, the Wilcoxon test also applies in a much more
general and more useful setting. It tests hypotheses that we can state in words as
H0: The two distributions are the same.
Ha: One distribution has values that are systematically larger.
systematically larger
Here is a more exact statement of the systematically larger alternative
hypothesis. Take X1 to be hits in an American League game and X2 to be hits
in a National League game. These hits are random variables. That is, for each
game in the American League, the number of hits is a value of the variable X1.
15-10
Chapter 15 Nonparametric Tests
The probability that the number of hits is more than say, 15 is P(X1 . 15).
Similarly, P(X2 . 15) is the corresponding probability for the National League.
If the number of American League hits is “systematically larger’’ than the
number of National League hits, getting more hits than 15 should be more
likely in the American League. That is, we should have
P(X1 . 15) . P(X2 . 15)
The alternative hypothesis says that this inequality holds not just for 15 hits
but for any number of hits.2
This exact statement of the hypotheses we are testing is a bit awkward.
The hypotheses really are “nonparametric’’ because they do not involve any
specific parameter such as the mean or median. If the two distributions do
have the same shape, the general hypotheses reduce to comparing medians.
Many texts and computer outputs state the hypotheses in terms of medians,
sometimes ignoring the same-shape requirement. We recommend that you
express the hypotheses in words rather than symbols. “The number of American
League hits per game is systematically higher than the number of National
League hits per game’’ is easy to understand and is a good statement of the
effect that the Wilcoxon test looks for.
Ties
average ranks
EX A M P L E 15.6
The exact distribution for the Wilcoxon rank sum is obtained assuming that
all observations in both samples take different values. This allows us to rank
them all. In practice, however, we often find observations tied at the same
value. What shall we do? The usual practice is to assign all tied values the
average of the ranks they occupy. Here is an example:
Does the American League get more hits? In Example 15.1 (page 15-3), we
examined data that could be used to address this question. There were no
ties in the data but it would not be unlikely to see ties in data such as this.
Let’s change the data so that the first entry for the National League is 20
instead of 19. Here is the resulting table:
League
Hits
American
National
21
20
18
7
24
11
20
13
Here are the ranked data with the American League hits displayed in boldface.
7
11
13
18 20
20
21
24
The boldface entries in the list are the hits for the American League. The
idea of rank tests is to look just at position in this ordered list. To do this,
replace each observation by its order, from 1 (smallest) to 8 (largest). These
numbers are the ranks:
Hits
7
Rank
1
11 13
2
3
18
20
20
21
24
4
5.5
5.5
7
8
Notice that the two entries with 20 hits now share the rank 5.5.
15 .1 The Wilcoxon Rank Sum Test
15-11
The exact distribution for the Wilcoxon rank sum W changes if the data
contain ties. Moreover, the standard deviation sW must be adjusted if ties are
present. The Normal approximation can be used after the standard deviation is
adjusted. Statistical software will detect ties, make the necessary adjustment,
and switch to the Normal approximation. In practice, software is required if
you want to use rank tests when the data contain tied values.
It is sometimes useful to use rank tests on data that have very many ties
because the scale of measurement has only a few values. Here is an example.
E
TV time (hours per day)
Exergamer
EXERG
UsE YoUR KnoWLEdGE
EXERG
LOOK BACK
chi-square
test,
p. 535
Yes
No
None
Some but less than two hours
two hours or more
6
48
160
616
115
255
15.7 Analyze as a two-way table. Analyze the exergaming data in Example 15.7 as a two-way table.
(a) Compute the percents in the three categories of TV watching for
the exergamers. Do the same for those who are not exergamers. Display the percents graphically and summarize the differences in the
two distributions.
(b) Perform the chi-square test for the counts in the two-way table.
Report the test statistic, the degrees of freedom, and the P-value.
Give a brief summary of what you can conclude from this significance test.
How do we approach the analysis of these data using the Wilcoxon test?
We start with the hypotheses. We have two distributions of TV viewing, one
for the exergamers and one for those who are not exergamers. The null hypothesis states that these two distributions are the same. The alternative hypothesis uses the fact that the responses are ordered from no TV to two hours
or more per day. It states that one of the exerciser groups watches more TV
than the other.
H0: The amount of time spent viewing TV is the same for students
who are exergamers and students who are not.
Ha: One of the two groups views more TV than the other.
15-12
Chapter 15 Nonparametric Tests
The alternative hypothesis is two-sided. Because the responses can take
only three values, there are very many ties. All 54 students who watch no TV
are tied. Similarly, all students in each of the other two columns of the table
are tied. The graphical display that you prepared in Exercise 15.7 suggests that
the exergamers watch more TV than those who are not exergamers. Is this difference statistically significant?
EX A M P L E 15.8
EXERG
Software output. Look at Figure 15.4, which gives JMP output for the
Wilcoxon test. The rank sum for the exergamers (using average ranks for
ties) is W 5 187,747.5 (Score Sum, rounded in the output). The expected
rank sum under the null hypothesis is 168,740.5 (Expected Score in the
output). So the exergamers have a higher rank sum than we would expect.
The Normal approximation test statistic is z 5 4.46794, and the two-sided
P-value is reported as P , 0.0001. There is very strong evidence of a difference. Exergamers watch more TV than the students who are not exergamers.
FIGURE 15.4 Output from JMP for the exergaming data, Example 15.8.
We can use our framework of “systematically larger’’ (page 15-9) to
summarize these data. For the exergamers, 98% watch some TV and 41%
watch two or more hours per day. The corresponding percents for the
students who are not exergamers are 95% and 28%. The difference is
statistically significant (z 5 4.68, P , 0.0001.)
In our discussion of TV viewing and exergaming, we have expressed
results in terms of the amount of TV watched. In fact, we do not have the
actual hours of TV watched by each student in the study. Only data with the
hours classified into three groups are available. Many government surveys
summarize quantitative data categorized into ranges of values. When
summarizing the analysis of data, it is very important to explain clearly how the
data are recorded. In this setting, we have chosen to use phrases such as
“watch more TV’’ because they express the findings based on the data
available.
15 .1 The Wilcoxon Rank Sum Test
15-13
Note that the two-sample t test would not be appropriate in this setting. If
we coded the TV-watching categories as 1, 2, and 3, the average of these coded
values would not be meaningful.
On the other hand, we frequently encounter variables measured in scales
such as “strongly agree,’’ “agree,’’ “neither agree nor disagree,’’ “disagree,’’ and
“strongly disagree.’’ In these circumstances, many would code the responses
with the integers 1 to 5 and then use standard methods such as a t test or
ANOVA. Whether to do this or not is a matter of judgment. Rank tests avoid
the dilemma because they use only the order of the responses, not their actual
values. Some statisticians use t procedures when there is not a fully meaningful
scale of measurement, but others avoid them.
Rank, t, and permutation tests
The two-sample t procedures are the most common method for comparing
the centers of two populations based on random samples from each. The
Wilcoxon rank sum test is a competing procedure that does not start from the
condition that the populations have Normal distributions. Permutation tests
(Chapter 16) also avoid the need for Normality. Tests based on Normality,
rank tests, and permutation tests apply in many other settings as well. How
do these three approaches compare in general?
First, let’s consider rank tests versus traditional tests based on Normal
distributions. Both are available in almost all statistical software.
● Moving from the actual data values to their ranks allows us to find an
exact sampling distribution for rank statistics such as the Wilcoxon rank
sum W when the null hypothesis is true. (Most software will do this only
if there are no ties and if the samples are quite small.) When our samples
are small, are truly random samples from the populations, and show nonNormal distributions of the same shape, the Wilcoxon test is more reliable
than the two-sample t test. In practice, the robustness of t procedures
implies that we rarely encounter data that require nonparametric procedures
to obtain reasonably accurate P-values. The t and W tests gave very similar
results for the baseball hit data in Example 15.1, but we would not use a t
procedure for the exergame data in Example 15.7.
Normal tests compare means and are accompanied by simple confidence
intervals for means or differences between means. When we use rank tests
to compare medians, we can also give confidence intervals for medians.
However, the usefulness of rank tests is clearest in settings when they do
not simply compare medians—see the discussion “What Hypotheses Does
Wilcoxon Test?’’ (page 15-9). Rank methods focus on significance tests, not
confidence intervals.
●
● Inference based on ranks is largely restricted to simple settings. Normal
inference extends to methods for use with complex experimental designs
and multiple regression, but nonparametric tests do not. We stress Normal
inference in part because it leads to more advanced statistics.
If you read Chapter 16 and use software that makes permutation tests
available to you, you will also want to compare rank tests with resampling
methods.
15-14
Chapter 15 Nonparametric Tests
Both rank and permutation tests are nonparametric. That is, they require
no assumptions about the shape of the population distribution. A twosample permutation test has the same null hypothesis as the Wilcoxon rank
sum test: that the two population distributions are identical. Calculation
of the sampling distribution under the null hypothesis is similar for both
tests but is simpler for rank tests because it depends only on the sizes of the
samples. As a result, software often gives exact P-values for rank tests but
not for permutation tests.
●
LOOK BACK
trimmed
mean,
p. 51
● Permutation tests have the advantage of flexibility. They allow wide
choice of the statistic used to compare two samples, an advantage over
both the t and Wilcoxon tests. In fact, we could apply the permutation test
method to sample means (imitating t) or to rank sums (imitating Wilcoxon),
as well as to other statistics such as the trimmed mean that we used in
Exercise 1.91 (page 51). Permutation tests are not available in some settings,
such as testing hypotheses about a single population, though bootstrap
confidence intervals do allow resampling tests in these settings. Permutation
tests are available for multiple regression and some other quite elaborate
settings.
An important advantage of resampling methods over both Normal
and rank procedures is that we can get bootstrap confidence intervals
for the parameter corresponding to whatever statistic we choose for
the permutation test. If the samples are very small, however, bootstrap
confidence intervals may be unreliable because the samples don’t
represent the population well enough to provide a good basis for
bootstrapping.
●
In general, both Normal distribution methods and resampling methods
are more useful than rank tests. If you are familiar with resampling, we recom­
mend rank tests only for very small samples that are clearly non-Normal and,
even then, only if your software gives exact P-values for rank tests but not for
permutation tests.
sEcTIon 15.1 SUMMaRy
● Nonparametric tests do not require any specific form for the distribution
of the population from which our samples come.
Rank tests are nonparametric tests based on the ranks of observations,
their positions in the list ordered from smallest (rank 1) to largest. Tied
observations receive the average of their ranks.
●
● The Wilcoxon rank sum test compares two distributions to assess
whether one has systematically larger values than the other. The Wilcoxon
test is based on the Wilcoxon rank sum statistic W, which is the sum
of the ranks of one of the samples. The Wilcoxon test can replace the
two-sample t test.
P-values for the Wilcoxon test are based on the sampling distribution
of the rank sum statistic W when the null hypothesis (no difference in
distributions) is true. You can find P-values from special tables, software,
or a Normal approximation (with continuity correction).
●
15-15
15 .1 The Wilcoxon Rank Sum Test
sEcTIon 15.1 EXERCISES
For Exercises 15.1 and 15.2, see pages 15-4, 15-5;
for Exercises 15.3 and 15.4, see page 15-6; for
Exercises 15.5 and 15.6, see page 15-8; and
for Exercise 15.7, see page 15-11.
15.8 Time spent studying. A sample of 11 students in a
large first-year college class were interviewed and were
asked how much time they spent studying on a typical
week night. Here are the responses, in minutes, for the
five female students in the sample:
STUDYT
110
70
190
120
310
Find the ranks for all 11 students and report the ranks
for the five female students.
15.9 Find the rank sum statistic. Refer to the previous
exercise. Here are the data for six men in the class:
STUDYT
80
80
30
130
0 200
Compute the value of the Wilcoxon statistic. Take the
first sample to be the women.
15.10 State the hypotheses. Refer to the previous
exercise. State appropriate null and alternative
hypotheses for this setting.
15.11 Find the mean and standard deviation of the
distribution of the statistic. The statistic W that you
calculated in Exercise 15.10 is a random variable with
a sampling distribution. What are the mean and the
standard deviation of this sampling distribution under
the null hypothesis?
15.12 Find the P-value. Refer to Exercises 15.8 through
15.11. Find the P-value using the Normal approximation
with the continuity correction and interpret the result of
the significance test.
15.13 Is civic engagement related to education? A
Pew Internet Poll of adults aged 18 and older examined
FIGURE 15.5 Output from
JMP for the civic participation
data, Exercise 15.13.
factors related to civic engagement. Participants were
asked whether or not they had participated in a civic
group or activity in the preceding 12 months. One
analysis looked at the relationship between this variable
and education. Here are the data:4
CIVIC
Education
Civic
participation
Civic
No civic
No high
school
High
school
Some
college
College
76
294
295
428
155
424
273
298
Figure 15.5 gives the JMP output for analyzing these data
using the Wilcoxon rank sum procedure.
(a) Describe the relevant parts of the output and write a
short summary of the results.
(b) Apply the “Systematically larger’’ framework that we
used in Example 15.8 (page 15-12) to these data. Is this
a useful way to describe the results of this analysis? Give
reasons for your answer.
15.14 Do women talk more? Conventional wisdom
suggests that women are more talkative than men. One
study designed to examine this stereotype collected data
on the speech of 10 men and 10 women in the United
States.5 The variable recorded is the number of words
per day. Here are the data:
TALK10
Men
23,871 5,180 9,951 12,460
17,155 10,344 9,811 12,387
29,920 21,791
Women
10,592 24,608 13,739 22,376
9,351 7,694 16,812 21,066
32,291 12,320
(a) Summarize the data for the two groups using numerical
and graphical methods. Describe the two distributions.
15-16
Chapter 15 Nonparametric Tests
(b) Compare the words per day spoken by the men
with the words per day spoken by the women using the
Wilcoxon rank sum test. Summarize your results and
conclusion in a short paragraph.
15.15 More data for women and men talking. The
data in the previous exercise were a sample of the data
collected in a larger study of 42 men and 37 women. Use
the larger data set to answer the questions in the
previous exercise. Discuss the advisability of using the
Wilcoxon test versus the t test for this exercise and for
the previous one.
TALK
15.16 Learning math through subliminal messages.
A “subliminal’’ message is below our threshold of
awareness but may, nonetheless, influence us. Can
subliminal messages help students learn math? A group
of students who had failed the mathematics part of the
City University of New York Skills Assessment Test
agreed to participate in a study to find out. All received
a daily subliminal message, flashed on a screen too
rapidly to be consciously read. The treatment group of
10 students was exposed to “Each day I am getting
better in math.’’ The control group of eight students was
exposed to a neutral message, “People are walking on
the street.’’ All students participated in a summer
program designed to raise their math skills, and all took
the assessment test again at the end of the program.
Here are data on the subjects’ scores before and after the
program:6
SUBLIM
Treatment group
Control group
Pretest
Posttest
Pretest
Posttest
18
24
18
29
18
25
24
29
21
33
20
24
18
29
18
26
18
33
24
38
20
36
22
27
23
34
15
22
23
36
19
31
21
34
17
27
(a) The study design was a randomized comparative
experiment. Outline this design.
(b) Compare the gain in scores in the two groups using a
graph and numerical descriptions. Does it appear that the
treatment group’s scores rose more than the scores for
the control group?
(c) Apply the Wilcoxon rank sum test to the gain in
scores. Note that there are some ties. What do you
conclude?
15.17 Storytelling and the use of language. A study of
early childhood education asked kindergarten students to
retell two fairy tales that had been read to them earlier in
the week. The 10 children in the study included five highprogress readers and five low-progress readers. Each child
told two stories. Story 1 had been read to them; Story 2
had been read and also illustrated with pictures. An expert
listened to a recording of each child and assigned a score
for certain uses of language. Here are the data:7
STORY
Story 1 Story 2
Story 1 Story 2
Child Progress score score Child Progress score score
1
high
0.55
0.80
6
low
0.40
0.77
2
high
0.57
0.82
7
low
0.72
0.49
3
high
0.72
0.54
8
low
0.00
0.66
4
high
0.70
0.79
9
low
0.36
0.28
5
high
0.84
0.89
10
low
0.55
0.38
Is there evidence that the scores of high-progress readers
are higher than those of low-progress readers when they
retell a story they have heard without pictures (Story 1)?
(a) Make Normal quantile plots for the five responses in
each group. Are any major deviations from Normality
apparent?
(b) Carry out a two-sample t test. State hypotheses and
give the two sample means, the t statistic and its P-value,
and your conclusion.
(c) Carry out the Wilcoxon rank sum test. State
hypotheses and give the rank sum W for high-progress
readers, its P-value, and your conclusion. Do the t and
Wilcoxon tests lead you to different conclusions?
15.18 Repeat the analysis for Story 2. Repeat the
analysis of Exercise 15.17 for the scores when children
retell a story they have heard and seen illustrated with
pictures (Story 2).
STORY
15.19 Do the calculations by hand. Use the data in
Exercise 15.17 for children telling Story 2 to carry out by
hand the steps in the Wilcoxon rank sum test.
STORY
(a) Arrange the 10 observations in order and assign
ranks. There are no ties.
(b) Find the rank sum W for the five high-progress
readers. What are the mean and standard deviation of
W under the null hypothesis that low-progress and highprogress readers do not differ?
(c) Standardize W to obtain a z statistic. Do a Normal
probability calculation with the continuity correction to
obtain a one-sided P-value.
(d) The data for Story 1 contain tied observations. What
ranks would you assign to the 10 scores for Story 1?
15.2 The Wilcoxon Signed Rank Test
15-17
15.2 The Wilcoxon Signed Rank Test
When you complete
this section, you will
be able to:
●
●
●
●
●
●
●
LOOK BACK
matched pairs,
p. 182
EX A M P L E 15.9
For a set of paired sample data, take the differences between the pairs,
take the absolute values of the differences, and put the absolute values
of the differences in order, from smallest to largest, with an indication of
which absolute differences were from positive differences.
Compute the Wilcoxon signed rank statistic W1 from an ordered list of
differences with an indication of which absolute differences were from
positive differences.
State the null and alternative hypotheses that are used for the analysis of
data using the Wilcoxon signed rank test.
Using the sample size (that is, the number of pairs), find the mean and
the standard deviation of the sampling distribution of the W1 under the
null hypothesis.
Find the P-value for the Wilcoxon signed rank test using the Normal
approximation with the continuity correction.
Use computer output to determine the results of the Wilcoxon signed
rank test.
Test a hypothesis about the median of a distribution using the Wilcoxon
signed rank test.
We use the one-sample t procedures for inference about the mean of one
population or for inference about the mean difference in a matched pairs
setting. The matched pairs setting is more important because good studies are
generally comparative. We previously discussed the sign test for this setting.
We now meet a nonparametric test that uses ranks.
Storytelling and reading. A study of early childhood education asked
kindergarten students to retell two fairy tales that had been read to them
earlier in the week. The first (Story 1) had been read to them, and the second
(Story 2) had been read but also illustrated with pictures. An expert listened
to recordings of the children retelling each story and assigned a score for
certain uses of language. Higher scores are better. Here are the data for five
“low-progress’’ readers in a pilot study:8
Child
1
2
3
4
5
Story 2
Story 1
0.77
0.40
0.49
0.72
0.66
0.00
0.28
0.36
0.38
0.55
Difference
0.37
20.23
0.66
20.08
20.17
We wonder if illustrations improve how the children retell a story. We would
like to test the hypotheses
H0: Scores have the same distribution for both stories.
Ha: Scores are systematically higher for Story 2.
Chapter 15 Nonparametric Tests
STORY
Because this is a matched pairs design, we base our inference on the
differences. The matched pairs t test gives t 5 0.635 with one-sided P-value
P 5 0.280. Displays of the data (Figure 15.6) suggest some lack of Normality.
Therefore, we prefer to use a rank test.
2.0
0.6
0.4
Differences
15-18
0.2
1.0
0.0
–0.2
0.0
–3
–2
–1
0
1
Normal score
2
3
–0.4
–0.2
0.0
0.2
0.4
Differences
0.6
0.8
FIGURE 15.6 Normal quantile plot and histogram for the five differences in story scores,
Example 15.9.
absolute value
Positive differences in Example 15.9 indicate that the child performed
better telling Story 2. If scores are generally higher with illustrations, the
positive differences should be farther from zero in the positive direction than
the negative differences are in the negative direction. We, therefore, compare
the absolute values of the differences—that is, their magnitudes without a
sign. Here they are, with boldface indicating the positive values:
0.37
0.23
0.66
0.08
0.17
Arrange these in increasing order and assign ranks, keeping track of which
values were originally positive. Tied values receive the average of their ranks.
If there are cases with zero differences, discard them before ranking.
Absolute value
Rank
0.08
0.17
0.23
0.37
0.66
1
2
3
4
5
The test statistic is the sum of the ranks of the positive differences. (We could
equally well use the sum of the ranks of the negative differences.) This is the
Wilcoxon signed rank statistic. Its value here is W1 5 9.
The WILCOxOn SIgneD RAnK TeST fOR mATCheD PAIRS
Draw an SRS of size n from a population for a matched pairs study and
take the differences in responses within pairs. Rank the absolute values of
these differences. The sum W 1 of the ranks for the positive differences is
the Wilcoxon signed rank statistic. If the distribution of the responses is
15.2 The Wilcoxon Signed Rank Test
15-19
not affected by the different treatments within pairs and there are no ties,
then W 1 has mean
mW1 5
and standard deviation
sW1 5
ÃŽ
n(n 1 1)
4
n(n 1 1)(2n 1 1)
24
The Wilcoxon signed rank test rejects the hypothesis that there are
no systematic differences within pairs when the rank sum W 1 is far from
its mean.
UsE YoUR KnoWLEdGE
GEPARTS
OILFREE
15.20 The effect of altering a software parameter. Example 7.7 (page 419)
describes a study in which researchers studied sensor software used
in the measurement of complex machine parts. They were interested
in the possibility of improving productivity by unchecking one particular software option. They measured 51 parts both with and without the option. Use the data to investigate the effect of the option.
Formulate this question in terms of null and alternative hypotheses.
Then compute the differences and find the value of the Wilcoxon
signed rank statistic, W 1.
15.21 Oil-free deep fryer. Exercise 7.10 (page 422) discusses a study
where food experts compared food made with hot oil and their new
oil-free fryer. Five experts rated the taste of hash browns prepared
with each method. Here are the data:
Expert
Hot oil:
Oil free:
1
2
3
4
5
78
75
84
85
62
67
73
75
63
66
Examine whether or not there is a difference in taste of hash browns
prepared in hot oil or a oil-free fryer using the Wilcoxon signed rank
procedure.
EX A M P L E 15.10
STORY
Software output. In the storytelling study of Example 15.9, n 5 5. If the null
hypothesis (no systematic effect of illustrations) is true, the mean of the
signed rank statistic is
n(n 1 1) (5)(6)
mW1 5
5
5 7.5
4
4
Our observed value W 1 5 9 is only slightly larger than this mean. The onesided P-value is P(W 1 $ 9).
15-20
Chapter 15 Nonparametric Tests
Most statistical software uses the differences between the two variables,
with the signs, as input. Alternatively, the differences can sometimes be calcu­
lated within the software. Figure 15.7 displays the output from three statistical
programs. Each does things a little differently. The JMP output in Figure 15.7(a)
gives the one-sided (Prob . t in the Signed-Rank column) P 5 0.4063. The
Minitab output in Figure 15.7(b) gives P 5 0.394 for the one-sided Wilcoxon
signed rank test with n 5 5 observations and W1 5 9.0. The SPSS output in
Figure 15.7(c) gives P 5 0.686 for testing the two-sided alternative. We divide
this by 2, 0.686y2 5 0.343, to obtain the P-value for the one-sided alternative.
FIGURE 15.7 Output from
(a) JMP, (b) Minitab, and
(c) SPSS for the storytelling
data, Example 15.10.
(a) JMP
(b) Minitab
(c) SPSS
15.2 The Wilcoxon Signed Rank Test
15-21
Results reported in the three outputs lead us to the same qualitative
conclusion: the data do not provide evidence to support the idea that the Story 2
scores are higher than (or not equal to) the Story 1 scores. Different methods
and approximations are used to compute the P-values. With larger sample
sizes, we would not expect so much variation in the P-values. Note that the t
test results reported by JMP also gives the same conclusion, P 5 0.5599.
When the sampling distribution of a test statistic is symmetric, we can use
output that gives a P-value for a two-sided alternative to compute a P-value
for a one-sided alternative. Check that the effect is in the direction specified
by the one-sided alternative and then divide the P-value by 2.
The normal approximation
The distribution of the signed rank statistic when the null hypothesis (no difference) is true becomes approximately Normal as the sample size becomes
large. We can then use Normal probability calculations (with the continuity
correction) to obtain approximate P-values for W 1. Let’s see how this works
in the storytelling example, even though n 5 5 is certainly not a large sample.
EX A M P L E 15.11
The normal approximation. For n 5 5 observations, we saw in Example 15.10
that mW1 5 7.5. The standard deviation of W 1 under the null hypothesis is
sW1 5
5
ÃŽ
ÃŽ
n(n 1 1)(2n 1 1)
24
(5)(6)(11)
24
5 Ï13.75 5 3.708
The continuity correction calculates the P-value P(W1 $ 9) as P(W1 $ 8.5),
treating the value W1 5 9 as occupying the interval from 8.5 to 9.5. We find
the Normal approximation for the P-value by standardizing and using the
standard Normal table:
S
P(W1 $ 8.5) 5 P
W1 2 7.5 8.5 2 7.5
$
3.708
3.708
D
5 P(Z $ 0.27)
5 0.394
Despite the small sample size, the Normal approximation gives a result quite
close to the exact value P 5 0.4062.
UsE YoUR KnoWLEdGE
GEPARTS
OILFREE
15.22 Significance test for altering a software parameter. Refer to
Exercise 15.20 (page 15-19). Find mW1, sW1, and the Normal approximation for the P-value for the Wilcoxon signed rank test.
15.23 Significance test for the oil-free fryer. Refer to Exercise 15.21
(page 15-19). Find mW1, sW1, and the Normal approximation for the
P-value for the Wilcoxon signed rank test.
15-22
Chapter 15 Nonparametric Tests
Ties
Ties among the absolute differences are handled by assigning average ranks. A
tie within a pair creates a difference of zero. Because these are neither positive
nor negative, the usual procedure simply drops such pairs from the sample. This
amounts to dropping observations that favor the null hypothesis (no difference).
If there are many ties, the test may be biased in favor of the alternative hypothesis.
As in the case of the Wilcoxon rank sum, ties between nonzero absolute
differences complicate finding a P-value. Most software no longer provides an
exact distribution for the signed rank statistic W1, and the standard deviation
sW1 must be adjusted for the ties before we can use the Normal approximation. Software will do this. Here is an example.
G
GOLF
Player
1
2
3
4
5
6
7
8
9
10
11
12
Round 2
Round 1
Difference
94
89
5
85
90
25
89
87
2
89
95
26
81
86
25
76
81
25
107
102
5
89
105
216
87
83
4
91
88
3
88
91
23
80
79
1
Negative differences indicate better (lower) scores on the second round. We
see that six of the 12 golfers improved their scores. We would like to test the
hypotheses that in a large population of collegiate women golfers
H0: Scores have the same distribution in Rounds 1 and 2.
Ha: Scores are systematically lower or higher in Round 2.
A Normal quantile plot of the differences (Figure 15.8) shows some
irregularity and a low outlier. We will use the Wilcoxon signed rank test.
FIGURE 15.8 Normal quantile
plot of the difference in scores for
two rounds of a golf tournament,
Example 15.12.
Difference in golf score
5
0
–5
–10
–15
–3
–2
–1
0
1
Normal score
2
3
15.2 The Wilcoxon Signed Rank Test
15-23
The absolute values of the differences, with boldface indicating those that
are negative, are
5 5
6
2
5
5
5
16
4
3
3
1
Arrange these in increasing order and assign ranks, keeping track of which
values were originally negative. Tied values receive the average of their ranks.
Absolute value
1
2
3
3
4
5
5
5
5
5
6
16
Rank
1
2
3.5
3.5
5
8
8
8
8
8
11
12
The Wilcoxon signed rank statistic is the sum of the ranks of the negative
differences. (We could equally well use the sum of the ranks of the positive
differences.) Its value is W1 5 50.5.
EX A M P L E 15.13
GOLF
Software output. Here are the two-sided P-values for the Wilcoxon signed
rank test for the golf score data from three statistical programs:
Program
P-value
JMP
Minitab
SPSS
P 5 0.388
P 5 0.388
P 5 0.363
All lead to the same practical conclusion: these data give no evidence
for a systematic change in scores between rounds. However, the P-value
reported by SPSS differs a bit from the other two. The reason for the variation is that the programs use slightly different versions of the approximate
calculations needed when ties are present. The reported P-value depends on
which version is used.
For the golf data, the matched pairs t test gives t 5 0.9314 with P 5 0.3716.
Once again, t and W1 lead to the same conclusion.
Testing a hypothesis about the median of a distribution
Let’s take another look at how the Wilcoxon signed rank test works. We have
data for a pair of variables measured on the same individuals. The analysis
starts with the differences between the two variables. These differences are
what we input to statistical software.
At this stage, we can think of our data as consisting of a single variable. The
Wilcoxon signed rank test tests the null hypothesis that the population median
of the differences is zero. The alternative is that the median is not zero.
Think about starting the analysis at the stage where we have a single
variable and we are interested in testing a hypothesis about the median.
The null hypothesis does not necessarily need to be zero. If you can’t specify
a value other than zero with your software, you can simply subtract that
value from each observation before we start the analysis. Exercise 15.30 is
an example.
15-24
Chapter 15 Nonparametric Tests
sEcTIon 15.2 SUMMaRy
The Wilcoxon signed rank test applies to matched pairs studies. It tests
the null hypothesis that there is no systematic difference within pairs against
alternatives that assert a systematic difference (either one-sided or two-sided).
●
The test is based on the Wilcoxon signed rank statistic W1, which is the
sum of the ranks of the positive (or negative) differences when we rank the
absolute values of the differences. The matched pairs t test and the sign
test are alternative tests in this setting.
●
P-values for the signed rank test are based on the sampling distribution
of W1 when the null hypothesis is true. You can find P-values from special
tables, software, or a Normal approximation (with continuity correction).
●
sEcTIon 15.2 EXERCISES
For Exercises 15.20 and 15.21, see page 15-19; and for
Exercises 15.22 and 15.23, see page 15-21.
15.24 Fuel efficiency. Computers in some vehicles
calculate various quantities related to performance. One
of these is the fuel efficiency, or gas mileage, usually
expressed as miles per gallon (mpg). For one vehicle
equipped in this way, the mpg were recorded each time
the gas tank was filled, and the computer was then reset.
In addition to the computer calculating mpg, the driver
also recorded the mpg by dividing the miles driven by the
number of gallons at fill-up.9 The driver wants to
determine if these calculations are different.
MPG8
Fill-up
Computer
Driver
1
2
3
4
5
6
7
8
41.5
36.5
50.7
44.2
36.6
37.2
37.3
35.6
34.2
30.5
45.0
40.5
48.0
40.0
43.2
41.0
(a) For each of the eight fill-ups, find the difference
between the computer mpg and the driver mpg.
(b) Find the absolute values of the differences you found
in part (a).
(c) Order the absolute values of the differences that you
found in part (b) from smallest to largest, and underline
those absolute differences that came from positive
differences in part (a).
FIGURE 15.9 Minitab output
for the fuel efficiency data,
Exercise 15.29.
15.25 Find the mean and the standard deviation.
Refer to the previous exercise. Use the sample size to
find the mean and the standard deviation of the sampling
distribution of the Wilcoxon signed rank statistic W1
under the null hypothesis.
15.26 State the hypotheses. Refer to Exercise 15.24.
State the null hypothesis and the alternative hypothesis
for this setting.
15.27 Find the Wilcoxon signed rank statistic. Using
the work that you performed in the Exercise 15.25, find
the value of the Wilcoxon signed rank statistic W1.
15.28 Find the P-value. Refer to Exercises 15.24
through 15.27. Find the P-value for the Wilcoxon signed
rank statistic using the Normal approximation with the
continuity correction.
15.29 Read the output. The data in Exercise 15.24
are a subset of a larger set of data. Figure 15.9 gives
Minitab output for the analysis of this larger set of
data.
MPGCOMP
(a) How many pairs of observations are in the larger
data set?
(b) What is the value of the Wilcoxon signed rank
statistic W1?
(c) Report the P-value for the significance test and give a
brief statement of your conclusion.
15.2 The Wilcoxon Signed Rank Test
(d) The output reports an estimated median. Explain how
this statistic is calculated from the data.
15.30 Number of friends on Facebook. Facebook
recently examined all active Facebook users (more than
10% of the global population) and determined that the
average user has 190 friends. This distribution takes only
integer values, so it is certainly not Normal. It is also
highly skewed to the right, with a median of 100
friends.10 Consider the following SRS of n 5 30 Facebook
users from your large university.
FACEFR
594
31
85
60
325
165
417
52
288
120
63
65
132
537
57
176
27
81
516
368
257
319
11
24
734
12
297
8
190
148
(a) Use the Wilcoxon signed rank procedure to test the null
hypothesis that the median number of Facebook friends
for Facebook users at your university is 190. Describe the
steps in the procedure and summarize the results.
(b) Analyze these data using the t procedure and compare
the results with those that you found in part (a).
15.31 The full moon and behavior. Can the full moon
influence behavior? A study observed 15 nursing-home
patients with dementia. The number of incidents of
aggressive behavior was recorded each day for 12 weeks.
Call a day a “moon day’’ if it is the day of a full moon or
the day before or after a full moon. Here are the average
numbers of aggressive incidents for moon days and other
days for each subject:11
MOON
Patient
Moon days
Other days
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
3.33
3.67
2.67
3.33
3.33
3.67
4.67
2.67
6.00
4.33
3.33
0.67
1.33
0.33
2.00
0.27
0.59
0.32
0.19
1.26
0.11
0.30
0.40
1.59
0.60
0.65
0.69
1.26
0.23
0.38
The matched pairs t test gives P , 0.000015, and a
permutation test (Chapter 16) gives P 5 0.0001. Does the
Wilcoxon signed rank test, based on ranks rather than
means, agree that there is strong evidence that there are
more aggressive incidents on moon days?
15-25
15.32 Comparison of two energy drinks. Consider the
following study to compare two popular energy drinks.
For each subject, a coin was flipped to determine which
drink to rate first. Each drink was rated on a 0 to 100
scale, with 100 being the highest rating.
ENERDR6
Subject
Drink
1
2
3
4
5
6
A
B
43
45
83
78
66
64
87
79
78
71
67
62
(a) Inspect the data. Is there a tendency for these subjects
to prefer one of the two energy drinks?
(b) Use the matched pairs t test of Chapter 7 (page 419)
to compare the two drinks.
(c) Use the Wilcoxon signed rank test to compare the two
drinks.
(d) Write a summary of your results and explain why the
two tests give different conclusions.
15.33 Comparison of two energy drinks with an
additional subject. Refer to the previous exercise. Let’s
suppose that there is an additional subject who expresses
a strong preference for energy drink A. Here is the new
data set:
ENERDR7
Subject
Drink
1
2
3
4
5
6
7
A
B
43
45
83
78
66
64
87
79
78
71
67
62
90
60
Answer the questions given in the previous exercise.
Write a summary comparing this exercise with the
previous one. Include a discussion of what you have
learned regarding the choice of the t test versus the
Wilcoxon signed rank test for different sets of data.
15.34 A summer language institute for teachers. A
matched pairs study of the effect of a summer language
institute on the ability of teachers to comprehend spoken
French had these improvements in scores between the
pretest and the posttest for 20 teachers:
SUMLANG
2
6
0
6
6
3
6
0
3
1
3
1
2
0
3
2
26
3
6
3
(a) Show how you rank these data.
(b) Calculate the signed rank statistic W1. Be sure to
show your work. Remember that zeros are dropped
from the data before ranking so that n is the number of
nonzero differences within pairs.
15-26
Chapter 15 Nonparametric Tests
(c) Perform the significance test and write a short
summary of your conclusions.
15.35 Radon detectors. How accurate are radon detectors
of a type sold to homeowners? To answer this question,
university researchers placed 12 detectors in a chamber
that exposed them to 105 picocuries per liter (pCi/l) of
radon.12 The detector readings are as follows:
RADON
91.9
103.8
97.8
99.6
111.4
96.6
122.3
119.3
105.4
104.8
95.0
101.7
To do this, apply the Wilcoxon signed rank statistic to
the differences between the observations and 105. (This
is the one-sample version of the test.) What do you
conclude?
15.36 Vitamin C in wheat-soy blend. The U.S. Agency
for International Development provides large quantities
of wheat-soy blend (WSB) for development programs and
emergency relief in countries throughout the world. One
study collected data on the vitamin C content of five bags
of WSB at the factory and five months later in Haiti.13
Here are the data:
WSBVITC
We wonder if the median reading differs significantly
from the true value 105.
Sample
1
2
3
4
5
(a) Graph the data, and comment on skewness and
outliers. A rank test is appropriate.
Before
After
73
20
79
27
86
29
88
36
78
17
(b) We would like to test hypotheses about the median
reading from home radon detectors:
H0: median 5 105
Ha: median Þ 105
We want to know if vitamin C has been lost during
transportation and storage. Describe what the data show
about this question. Then use a rank test to see whether
there has been a significant loss.
15.3 The Kruskal-Wallis Test*
When you complete
this section, you will
be able to:
●
Describe the setting where the Kruskal-Wallis test can be used.
●
Specify the null and alternative hypotheses for the Kruskal-Wallis test.
●
Use computer output to determine the results of the Kruskal-Wallis
significance test.
We have now considered alternatives to the matched pairs and two-sample t
tests for comparing the magnitude of responses to two treatments. To compare
more than two treatments, we use one-way analysis of variance (ANOVA) if
the distributions of the responses to each treatment are at least roughly
Normal and have similar spreads. What can we do when these distribution
requirements are violated?
EX A M P L E 15.14
WEEDS
Weeds and corn yield. Lamb’s-quarter is a common weed that interferes
with the growth of corn. A researcher planted corn at the same rate in 16
small plots of ground and then randomly assigned the plots to four groups.
He weeded the plots by hand to allow a fixed number of lamb’s-quarter
plants to grow in each meter of corn row. These numbers were 0, 1, 3, and
*Because this test is an alternative to the one-way analysis of variance F test, you should first
read Chapter 12.
15.3 The Kruskal-Wallis Test
15-27
9 in the four groups of plots. No other weeds were allowed to grow, and all
plots received identical treatment except for the weeds. Here are the yields
of corn (bushels per acre) in each of the plots:14
Weeds
per
meter
Corn
yield
Weeds
per
meter
Corn
yield
Weeds
per
meter
Corn
yield
Weeds
per
meter
Corn
yield
0
0
0
0
166.7
172.2
165.0
176.9
1
1
1
1
166.2
157.3
166.7
161.1
3
3
3
3
158.6
176.4
153.1
156.0
9
9
9
9
162.8
142.4
162.7
162.4
The summary statistics are
LOOK BACK
rule for
examining
standard
deviations in
ANOVA,
p. 654
Weeds
n
Mean
Std. dev.
0
1
3
9
4
4
4
4
170.200
162.825
161.025
157.575
5.422
4.469
10.493
10.118
The sample standard deviations do not satisfy our rule of thumb that
for safe use of ANOVA the largest should not exceed twice the smallest. A
careful look at the data suggests that there may be some outliers. These are
the correct yields for their plots, so we have no justification for removing
them. Let’s use a rank test that is not sensitive to outliers.
Hypotheses and assumptions
The ANOVA F test concerns the means of the several populations represented
by our samples. For Example 15.14, the ANOVA hypotheses are
H0: m0 5 m1 5 m3 5 m9
Ha: not all four means are equal
Kruskal-Wallis test
Here, m0 is the mean yield in the population of all corn planted under the
conditions of the experiment with no weeds present. The data should consist
of four independent random samples from the four populations, all Normally
distributed with the same standard deviation.
The Kruskal-Wallis test is a rank test that can replace the one-way ANOVA
F test. The assumption about data production (independent random samples
from each population) remains important, but we can relax the Normality as­
sumption. We assume only that the response has a continuous distribution in
each population. The hypotheses tested in our example are
H0: Yields have the same distribution in all groups.
Ha: Yields are systematically higher in some groups than in others.
15-28
Chapter 15 Nonparametric Tests
If all the population distributions have the same shape (Normal or not), these
hypotheses take a simpler form. The null hypothesis is that all four popula­
tions have the same median yield. The alternative hypothesis is that not all
four median yields are equal.
The Kruskal-Wallis test
Recall the analysis of variance idea: we write the total observed variation in
the responses as the sum of two parts, one measuring variation among the
groups (sum of squares for groups, SSG) and one measuring variation among
individual observations within the same group (sum of squares for error,
SSE). The ANOVA F test rejects the null hypothesis that the mean responses
are equal in all groups if SSG is large relative to SSE.
The idea of the Kruskal-Wallis rank test is to rank all the responses
from all groups together and then apply one-way ANOVA to the ranks
rather than to the original observations. If there are N observations in
all, the ranks are always the whole numbers from 1 to N. The total sum
of squares for the ranks is, therefore, a fixed number no matter what the
data are. So we do not need to look at both SSG and SSE. Although it isn’t
obvious without some unpleasant algebra, the Kruskal-Wallis test statistic
is essentially just SSG for the ranks. We give the formula, but you should
rely on software to do the arithmetic. When SSG is large, that is evidence
that the groups differ.
The KRuSKAL-WALLIS TeST
Draw independent SRSs of sizes n1, n2, . . . , nI from I populations. There
are N observations in all. Rank all N observations and let Ri be the sum of
the ranks for the ith sample. The Kruskal-Wallis statistic is
H5
12
N(N 1 1)
o
R2i
2 3(N 1 1)
ni
When the sample sizes ni are large, there are no ties, and all I populations
have the same continuous distribution, H has approximately the chisquare distribution with I 2 1 degrees of freedom.
The Kruskal-Wallis test rejects the null hypothesis that all populations
have the same distribution when H is large.
We now see that, like the Wilcoxon rank sum statistic, the Kruskal-Wallis
statistic is based on the sums of the ranks for the groups we are comparing.
The more different these sums are, the stronger is the evidence that responses
are systematically larger in some groups than in others.
The exact distribution of the Kruskal-Wallis statistic H under the null
hypothesis depends on all the sample sizes n1 to nI, so tables are awkward.
The calculation of the exact distribution is so time-consuming for all
but the smallest problems that even most statistical software uses a chisquare approximation to obtain P-values. As usual, there is no usable exact
distribution when there are ties among the responses. We again assign average
ranks to tied observations.
15.3 The Kruskal-Wallis Test
EX A M P L E 15.15
WEEDS
15-29
Perform the significance test. In Example 15.14, there are I 5 4 populations
and N 5 16 observations. The sample sizes are equal, ni 5 4. The 16 observations arranged in increasing order, with their ranks, are
Yield
Rank
142.4
1
153.1
2
156.0
3
157.3
4
158.6
5
161.1
6
162.4
7
162.7
8
Yield
Rank
162.8
9
165.0
10
166.2
11
166.7 166.7
12.5
12.5
172.2
14
176.4
15
176.9
16
There is one pair of tied observations. The ranks for each of the four treatments are
Weeds
0
1
3
9
Ranks
10
4
2
1
12.5 14 16
6
11 12.5
3
5 15
7
8
9
Rank sums
52.5
33.5
25.0
25.0
The Kruskal-Wallis statistic is, therefore,
H5
5
12
N(N 1 1)
o
S
R2i
2 3(N 1 1)
ni
D
12
52.52 33.52 252 252
1
1
1
2 (3)(17)
(16)(17)
4
4
4
4
12
(1282.125) 2 51
272
5 5.56
5
Referring to the table of chi-square critical points (Table F) with df 5 3,
we find that the P-value lies in the interval 0.10 , P , 0.15. This small
experiment suggests that more weeds decrease yield but does not provide
convincing evidence that weeds have an effect.
LOOK BACK
matched pairs,
p. 182
EX A M P L E 15.16
ORGANIC
In Example 15.15, we concluded that the data did not provide evidence in
support of the idea that more weeds decreases the yield of corn. Here is an
example of a study where the analysis does provide evidence for us to reject
the null hypotheses. In this situation, we will include a multiple comparisons
procedure to tell us which pairs of levels of the factor differ significantly.
Organic foods and morals? Organic foods are often marketed with moral
terms such as honesty and purity. Is this just a marketing strategy, or is there
a conceptual link between organic food and morality? In one experiment,
62 undergraduates were randomly assigned to one of three food conditions (organic, comfort, and control).15 First, each participant was given a
packet of four food types from the assigned condition and told to rate the
15-30
Chapter 15 Nonparametric Tests
desirability of each food on a seven-point scale. Then, each was presented
with a list of six moral transgressions and asked to rate each on a sevenpoint scale ranging from 1 5 not at all morally wrong to 7 5 very morally
wrong.
Exercises 12.23 and 12.24 (page 669) lead you through the steps required
to analyze these data using a one-way ANOVA. Note that the data are dis­
crete with possible values of 1 through 7. We expect that our results should
be reasonable because the sample sizes are large enough for us to expect
that the sample means are approximately Normal. Let’s check the results
using the Kruskal-Wallis test.
The output from JMP is given in Figure 15.10. This software uses a
chi-square approximation to test the null hypothesis. We reject the null
hypothesis (X2 5 12.41, df 5 2, P 5 0.002) and conclude that scores (moral
judgments) depend upon the type of food shown to the students. The
multiple comparisons procedure indicates that, on the basis of the moral
transgression scale, we can distinguish organic from comfort and organic
from control, but control and comfort are not distinguishable.
FIGURE 15.10 Output from JMP for the Kruskal-Wallis test applied to the organic
food data, Example 15.16.
sEcTIon 15.3 SUMMaRy
● The Kruskal-Wallis test compares several populations on the basis of
independent random samples from each population. This is the one-way
analysis of variance (ANOVA) setting.
The null hypothesis for the Kruskal-Wallis test is that the distribution of the
response variable is the same in all the populations. The alternative hypothesis
is that responses are systematically larger in some populations than in others.
●
15.3 The Kruskal-Wallis Test
15-31
The Kruskal-Wallis statistic H can be viewed in two ways. It is essentially
the result of applying one-way ANOVA to the ranks of the observations. It is
also a comparison of the sums of the ranks for the several samples.
●
● When the sample sizes are large and the null hypothesis is true, H for
comparing I populations has approximately the chi-square distribution with
I 2 1 degrees of freedom. Software often uses this approximate distribution
to obtain P-values.
sEcTIon 15.3 EXERCISES
15.37 Number of Facebook friends. An experiment was
run to examine the relationship between the number of
Facebook friends and the user’s perceived social
attractiveness.16 A total of 134 undergraduate participants
were randomly assigned to observe one of five Facebook
profiles. Everything about the profile was the same except
the number of friends, which appeared on the profile as
102, 302, 502, 702, or 902. After viewing the profile, each
participant was asked to fill out a questionnaire on the
physical and social attractiveness of the profile user. Each
attractiveness score is an average of several seven-point
questionnaire items, ranging from 1 (strongly disagree) to
7 (strongly agree). In Example 12.3 (page 648), we
analyzed these data using a one-way ANOVA. Describe the
setting for this problem. Include the number of groups to
be compared, assumptions about independence, and the
distribution of the attractiveness scores.
FRIENDS
15.38 What are the hypotheses? Refer to the previous
exercise. What are the null hypothesis and the alternative
hypothesis? Explain why a nonparametric procedure
would be appropriate in this setting.
15.39 Read the output. Figure 15.11 gives JMP output
for the analysis of the data described in Exercise 15.37.
FIGURE 15.11 Output from JMP for the Kruskal-Wallis test applied to the Facebook data, Exercise 15.39.
15-32
Chapter 15 Nonparametric Tests
Describe the results given in the output and write
a short summary of your conclusions from the
analysis.
The loss of vitamin C over time is clear, but with only two
loaves of bread for each storage time, we wonder if the
differences among the groups are significant.
15.40 Do we experience emotions differently? In
Exercise 12.37 (page 686) you analyzed data related to
the way people from different cultures experience
emotions. The study subjects were 416 college students
from five different cultures. They were asked to record,
on a 1 (never) to 7 (always) scale, how much of the
time they typically felt eight specific emotions. These
were averaged to produce the global emotion score
for each participant. Analyze the data using the KruskalWallis test and write a summary of your analysis
and conclusions. Be sure to include your assumptions,
hypotheses, and the results of the significance
test.
EMOTION
(a) Use the Kruskal-Wallis test to assess significance and
then write a brief summary of what the data show.
15.41 Do isoflavones increase bone mineral density?
In Exercise 12.45 (page 688) you investigated the effects
of isoflavones from kudzu on bone mineral density
(BMD). The experiment randomized rats to three diets:
control, low isoflavones, and high isoflavones. Here are
the data:
KUDZU
BMD (g/cm2)
Treatment
Control
0.228 0.207 0.234 0.220 0.217 0.228 0.209 0.221
0.204 0.220 0.203 0.219 0.218 0.245 0.210
Low dose 0.211 0.220 0.211 0.233 0.219 0.233 0.226 0.228
0.216 0.225 0.200 0.208 0.198 0.208 0.203
(b) Because there are only two observations per group,
we suspect that the common chi-square approximation
to the distribution of the Kruskal-Wallis statistic may not
be accurate. The exact P-value (from SAS software) is
P 5 0.0011. Compare this with your P-value from part (a).
Is the difference large enough to affect your conclusion?
15.43 Jumping and strong bones. In Exercise 12.47
(page 688), you studied the effects of jumping on the
bones of rats. Ten rats were assigned to each of three
treatments: a 60-centimeter “high jump,’’ a 30-centimeter
“low jump,’’ and a control group with no jumping.18 Here
are the bone densities (in milligrams per cubic centimeter)
after eight weeks of 10 jumps per day:
JUMP
Bone density (mg/cm3)
Group
Control
611
653
621
600
614
554
593
603
593
569
Low jump
635
632
605
631
638
588
594
607
599
596
High jump
650
622
622
643
626
674
626
643
631
650
High dose 0.250 0.237 0.217 0.206 0.247 0.228 0.245 0.232
0.267 0.261 0.221 0.219 0.232 0.209 0.255
(a) The study was a randomized comparative experiment.
Outline the design of this experiment.
(a) Use the Kruskal-Wallis test to compare the three
diets.
(b) Make side-by-side stemplots for the three groups,
with the stems lined up for easy comparison. The
distributions are a bit irregular but not strongly nonNormal. We would usually use analysis of variance to
assess the significance of the difference in group means.
(b) How do these results compare with what you find
using the ANOVA F statistic?
15.42 Vitamins in bread. Does bread lose its vitamins
when stored? Here are data on the vitamin C content
(milligrams per 100 grams of flour) in bread baked from
the same recipe and stored for one, three, five, or seven
days.17 The 10 observations are from 10 different loaves
of bread.
BREAD
Condition
Immediately after baking
One day after baking
Three days after baking
Five days after baking
Seven days after baking
Vitamin C
47.62
40.45
21.25
13.18
8.51
(mg/100 g)
49.79
43.46
22.34
11.65
8.13
(c) Do the Kruskal-Wallis test. Explain the distinction
between the hypotheses tested by Kruskal-Wallis and
ANOVA.
(d) Write a brief statement of your findings. Include a
numerical comparison of the groups as well as your test
result.
15.44 Do poets die young? In Exercise 12.64 (page 693),
you analyzed the age at death for female writers. They
were classified as novelists, poets, and nonfiction writers.
The data are given in Table 12.1 (page 693).
POETS
(a) Use the Kruskal-Wallis test to compare the three
groups of female writers.
(b) Compare these results with what you find using the
ANOVA F statistic.
15-33
Chapter 15 Exercises
cHAPTER 15 EXERCISES
15.45 Plants and hummingbirds. Different
varieties of the tropical flower Heliconia are
fertilized by different species of hummingbirds. Over
time, the lengths of the flowers and the forms of the
hummingbirds’ beaks have evolved to match each
other. Here are data on the lengths in millimeters of
three varieties of these flowers on the island of
Dominica:19
HBIRDS
(c) Use a two-sample t test to compare the men and
women. Write a short summary of your results.
(d) Which procedure is more appropriate for these data?
Give reasons for your answer.
15.47 Response times for telephone repair calls.
A study examined the time required for the telephone
company Verizon to respond to repair calls from its
own customers and from customers of a CLEC, another
phone company that pays Verizon to use its local lines.
Here are the data, which are rounded to the nearest
hour:
TREPAIR
H. bihai
47.12
46.44
50.12
46.75
46.64
46.34
46.81
48.07
46.94
47.12
48.34
48.36
46.67
48.15
47.43
50.26
Verizon
H. caribaea red
41.90
39.78
39.16
38.79
42.01
40.57
37.40
38.23
41.93
39.63
38.20
38.87
43.09
42.18
38.07
37.78
41.47
40.66
38.10
38.01
41.69
37.87
37.97
36.03
36.66
35.45
35.68
1
1
1
1
1
1
1
1
H. caribaea yellow
36.78
38.13
36.03
37.02
37.10
34.57
36.52
35.17
34.63
36.11
36.82
Do a complete analysis that includes description of the
data and a rank test for the significance of the differences
in lengths among the three species.
15.46 Time spent studying. In Exercise 1.159
(page 76), you compared the time spent studying by
men and women. The students in a large first-year
college class were asked how many minutes they
studied on a typical weeknight. Here are the responses
of random samples of 30 women and 30 men from
the class:
STIME
170
120
150
200
120
90
120
180
120
150
60
240
180
120
180
180
120
180
Men
360
240
180
150
180
115
240
170
150
180
180
120
80
90
150
240
30
0
120
45
120
60
230
200
30
30
60
120
120
120
90
120
240
60
95
120
(a) Summarize the data numerically and graphically.
(b) Use the Wilcoxon rank sum test to compare the
men and women. Write a short summary of your
results.
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
3
3
3
5
6
15
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
5
5
5
2
2
2
2
2
2
2
2
2
3
3
3
4
5
8
CLEC
1
Women
1
1
1
1
1
1
1
1
200
75
300
30
150
180
1
5
5
5
1
5
(a) Does Verizon appear to give CLEC customers the
same level of service as its own customers? Compare the
data using graphs and descriptive measures and express
your opinion.
(b) We would like to see if times are significantly longer
for CLEC customers than for Verizon customers. Why
would you hesitate to use a t test for this purpose? Carry
out a rank test. What can you conclude?
(c) Explain why a nonparametric procedure is
appropriate in this setting.
Iron-deficiency anemia is the most common form of
malnutrition in developing countries. Does the type of
cooking pot affect the iron content of food? We have data
from a study in Ethiopia that measured the iron content
(milligrams per 100 grams of food) for three types of food
COOK
cooked in each of three types of pots:20
Type of pot
Aluminum
Clay
Iron
Iron content
1.77
2.27
5.27
Meat
2.36 1.96
1.28 2.48
5.17 4.06
2.14
2.68
4.22
15-34
Chapter 15 Nonparametric Tests
Aluminum
Clay
Iron
2.40
2.41
3.69
Legumes
2.17 2.41
2.43 2.57
3.43 3.84
Aluminum
Clay
Iron
1.03
1.55
2.45
Vegetables
1.53 1.07
0.79 1.68
2.99 2.80
2.34
2.48
3.72
1.30
1.82
2.92
the three types of food differ in iron content when all are
cooked in iron pots?
COOK
15.48 Cooking vegetables in different pots. Does the
vegetable dish vary in iron content when cooked in
aluminum, clay, and iron pots?
COOK
15.51 Multiple comparisons for plants and
hummingbirds. As in ANOVA, we often want to
carry out a multiple-comparisons procedure following a
Kruskal-Wallis test to tell us which groups differ
significantly.21 The Bonferroni method (page 679) is a
simple method: if we carry out k tests at fixed significance
level 0.05yk, the probability of any false rejection among
the k tests is always no greater than 0.05. That is, to get
overall significance level 0.05 for all of k comparisons, do
each individual comparison at the 0.05yk level. In
Exercise 15.45, you found a significant difference among
the lengths of three varieties of the flower Heliconia. Now
we will explore multiple comparisons.
HBIRDS
(a) What do the data appear to show? Check the
conditions for one-way ANOVA. Which requirements are
a bit dubious in this setting?
(a) Write down all the pairwise comparisons we can
make, for example, bihai versus caribaea red. There are
three possible pairwise comparisons.
(b) Instead of ANOVA, do a rank test. Summarize your
conclusions about the effect of pot material on the iron
content of the vegetable dish.
(b) Carry out three Wilcoxon rank sum tests, one for
each of the three pairs of flower varieties. What are the
three two-sided P-values?
15.49 Cooking meat and legumes in aluminum and
clay pots. There appears to be little difference between
the iron content of food cooked in aluminum pots and
food cooked in clay pots. Is there a significant difference
between the iron content of meat cooked in aluminum
and clay? Is the difference between aluminum and clay
significant for legumes? Use rank tests.
COOK
(c) For purposes of multiple comparisons, any of these
three tests is significant if its P-value is no greater than
0.05y3 = 0.0167. Which pairs differ significantly at the
overall 0.05 level?
Exercises 15.48, 15.49, and 15.50 use these data.
15.50 Iron in food cooked in iron pots. The data show
that food cooked in iron pots has the highest iron
content. They also suggest that the three types of food
differ in iron content. Is there significant evidence that
15.52 Multiple comparisons for cooking pots.
The previous exercise outlines how to use the
Wilcoxon rank sum test several times for multiple
comparisons with overall significance level 0.05
for all comparisons together. Apply this procedure
to the data used in each of Exercises 15.48, 15.49,
COOK
and 15.50.
cHAPTER 15 NOTES aND DaTa SOURCES
1. Cvent’s 2014 Top 100 Meeting Hotels, see
cvent.com/rfp/2014-top-100-us-meeting-hotels­
f002743686ec45749cf28b9ae19ec3df.aspx.
2. For purists, here is the precise definition: X1 is
stochastically larger than X2 if
P(X1 > a) $ P(X2 > a)
for all a, with strict inequality for at least one a.
The Wilcoxon rank sum test is effective against this
alternative in the sense that the power of the test
approaches 1 (that is, the test becomes more certain to
reject the null hypothesis) as the number of observations
increases.
3. Erin K. O’Loughlin et al., “Prevalence and correlates
of exergaming in youth,” Pediatrics, 130 (2012)
pp. 806–814.
4. From the PEW Internet & American Life website,
pewinternet.org/Reports/2013/Civic-Engagement.aspx.
5. From Matthias R. Mehl et al., “Are women really more
talkative than men?” Science, 317, no 5834 (2007), p. 82.
The raw data were provided by Matthias Mehl.
6. Data provided by Warren Page, New York City Technical
College, from a study done by John Hudesman.
7. Data provided by Susan Stadler, Purdue University.
8. Ibid.
9. The vehicle is a 2002 Toyota Prius owned by the third
author.
10. Statistics regarding Facebook usage can be
found at facebook.com/notes/facebook-data-team/
anatomy-of-facebook/10150388519243859.
Chapter 15 Notes and Data Sources
11. These data were collected as part of a larger study
of dementia patients conducted by Nancy Edwards,
School of Nursing, and Alan Beck, School of Veterinary
Medicine, Purdue University.
12. Data provided by Diana Schellenberg, Purdue
University School of Health Sciences.
13. These data are from “Results report on the
vitamin C pilot program,” prepared by SUSTAIN
(Sharing United States Technology to Aid in the
Improvement of Nutrition) for the U.S. Agency
for International Development. The report was used
by the Committee on International Nutrition of the
National Academy of Sciences/Institute of Medicine
to make recommendations on whether or not the
vitamin C content of food commodities used in
U.S. food aid programs should be increased. The
program was directed by Peter Ranum and Françoise
Chomé. The second author was a member of the
committee.
14. Data provided by Sam Phillips, Purdue University.
15. Kendall J. Eskine, “Wholesome foods and
wholesome morals? Organic foods reduce prosocial
behavior and harshen moral judgments,” Social
15-35
Psychological and Personality Science, 2012, doi:
10.1177/1948550612447114.
16. See item 10.
17. Data provided by Helen Park. See H. Park et al.,
“Fortifying bread with each of three antioxidants,” Cereal
Chemistry, 74 (1997), pp. 202–206.
18. Data provided by Jo Welch, Purdue University
Department of Foods and Nutrition.
19. We thank Ethan J. Temeles of Amherst College
for providing the data. His work is described in
Ethan J. Temeles and W. John Kress, “Adaptation in a
plant-hummingbird association,” Science, 300 (2003),
pp. 630–633.
20. Based on A. A. Adish et al., “Effect of consumption
of food cooked in iron pots on iron status and growth
of young children: A randomised trial,” The Lancet, 353
(1999), pp. 712–716.
21. For more details on multiple comparisons, see
M. Hollander and D. A. Wolfe, Nonparametric Statistical
Methods, 2nd ed., Wiley, 1999. This book is a useful
reference on applied aspects of nonparametric inference
in general.

Purchase answer to see full
attachment

We offer the bestcustom writing paper services. We have done this question before, we can also do it for you.

Why Choose Us

  • 100% non-plagiarized Papers
  • 24/7 /365 Service Available
  • Affordable Prices
  • Any Paper, Urgency, and Subject
  • Will complete your papers in 6 hours
  • On-time Delivery
  • Money-back and Privacy guarantees
  • Unlimited Amendments upon request
  • Satisfaction guarantee

How it Works

  • Click on the “Place Order” tab at the top menu or “Order Now” icon at the bottom and a new page will appear with an order form to be filled.
  • Fill in your paper’s requirements in the "PAPER DETAILS" section.
  • Fill in your paper’s academic level, deadline, and the required number of pages from the drop-down menus.
  • Click “CREATE ACCOUNT & SIGN IN” to enter your registration details and get an account with us for record-keeping and then, click on “PROCEED TO CHECKOUT” at the bottom of the page.
  • From there, the payment sections will show, follow the guided payment process and your order will be available for our writing team to work on it.