###
Description

Must reflect your understanding of the material

.

Please see the Document attach (hw4.docx) for information about this work, you Must go through and over it. You Must include the screen photos or drawings as directed when needed please provide DOC file or any other program prefered “Stata” file that you used to show your work

review the question for more information .

Complete and show the break down of work,

Instructions: Solve each of the following problems. When you use Stata to arrive at your

solution, copy the supporting work (plots, etc.) into your solutions. Do not copy all the output

into your solution. Include only what is needed.

Nonparametric Tests

Steven King/Icon SMI 258/Newscom

Introduction

The most commonly used methods for inference about the means of quanÂ

titative response variables assume that the distributions of sample means

are approximately Normal. This condition is satisfied when we have Normal

distributions in the population or populations from which we draw our data.

In practice, of course, no distribution is exactly Normal. Fortunately, our

usual methods for inference about population means (the one-sample and

two-sample t procedures and analysis of variance) are quite robust. That is,

the results of inference are not very sensitive to moderate lack of Normality,

especially when the samples are reasonably large. Some practical guidelines

for taking advantage of the robustness of these methods appear in Chapter 7

(page 423).

What can we do if plots suggest that the population distribution is clearly

not Normal, especially when we have only a few observations? This is not a

simple question. Here are the basic options:

1. If lack of Normality is due to outliers, it may be legitimate to remove the

outliers. An outlier is an observation that may not come from the same

population as the other observations. Equipment failure that produced

a bad measurement, for example, entitles you to remove the outlier and

analyze the remaining data.

15

15.1 The Wilcoxon Rank

Sum Test

15.2 The Wilcoxon

Signed Rank Test

15.3 The Kruskal-Wallis

Test

robustness

outlier

15-1

15-2

Chapter 15 Nonparametric Tests

LOOK BACK

transformations,

p. 91

other standard

distributions

2. Sometimes we can transform our data so that their distribution is more

nearly Normal. Transformations such as the logarithm that pull in the long

tail of right-skewed distributions are particularly helpful. Example 7.25

(page 470) illustrates use of the logarithm.

3. In some settings, other standard distributions replace the Normal

distributions as models for the overall pattern in the population. We

mentioned in Chapter 5 (page 305) that the Weibull and exponential

distributions are common models for the lifetimes in service of equipment

in statistical studies of reliability. Also, we studied the exponential

distributions (page 300) and the Poisson distributions (page 328) in Chapter 5.

There are inference procedures for the parameters of these distributions

that replace the t procedures when we use specific non-Normal models.

bootstrap methods

permutation tests

4. Modern bootstrap methods and permutation tests do not require

Normality or any other specific form of sampling distribution. Moreover,

you can base inference on resistant statistics such as the trimmed mean.

We recommend these methods unless the sample is so small that it may not

represent the population well. Chapter 16 gives a full discussion.

nonparametric methods

5. Finally, there are other nonparametric methods that do not require any

specific form for the distribution of the population. Unlike bootstrap and

permutation methods, common nonparametric methods do not make use

of the actual values of the observations. We have already discussed the

sign test (page 473) which works with counts of observations. This chapter

presents rank tests based on the rank (place in order) of each observation

in the set of all the data.

rank tests

The methods of this chapter are designed to replace the t tests and one-way

analysis of variance (ANOVA) when the Normality conditions for those tests

are not met. Figure 15.1 presents an outline of the standard tests (based on

Normal distributions) and the rank tests that compete with them.

The rank tests we will study concern the center of a population or popuÂ

lations. When a population has at least roughly a Normal distribution, we

describe its center by the mean. The â€œNormal testsâ€™â€™ in Figure 15.1 test hyÂ

potheses about population means. When distributions are strongly skewed, we

often prefer the median to the mean as a measure of center. In simplest form,

the hypotheses for rank tests just replace the mean by the median.

FIGURE 15.1 Comparison

of tests based on Normal

distributions with nonparametric

tests for similar settings.

Setting

Normal test

Rank test

One sample

One-sample t test

Section 7.1

Wilcoxon signed rank test

Section 15.2

Matched pairs

Apply one-sample test to differences within pairs

Two independent samples

Two-sample t test

Section 7.2

Wilcoxon rank sum test

Section 15.1

Several independent samples

One-way ANOVA F test

Chapter 12

Kruskal-Wallis test

Section 15.3

15 .1 The Wilcoxon Rank Sum Test

15-3

We devote a section of this chapter to each of the rank procedures. Section

15.1, which discusses the most common of these tests, also contains general

information about rank tests. The big idea of using ranks, kind of assumptions

required, the nature of the hypotheses tested, and the contrast between using

exact distributions for small samples and approximate distributions for larger

samples are common to all rank tests. Sections 15.2 and 15.3 more briefly

describe other rank tests.

15.1 The Wilcoxon Rank Sum Test

When you complete

this section, you will

be able to:

â—

â—

â—

â—

â—

â—

LOOK BACK

Find the rank transformation for a set of data.

Compute the Wilcoxon rank sum statistic for the comparison of two

populations.

State the null and alternative hypotheses that are used for the analysis

of data using the Wilcoxon rank sum test.

Use the two sample sizes to find the mean and the standard deviation

of the sampling distribution of the Wilcoxon rank sum statistic under the

null hypothesis.

Find the P-value for the Wilcoxon rank sum significance test using the

Normal approximation with the continuity correction.

For the Wilcoxon rank sum test, use computer output to determine the

results of the significance test.

Two-sample problems (see Section 7.2) are among the most common in

statistics. The most useful nonparametric significance test compares two

distributions. Here is an example of this setting.

two sample problems,

p. 433

EX A M P L E 15.1

HITS

Does the American League get more hits? In 1973, the American League

adopted the designated-hitter rule, which allows a substitute player to take

the place of the pitcher when it is the pitcherâ€™s turn to bat. Because pitchers

typically do not hit as well as other players, it was expected that the rule

would produce more hits and, therefore, more excitement for the fans. The

National League has not adopted this rule. Letâ€™s look at some data to see if

we can detect a difference in hits between the American League and the

National League. Here are the number of hits for eight games played on the

same spring day, four from each league.

League

American

National

Hits

21

19

18

7

24

11

20

13

The samples are too small to assess Normality adequately or to rely on the

robustness of the t test, although the first entry for the National League

suggests that there may be some skewness. Letâ€™s use a test that does not

require Normality.

15-4

Chapter 15 Nonparametric Tests

We recommend always using either the exact distribution (from software

or tables) or the continuity correction for the rank sum statistic W. The

exact distribution is safer for small samples. As Example 15.4 illustrates,

however, the Normal approximation with the continuity correction is often

adequate.

The rank transformation

We first rank all eight observations together. To do this, arrange them in order

from smallest to largest:

7

11

13

18 19

20

21

24

The boldface entries in the list are the hits for the American League. The idea

of rank tests is to look just at position in this ordered list. To do this, replace

each observation by its order, from 1 (smallest) to 8 (largest). These numbers

are the ranks:

Hits

7

Rank

1

11 13 18

19

20

21

24

4

5

6

7

8

2

3

RAnKS

To rank observations, first arrange them in order from smallest to largest.

The rank of each observation is its position in this ordered list, starting

with rank 1 for the smallest observation.

It would not be unusual in the baseball example to have sampled from a

day where more than one game had the same number of hits. We will discuss

how to handle ties later in this section.

Moving from the original observations to their ranks is a transformation

of the data, like moving from the observations to their logarithms. The rank

transformation retains only the ordering of the observations and makes

no other use of their numerical values. Working with ranks allows us to

dispense with specific assumptions about the shape of the distribution, such

as Normality.

UsE YoUR KnoWLEdGE

HOTELS

15.1 Numbers of rooms in top meeting hotels. Cvent ranks meeting

hotels in the United States and lists the top 100 with characteristics of

each hotel.1 We let Group A be the 25 top-ranked hotels and let Group B

be the hotels ranked 26 to 50. A simple random sample (SRS) of size 5

was taken from each group, and the number of rooms in each selected

hotel was recorded. Here are the data:

Group A

1628

1622

2019

1260

1996

Group B

1544

736

3933

1214

1096

Rank all the observations together and make a list of the ranks for

Group A and Group B.

15 .1 The Wilcoxon Rank Sum Test

HOTEL2

15-5

15.2 The effect of Caesars Palace on the result. Refer to the previous

exercise. Caesars Palace in Las Vegas, with 3933 rooms, was the third

hotel selected in Group B. Suppose, instead, a different hotel, with

1600 rooms, less than half as many, had been selected. Replace the

observation 3933 in Group B by 1600. Use the modified data to make

a list of the ranks for Groups A and B combined. What changes?

The Wilcoxon rank sum test

If the American League games tend to have more hits than the National

League, we expect the ranks of the American League games to be higher than

those for the National League games. Letâ€™s compare the sums of the ranks

from the two treatments:

League

Sum of ranks

American

National

25

11

These sums compare the hits of the American League with those of the

National League. In fact, the sum of the ranks from 1 to 8 is always equal to

36, so it is enough to report the sum for one of the two groups.

Because the sum of the ranks for the American League is 25, the ranks

for the National League must be 11 because 25 1 11 5 36. If there was no

difference between the leagues, we would expect the sum of the ranks for each

league to be 18 (half of the total sum of 36). Here are the facts we need in a

more general form that takes account of the fact that our two samples need

not be the same size.

The WILCOxOn RAnK Sum TeST

Draw an SRS of size n1 from one population and draw an independent

SRS of size n2 from a second population. There are N observations in all,

where N 5 n1 1 n2. Rank all N observations. The sum W of the ranks for

the first sample is the Wilcoxon rank sum statistic. If the two populaÂ

tions have the same continuous distribution, then W has mean

mW 5

and standard deviation

sW 5

n1(N 1 1)

2

ÃŽ

n1n2(N 1 1)

12

The Wilcoxon rank sum test rejects the hypothesis that the two populaÂ

tions have identical distributions when the rank sum W is far from its

mean.* This test is also called the Mann-Whitney test.

*This test was invented by Frank Wilcoxon (1892â€“1965) in 1945. Wilcoxon was a chemist who

encountered statistical problems in his work at the research laboratories of American Cyanamid

Company.

15-6

Chapter 15 Nonparametric Tests

For the baseball question of Example 15.1, we want to test

H0: the number of hits in an American League game is the

same as the number of hits in a National League game

against the one-sided alternative

Ha: the number of hits in American League games is greater

than the number of hits in National League games

Our test statistic is the rank sum W 5 25 for the American League games.

UsE YoUR KnoWLEdGE

HOTELS

HOTEL2

EX A M P L E 15.2

15.3 Hypotheses and test statistic for top hotels. Refer to Exercise 15.1.

State appropriate null and alternative hypotheses for this setting, and

calculate the value of W, the test statistic.

15.4 The effect of Caesars Palace on the test statistic. Refer to Exercise 15.2.

Using the altered data, state appropriate null and alternative hypotheses, and calculate the value of W, the test statistic.

Perform the significance test. In Example 15.1, n1 5 4, n2 5 4, and there are

N 5 8 observations in all. The sum of ranks for the American League games

has mean

mW 5

5

and standard deviation

sW 5

5

ÃŽ

ÃŽ

n1(N 1 1)

2

(4)(9)

5 18

2

n1n2(N 1 1)

12

(4)(4)(9)

5 Ã12 5 3.464

12

The observed sum of the ranks, W 5 25, is higher than the mean,

about 2 standard deviations higher, (25 2 18)y3.464. It appears that the

data support our idea that American League games have more hits than

National League games. The P-value for our one-sided alternative is

P(W $ 25), the probability that W is at least as large as the value for our

data when H0 is true.

To calculate the P-value P(W $ 25), we need to know the sampling

distribution of the rank sum W when the null hypothesis is true. This

distribution depends on the two sample sizes n1 and n2. Tables are, therefore,

a bit unwieldy, though you can find them in handbooks of statistical tables.

Most statistical software will give you P-values, as well as carry out the

ranking and calculate W. However, some software gives only approximate

P-values.

15 .1 The Wilcoxon Rank Sum Test

15-7

The normal approximation

The rank sum statistic W becomes approximately Normal as the two sample

sizes increase. We can then form yet another z statistic by standardizing W:

z5

5

LOOK BACK

continuity

correction,

p. 325

EX A M P L E 15.3

W 2 mW

sW

W 2 n1(N 1 1)y2

Ãn1n2(N 1 1)y12

Use standard Normal probability calculations to find P-values for this statistic.

Because W takes only whole-number values, the continuity correction

improves the accuracy of the approximation.

The continuity correction. The standardized rank sum statistic W in our

baseball example is

z5

W 2 mW 25 2 18

5

5 2.02

sW

3.464

We expect W to be larger when the alternative hypothesis is true, so the

approximate P-value is

P(Z $ 2.02) 5 0.0217

The continuity correction acts as if the whole number 25 occupies

the entire interval from 24.5 to 25.5. We calculate the P-value P(W $ 25) as

P(W $ 24.5) because the value 25 is included in the range whose probability

we want. Here is the calculation:

S

P(W $ 24.5) 5 P

W 2 mW 24.5 2 18

$

sW

3.464

D

5 P(Z $ 1.876)

5 0.0303

Software output. Figure 15.2 shows the output from JMP. The sum of

the ranks for the American League is given as W 5 25. The value for the

National League is W 5 11. Dividing these sums by the sample sizes, both 4

in this example, gives the means displayed in the Score Means column in the

output. JMP uses the continuity correction for its calculations. A z statistic

is given for each league. These have the same values but with opposite

signs. The P-value for the two-sided alternative is 0.0606. For the one-sided

alternative that American League games have more hits than the National

League games, we divide the P-value by 2 giving P 5 0.0303.

LOOK BACK

two-sample

t test,

p. 440

It is worth noting that the two-sample t test for the one-sided alternative

gives essentially the same result as the Wilcoxon test in Example 15.3 (t 5 2.95,

P 5 0.016).

15-8

Chapter 15 Nonparametric Tests

FIGURE 15.2 Output from

JMP for the baseball hit data,

Example 15.4.

UsE YoUR KnoWLEdGE

HOTELS

HOTEL2

15.5 The P-value for top hotels. Refer to Exercises 15.1 and 15.3 (pages

15-4 and 15-6). Find mW, sW, and the standardized rank sum statistic.

Then give an approximate P-value using the Normal approximation.

What do you conclude?

15.6 The effect of Caesars Palace on the P-value. Refer to Exercises 15.2

and 15.4 (pages 15-5 and 15-6). Perform the same analysis steps as in

Exercise 15.5 using the altered data.

M

FIGURE 15.3 Output from

(a) Minitab and (b) SPSS for the

data in Example 15.1. (a) Minitab

uses the Normal approximation

for the distribution of W. (b) SPSS

gives the exact value for the twosided alternative.

(a) Minitab

15 .1 The Wilcoxon Rank Sum Test

FIGURE 15.3 Continued

15-9

(b) SPSS

What hypotheses does Wilcoxon test?

Our null hypothesis is that the distribution of hits per game is the same in the

two leagues. Our alternative hypothesis is that there are more hits per game in

the American League than in the National League. If we are willing to assume

that hits are Normally distributed, or if we have reasonably large samples, we

use the two-sample t test for means. Our hypotheses then become

H0: m1 5 m2

Ha: m1 . m2

When the distributions may not be Normal, we might restate the hypothÂ

eses in terms of population medians rather than means:

H0: median1 5 median2

Ha: median1 . median2

The Wilcoxon rank sum test does test hypotheses about population medians, but

only if an additional assumption is met: both populations must have distributions

of the same shape and spread. That is, the density curve for hits per game in the

American League must look exactly like that for the National League except

that it may be shifted to the left or to the right. The Minitab output in Figure

15.3(a) states the hypotheses in terms of population medians and also gives a

confidence interval for the difference between the two population medians.

The same-shape assumption is too strict to be reasonable in practice. Recall

that our preferred version of the two-sample t test does not require that the two

populations have the same standard deviationâ€”that is, it does not make a sameshape assumption. Fortunately, the Wilcoxon test also applies in a much more

general and more useful setting. It tests hypotheses that we can state in words as

H0: The two distributions are the same.

Ha: One distribution has values that are systematically larger.

systematically larger

Here is a more exact statement of the systematically larger alternative

hypothesis. Take X1 to be hits in an American League game and X2 to be hits

in a National League game. These hits are random variables. That is, for each

game in the American League, the number of hits is a value of the variable X1.

15-10

Chapter 15 Nonparametric Tests

The probability that the number of hits is more than say, 15 is P(X1 . 15).

Similarly, P(X2 . 15) is the corresponding probability for the National League.

If the number of American League hits is â€œsystematically largerâ€™â€™ than the

number of National League hits, getting more hits than 15 should be more

likely in the American League. That is, we should have

P(X1 . 15) . P(X2 . 15)

The alternative hypothesis says that this inequality holds not just for 15 hits

but for any number of hits.2

This exact statement of the hypotheses we are testing is a bit awkward.

The hypotheses really are â€œnonparametricâ€™â€™ because they do not involve any

specific parameter such as the mean or median. If the two distributions do

have the same shape, the general hypotheses reduce to comparing medians.

Many texts and computer outputs state the hypotheses in terms of medians,

sometimes ignoring the same-shape requirement. We recommend that you

express the hypotheses in words rather than symbols. â€œThe number of American

League hits per game is systematically higher than the number of National

League hits per gameâ€™â€™ is easy to understand and is a good statement of the

effect that the Wilcoxon test looks for.

Ties

average ranks

EX A M P L E 15.6

The exact distribution for the Wilcoxon rank sum is obtained assuming that

all observations in both samples take different values. This allows us to rank

them all. In practice, however, we often find observations tied at the same

value. What shall we do? The usual practice is to assign all tied values the

average of the ranks they occupy. Here is an example:

Does the American League get more hits? In Example 15.1 (page 15-3), we

examined data that could be used to address this question. There were no

ties in the data but it would not be unlikely to see ties in data such as this.

Letâ€™s change the data so that the first entry for the National League is 20

instead of 19. Here is the resulting table:

League

Hits

American

National

21

20

18

7

24

11

20

13

Here are the ranked data with the American League hits displayed in boldface.

7

11

13

18 20

20

21

24

The boldface entries in the list are the hits for the American League. The

idea of rank tests is to look just at position in this ordered list. To do this,

replace each observation by its order, from 1 (smallest) to 8 (largest). These

numbers are the ranks:

Hits

7

Rank

1

11 13

2

3

18

20

20

21

24

4

5.5

5.5

7

8

Notice that the two entries with 20 hits now share the rank 5.5.

15 .1 The Wilcoxon Rank Sum Test

15-11

The exact distribution for the Wilcoxon rank sum W changes if the data

contain ties. Moreover, the standard deviation sW must be adjusted if ties are

present. The Normal approximation can be used after the standard deviation is

adjusted. Statistical software will detect ties, make the necessary adjustment,

and switch to the Normal approximation. In practice, software is required if

you want to use rank tests when the data contain tied values.

It is sometimes useful to use rank tests on data that have very many ties

because the scale of measurement has only a few values. Here is an example.

E

TV time (hours per day)

Exergamer

EXERG

UsE YoUR KnoWLEdGE

EXERG

LOOK BACK

chi-square

test,

p. 535

Yes

No

None

Some but less than two hours

two hours or more

6

48

160

616

115

255

15.7 Analyze as a two-way table. Analyze the exergaming data in Example 15.7 as a two-way table.

(a) Compute the percents in the three categories of TV watching for

the exergamers. Do the same for those who are not exergamers. Display the percents graphically and summarize the differences in the

two distributions.

(b) Perform the chi-square test for the counts in the two-way table.

Report the test statistic, the degrees of freedom, and the P-value.

Give a brief summary of what you can conclude from this significance test.

How do we approach the analysis of these data using the Wilcoxon test?

We start with the hypotheses. We have two distributions of TV viewing, one

for the exergamers and one for those who are not exergamers. The null hypothesis states that these two distributions are the same. The alternative hypothesis uses the fact that the responses are ordered from no TV to two hours

or more per day. It states that one of the exerciser groups watches more TV

than the other.

H0: The amount of time spent viewing TV is the same for students

who are exergamers and students who are not.

Ha: One of the two groups views more TV than the other.

15-12

Chapter 15 Nonparametric Tests

The alternative hypothesis is two-sided. Because the responses can take

only three values, there are very many ties. All 54 students who watch no TV

are tied. Similarly, all students in each of the other two columns of the table

are tied. The graphical display that you prepared in Exercise 15.7 suggests that

the exergamers watch more TV than those who are not exergamers. Is this difference statistically significant?

EX A M P L E 15.8

EXERG

Software output. Look at Figure 15.4, which gives JMP output for the

Wilcoxon test. The rank sum for the exergamers (using average ranks for

ties) is W 5 187,747.5 (Score Sum, rounded in the output). The expected

rank sum under the null hypothesis is 168,740.5 (Expected Score in the

output). So the exergamers have a higher rank sum than we would expect.

The Normal approximation test statistic is z 5 4.46794, and the two-sided

P-value is reported as P , 0.0001. There is very strong evidence of a difference. Exergamers watch more TV than the students who are not exergamers.

FIGURE 15.4 Output from JMP for the exergaming data, Example 15.8.

We can use our framework of â€œsystematically largerâ€™â€™ (page 15-9) to

summarize these data. For the exergamers, 98% watch some TV and 41%

watch two or more hours per day. The corresponding percents for the

students who are not exergamers are 95% and 28%. The difference is

statistically significant (z 5 4.68, P , 0.0001.)

In our discussion of TV viewing and exergaming, we have expressed

results in terms of the amount of TV watched. In fact, we do not have the

actual hours of TV watched by each student in the study. Only data with the

hours classified into three groups are available. Many government surveys

summarize quantitative data categorized into ranges of values. When

summarizing the analysis of data, it is very important to explain clearly how the

data are recorded. In this setting, we have chosen to use phrases such as

â€œwatch more TVâ€™â€™ because they express the findings based on the data

available.

15 .1 The Wilcoxon Rank Sum Test

15-13

Note that the two-sample t test would not be appropriate in this setting. If

we coded the TV-watching categories as 1, 2, and 3, the average of these coded

values would not be meaningful.

On the other hand, we frequently encounter variables measured in scales

such as â€œstrongly agree,â€™â€™ â€œagree,â€™â€™ â€œneither agree nor disagree,â€™â€™ â€œdisagree,â€™â€™ and

â€œstrongly disagree.â€™â€™ In these circumstances, many would code the responses

with the integers 1 to 5 and then use standard methods such as a t test or

ANOVA. Whether to do this or not is a matter of judgment. Rank tests avoid

the dilemma because they use only the order of the responses, not their actual

values. Some statisticians use t procedures when there is not a fully meaningful

scale of measurement, but others avoid them.

Rank, t, and permutation tests

The two-sample t procedures are the most common method for comparing

the centers of two populations based on random samples from each. The

Wilcoxon rank sum test is a competing procedure that does not start from the

condition that the populations have Normal distributions. Permutation tests

(Chapter 16) also avoid the need for Normality. Tests based on Normality,

rank tests, and permutation tests apply in many other settings as well. How

do these three approaches compare in general?

First, letâ€™s consider rank tests versus traditional tests based on Normal

distributions. Both are available in almost all statistical software.

â— Moving from the actual data values to their ranks allows us to find an

exact sampling distribution for rank statistics such as the Wilcoxon rank

sum W when the null hypothesis is true. (Most software will do this only

if there are no ties and if the samples are quite small.) When our samples

are small, are truly random samples from the populations, and show nonNormal distributions of the same shape, the Wilcoxon test is more reliable

than the two-sample t test. In practice, the robustness of t procedures

implies that we rarely encounter data that require nonparametric procedures

to obtain reasonably accurate P-values. The t and W tests gave very similar

results for the baseball hit data in Example 15.1, but we would not use a t

procedure for the exergame data in Example 15.7.

Normal tests compare means and are accompanied by simple confidence

intervals for means or differences between means. When we use rank tests

to compare medians, we can also give confidence intervals for medians.

However, the usefulness of rank tests is clearest in settings when they do

not simply compare mediansâ€”see the discussion â€œWhat Hypotheses Does

Wilcoxon Test?â€™â€™ (page 15-9). Rank methods focus on significance tests, not

confidence intervals.

â—

â— Inference based on ranks is largely restricted to simple settings. Normal

inference extends to methods for use with complex experimental designs

and multiple regression, but nonparametric tests do not. We stress Normal

inference in part because it leads to more advanced statistics.

If you read Chapter 16 and use software that makes permutation tests

available to you, you will also want to compare rank tests with resampling

methods.

15-14

Chapter 15 Nonparametric Tests

Both rank and permutation tests are nonparametric. That is, they require

no assumptions about the shape of the population distribution. A twosample permutation test has the same null hypothesis as the Wilcoxon rank

sum test: that the two population distributions are identical. Calculation

of the sampling distribution under the null hypothesis is similar for both

tests but is simpler for rank tests because it depends only on the sizes of the

samples. As a result, software often gives exact P-values for rank tests but

not for permutation tests.

â—

LOOK BACK

trimmed

mean,

p. 51

â— Permutation tests have the advantage of flexibility. They allow wide

choice of the statistic used to compare two samples, an advantage over

both the t and Wilcoxon tests. In fact, we could apply the permutation test

method to sample means (imitating t) or to rank sums (imitating Wilcoxon),

as well as to other statistics such as the trimmed mean that we used in

Exercise 1.91 (page 51). Permutation tests are not available in some settings,

such as testing hypotheses about a single population, though bootstrap

confidence intervals do allow resampling tests in these settings. Permutation

tests are available for multiple regression and some other quite elaborate

settings.

An important advantage of resampling methods over both Normal

and rank procedures is that we can get bootstrap confidence intervals

for the parameter corresponding to whatever statistic we choose for

the permutation test. If the samples are very small, however, bootstrap

confidence intervals may be unreliable because the samples donâ€™t

represent the population well enough to provide a good basis for

bootstrapping.

â—

In general, both Normal distribution methods and resampling methods

are more useful than rank tests. If you are familiar with resampling, we recomÂ

mend rank tests only for very small samples that are clearly non-Normal and,

even then, only if your software gives exact P-values for rank tests but not for

permutation tests.

sEcTIon 15.1 SUMMaRy

â— Nonparametric tests do not require any specific form for the distribution

of the population from which our samples come.

Rank tests are nonparametric tests based on the ranks of observations,

their positions in the list ordered from smallest (rank 1) to largest. Tied

observations receive the average of their ranks.

â—

â— The Wilcoxon rank sum test compares two distributions to assess

whether one has systematically larger values than the other. The Wilcoxon

test is based on the Wilcoxon rank sum statistic W, which is the sum

of the ranks of one of the samples. The Wilcoxon test can replace the

two-sample t test.

P-values for the Wilcoxon test are based on the sampling distribution

of the rank sum statistic W when the null hypothesis (no difference in

distributions) is true. You can find P-values from special tables, software,

or a Normal approximation (with continuity correction).

â—

15-15

15 .1 The Wilcoxon Rank Sum Test

sEcTIon 15.1 EXERCISES

For Exercises 15.1 and 15.2, see pages 15-4, 15-5;

for Exercises 15.3 and 15.4, see page 15-6; for

Exercises 15.5 and 15.6, see page 15-8; and

for Exercise 15.7, see page 15-11.

15.8 Time spent studying. A sample of 11 students in a

large first-year college class were interviewed and were

asked how much time they spent studying on a typical

week night. Here are the responses, in minutes, for the

five female students in the sample:

STUDYT

110

70

190

120

310

Find the ranks for all 11 students and report the ranks

for the five female students.

15.9 Find the rank sum statistic. Refer to the previous

exercise. Here are the data for six men in the class:

STUDYT

80

80

30

130

0 200

Compute the value of the Wilcoxon statistic. Take the

first sample to be the women.

15.10 State the hypotheses. Refer to the previous

exercise. State appropriate null and alternative

hypotheses for this setting.

15.11 Find the mean and standard deviation of the

distribution of the statistic. The statistic W that you

calculated in Exercise 15.10 is a random variable with

a sampling distribution. What are the mean and the

standard deviation of this sampling distribution under

the null hypothesis?

15.12 Find the P-value. Refer to Exercises 15.8 through

15.11. Find the P-value using the Normal approximation

with the continuity correction and interpret the result of

the significance test.

15.13 Is civic engagement related to education? A

Pew Internet Poll of adults aged 18 and older examined

FIGURE 15.5 Output from

JMP for the civic participation

data, Exercise 15.13.

factors related to civic engagement. Participants were

asked whether or not they had participated in a civic

group or activity in the preceding 12 months. One

analysis looked at the relationship between this variable

and education. Here are the data:4

CIVIC

Education

Civic

participation

Civic

No civic

No high

school

High

school

Some

college

College

76

294

295

428

155

424

273

298

Figure 15.5 gives the JMP output for analyzing these data

using the Wilcoxon rank sum procedure.

(a) Describe the relevant parts of the output and write a

short summary of the results.

(b) Apply the â€œSystematically largerâ€™â€™ framework that we

used in Example 15.8 (page 15-12) to these data. Is this

a useful way to describe the results of this analysis? Give

reasons for your answer.

15.14 Do women talk more? Conventional wisdom

suggests that women are more talkative than men. One

study designed to examine this stereotype collected data

on the speech of 10 men and 10 women in the United

States.5 The variable recorded is the number of words

per day. Here are the data:

TALK10

Men

23,871 5,180 9,951 12,460

17,155 10,344 9,811 12,387

29,920 21,791

Women

10,592 24,608 13,739 22,376

9,351 7,694 16,812 21,066

32,291 12,320

(a) Summarize the data for the two groups using numerical

and graphical methods. Describe the two distributions.

15-16

Chapter 15 Nonparametric Tests

(b) Compare the words per day spoken by the men

with the words per day spoken by the women using the

Wilcoxon rank sum test. Summarize your results and

conclusion in a short paragraph.

15.15 More data for women and men talking. The

data in the previous exercise were a sample of the data

collected in a larger study of 42 men and 37 women. Use

the larger data set to answer the questions in the

previous exercise. Discuss the advisability of using the

Wilcoxon test versus the t test for this exercise and for

the previous one.

TALK

15.16 Learning math through subliminal messages.

A â€œsubliminalâ€™â€™ message is below our threshold of

awareness but may, nonetheless, influence us. Can

subliminal messages help students learn math? A group

of students who had failed the mathematics part of the

City University of New York Skills Assessment Test

agreed to participate in a study to find out. All received

a daily subliminal message, flashed on a screen too

rapidly to be consciously read. The treatment group of

10 students was exposed to â€œEach day I am getting

better in math.â€™â€™ The control group of eight students was

exposed to a neutral message, â€œPeople are walking on

the street.â€™â€™ All students participated in a summer

program designed to raise their math skills, and all took

the assessment test again at the end of the program.

Here are data on the subjectsâ€™ scores before and after the

program:6

SUBLIM

Treatment group

Control group

Pretest

Posttest

Pretest

Posttest

18

24

18

29

18

25

24

29

21

33

20

24

18

29

18

26

18

33

24

38

20

36

22

27

23

34

15

22

23

36

19

31

21

34

17

27

(a) The study design was a randomized comparative

experiment. Outline this design.

(b) Compare the gain in scores in the two groups using a

graph and numerical descriptions. Does it appear that the

treatment groupâ€™s scores rose more than the scores for

the control group?

(c) Apply the Wilcoxon rank sum test to the gain in

scores. Note that there are some ties. What do you

conclude?

15.17 Storytelling and the use of language. A study of

early childhood education asked kindergarten students to

retell two fairy tales that had been read to them earlier in

the week. The 10 children in the study included five highprogress readers and five low-progress readers. Each child

told two stories. Story 1 had been read to them; Story 2

had been read and also illustrated with pictures. An expert

listened to a recording of each child and assigned a score

for certain uses of language. Here are the data:7

STORY

Story 1 Story 2

Story 1 Story 2

Child Progress score score Child Progress score score

1

high

0.55

0.80

6

low

0.40

0.77

2

high

0.57

0.82

7

low

0.72

0.49

3

high

0.72

0.54

8

low

0.00

0.66

4

high

0.70

0.79

9

low

0.36

0.28

5

high

0.84

0.89

10

low

0.55

0.38

Is there evidence that the scores of high-progress readers

are higher than those of low-progress readers when they

retell a story they have heard without pictures (Story 1)?

(a) Make Normal quantile plots for the five responses in

each group. Are any major deviations from Normality

apparent?

(b) Carry out a two-sample t test. State hypotheses and

give the two sample means, the t statistic and its P-value,

and your conclusion.

(c) Carry out the Wilcoxon rank sum test. State

hypotheses and give the rank sum W for high-progress

readers, its P-value, and your conclusion. Do the t and

Wilcoxon tests lead you to different conclusions?

15.18 Repeat the analysis for Story 2. Repeat the

analysis of Exercise 15.17 for the scores when children

retell a story they have heard and seen illustrated with

pictures (Story 2).

STORY

15.19 Do the calculations by hand. Use the data in

Exercise 15.17 for children telling Story 2 to carry out by

hand the steps in the Wilcoxon rank sum test.

STORY

(a) Arrange the 10 observations in order and assign

ranks. There are no ties.

(b) Find the rank sum W for the five high-progress

readers. What are the mean and standard deviation of

W under the null hypothesis that low-progress and highprogress readers do not differ?

(c) Standardize W to obtain a z statistic. Do a Normal

probability calculation with the continuity correction to

obtain a one-sided P-value.

(d) The data for Story 1 contain tied observations. What

ranks would you assign to the 10 scores for Story 1?

15.2 The Wilcoxon Signed Rank Test

15-17

15.2 The Wilcoxon Signed Rank Test

When you complete

this section, you will

be able to:

â—

â—

â—

â—

â—

â—

â—

LOOK BACK

matched pairs,

p. 182

EX A M P L E 15.9

For a set of paired sample data, take the differences between the pairs,

take the absolute values of the differences, and put the absolute values

of the differences in order, from smallest to largest, with an indication of

which absolute differences were from positive differences.

Compute the Wilcoxon signed rank statistic W1 from an ordered list of

differences with an indication of which absolute differences were from

positive differences.

State the null and alternative hypotheses that are used for the analysis of

data using the Wilcoxon signed rank test.

Using the sample size (that is, the number of pairs), find the mean and

the standard deviation of the sampling distribution of the W1 under the

null hypothesis.

Find the P-value for the Wilcoxon signed rank test using the Normal

approximation with the continuity correction.

Use computer output to determine the results of the Wilcoxon signed

rank test.

Test a hypothesis about the median of a distribution using the Wilcoxon

signed rank test.

We use the one-sample t procedures for inference about the mean of one

population or for inference about the mean difference in a matched pairs

setting. The matched pairs setting is more important because good studies are

generally comparative. We previously discussed the sign test for this setting.

We now meet a nonparametric test that uses ranks.

Storytelling and reading. A study of early childhood education asked

kindergarten students to retell two fairy tales that had been read to them

earlier in the week. The first (Story 1) had been read to them, and the second

(Story 2) had been read but also illustrated with pictures. An expert listened

to recordings of the children retelling each story and assigned a score for

certain uses of language. Higher scores are better. Here are the data for five

â€œlow-progressâ€™â€™ readers in a pilot study:8

Child

1

2

3

4

5

Story 2

Story 1

0.77

0.40

0.49

0.72

0.66

0.00

0.28

0.36

0.38

0.55

Difference

0.37

20.23

0.66

20.08

20.17

We wonder if illustrations improve how the children retell a story. We would

like to test the hypotheses

H0: Scores have the same distribution for both stories.

Ha: Scores are systematically higher for Story 2.

Chapter 15 Nonparametric Tests

STORY

Because this is a matched pairs design, we base our inference on the

differences. The matched pairs t test gives t 5 0.635 with one-sided P-value

P 5 0.280. Displays of the data (Figure 15.6) suggest some lack of Normality.

Therefore, we prefer to use a rank test.

2.0

0.6

0.4

Differences

15-18

0.2

1.0

0.0

â€“0.2

0.0

â€“3

â€“2

â€“1

0

1

Normal score

2

3

â€“0.4

â€“0.2

0.0

0.2

0.4

Differences

0.6

0.8

FIGURE 15.6 Normal quantile plot and histogram for the five differences in story scores,

Example 15.9.

absolute value

Positive differences in Example 15.9 indicate that the child performed

better telling Story 2. If scores are generally higher with illustrations, the

positive differences should be farther from zero in the positive direction than

the negative differences are in the negative direction. We, therefore, compare

the absolute values of the differencesâ€”that is, their magnitudes without a

sign. Here they are, with boldface indicating the positive values:

0.37

0.23

0.66

0.08

0.17

Arrange these in increasing order and assign ranks, keeping track of which

values were originally positive. Tied values receive the average of their ranks.

If there are cases with zero differences, discard them before ranking.

Absolute value

Rank

0.08

0.17

0.23

0.37

0.66

1

2

3

4

5

The test statistic is the sum of the ranks of the positive differences. (We could

equally well use the sum of the ranks of the negative differences.) This is the

Wilcoxon signed rank statistic. Its value here is W1 5 9.

The WILCOxOn SIgneD RAnK TeST fOR mATCheD PAIRS

Draw an SRS of size n from a population for a matched pairs study and

take the differences in responses within pairs. Rank the absolute values of

these differences. The sum W 1 of the ranks for the positive differences is

the Wilcoxon signed rank statistic. If the distribution of the responses is

15.2 The Wilcoxon Signed Rank Test

15-19

not affected by the different treatments within pairs and there are no ties,

then W 1 has mean

mW1 5

and standard deviation

sW1 5

ÃŽ

n(n 1 1)

4

n(n 1 1)(2n 1 1)

24

The Wilcoxon signed rank test rejects the hypothesis that there are

no systematic differences within pairs when the rank sum W 1 is far from

its mean.

UsE YoUR KnoWLEdGE

GEPARTS

OILFREE

15.20 The effect of altering a software parameter. Example 7.7 (page 419)

describes a study in which researchers studied sensor software used

in the measurement of complex machine parts. They were interested

in the possibility of improving productivity by unchecking one particular software option. They measured 51 parts both with and without the option. Use the data to investigate the effect of the option.

Formulate this question in terms of null and alternative hypotheses.

Then compute the differences and find the value of the Wilcoxon

signed rank statistic, W 1.

15.21 Oil-free deep fryer. Exercise 7.10 (page 422) discusses a study

where food experts compared food made with hot oil and their new

oil-free fryer. Five experts rated the taste of hash browns prepared

with each method. Here are the data:

Expert

Hot oil:

Oil free:

1

2

3

4

5

78

75

84

85

62

67

73

75

63

66

Examine whether or not there is a difference in taste of hash browns

prepared in hot oil or a oil-free fryer using the Wilcoxon signed rank

procedure.

EX A M P L E 15.10

STORY

Software output. In the storytelling study of Example 15.9, n 5 5. If the null

hypothesis (no systematic effect of illustrations) is true, the mean of the

signed rank statistic is

n(n 1 1) (5)(6)

mW1 5

5

5 7.5

4

4

Our observed value W 1 5 9 is only slightly larger than this mean. The onesided P-value is P(W 1 $ 9).

15-20

Chapter 15 Nonparametric Tests

Most statistical software uses the differences between the two variables,

with the signs, as input. Alternatively, the differences can sometimes be calcuÂ

lated within the software. Figure 15.7 displays the output from three statistical

programs. Each does things a little differently. The JMP output in Figure 15.7(a)

gives the one-sided (Prob . t in the Signed-Rank column) P 5 0.4063. The

Minitab output in Figure 15.7(b) gives P 5 0.394 for the one-sided Wilcoxon

signed rank test with n 5 5 observations and W1 5 9.0. The SPSS output in

Figure 15.7(c) gives P 5 0.686 for testing the two-sided alternative. We divide

this by 2, 0.686y2 5 0.343, to obtain the P-value for the one-sided alternative.

FIGURE 15.7 Output from

(a) JMP, (b) Minitab, and

(c) SPSS for the storytelling

data, Example 15.10.

(a) JMP

(b) Minitab

(c) SPSS

15.2 The Wilcoxon Signed Rank Test

15-21

Results reported in the three outputs lead us to the same qualitative

conclusion: the data do not provide evidence to support the idea that the Story 2

scores are higher than (or not equal to) the Story 1 scores. Different methods

and approximations are used to compute the P-values. With larger sample

sizes, we would not expect so much variation in the P-values. Note that the t

test results reported by JMP also gives the same conclusion, P 5 0.5599.

When the sampling distribution of a test statistic is symmetric, we can use

output that gives a P-value for a two-sided alternative to compute a P-value

for a one-sided alternative. Check that the effect is in the direction specified

by the one-sided alternative and then divide the P-value by 2.

The normal approximation

The distribution of the signed rank statistic when the null hypothesis (no difference) is true becomes approximately Normal as the sample size becomes

large. We can then use Normal probability calculations (with the continuity

correction) to obtain approximate P-values for W 1. Letâ€™s see how this works

in the storytelling example, even though n 5 5 is certainly not a large sample.

EX A M P L E 15.11

The normal approximation. For n 5 5 observations, we saw in Example 15.10

that mW1 5 7.5. The standard deviation of W 1 under the null hypothesis is

sW1 5

5

ÃŽ

ÃŽ

n(n 1 1)(2n 1 1)

24

(5)(6)(11)

24

5 Ã13.75 5 3.708

The continuity correction calculates the P-value P(W1 $ 9) as P(W1 $ 8.5),

treating the value W1 5 9 as occupying the interval from 8.5 to 9.5. We find

the Normal approximation for the P-value by standardizing and using the

standard Normal table:

S

P(W1 $ 8.5) 5 P

W1 2 7.5 8.5 2 7.5

$

3.708

3.708

D

5 P(Z $ 0.27)

5 0.394

Despite the small sample size, the Normal approximation gives a result quite

close to the exact value P 5 0.4062.

UsE YoUR KnoWLEdGE

GEPARTS

OILFREE

15.22 Significance test for altering a software parameter. Refer to

Exercise 15.20 (page 15-19). Find mW1, sW1, and the Normal approximation for the P-value for the Wilcoxon signed rank test.

15.23 Significance test for the oil-free fryer. Refer to Exercise 15.21

(page 15-19). Find mW1, sW1, and the Normal approximation for the

P-value for the Wilcoxon signed rank test.

15-22

Chapter 15 Nonparametric Tests

Ties

Ties among the absolute differences are handled by assigning average ranks. A

tie within a pair creates a difference of zero. Because these are neither positive

nor negative, the usual procedure simply drops such pairs from the sample. This

amounts to dropping observations that favor the null hypothesis (no difference).

If there are many ties, the test may be biased in favor of the alternative hypothesis.

As in the case of the Wilcoxon rank sum, ties between nonzero absolute

differences complicate finding a P-value. Most software no longer provides an

exact distribution for the signed rank statistic W1, and the standard deviation

sW1 must be adjusted for the ties before we can use the Normal approximation. Software will do this. Here is an example.

G

GOLF

Player

1

2

3

4

5

6

7

8

9

10

11

12

Round 2

Round 1

Difference

94

89

5

85

90

25

89

87

2

89

95

26

81

86

25

76

81

25

107

102

5

89

105

216

87

83

4

91

88

3

88

91

23

80

79

1

Negative differences indicate better (lower) scores on the second round. We

see that six of the 12 golfers improved their scores. We would like to test the

hypotheses that in a large population of collegiate women golfers

H0: Scores have the same distribution in Rounds 1 and 2.

Ha: Scores are systematically lower or higher in Round 2.

A Normal quantile plot of the differences (Figure 15.8) shows some

irregularity and a low outlier. We will use the Wilcoxon signed rank test.

FIGURE 15.8 Normal quantile

plot of the difference in scores for

two rounds of a golf tournament,

Example 15.12.

Difference in golf score

5

0

â€“5

â€“10

â€“15

â€“3

â€“2

â€“1

0

1

Normal score

2

3

15.2 The Wilcoxon Signed Rank Test

15-23

The absolute values of the differences, with boldface indicating those that

are negative, are

5 5

6

2

5

5

5

16

4

3

3

1

Arrange these in increasing order and assign ranks, keeping track of which

values were originally negative. Tied values receive the average of their ranks.

Absolute value

1

2

3

3

4

5

5

5

5

5

6

16

Rank

1

2

3.5

3.5

5

8

8

8

8

8

11

12

The Wilcoxon signed rank statistic is the sum of the ranks of the negative

differences. (We could equally well use the sum of the ranks of the positive

differences.) Its value is W1 5 50.5.

EX A M P L E 15.13

GOLF

Software output. Here are the two-sided P-values for the Wilcoxon signed

rank test for the golf score data from three statistical programs:

Program

P-value

JMP

Minitab

SPSS

P 5 0.388

P 5 0.388

P 5 0.363

All lead to the same practical conclusion: these data give no evidence

for a systematic change in scores between rounds. However, the P-value

reported by SPSS differs a bit from the other two. The reason for the variation is that the programs use slightly different versions of the approximate

calculations needed when ties are present. The reported P-value depends on

which version is used.

For the golf data, the matched pairs t test gives t 5 0.9314 with P 5 0.3716.

Once again, t and W1 lead to the same conclusion.

Testing a hypothesis about the median of a distribution

Letâ€™s take another look at how the Wilcoxon signed rank test works. We have

data for a pair of variables measured on the same individuals. The analysis

starts with the differences between the two variables. These differences are

what we input to statistical software.

At this stage, we can think of our data as consisting of a single variable. The

Wilcoxon signed rank test tests the null hypothesis that the population median

of the differences is zero. The alternative is that the median is not zero.

Think about starting the analysis at the stage where we have a single

variable and we are interested in testing a hypothesis about the median.

The null hypothesis does not necessarily need to be zero. If you canâ€™t specify

a value other than zero with your software, you can simply subtract that

value from each observation before we start the analysis. Exercise 15.30 is

an example.

15-24

Chapter 15 Nonparametric Tests

sEcTIon 15.2 SUMMaRy

The Wilcoxon signed rank test applies to matched pairs studies. It tests

the null hypothesis that there is no systematic difference within pairs against

alternatives that assert a systematic difference (either one-sided or two-sided).

â—

The test is based on the Wilcoxon signed rank statistic W1, which is the

sum of the ranks of the positive (or negative) differences when we rank the

absolute values of the differences. The matched pairs t test and the sign

test are alternative tests in this setting.

â—

P-values for the signed rank test are based on the sampling distribution

of W1 when the null hypothesis is true. You can find P-values from special

tables, software, or a Normal approximation (with continuity correction).

â—

sEcTIon 15.2 EXERCISES

For Exercises 15.20 and 15.21, see page 15-19; and for

Exercises 15.22 and 15.23, see page 15-21.

15.24 Fuel efficiency. Computers in some vehicles

calculate various quantities related to performance. One

of these is the fuel efficiency, or gas mileage, usually

expressed as miles per gallon (mpg). For one vehicle

equipped in this way, the mpg were recorded each time

the gas tank was filled, and the computer was then reset.

In addition to the computer calculating mpg, the driver

also recorded the mpg by dividing the miles driven by the

number of gallons at fill-up.9 The driver wants to

determine if these calculations are different.

MPG8

Fill-up

Computer

Driver

1

2

3

4

5

6

7

8

41.5

36.5

50.7

44.2

36.6

37.2

37.3

35.6

34.2

30.5

45.0

40.5

48.0

40.0

43.2

41.0

(a) For each of the eight fill-ups, find the difference

between the computer mpg and the driver mpg.

(b) Find the absolute values of the differences you found

in part (a).

(c) Order the absolute values of the differences that you

found in part (b) from smallest to largest, and underline

those absolute differences that came from positive

differences in part (a).

FIGURE 15.9 Minitab output

for the fuel efficiency data,

Exercise 15.29.

15.25 Find the mean and the standard deviation.

Refer to the previous exercise. Use the sample size to

find the mean and the standard deviation of the sampling

distribution of the Wilcoxon signed rank statistic W1

under the null hypothesis.

15.26 State the hypotheses. Refer to Exercise 15.24.

State the null hypothesis and the alternative hypothesis

for this setting.

15.27 Find the Wilcoxon signed rank statistic. Using

the work that you performed in the Exercise 15.25, find

the value of the Wilcoxon signed rank statistic W1.

15.28 Find the P-value. Refer to Exercises 15.24

through 15.27. Find the P-value for the Wilcoxon signed

rank statistic using the Normal approximation with the

continuity correction.

15.29 Read the output. The data in Exercise 15.24

are a subset of a larger set of data. Figure 15.9 gives

Minitab output for the analysis of this larger set of

data.

MPGCOMP

(a) How many pairs of observations are in the larger

data set?

(b) What is the value of the Wilcoxon signed rank

statistic W1?

(c) Report the P-value for the significance test and give a

brief statement of your conclusion.

15.2 The Wilcoxon Signed Rank Test

(d) The output reports an estimated median. Explain how

this statistic is calculated from the data.

15.30 Number of friends on Facebook. Facebook

recently examined all active Facebook users (more than

10% of the global population) and determined that the

average user has 190 friends. This distribution takes only

integer values, so it is certainly not Normal. It is also

highly skewed to the right, with a median of 100

friends.10 Consider the following SRS of n 5 30 Facebook

users from your large university.

FACEFR

594

31

85

60

325

165

417

52

288

120

63

65

132

537

57

176

27

81

516

368

257

319

11

24

734

12

297

8

190

148

(a) Use the Wilcoxon signed rank procedure to test the null

hypothesis that the median number of Facebook friends

for Facebook users at your university is 190. Describe the

steps in the procedure and summarize the results.

(b) Analyze these data using the t procedure and compare

the results with those that you found in part (a).

15.31 The full moon and behavior. Can the full moon

influence behavior? A study observed 15 nursing-home

patients with dementia. The number of incidents of

aggressive behavior was recorded each day for 12 weeks.

Call a day a â€œmoon dayâ€™â€™ if it is the day of a full moon or

the day before or after a full moon. Here are the average

numbers of aggressive incidents for moon days and other

days for each subject:11

MOON

Patient

Moon days

Other days

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

3.33

3.67

2.67

3.33

3.33

3.67

4.67

2.67

6.00

4.33

3.33

0.67

1.33

0.33

2.00

0.27

0.59

0.32

0.19

1.26

0.11

0.30

0.40

1.59

0.60

0.65

0.69

1.26

0.23

0.38

The matched pairs t test gives P , 0.000015, and a

permutation test (Chapter 16) gives P 5 0.0001. Does the

Wilcoxon signed rank test, based on ranks rather than

means, agree that there is strong evidence that there are

more aggressive incidents on moon days?

15-25

15.32 Comparison of two energy drinks. Consider the

following study to compare two popular energy drinks.

For each subject, a coin was flipped to determine which

drink to rate first. Each drink was rated on a 0 to 100

scale, with 100 being the highest rating.

ENERDR6

Subject

Drink

1

2

3

4

5

6

A

B

43

45

83

78

66

64

87

79

78

71

67

62

(a) Inspect the data. Is there a tendency for these subjects

to prefer one of the two energy drinks?

(b) Use the matched pairs t test of Chapter 7 (page 419)

to compare the two drinks.

(c) Use the Wilcoxon signed rank test to compare the two

drinks.

(d) Write a summary of your results and explain why the

two tests give different conclusions.

15.33 Comparison of two energy drinks with an

additional subject. Refer to the previous exercise. Letâ€™s

suppose that there is an additional subject who expresses

a strong preference for energy drink A. Here is the new

data set:

ENERDR7

Subject

Drink

1

2

3

4

5

6

7

A

B

43

45

83

78

66

64

87

79

78

71

67

62

90

60

Answer the questions given in the previous exercise.

Write a summary comparing this exercise with the

previous one. Include a discussion of what you have

learned regarding the choice of the t test versus the

Wilcoxon signed rank test for different sets of data.

15.34 A summer language institute for teachers. A

matched pairs study of the effect of a summer language

institute on the ability of teachers to comprehend spoken

French had these improvements in scores between the

pretest and the posttest for 20 teachers:

SUMLANG

2

6

0

6

6

3

6

0

3

1

3

1

2

0

3

2

26

3

6

3

(a) Show how you rank these data.

(b) Calculate the signed rank statistic W1. Be sure to

show your work. Remember that zeros are dropped

from the data before ranking so that n is the number of

nonzero differences within pairs.

15-26

Chapter 15 Nonparametric Tests

(c) Perform the significance test and write a short

summary of your conclusions.

15.35 Radon detectors. How accurate are radon detectors

of a type sold to homeowners? To answer this question,

university researchers placed 12 detectors in a chamber

that exposed them to 105 picocuries per liter (pCi/l) of

radon.12 The detector readings are as follows:

RADON

91.9

103.8

97.8

99.6

111.4

96.6

122.3

119.3

105.4

104.8

95.0

101.7

To do this, apply the Wilcoxon signed rank statistic to

the differences between the observations and 105. (This

is the one-sample version of the test.) What do you

conclude?

15.36 Vitamin C in wheat-soy blend. The U.S. Agency

for International Development provides large quantities

of wheat-soy blend (WSB) for development programs and

emergency relief in countries throughout the world. One

study collected data on the vitamin C content of five bags

of WSB at the factory and five months later in Haiti.13

Here are the data:

WSBVITC

We wonder if the median reading differs significantly

from the true value 105.

Sample

1

2

3

4

5

(a) Graph the data, and comment on skewness and

outliers. A rank test is appropriate.

Before

After

73

20

79

27

86

29

88

36

78

17

(b) We would like to test hypotheses about the median

reading from home radon detectors:

H0: median 5 105

Ha: median Ãž 105

We want to know if vitamin C has been lost during

transportation and storage. Describe what the data show

about this question. Then use a rank test to see whether

there has been a significant loss.

15.3 The Kruskal-Wallis Test*

When you complete

this section, you will

be able to:

â—

Describe the setting where the Kruskal-Wallis test can be used.

â—

Specify the null and alternative hypotheses for the Kruskal-Wallis test.

â—

Use computer output to determine the results of the Kruskal-Wallis

significance test.

We have now considered alternatives to the matched pairs and two-sample t

tests for comparing the magnitude of responses to two treatments. To compare

more than two treatments, we use one-way analysis of variance (ANOVA) if

the distributions of the responses to each treatment are at least roughly

Normal and have similar spreads. What can we do when these distribution

requirements are violated?

EX A M P L E 15.14

WEEDS

Weeds and corn yield. Lambâ€™s-quarter is a common weed that interferes

with the growth of corn. A researcher planted corn at the same rate in 16

small plots of ground and then randomly assigned the plots to four groups.

He weeded the plots by hand to allow a fixed number of lambâ€™s-quarter

plants to grow in each meter of corn row. These numbers were 0, 1, 3, and

*Because this test is an alternative to the one-way analysis of variance F test, you should first

read Chapter 12.

15.3 The Kruskal-Wallis Test

15-27

9 in the four groups of plots. No other weeds were allowed to grow, and all

plots received identical treatment except for the weeds. Here are the yields

of corn (bushels per acre) in each of the plots:14

Weeds

per

meter

Corn

yield

Weeds

per

meter

Corn

yield

Weeds

per

meter

Corn

yield

Weeds

per

meter

Corn

yield

0

0

0

0

166.7

172.2

165.0

176.9

1

1

1

1

166.2

157.3

166.7

161.1

3

3

3

3

158.6

176.4

153.1

156.0

9

9

9

9

162.8

142.4

162.7

162.4

The summary statistics are

LOOK BACK

rule for

examining

standard

deviations in

ANOVA,

p. 654

Weeds

n

Mean

Std. dev.

0

1

3

9

4

4

4

4

170.200

162.825

161.025

157.575

5.422

4.469

10.493

10.118

The sample standard deviations do not satisfy our rule of thumb that

for safe use of ANOVA the largest should not exceed twice the smallest. A

careful look at the data suggests that there may be some outliers. These are

the correct yields for their plots, so we have no justification for removing

them. Letâ€™s use a rank test that is not sensitive to outliers.

Hypotheses and assumptions

The ANOVA F test concerns the means of the several populations represented

by our samples. For Example 15.14, the ANOVA hypotheses are

H0: m0 5 m1 5 m3 5 m9

Ha: not all four means are equal

Kruskal-Wallis test

Here, m0 is the mean yield in the population of all corn planted under the

conditions of the experiment with no weeds present. The data should consist

of four independent random samples from the four populations, all Normally

distributed with the same standard deviation.

The Kruskal-Wallis test is a rank test that can replace the one-way ANOVA

F test. The assumption about data production (independent random samples

from each population) remains important, but we can relax the Normality asÂ

sumption. We assume only that the response has a continuous distribution in

each population. The hypotheses tested in our example are

H0: Yields have the same distribution in all groups.

Ha: Yields are systematically higher in some groups than in others.

15-28

Chapter 15 Nonparametric Tests

If all the population distributions have the same shape (Normal or not), these

hypotheses take a simpler form. The null hypothesis is that all four populaÂ

tions have the same median yield. The alternative hypothesis is that not all

four median yields are equal.

The Kruskal-Wallis test

Recall the analysis of variance idea: we write the total observed variation in

the responses as the sum of two parts, one measuring variation among the

groups (sum of squares for groups, SSG) and one measuring variation among

individual observations within the same group (sum of squares for error,

SSE). The ANOVA F test rejects the null hypothesis that the mean responses

are equal in all groups if SSG is large relative to SSE.

The idea of the Kruskal-Wallis rank test is to rank all the responses

from all groups together and then apply one-way ANOVA to the ranks

rather than to the original observations. If there are N observations in

all, the ranks are always the whole numbers from 1 to N. The total sum

of squares for the ranks is, therefore, a fixed number no matter what the

data are. So we do not need to look at both SSG and SSE. Although it isnâ€™t

obvious without some unpleasant algebra, the Kruskal-Wallis test statistic

is essentially just SSG for the ranks. We give the formula, but you should

rely on software to do the arithmetic. When SSG is large, that is evidence

that the groups differ.

The KRuSKAL-WALLIS TeST

Draw independent SRSs of sizes n1, n2, . . . , nI from I populations. There

are N observations in all. Rank all N observations and let Ri be the sum of

the ranks for the ith sample. The Kruskal-Wallis statistic is

H5

12

N(N 1 1)

o

R2i

2 3(N 1 1)

ni

When the sample sizes ni are large, there are no ties, and all I populations

have the same continuous distribution, H has approximately the chisquare distribution with I 2 1 degrees of freedom.

The Kruskal-Wallis test rejects the null hypothesis that all populations

have the same distribution when H is large.

We now see that, like the Wilcoxon rank sum statistic, the Kruskal-Wallis

statistic is based on the sums of the ranks for the groups we are comparing.

The more different these sums are, the stronger is the evidence that responses

are systematically larger in some groups than in others.

The exact distribution of the Kruskal-Wallis statistic H under the null

hypothesis depends on all the sample sizes n1 to nI, so tables are awkward.

The calculation of the exact distribution is so time-consuming for all

but the smallest problems that even most statistical software uses a chisquare approximation to obtain P-values. As usual, there is no usable exact

distribution when there are ties among the responses. We again assign average

ranks to tied observations.

15.3 The Kruskal-Wallis Test

EX A M P L E 15.15

WEEDS

15-29

Perform the significance test. In Example 15.14, there are I 5 4 populations

and N 5 16 observations. The sample sizes are equal, ni 5 4. The 16 observations arranged in increasing order, with their ranks, are

Yield

Rank

142.4

1

153.1

2

156.0

3

157.3

4

158.6

5

161.1

6

162.4

7

162.7

8

Yield

Rank

162.8

9

165.0

10

166.2

11

166.7 166.7

12.5

12.5

172.2

14

176.4

15

176.9

16

There is one pair of tied observations. The ranks for each of the four treatments are

Weeds

0

1

3

9

Ranks

10

4

2

1

12.5 14 16

6

11 12.5

3

5 15

7

8

9

Rank sums

52.5

33.5

25.0

25.0

The Kruskal-Wallis statistic is, therefore,

H5

5

12

N(N 1 1)

o

S

R2i

2 3(N 1 1)

ni

D

12

52.52 33.52 252 252

1

1

1

2 (3)(17)

(16)(17)

4

4

4

4

12

(1282.125) 2 51

272

5 5.56

5

Referring to the table of chi-square critical points (Table F) with df 5 3,

we find that the P-value lies in the interval 0.10 , P , 0.15. This small

experiment suggests that more weeds decrease yield but does not provide

convincing evidence that weeds have an effect.

LOOK BACK

matched pairs,

p. 182

EX A M P L E 15.16

ORGANIC

In Example 15.15, we concluded that the data did not provide evidence in

support of the idea that more weeds decreases the yield of corn. Here is an

example of a study where the analysis does provide evidence for us to reject

the null hypotheses. In this situation, we will include a multiple comparisons

procedure to tell us which pairs of levels of the factor differ significantly.

Organic foods and morals? Organic foods are often marketed with moral

terms such as honesty and purity. Is this just a marketing strategy, or is there

a conceptual link between organic food and morality? In one experiment,

62 undergraduates were randomly assigned to one of three food conditions (organic, comfort, and control).15 First, each participant was given a

packet of four food types from the assigned condition and told to rate the

15-30

Chapter 15 Nonparametric Tests

desirability of each food on a seven-point scale. Then, each was presented

with a list of six moral transgressions and asked to rate each on a sevenpoint scale ranging from 1 5 not at all morally wrong to 7 5 very morally

wrong.

Exercises 12.23 and 12.24 (page 669) lead you through the steps required

to analyze these data using a one-way ANOVA. Note that the data are disÂ

crete with possible values of 1 through 7. We expect that our results should

be reasonable because the sample sizes are large enough for us to expect

that the sample means are approximately Normal. Letâ€™s check the results

using the Kruskal-Wallis test.

The output from JMP is given in Figure 15.10. This software uses a

chi-square approximation to test the null hypothesis. We reject the null

hypothesis (X2 5 12.41, df 5 2, P 5 0.002) and conclude that scores (moral

judgments) depend upon the type of food shown to the students. The

multiple comparisons procedure indicates that, on the basis of the moral

transgression scale, we can distinguish organic from comfort and organic

from control, but control and comfort are not distinguishable.

FIGURE 15.10 Output from JMP for the Kruskal-Wallis test applied to the organic

food data, Example 15.16.

sEcTIon 15.3 SUMMaRy

â— The Kruskal-Wallis test compares several populations on the basis of

independent random samples from each population. This is the one-way

analysis of variance (ANOVA) setting.

The null hypothesis for the Kruskal-Wallis test is that the distribution of the

response variable is the same in all the populations. The alternative hypothesis

is that responses are systematically larger in some populations than in others.

â—

15.3 The Kruskal-Wallis Test

15-31

The Kruskal-Wallis statistic H can be viewed in two ways. It is essentially

the result of applying one-way ANOVA to the ranks of the observations. It is

also a comparison of the sums of the ranks for the several samples.

â—

â— When the sample sizes are large and the null hypothesis is true, H for

comparing I populations has approximately the chi-square distribution with

I 2 1 degrees of freedom. Software often uses this approximate distribution

to obtain P-values.

sEcTIon 15.3 EXERCISES

15.37 Number of Facebook friends. An experiment was

run to examine the relationship between the number of

Facebook friends and the userâ€™s perceived social

attractiveness.16 A total of 134 undergraduate participants

were randomly assigned to observe one of five Facebook

profiles. Everything about the profile was the same except

the number of friends, which appeared on the profile as

102, 302, 502, 702, or 902. After viewing the profile, each

participant was asked to fill out a questionnaire on the

physical and social attractiveness of the profile user. Each

attractiveness score is an average of several seven-point

questionnaire items, ranging from 1 (strongly disagree) to

7 (strongly agree). In Example 12.3 (page 648), we

analyzed these data using a one-way ANOVA. Describe the

setting for this problem. Include the number of groups to

be compared, assumptions about independence, and the

distribution of the attractiveness scores.

FRIENDS

15.38 What are the hypotheses? Refer to the previous

exercise. What are the null hypothesis and the alternative

hypothesis? Explain why a nonparametric procedure

would be appropriate in this setting.

15.39 Read the output. Figure 15.11 gives JMP output

for the analysis of the data described in Exercise 15.37.

FIGURE 15.11 Output from JMP for the Kruskal-Wallis test applied to the Facebook data, Exercise 15.39.

15-32

Chapter 15 Nonparametric Tests

Describe the results given in the output and write

a short summary of your conclusions from the

analysis.

The loss of vitamin C over time is clear, but with only two

loaves of bread for each storage time, we wonder if the

differences among the groups are significant.

15.40 Do we experience emotions differently? In

Exercise 12.37 (page 686) you analyzed data related to

the way people from different cultures experience

emotions. The study subjects were 416 college students

from five different cultures. They were asked to record,

on a 1 (never) to 7 (always) scale, how much of the

time they typically felt eight specific emotions. These

were averaged to produce the global emotion score

for each participant. Analyze the data using the KruskalWallis test and write a summary of your analysis

and conclusions. Be sure to include your assumptions,

hypotheses, and the results of the significance

test.

EMOTION

(a) Use the Kruskal-Wallis test to assess significance and

then write a brief summary of what the data show.

15.41 Do isoflavones increase bone mineral density?

In Exercise 12.45 (page 688) you investigated the effects

of isoflavones from kudzu on bone mineral density

(BMD). The experiment randomized rats to three diets:

control, low isoflavones, and high isoflavones. Here are

the data:

KUDZU

BMD (g/cm2)

Treatment

Control

0.228 0.207 0.234 0.220 0.217 0.228 0.209 0.221

0.204 0.220 0.203 0.219 0.218 0.245 0.210

Low dose 0.211 0.220 0.211 0.233 0.219 0.233 0.226 0.228

0.216 0.225 0.200 0.208 0.198 0.208 0.203

(b) Because there are only two observations per group,

we suspect that the common chi-square approximation

to the distribution of the Kruskal-Wallis statistic may not

be accurate. The exact P-value (from SAS software) is

P 5 0.0011. Compare this with your P-value from part (a).

Is the difference large enough to affect your conclusion?

15.43 Jumping and strong bones. In Exercise 12.47

(page 688), you studied the effects of jumping on the

bones of rats. Ten rats were assigned to each of three

treatments: a 60-centimeter â€œhigh jump,â€™â€™ a 30-centimeter

â€œlow jump,â€™â€™ and a control group with no jumping.18 Here

are the bone densities (in milligrams per cubic centimeter)

after eight weeks of 10 jumps per day:

JUMP

Bone density (mg/cm3)

Group

Control

611

653

621

600

614

554

593

603

593

569

Low jump

635

632

605

631

638

588

594

607

599

596

High jump

650

622

622

643

626

674

626

643

631

650

High dose 0.250 0.237 0.217 0.206 0.247 0.228 0.245 0.232

0.267 0.261 0.221 0.219 0.232 0.209 0.255

(a) The study was a randomized comparative experiment.

Outline the design of this experiment.

(a) Use the Kruskal-Wallis test to compare the three

diets.

(b) Make side-by-side stemplots for the three groups,

with the stems lined up for easy comparison. The

distributions are a bit irregular but not strongly nonNormal. We would usually use analysis of variance to

assess the significance of the difference in group means.

(b) How do these results compare with what you find

using the ANOVA F statistic?

15.42 Vitamins in bread. Does bread lose its vitamins

when stored? Here are data on the vitamin C content

(milligrams per 100 grams of flour) in bread baked from

the same recipe and stored for one, three, five, or seven

days.17 The 10 observations are from 10 different loaves

of bread.

BREAD

Condition

Immediately after baking

One day after baking

Three days after baking

Five days after baking

Seven days after baking

Vitamin C

47.62

40.45

21.25

13.18

8.51

(mg/100 g)

49.79

43.46

22.34

11.65

8.13

(c) Do the Kruskal-Wallis test. Explain the distinction

between the hypotheses tested by Kruskal-Wallis and

ANOVA.

(d) Write a brief statement of your findings. Include a

numerical comparison of the groups as well as your test

result.

15.44 Do poets die young? In Exercise 12.64 (page 693),

you analyzed the age at death for female writers. They

were classified as novelists, poets, and nonfiction writers.

The data are given in Table 12.1 (page 693).

POETS

(a) Use the Kruskal-Wallis test to compare the three

groups of female writers.

(b) Compare these results with what you find using the

ANOVA F statistic.

15-33

Chapter 15 Exercises

cHAPTER 15 EXERCISES

15.45 Plants and hummingbirds. Different

varieties of the tropical flower Heliconia are

fertilized by different species of hummingbirds. Over

time, the lengths of the flowers and the forms of the

hummingbirdsâ€™ beaks have evolved to match each

other. Here are data on the lengths in millimeters of

three varieties of these flowers on the island of

Dominica:19

HBIRDS

(c) Use a two-sample t test to compare the men and

women. Write a short summary of your results.

(d) Which procedure is more appropriate for these data?

Give reasons for your answer.

15.47 Response times for telephone repair calls.

A study examined the time required for the telephone

company Verizon to respond to repair calls from its

own customers and from customers of a CLEC, another

phone company that pays Verizon to use its local lines.

Here are the data, which are rounded to the nearest

hour:

TREPAIR

H. bihai

47.12

46.44

50.12

46.75

46.64

46.34

46.81

48.07

46.94

47.12

48.34

48.36

46.67

48.15

47.43

50.26

Verizon

H. caribaea red

41.90

39.78

39.16

38.79

42.01

40.57

37.40

38.23

41.93

39.63

38.20

38.87

43.09

42.18

38.07

37.78

41.47

40.66

38.10

38.01

41.69

37.87

37.97

36.03

36.66

35.45

35.68

1

1

1

1

1

1

1

1

H. caribaea yellow

36.78

38.13

36.03

37.02

37.10

34.57

36.52

35.17

34.63

36.11

36.82

Do a complete analysis that includes description of the

data and a rank test for the significance of the differences

in lengths among the three species.

15.46 Time spent studying. In Exercise 1.159

(page 76), you compared the time spent studying by

men and women. The students in a large first-year

college class were asked how many minutes they

studied on a typical weeknight. Here are the responses

of random samples of 30 women and 30 men from

the class:

STIME

170

120

150

200

120

90

120

180

120

150

60

240

180

120

180

180

120

180

Men

360

240

180

150

180

115

240

170

150

180

180

120

80

90

150

240

30

0

120

45

120

60

230

200

30

30

60

120

120

120

90

120

240

60

95

120

(a) Summarize the data numerically and graphically.

(b) Use the Wilcoxon rank sum test to compare the

men and women. Write a short summary of your

results.

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

2

2

2

2

2

2

2

2

2

2

3

3

3

5

6

15

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

2

5

5

5

2

2

2

2

2

2

2

2

2

3

3

3

4

5

8

CLEC

1

Women

1

1

1

1

1

1

1

1

200

75

300

30

150

180

1

5

5

5

1

5

(a) Does Verizon appear to give CLEC customers the

same level of service as its own customers? Compare the

data using graphs and descriptive measures and express

your opinion.

(b) We would like to see if times are significantly longer

for CLEC customers than for Verizon customers. Why

would you hesitate to use a t test for this purpose? Carry

out a rank test. What can you conclude?

(c) Explain why a nonparametric procedure is

appropriate in this setting.

Iron-deficiency anemia is the most common form of

malnutrition in developing countries. Does the type of

cooking pot affect the iron content of food? We have data

from a study in Ethiopia that measured the iron content

(milligrams per 100 grams of food) for three types of food

COOK

cooked in each of three types of pots:20

Type of pot

Aluminum

Clay

Iron

Iron content

1.77

2.27

5.27

Meat

2.36 1.96

1.28 2.48

5.17 4.06

2.14

2.68

4.22

15-34

Chapter 15 Nonparametric Tests

Aluminum

Clay

Iron

2.40

2.41

3.69

Legumes

2.17 2.41

2.43 2.57

3.43 3.84

Aluminum

Clay

Iron

1.03

1.55

2.45

Vegetables

1.53 1.07

0.79 1.68

2.99 2.80

2.34

2.48

3.72

1.30

1.82

2.92

the three types of food differ in iron content when all are

cooked in iron pots?

COOK

15.48 Cooking vegetables in different pots. Does the

vegetable dish vary in iron content when cooked in

aluminum, clay, and iron pots?

COOK

15.51 Multiple comparisons for plants and

hummingbirds. As in ANOVA, we often want to

carry out a multiple-comparisons procedure following a

Kruskal-Wallis test to tell us which groups differ

significantly.21 The Bonferroni method (page 679) is a

simple method: if we carry out k tests at fixed significance

level 0.05yk, the probability of any false rejection among

the k tests is always no greater than 0.05. That is, to get

overall significance level 0.05 for all of k comparisons, do

each individual comparison at the 0.05yk level. In

Exercise 15.45, you found a significant difference among

the lengths of three varieties of the flower Heliconia. Now

we will explore multiple comparisons.

HBIRDS

(a) What do the data appear to show? Check the

conditions for one-way ANOVA. Which requirements are

a bit dubious in this setting?

(a) Write down all the pairwise comparisons we can

make, for example, bihai versus caribaea red. There are

three possible pairwise comparisons.

(b) Instead of ANOVA, do a rank test. Summarize your

conclusions about the effect of pot material on the iron

content of the vegetable dish.

(b) Carry out three Wilcoxon rank sum tests, one for

each of the three pairs of flower varieties. What are the

three two-sided P-values?

15.49 Cooking meat and legumes in aluminum and

clay pots. There appears to be little difference between

the iron content of food cooked in aluminum pots and

food cooked in clay pots. Is there a significant difference

between the iron content of meat cooked in aluminum

and clay? Is the difference between aluminum and clay

significant for legumes? Use rank tests.

COOK

(c) For purposes of multiple comparisons, any of these

three tests is significant if its P-value is no greater than

0.05y3 = 0.0167. Which pairs differ significantly at the

overall 0.05 level?

Exercises 15.48, 15.49, and 15.50 use these data.

15.50 Iron in food cooked in iron pots. The data show

that food cooked in iron pots has the highest iron

content. They also suggest that the three types of food

differ in iron content. Is there significant evidence that

15.52 Multiple comparisons for cooking pots.

The previous exercise outlines how to use the

Wilcoxon rank sum test several times for multiple

comparisons with overall significance level 0.05

for all comparisons together. Apply this procedure

to the data used in each of Exercises 15.48, 15.49,

COOK

and 15.50.

cHAPTER 15 NOTES aND DaTa SOURCES

1. Cventâ€™s 2014 Top 100 Meeting Hotels, see

cvent.com/rfp/2014-top-100-us-meeting-hotelsÂ

f002743686ec45749cf28b9ae19ec3df.aspx.

2. For purists, here is the precise definition: X1 is

stochastically larger than X2 if

P(X1 > a) $ P(X2 > a)

for all a, with strict inequality for at least one a.

The Wilcoxon rank sum test is effective against this

alternative in the sense that the power of the test

approaches 1 (that is, the test becomes more certain to

reject the null hypothesis) as the number of observations

increases.

3. Erin K. Oâ€™Loughlin et al., â€œPrevalence and correlates

of exergaming in youth,â€ Pediatrics, 130 (2012)

pp. 806â€“814.

4. From the PEW Internet & American Life website,

pewinternet.org/Reports/2013/Civic-Engagement.aspx.

5. From Matthias R. Mehl et al., â€œAre women really more

talkative than men?â€ Science, 317, no 5834 (2007), p. 82.

The raw data were provided by Matthias Mehl.

6. Data provided by Warren Page, New York City Technical

College, from a study done by John Hudesman.

7. Data provided by Susan Stadler, Purdue University.

8. Ibid.

9. The vehicle is a 2002 Toyota Prius owned by the third

author.

10. Statistics regarding Facebook usage can be

found at facebook.com/notes/facebook-data-team/

anatomy-of-facebook/10150388519243859.

Chapter 15 Notes and Data Sources

11. These data were collected as part of a larger study

of dementia patients conducted by Nancy Edwards,

School of Nursing, and Alan Beck, School of Veterinary

Medicine, Purdue University.

12. Data provided by Diana Schellenberg, Purdue

University School of Health Sciences.

13. These data are from â€œResults report on the

vitamin C pilot program,â€ prepared by SUSTAIN

(Sharing United States Technology to Aid in the

Improvement of Nutrition) for the U.S. Agency

for International Development. The report was used

by the Committee on International Nutrition of the

National Academy of Sciences/Institute of Medicine

to make recommendations on whether or not the

vitamin C content of food commodities used in

U.S. food aid programs should be increased. The

program was directed by Peter Ranum and FranÃ§oise

ChomÃ©. The second author was a member of the

committee.

14. Data provided by Sam Phillips, Purdue University.

15. Kendall J. Eskine, â€œWholesome foods and

wholesome morals? Organic foods reduce prosocial

behavior and harshen moral judgments,â€ Social

15-35

Psychological and Personality Science, 2012, doi:

10.1177/1948550612447114.

16. See item 10.

17. Data provided by Helen Park. See H. Park et al.,

â€œFortifying bread with each of three antioxidants,â€ Cereal

Chemistry, 74 (1997), pp. 202â€“206.

18. Data provided by Jo Welch, Purdue University

Department of Foods and Nutrition.

19. We thank Ethan J. Temeles of Amherst College

for providing the data. His work is described in

Ethan J. Temeles and W. John Kress, â€œAdaptation in a

plant-hummingbird association,â€ Science, 300 (2003),

pp. 630â€“633.

20. Based on A. A. Adish et al., â€œEffect of consumption

of food cooked in iron pots on iron status and growth

of young children: A randomised trial,â€ The Lancet, 353

(1999), pp. 712â€“716.

21. For more details on multiple comparisons, see

M. Hollander and D. A. Wolfe, Nonparametric Statistical

Methods, 2nd ed., Wiley, 1999. This book is a useful

reference on applied aspects of nonparametric inference

in general.

Purchase answer to see full

attachment

**We offer the bestcustom writing paper services. We have done this question before, we can also do it for you.**

#### Why Choose Us

- 100% non-plagiarized Papers
- 24/7 /365 Service Available
- Affordable Prices
- Any Paper, Urgency, and Subject
- Will complete your papers in 6 hours
- On-time Delivery
- Money-back and Privacy guarantees
- Unlimited Amendments upon request
- Satisfaction guarantee

#### How it Works

- Click on the “Place Order” tab at the top menu or “Order Now” icon at the bottom and a new page will appear with an order form to be filled.
- Fill in your paper’s requirements in the "
**PAPER DETAILS**" section. - Fill in your paper’s academic level, deadline, and the required number of pages from the drop-down menus.
- Click “
**CREATE ACCOUNT & SIGN IN**” to enter your registration details and get an account with us for record-keeping and then, click on “PROCEED TO CHECKOUT” at the bottom of the page. - From there, the payment sections will show, follow the guided payment process and your order will be available for our writing team to work on it.