Introduction to Statistics complete course is currently being offered by Stanford through Coursera platform.

Introduction to Statistics Coursera Week 1 Quiz Answers!
Introduction and Descriptive Statistics for Exploring
Data
Q1. What is an appropriate way to visualize a list of the
eye colors of 120 people? Select all that apply.
- pie
chart
- box
plot
- dot
plot
Q2. According to the histogram of travel times to work from
the US 2000 census (Page 6 of “Journey to Work: 2000”), roughly what percentage
of commuters travel more than 45 minutes?
- 75
Q3. According to the histogram of travel times to work from
the US 2000 census (Page 6 of “Journey to Work: 2000”), approximately what is
the median travel time, in minutes (i.e. 50% of commuters have at most that
travel time, 50% have at least that travel time)?
- 50
Q3. You want to investigate whether households in California
tend to have a higher income than households in Massachusetts. Which summary
measure would you use to compare the two states?
- 3rd
quartile of household income
- median
household income
- mean
household income
Q5. Suppose all household incomes in California increase by
5%. How does that change the mean household income?
- the
mean household income goes up by 5%
- the
mean household income doesn’t change
- cannot
be determined from the information given
Q6. Suppose all household incomes in California increase by
5%. How does that change the median household income?
- cannot
be determined from the information given
- median
household income goes up by 5%
- the
median household income doesn’t change
Q7. Suppose all household incomes in California increase by
5%. How does that change the standard deviation of the household incomes?
- the
standard deviation of the household incomes goes up by 5%
- the
standard deviation of the household incomes doesn’t change
- cannot
be determined from the information given
Q8. Suppose all household incomes in California increase by
5%. How does that change the interquartile range of household incomes?
- the
interquartile range of the household incomes doesn’t change
- cannot
be determined from the information given
- the
interquartile range of the household incomes goes up by 5%
Q9. Suppose all household incomes in California increase by
$5,000. How does that change the mean household income?
- cannot
be determined from the information given
- the
mean household income doesn’t change
- the
mean household income goes up by $5,000
Q10. Suppose all household incomes in California increase by
$5,000. How does that change the median household income?
- the
median household income goes up by $5,000
- cannot
be determined from the information given
- the
median household income doesn’t change
Q11. Suppose all household incomes in California increase by
$5,000. How does that change the standard deviation of the household incomes?
- the
standard deviation of the household incomes doesn’t change
- cannot
be determined from the information given
- the
standard deviation of the household incomes goes up by $5,00
Q12. Suppose all household incomes in California increase by
$5,000. How does that change the interquartile range of household incomes?
- the
interquartile range of the household incomes goes up by $5,000
- the
interquartile range of the household incomes doesn’t change
- cannot
be determined from the information given
Q13. The median sales price for houses in a certain county
during the last year was $342,000. What can we say about the percentage of
sales represented by the houses that sold for more than $342,000?
- the
houses that sold for more than $342,000 represent more than 50% of all
sales
- the
houses that sold for more than $342,000 represent exactly 50% of all sales
- the
houses that sold for more than $342,000 represent less than 50% of all
sales
Introduction to Statistics Coursera Week 2 Quiz Answers!
Producing Data and Sampling
Q1. A news company located next to Times Square in New York
wants to get a sense of how people feel about a proposed law on immigration. A
reporter steps out of the building and randomly selects 100 people walking
there and asks them about the proposed law. What can we say about this sampling
plan? Single correct answer.
- it
leads to voluntary response bias
- it
leads to non-response bias
- it
leads to selection bias
- it
represents a simple random sampling
Q2. A car company wants to get a sense how satisfied the owners
of its new car model are with the quality of that car. It randomly selects 250
numbers from the all the vehicle registration numbers that have been issued for
this model and contacts the owners of that model. What can we say about this
sampling plan?
- it
represents a simple random sampling
- it
leads to selection bias
- it
leads to non-response bias
- it
leads to voluntary response bias
Q3. An airline wants to do a customer survey in order to
improve its service. For one month, it sends an email to a random sample of
customers who flew with the airline on the previous day (no customer will be
contacted more than once). The email states that the airline would like the
customer to fill out a 10-minute survey in order to help the airline improve
its service. What can we say about this sampling plan? Single correct answer.
- it
represents a simple random sampling
- it
leads to selection bias
- it
leads to non-response bias
- it
leads to voluntary response bias
Q4. As in the previous question, an airline wants to do a
customer survey in order to improve its service. For one month, it sends an
email to a random sample of customers which flew with the airline on the
previous day (no customer will be contacted more than once). Again, the email
states that the airline would like the customer to fill out a 10 minute survey
in order to help the airline improve its service, but this time it states in
addition that every respondent will receive a gift card worth $100. What can we
say about this sampling plan?
- it
represents a simple random sampling
- it
leads to selection bias
- it
leads to non-response bias
- it
leads to voluntary response bias
Q5. Some years ago, there were many news reports about the
“Paleo diet”. It was claimed that the Paleo Diet would result in weight loss as
well as prevention and control of many “diseases of civilization”.
A news channel decides to check this out. It recruits people
who have followed the diet for the past year and selects 100 at random. It also
recruits people who have not followed the diet and selects 100 at random. It
finds that there is more weight loss in the diet group, and that this result is
‘statistically significant’.
Which of the following statements are true?
- This
is a randomized controlled experiment.
- It
is possible that the difference in weight loss is due to the placebo
effect.
- If
a future carefully run randomized controlled experiment reveals that the
paleo diet does not result in weight loss, then we can conclude that the
weight loss observed above must be due to the placebo effect.
Q6. A number of competitive female cross country runners
suffer from bone loss due to low estrogen levels. Some medical experts
conjecture that this can be prevented by taking oral contraceptives, as those
contain estrogen. This conjecture is to be tested with an experiment. The goal
of the experiment is to find out whether taking an oral contraceptive prevents
bone loss in female cross country runners. Which of the following subjects
should be recruited in order to do a good experiment? (Pick one of the three.)1
point
- A
group of women who are competitive runners and another group of women who
are not competitive athletes.
- A
group of female runners who are taking oral contraceptives and another
group of female runners who are not taking oral contraceptives.
- A
group of female runners who are not taking oral contraceptives, but who
are willing to take them if asked by the organizers of the experiment to
do do.
Probability
Q1. A fair coin is tossed 5 times. What is the probability
of getting at most 4 tails?
- 1
– (1/2)5 = 0.96875
Q2. When you roll a pair of dice, a double is when both dice
show the same number, e.g. both show ‘1’ or both show ‘4’. What is the chance
of a double when you roll a pair of dice?
- 6/15
- 1/6
- 1/12
- 1/36
Q3. The game Monopoly is played by rolling a pair of dice.
If you land in jail, then to get out, you must roll a double on any one of your
next three turns, or else pay a fine. What are the chances that you get out of
jail without paying a fine?
- 1
– (5/6)3 = 0.421296
Q4. 3% of all applicants to the Stanford Medical School are
admitted. 70% of all applicants have a GPA of 3.6 or above. Of those who are
admitted, 95% have a GPA of 3.6 or above.
What are the chances of being admitted for an applicant
whose GPA is 3.6 or above?
- (0.95)
(0.03) / (0.7)
Q5. A multiple-choice exam has 10 questions. Each question
has 3 possible answers, of which one is correct. A student knows the correct
answers to 4 questions and guesses the answers to the other 6 questions.
It turns out that the student answered the first question
correctly. What are the chances that the student was merely guessing?
Q6. There are three boxes on the table: The first box
contains 2 quarters, the second box contains 2 nickels, and the last box
contains 1 quarter and 1 nickel. You choose a box at random, then you pick a
coin at random from the chosen box.
If the coin you picked is a quarter, what’s the chance that
the other coin in the box is also a quarter?
Introduction to Statistics Coursera Week 3 Quiz Answers!
The Normal Approximation for Data and the Binomial
Distribution
Q1. Scores on a certain test follow the normal curve with an
average of 1350 and a standard deviation of 120.
What percentage of the test-takers score below 1230? (Use
the empirical rule.)
- 16%
- 34%
- 68%
- 18%
Q2. As in the previous question, scores on a certain test
follow the normal curve with an average of 1350 and a standard deviation of
120.
In order to qualify for a certain job, a candidate needs to
score in the top 2.5%. What score does she need?
- 1710
- 1470
- 1650
- 1590
Q3. Recall that the main object in a boxplot is a box that
is bounded by the first and the third quartiles. So the length of the box is
the difference between the third and the first quartile, which is called the
interquartile range. This is a measure of the spread of the data; it is
sometimes used as an alternative to the standard deviation.
If the data follow the normal curve, then the interquartile
range equals how many standard deviations? (You may use the fact that the
z-value of the third quartile is 0.7.)
- 0.7
- 1
- 1.4
- 2
Q4. A multiple-choice exam has 5 questions. Each question
has 4 possible answers, of which one is correct. If a student guesses the
answers to all five questions, what are the chances that he gets 2 correct?
Q5. A fair coin is tossed 6 times. What are the chances of
getting 2 tails in each of the first 3 and the last 3 tosses?
Q6. A fair coin is tossed 400 times. Approximately what are
the chances to get more than 210 tails? (Use the empirical rule and the normal
approximation to the binomial distribution.)
- 32%
- 16%
- 5%
Sampling Distributions and the Central Limit Theorem
Q1. A town has 10,000 registered voters, of whom 6,000 are
voting for the Democratic party. A survey organization is taking a sample of
100 registered voters (assume sampling with replacement). The percentage of
Democratic voters in the sample will be around _____, give or take ____. (You
may use the fact that the standard deviation of 6,000 1s and 4,000 0s is about
0.5)
- 60%,
give or take 5%
- 40%,
give or take 5%
- 60%,
give or take 0.5%
- 40%,
give or take 0.5%
Q2. You solicit 100 pledges for a charitable organization.
Each pledge is equally likely to be $10, $50, or $100. You may use the fact
that the standard deviation of the three amounts $10, $50 and $100 is $37.
What is the expected value of the sum of the 100 pledges?
- $5333
- $533
- $3700
- $370
Q3. You solicit 100 pledges for a charitable organization.
Each pledge is equally likely to be $10, $50, or $100. You may use the fact
that the standard deviation of the three amounts $10, $50 and $100 is $37.
What are the chances that the 100 pledges total more than
$5,700?
- 16%
- 32%
- 5%
Q4. There are two candidates running for governor in CA and
they are said to have roughly equal support from the voters. To get a better
idea who is ahead, a company polls 400 of the 20 million registered voters in
California. Likewise, there are two candidates running for mayor in Palo Alto
who are said to have roughly equal support, and the company polls 400 out of
the 20,000 registered voters in Palo Alto. Will the first poll be more accurate,
equally accurate, or less accurate than the second poll?
- more
accurate
- equally
accurate
- less
accurate
Q5. The average taxable income reported on tax returns for
the year 2016 is $ 45,000, and the standard deviation of the taxable income is
$ 23,000.
Which of the following two statements are true? Both?
- The
percentage of taxable incomes that fall below $ 30,000 can be computed
from the above information using a normal approximation.
- The
chances that the sum of 100 randomly selected taxable incomes exceed $ 4
million can be computed from the above information using the normal
approximation.
Q6. Questions (a)-(d) below relate to the following
situation: Someone tosses a fair coin 100 times.
Question (a): How many tails can she expect to get?
- 50
Q7. Question (b): What is the “give and take” number for the
result from Question (a)?
- 5
Q8. Question (c): What are the chances that she gets between
40 and 60 tails?
- 16%
- 68%
- 95%
- 99.7%
Q9. A large group of people gets together and everyone
tosses a coin 100 times.
Question (d): About what percentage of people will get
between 40 and 60 tails?
- 16%
- 68%
- 95%
- 99.7%
Introduction to Statistics Coursera Week 4 Quiz Answers!
Regression
Q1. Some people believe that musical activity (e.g. playing
an instrument) enhances mathematical ability. 100 high school students were
selected at random. For each student, musical activity was recorded in hours
per week, and mathematical ability was assessed by a test. The correlation
coefficient was found to be 0.85.
Does the large correlation coefficient prove that musical
activity enhances mathematical ability?
- yes
- no
Q2. What would your answer to the previous question be if
you learned that all students in the study came from the same grade?
- yes
- no
Q3. For a group of commuters commuting to work on a given
day, the correlation coefficient between a) time spent waiting at traffic
signals, and b) total commuting time, was found to be 0.4. Which of the
following statements about the correlation coefficient are true?
- If
a commuter’s total commuting time increases by 10 minutes, then he will
spend an additional 4 minutes waiting at traffic signals, on average.
- The
average commuter spent 40% of the commuting time waiting at traffic
signals.
- The
more time a commuter spends commuting to work, the more time he spends
waiting at traffic signals, on average.
- The
more time a commuter spends waiting at traffic signals, the longer the
total commuting time, on average.
Q4. A study followed 1,000 children over time. The scatter
plot of heights at age 1 vs. heights at age 2 looks football-shaped with a
correlation coefficient r=0.8. Alice’s height at age 1 is at the 80th
percentile.
Would you predict her height at age 2 to be below, at, or
above the 80th percentile?
- below
- at
- above
Q5. In the previous question we learned that in a study of
children’s height, the correlation coefficient between height at age 1 vs.
height at age 2 is r=0.8.
Predict the z-score of Alice’s height at age 2. (You may use
the fact that the z-score of the 80th percentile is z=0.85.)
- (0.8)(0.85)
= 0.68
- 0.85/0.8
= 1.0625
- not
enough information
Q6. Questions (a)-(d) below relate to the following
situation: In a biology class, both the midterm scores and the final exam
scores have an average of 50 and a standard deviation of 10. The scatterplot
looks football-shaped and the correlation coefficient is 0.6.
Claudia would like to know what score her friend Emily got
on the final.
Question (a): If you have no information on how Emily did on
the midterm, what is your prediction for her score on the final?
- 40
- 44
- 50
- 56
Q7. Question (b): What is the “give or take” number for your
prediction from Question (a)?
- 10
Q8. Now you learn that Emily got exactly the mean score of
50 on the midterm.
Question (c): Given this information, what is your
prediction for Emily’s score on the final?
- 40
- 44
- 50
- 56
Q9. Question (d): What is the “give or take” number for your
prediction from Question (c)?
- 10
(sqrt)1-(0.6)^2} =8
Q10. A tutoring center advertises its services by stating
that students who sign up improve their GPA on tests by 0.5 points on average.
Is this indeed evidence that the tutoring helps or could
this be due to the regression effect?
- The
improvement proves that the tutoring helps.
- The
improvement could be due to the regression effect.
Q11. True or false: If an observation with large leverage
has a small residual, then it is not influential.
- True
- False
Introduction to Statistics Coursera Week 5 Quiz Answers!
Confidence Intervals
Q1. A random sample of 500 sales prices of recently
purchased homes in a county is taken. From that sample, a 90% confidence
interval for the average sales price of all homes in the county is computed to
be $215,000 +/- $35,000.
Is the following statement true or false?
“About 90% of all home sales in the county have a sales
price in the range $215,000 +/- $35,000.”
- true
- false
Q2. A random sample of 500 sales prices of recently
purchased homes in a county is taken. From that sample, a 90% confidence
interval for the average sales price of all homes in the county is computed to
be $215,000 +/- $35,000.
Is the following statement: true or false?
“There is a 90% chance that the average sales price of all
homes in the county is in the range $215,000 +/- $35,000.”
- true
- false
Q3. Questions (a) and (b) below relate to the following:
Based on a sample of 500 salaries in a large city we want to find a confidence
interval for the average salary in that city.
Question (a): Is it possible to do this using the formula
“average +/- z SE”? (Keep in mind that the histogram of salaries is not normal
but quite skewed.)
- yes
- no
Q4. The margin of error for the confidence interval from
Question (a), which was based on 500 salaries, turns out to be $5,400. How many
salaries do we need to sample in order to shrink the margin of error to about
$2,000?
Q5. You are interested in what the current starting salary
for jobs in data science is. You solicit feedback on an online forum about data
science and you get 230 replies with salary numbers. Can you use the formula
“average +/- z SE” to find a confidence interval for the average starting
salary?
- yes
- no
Tests of Significance
Q1. Which of the following statements are true? (Select all
that apply.)
- The
p-value depends on the data.
- If
the pp-value is smaller than 5%, then there is less than a 5%
chance that the null hypothesis is true.
- If
the null hypothesis is true, then there is less than a 5% chance to get a
p-value that is smaller than 5%.
- If
a data scientist does many tests, then even if all the null hypotheses are
true, a certain proportion will be rejected in error.
Q2. Read the first five paragraphs of the article “Online
daters do better in the marriage stakes” by Regina Nuzzo in Nature News, 2013.
[You can find it on the internet or here]. The main claim of the article is
that there is a statistically significant difference in marital outcomes
between couples that meet online and couples that meet in other ways. Is this
finding is of practical relevance?
- yes
- no
Q3. A fair coin is tossed 100100 times.
Which of the following statements are true? (Select all that
apply.)
- The
standard error for the percentage of heads among the 100 tosses is 5%.
- The
standard error for the percentage of tails among the 100 tosses is5%.
- The
standard error for the quantity “percentage of heads – percentage of
tails” is \sqrt{0.05^2 + 0.05^2} = 7\%.0.052+0.052=7%.
Q4. Is there a relationship between age and insomnia? A
random sample of 184 people ages 18-29 was taken, and it was found that 26.1%
suffer from insomnia and 73.9% do not. A separate random sample of 811 people
ages 30 and over was taken, and it was found that 39.2% suffer from insomnia
and 60.8% do not.
Which of the following four test statistics are appropriate
for testing whether the prevalence of insomnia is different between the two age
groups? (Select all that are.)
Q5. You want to test whether plain M&Ms really contain
24% blue M&Ms as claimed on the manufacturer’s website. You sample 500
plain M&Ms at random and count the fraction of blue M&Ms.
Which of the following tests is appropriate to address this
question?
- z-test
- tt-test
- 2-sample z-test
- sign
test
- paired-difference
test.
Q6. A high school principal wants to find out whether the
average SAT score of this year’s graduating class is higher than last year’s.
She samples 13 students from this year’s graduating class at random and wants
to compare their average SAT score to the average SAT score from last year’s
graduating class.
- z-test
- t-test
- 2-sample z-test
- sign
test
- or
paired-difference test.
Q7. To investigate whether there is a difference in
scholastic abilities between first-borns and second-born siblings, 600 families
that have at least two children were randomly selected. The scholastic
abilities of the first-born and the second-born siblings were assessed with a
test and are to be compared.
- z-test
- t-test
- 2-sample z-test
- sign
test
- paired-difference
test.
Introduction to Statistics Week 6 Quiz Answers!
Resampling
Q1. We want to use the Monte Carlo method to estimate the
probability of getting exactly one ace (one spot) in three rolls of die.
Which of the following is a correct description for doing
this?
- To
simulate the roll of a die, we draw a number at random (with replacement)
from 1,2,3,4,5,6. To simulate the probability in question with B=1000
Monte Carlo simulations, we simulate the roll of a die 3B=3000 times and
count the number of times an ace comes up. Then we divide this number by
3B. The resulting proportion is our Monte Carlo estimate.
- To
simulate three rolls of a die, we draw three times a number at random
(with replacement) from 1,2,3,4,5,6. If we get the number `1′ exactly
once, then we label this trial to be a success. We repeat this B=1000
times. The proportion of successes in these 1000 trials is our Monte Carlo
estimate of the probability in question.
- To
simulate three rolls of a die, we draw three times a number at random
(with replacement) from 1,2,3,4,5,6. We repeat this simulation many times
until we get the number `1′ exactly once, then we stop. The desired Monte
Carlo estimate is 1/(number of repetitions).
Q2. We want to use the Monte Carlo Method to approximate the
standard error of our estimate from Question 1.
Which of the following is a correct description for doing
this?
- We
compute the standard deviation of the all the numbers we simulated in
Question 1.
- In
each of the B=1000 trials we simulated in Question 1, if the trial results
in a success (i.e. `1′ shows exactly once), then we give that trial the
label 1, otherwise the label 0. We compute the standard deviation of these
1000 labels.
- We
repeat the whole Monte Carlo simulation done in Question 1 many times
(e.g. 2000 times). Each time we get an estimate of the probability in
question. We compute the standard deviation of these 2000 estimates.
Analysis of Categorical Data
Q1. Questions (a)-(d) below relate to the following: Some
people suspect that childbirths may not be equally distributed over the seven
days of the week because hospital staff (who can influence the time of delivery
in some cases) may prefer to work on certain days of the week.
Question (a): Which of the following is the null hypothesis?
- childbirths
are more likely on certain days of the week
- childbirths
occur equally likely on the seven days of the week
Q2. To investigate, you note the day of the week of 300
births that were randomly selected from all births that occurred in New York
City last year.
Question (b): What test should you use to test the null
hypothesis?
- z-test
- chi-square
test for goodness-of-fit
- chi-square
test of independence
- chi-square
test of homogeneity
Q3. Question (d): What would be the answer to Question (b)
if you wanted to investigate a simpler question, namely whether the percentage
of births on weekends is lower than expected?
- z-test
- chi-square
test for goodness-of-fit
- chi-square
test of independence
- chi-square
test of homogeneity
Q4. This question and the next one are related to the
following context: A food delivery start-up decides to advertise its service by
placing ads on web pages. They wonder whether the percentage of viewers who
click on the ad changes depending on how often the viewers were shown the ad.
They randomly select 100 viewers from among those who were shown the add once,
135 from among those who were shown the add twice, and 150 from among those who
were shown the ad three times.
Which is the null hypothesis?
- the
chances that the user clicks on the ad increases with the number of ads
shown
- the
chances that the user clicks on the ad is the same for all three groups
Q5. In the previous question, which test is appropriate to
test the null hypothesis?
- z-test
- chi-square
test for goodness-of-fit
- chi-square
test of independence
- chi-square
test of homogeneity
Q6. A county wants to check whether the racial composition
of the teachers in the county corresponds to that of the population in the
county. It samples 500 teachers at random and wants to compare that sample with
the census numbers about the racial groups in that county.
Which test would be appropriate?
- z-test
- chi-square
test for goodness-of-fit
- chi-square
test of independence
- chi-square
test of homogeneity
- none
of these
Q7. An airline wants to find out whether there is a
connection between the customer’s status in its frequent flyer program and the
class of tickets that the customer buys. It samples 1,000 ticket records at
random and for each ticket notes the status level (‘none’, ‘silver’, ‘gold’)
and the ticket class (‘economy’, ‘business’,’first’)
- z-test
- chi-square
test for goodness-of-fit
- chi-square
test of independence
- chi-square
test of homogeneity
- none
of these
Q8. The airline wants to find out whether there is a connection
between the customer’s status in its frequent flyer program and the amount that
the customer spends on tickets in the following year. It samples 1,000 ticket
records at random and for each ticket notes the status level (‘none’, ‘silver’,
‘gold’) and the amount spent on tickets in the following year.
Which test would be appropriate?
- z-test
- chi-square
test for goodness-of-fit
- chi-square
test of independence
- chi-square
test of homogeneity
- none
of these
Introduction to Statistics Week 7 Quiz Answer!
One-Way Analysis of Variance
Q1. An online retailer strongly suspects that customers
purchase more in the following month if they are shown a company ad more often.
To confirm that hunch they randomly select 50 customers who are then sent one
ad, 45 customers who are sent two ads, and 52 customers who are sent three ads.
Which is the null hypothesis?
- the
spending means for the three groups are the same.
- the
spending means increase with the number of ads
Q2. Based on the description of the experiment in the
previous question and the boxplots below, do you think that the assumptions of
ANOVA are met?
- Yes
Q3. Based on the ANOVA table below and the boxplots, what is
the conclusion of the analysis?
- There
is no statistically significant effect.
- There
is sufficient evidence to conclude that the spending means to increase
with the number of ads.
- There
is sufficient evidence to conclude that the spending means are not equal,
but based on this analysis alone we cannot conclude that the spending
means to increase with the number of ads.
Q4. Does eye color effect the type of vision correction that
patients choose? From a large dataset of patients having vision correction, 70
patients were chosen randomly from those having brown eyes, 70 from those
having green eyes, and 70 from those having blue eyes. For each patient, the
type of vision correction was coded as follows: glasses=1, contact lenses=2,
corrective surgery=3. Those numbers were used for an ANOVA, which resulted in a
p-value of 0.5%.
Does the p-value of 0.5% mean that there is strong evidence
that that eye color has an effect on the type of vision correction that
patients choose?
- yes
- no
Q5. A clinical trial aims to discern whether twelve
interventions against high blood pressure have different effects. The study
randomizes 10,000 subjects into twelve groups. Each group is administered one
of the twelve interventions. After a month the change in blood pressure is
measured for each subject. The ANOVA table gives a p-value of 17%. The
investigators also perform pairwise two-sample t-tests for all pairs of
treatments and find that two pairs show a statistically significant difference.
Which of the following options describes a valid conclusion?
- There
is not enough evidence to conclude that the twelve treatment means are
different.
- We
can conclude that there are differences between the two pairs of
treatments that were found to be significant by the two-sample t-tests.
Introduction to Statistics Coursera Week 8 Quiz Answers!
Multiple Comparisons
Q1. Recall that a “discovery” occurs when a test rejects the
null hypothesis. In the medical literature a discovery is called a “positive
result”. So a “false positive” is a “false discovery”.
What is the false discovery proportion (FDP) of the
procedure that yielded the following results:
- 9/9+36
Q2. A medical study examines whether there is a significant
correlation between any of the 12 lifestyle choices and high blood pressure. It
doesn’t find any significant correlation, but upon further examination, the
researchers find a highly significant (p-value <0.5%) correlation
between two of the lifestyle choices. This correlation seems not to have been
noticed before.
Which of the following three statements is an appropriate
summary of these findings? Select all that apply.
- The
correlation between these two lifestyle choices is highly significant and
should be reported as such.
- The
seemingly significant correlation was found as a consequence of data
snooping and therefore the pp-value is not valid. The researchers
shouldn’t report anything.
- The
seemingly significant correlation was found as a consequence of data
snooping and therefore the pp-value is not valid. However, this
could potentially be a significant new finding. The researchers can report
it as such, pointing out that they cannot attach a valid pp-value
to this finding. It can serve as a hypothesis for a future study with new
data, which would then allow for statistically valid conclusions.
Q3. 1,000 tests were evaluated with the Bonferroni
correction. 31 tests had corrected pp-values smaller than 5%.
Which of the following three statements is an appropriate
conclusion?
- If
we reject these 31 null hypotheses then we can expect that about 5% of
them are rejected in error.
- This
is sufficient evidence to reject all of these 31 null hypotheses because
there is only a 5% chance that any of these 31 pp-values would be
this small if the null hypotheses were true.
- There
is a 95% probability that all of these 31 null hypotheses are false.
Q4. 1,000 tests were evaluated with the FDR at the 5% level,
which resulted in 31 discoveries.
Which of the following three statements is an appropriate
conclusion?
- There
is a 95% probability that all of these 31 null hypotheses are false.
- This
is sufficient evidence to reject all of these 31 null hypotheses, because
there is only a 5% chance that any of these 31 pp-values would be
this small if the null hypothesis were true.
- If we reject these 31 null hypotheses then we can expect that about 5% of them are rejected in error.
Post a Comment