## Week- 2

### Special case of age

1.
Question 1
Pablo requests the birth records for every individual in his region. He is told that the data set contains everyone’s date of birth so he will be able to calculate their age in days if he wishes. What sort of data will Pablo have:

1 point

• Continuous
• Integer
• Ordinal

2.
Question 2
When Pablo receives the data set he finds that in fact the version of the data set that he has been given contains age group rather than dates of birth. Each individual has been classified as <18 years, 18-44, 45-64 and 65+ years of age. What sort of data does Pablo actually have:

1 point

• Continuous
• Binary
• Ordinal

3.
Question 3
Meghan downloads the following death rate data for the population of England and Wales:

Age Number of people Number of deaths Crude death rate per 1000 people
“Young” (<65) 80,971 47,863,700 1.69
“Old” (65+) 442,886 10,517,500 42.11
True or false:

The death rate for males aged 65 or older in England and Wales is 42.11.
1 point

• True
• False

4.
Question 4
2. The death rate in England and Wales remains constant at 42.11 deaths per 1000 people for ages 0 to 64.

1 point

• True
• False

## Week- 2

### Well-behaved Distributions

1.
Question 1
Match this distribution to the plot:

Normal with mean 50 and standard deviation 12

Note: There are two correct answers

1 point

c. d. 2.
Question 2
Match this distribution to the plot:

Poisson with mean 4

Note: There are two correct answers

1 point

a. c. 3.
Question 3
Match this distribution to the plot:

Normal with mean 50 and standard deviation 4

Note: There are two correct answers

1 point

a. c. 4.
Question 4
Which of the following plots shows the distribution with the biggest standard deviation?

1 point

a. 5.
Question 5
What proportion of the data lies in the shaded area on the plot below?

1 point

68%

95%

50%

6.
Question 6
What proportion of the data lies in the shaded area on the plot below?

1 point

68%

95%

50%

7.
Question 7
A drug is given to 100 migraine suffers to prevent the onset of new migraines. 40% experience a new migraine after taking the drug. What distribution does the outcome (new migraine) follow:

1 point

Binomial

Normal

Poisson

8.
Question 8
A new drug is given to 100 asthma suffers to reduce the number of hospital admissions due to asthma attack over a 12 month period. After 12 months, the mean number of hospital admissions is 2. What distribution does the outcome (hospital admissions) follow:

1 point

Binomial

Normal

Poisson

9.
Question 9
The normal distribution is a:

1 point

Discrete distribution

Continuous distribution

10.
Question 10
The Poisson distribution is a:

1 point

Discrete distribution

Continuous distribution

11.
Question 11
The Binomial distribution is a:

1 point

Discrete distribution

Continuous distribution

12.
Question 12
Which of these does not follow a Poisson distribution:

1 point

Asthma exacerbations over a 12-month period

Patients arriving at a hospital emergency department in a one hour time period

Number of patients in disease remission

Number of patient falls on a geriatric ward over a twelve-hour shift.

## Ways of Dealing with Weird Data

1.
Question 1
The video introduced the idea that data do not always fit well-behaved distributions. However, this matters to a greater or lesser extent depending on how you plan to use the data. The following will test your understanding of this and the potential solutions available to you when you have “weird” data.

Dev has collected information on the average number of times a month that people viewed a particular public health information website (he has no information on people who did not access the website at all). He plots the data and observes the following:

Dev wants to describe website access in his sample. What would be the best approach for him to do this?

1 point

Try transforming the data to see if it makes the distribution more normal and analyse as a normal distribution.

Dichotomise the data into high and low usage using a cut point such as 5 or more times a month on average and analyse as a binomial distribution.

Present a simple summary table of frequencies and proportion of people by average number of logins.

2.
Question 2
Ji-woo is conducting a study that is looking at the effects of a new drug on vision compared with a group that receive standard care. The vision outcome is measured by the ETDRS (a visual acuity scale), which has a range from 0-100 (complete sight loss to perfect vision). She collects the ETDRS at baseline before the drug/standard care is administered and 6 months later. At baseline, the sample contains patients with very poor vision, including some with complete vision loss. The literature shows that the baseline scores are likely to be positively skewed. Ji-woo wants to compare change scores on the ETDRS between baseline and 6 months across the two treatment groups. How should Ji-woo proceed?

In thinking about your answer, one of the things you should consider is how the doctor might most easily communicate the information to the patient.

1 point

Present the mean change scores by group.

Dichotomise the change scores so that the data follows a binomial distribution.

3.
Question 3
Nisha has data that contain each person’s average daily fruit and vegetable consumption over the course of a year for the last ten years. An extract is given in the table below.

A histogram of the data for year 1 is shown below:

She wants to draw a graph of the trend over this 10-year period. She decides she needs to get a summary measure for each year to compare over time. How can she best summarise the data per year to make a comparison over time:

1 point

Calculate the mean average daily fruit and vegetable consumption for each year.

Calculate the proportion per year that eat above the daily recommended amount.

### Sampling

1.
Question 1
Which one of the following defines the standard error of a mean?

1 point

The difference between the population mean and the sample mean

The average difference between the population mean and the sample mean

The average difference between the individual observations and the sample mean

2.
Question 2
Lucy takes a sample of BMI values across her class of 35 students. The sample mean and standard deviation are 23.2 and 2 respectively. What is the estimated standard error of Lucy’s sample:

1 point

0.06

0.34

3.92

3.
Question 3
Lucy want to calculate the 95% confidence interval for the sample mean. What is Lucy’s estimated 95% confidence interval:

1 point

(22.53, 23.87)

(19.28, 27.12)

(21.20, 25.20)

## Week- 3

### Distributions and Medians

1.
Question 1
Match the below plot with the correct distribution.

1 point

Poisson(4)

Uniform (0,100)

Normal (75, 10)

Binomial (100, 0.5)

2.
Question 2
Match the below plot with the correct distribution.

1 point

Poisson(4)

Normal (75, 10)

Uniform (0,100)

Binomial (100, 0.5)

3.
Question 3
Match the below plot with the correct distribution.

1 point

Poisson(4)

Normal (75, 10)

Binomial (100, 0.5)

Uniform (0,100)

4.
Question 4
Match the below plot with the correct distribution.

1 point

Poisson(4)

Binomial (100, 0.5)

Uniform (0,100)

Normal (75, 10)

5.
Question 5
For the sequence of numbers 3, 4, 5, 5, 7, 36, what is the Mean?

1 point

5

3

4

10

6

6.
Question 6
For the sequence of numbers 3, 4, 5, 5, 7, 36, what is the Median?

1 point

4

3

6

5

10

7.
Question 7
For the sequence of numbers 7, 7, 5, 3, 2, 12, what is the Mean?

1 point

6

10

4

5

3

8.
Question 8
For the sequence of numbers 7, 7, 5, 3, 2, 12, what is the Median?

1 point

5

4

10

6

3

## Week- 4

### Results: Running a New Hypothesis Test

1.
Question 1
Suppose you want to compare the proportions of overweight and cancer. First, define your variables:

3
cancer <- g\$cancer

overweight <- ifelse(g\$bmi >= 25, 1, 0)
Have a look at your new variable to check everything makes sense:

7
table(overweight)

overweight

0 1

34 32
Next perform a chi-squared test. For best practice, assigning the explanatory variable to x and the dependent variable to y. The “dependent variable” is so named because we are hypothesising that its value depends at least partly on some other variable(s) – called the “explanatory variable(s)”.

1
chisq.test(x = overweight, y = cancer)
What did you get? What do you conclude?

Enter the p value in the box below (to 2 decimal places) and tick which of the given options for the conclusion you agree with.

1 point

 0.65

2.
Question 2
Tick which of the below given options for the conclusion you agree with.

1 point

Being overweight gives you cancer

Being overweight protects you from getting cancer

Being overweight does not give you cancer

There is no association between being overweight and cancer

There is good evidence of an association between being overweight and cancer

There is no evidence of an association between being overweight and cancer anywhere in the world

There is no evidence of an association between being overweight and cancer in this data set

### Hypothesis Testing

1.
Question 1
In each of the following six questions, you’ll be asked to choose the single correct answer.

David takes 5 samples of 10 patients from the National Cancer Registry. He calculates mean BMI values for each of these 5 samples and obtains the following results – 24.3, 27.9, 25.2, 26.7, 26.4. Why are David’s sample means all different?

1 point

Sampling variation

Population variation

Measurement error

2.
Question 2
Charlotte wants to test the mean BMI value in the National Cancer Registry based on a sample of 100 patients. She hypothesizes that the mean BMI value in her sample will be 27. Before she conducts her experiment, her boss points out an error in her hypothesis. What is wrong with Charlotte’s statement?

1 point

The hypothesis should relate to the population value.

She hasn’t specified her alpha value.

27 is an unreasonable value for mean BMI.

3.
Question 3
Charlotte corrects her hypotheses and randomly selects her sample of 100 patients. She has decided to use a two-sided alpha value of 0.01 instead of the conventional value of 0.05 because she believes that this will decrease her risk of making the wrong conclusion. Will this lower value reduce her risk of concluding the mean population BMI is 27 when in fact it isn’t?

1 point

Yes

No

4.
Question 4
Charlotte’s colleague repeats her experiment but chooses a two-side alpha value of 0.05. What happens to the chance area (or probability of making a type I error)?

1 point

Becomes larger

Stays the same

Becomes smaller

5.
Question 5
How many degrees of freedom will Charlotte’s test have?

1 point

99

100.

0.01

0.05

6.
Question 6
Noah has the following data and wants to test whether age-group is associated with the presence or absence of cancer. He decides to perform a chi-squared test.

How many degrees of freedom does his test have?

1 point

919

4

920

10

## End-of-course Assessment

1.
Question 1
Part of the success of the UN’s Millennium Development Goals was due to the statistical monitoring of data on measures such as infant mortality and living in extreme poverty.

1 point

True

False

2.
Question 2
As long as a research question is interesting, it is scientifically testable as a hypothesis – the more interesting, the more testable.

1 point

true

false

3.
Question 3
In the study published in the Journal of the American College of Cardiology on the effect of taking supplements of vitamins and minerals that you read earlier in this course, they concluded that, in simple terms, there’s no health benefit in taking such supplements (with the exception of folic acid) and there might even be some risk.

1 point

true

false

4.
Question 4
That phrase that I wrote in the previous question, “there’s no health benefit in taking supplements”, uses precise enough language to be used in a hypothesis test.

1 point

true

false

5.
Question 5
The responsibility for accurate reporting of medical research always lies solely with the journalist. If there’s a misinterpretation of the results, the scientist is never to blame.

1 point

true

false

6.
Question 6
The next set of questions concern data types and exploratory analyses in R. A histogram is a useful but rough way to assess whether a variable is normally distributed.

1 point

true

false

7.
Question 7
When undertaking a t-test in R, it is fine to use “t.test” before “hist” and “summary”

1 point

true

false

8.
Question 8
You want to see whether patients with cancer have different mean BMIs from those without. If you type t.test(cancer~bmi). Please select all that apply.

1 point

You have written “cancer” and “bmi” the wrong way round

BMI should be roughly normally distributed for a t-test to be valid

You should have done a chi-squared test instead

9.
Question 9
Your boss reminds you that BMI is often categorised, with underweight, normal weight etc as categories. Which of the following is/are correct?

1 point

Making categories from a normally distributed variable loses a lot of information, and it’s more efficient to compare means instead of proportions

If you did categorise BMI, you could do a chi-squared test using “chisq.test” in R

The chi-squared statistic that R gives you in the output is really useful and should always be reported

10.
Question 10
You decide to turn BMI into categories because they are of public interest, even though it loses information. Before running the above chi-squared test, you make the variable “bmi.group”. You should run these commands in R first and for the reason given…

1 point

table(cancer) in order to check how many values “cancer” has

table(bmi.group, exclude=NULL) to check your code for grouping BMI gives sensible results

hist(bmi) to check that BMI is roughly normally distributed

summary(bmi) to check that BMI is roughly normally distributed

table(bmi) in order to see how common each BMI value is

11.
Question 11
The next set of questions concern the interpretation of official mortality figures from India. These figures are publicly available from https://data.gov.in/catalog/estimated-age-specific-death-rates-sex and in the reading before this test (pdf download) and give the rates of death per 1,000 population in each age-gender group.

datafile (est age-sex death rates 2006-11 in India).pdf
PDF File
True or false?

These data were published in April 2014, but they only go up to 2011. Such delays in releasing official data are common in many countries.

1 point

True

False

12.
Question 12
In 2011 in India according to official statistics, the estimated death rate for girls aged under 1 was higher than that for every older age group until 75-79.

datafile (est age-sex death rates 2006-11 in India).pdf
PDF File
1 point

true

false

13.
Question 13
The lack of a 95% confidence interval for either of these estimates means that we can safely say that 49.7 is statistically significantly higher than 42.5.

datafile (est age-sex death rates 2006-11 in India).pdf
PDF File
1 point

true

false

14.
Question 14
To see whether these two rates (42.5 and 49.7 per 1,000) are statistically significantly different from one another, we would carry out a t-test and interpret its p value.

datafile (est age-sex death rates 2006-11 in India).pdf
PDF File
1 point

true

false

15.
Question 15
As these rates are in fact based on proportions (they’re proportions of the population of each age group that died that are then multiplied by 10 to make them easier to read), the appropriate test is a chi-squared test. We have enough information to carry out this test.

datafile (est age-sex death rates 2006-11 in India).pdf
PDF File
1 point

true

false