AI and fundamental stats

In my upcoming introductory stat class I am planning to include "critical evaluation of AI analyses" as a core learning objective. Starting to explore how ChatGPT 4.0 handles some of my favorite questions and it turns out that it does rather well.

I flipped a coin 500 times and 300 times it came up heads. Is this substantial evidence that it is not a fair coin?

*Visualization of the problem I posed to ChatGPT and that I introduce on the first day of class each year.*

ChatGPT 4.0: To determine whether this is substantial evidence that the coin is not fair, we could conduct a hypothesis test.

The null hypothesis (H0) is that the coin is fair, meaning it has a 0.5 probability of landing heads. The alternative hypothesis (H1) is that the coin is not fair, meaning it has a probability of landing heads that is not 0.5.

If we model this as a binomial distribution, the expected value for 500 flips of a fair coin is 0.5*500 = 250 heads. You observed 300 heads, which is more than expected under H0.

However, to decide if this difference is statistically significant, you'd want to calculate a p-value, which is the probability of observing a result as extreme as you did (or more so) under the null hypothesis.

In general, a p-value of less than 0.05 is considered statistically significant in many fields.

In this case, you would find the standard deviation, or sqrt(n_p_(1-p)), where n is the number of flips (500) and p is the probability of heads (0.5). This comes out to sqrt(500_0.5_0.5) = 11.18.

The z-score, or (observed - expected)/standard deviation, is (300 - 250)/11.18 = 4.47. This z-score is quite large, indicating that your result is highly unlikely to have occurred by chance if the coin were fair. The corresponding p-value for a z-score of 4.47 is extremely small, way less than 0.05, providing strong evidence to reject the null hypothesis that the coin is fair.

Please note this calculation assumes a normal approximation of the binomial distribution, which is reasonable here due to the large number of flips. Also note that statistical significance doesn't imply practical significance.

Suppose that 10% of a population is infected with a disease. I have a test that is 90% accurate. If I test positive for the disease, what is the probability that I actually have the disease?

ChatGPT 4.0: To answer this question, we can use Bayes' theorem, which helps us update our beliefs based on new evidence. Here, the test being "90% accurate" means that it has both a 90% sensitivity (it correctly identifies 90% of infected individuals) and a 90% specificity (it correctly identifies 90% of non-infected individuals).

Let's denote:

P(D) = Probability of having the disease = 0.1 (10% of the population is infected)
P(~D) = Probability of not having the disease = 0.9
P(T|D) = Probability of testing positive given the disease = 0.9 (sensitivity)
P(T|~D) = Probability of testing positive given no disease = 0.1 (1 - specificity)

We want to find P(D|T), the probability of having the disease given a positive test. By Bayes' theorem:

P(D|T) = [P(T|D) P(D)] / [P(T|D) P(D) + P(T|~D) * P(~D)]

Plugging in the values:

P(D|T) = [0.9 0.1] / [(0.9 0.1) + (0.1 * 0.9)]

P(D|T) = 0.09 / (0.09 + 0.09)

P(D|T) = 0.09 / 0.18

P(D|T) = 0.5

So, if you test positive for the disease, the probability that you actually have the disease is 50%.

The Great Gatsby Curve

Scott Alister McKinley