admin管理员组

文章数量:1532060

Table of contents

  • 1 Proportion
  • 2 Mean
  • 3 Welch's T-test

1 Proportion

Experiment: test color color of a button

  • Click through probability: N(users who clicked) / N(total users)
  • 1000 users in both control and treatment groups

Results:

  • Control group: 1.1% CTP
  • Treatment group: 2.3% CTP

Significance:

  • Practical significant boundary: 0.01
  • Significance level α \alpha α : 0.05

Make a decision:

  • Significant difference? Launch the “feature”?

Questions

1. Which hypothesis test to use?
2. What is the null hypothesis?
3. Is the result statistically significant?
4. Is the result practically significant?

  • Bernoulli population: either clicks or doesn’t click
  • Control group: n*p = 1000 * 1.1% = 11
  • Treatment group: n * p = 1000 * 2.3% = 23
  • Both np and n(1-p) are larger than 10, so we can consider it as large samples. Test statistic follows Z-distribution.

T-Test Z-Test 的区别?

  • https://zhuanlan.zhihu/p/120181558

Measurements

  • Users clicked X c t X_{ct} Xct, X t r X_{tr} Xtr
  • Total number of users n c t n_{ct} nct, n t r n_{tr} ntr

P c t P_{ct} Pct = X c t X_{ct} Xct / n c t n_{ct} nct = 11 / 1000
P t r P_{tr} Ptr = X t r X_{tr} Xtr / n t r n_{tr} ntr = 23 / 1000

What is the null hypothesis?

We want to measure the difference of P t r P_{tr} Ptr and P c t P_{ct} Pct .

d = P t r P_{tr} Ptr - P c t P_{ct} Pct

Null hypothesis:

H 0 H_{0} H0: P t r P_{tr} Ptr = P c t P_{ct} Pct , d = 0
d ~ N(0, S E 2 SE^{2} SE2)

We don’t know the standard deviation of d, so we need to estimate it.

Test statistic:

TS = ( P t r (P_{tr} (Ptr - P c t ) / S E P_{ct}) / SE Pct)/SE

Estimate a standard error:

  • Choose a SE can represent both groups
  • “Pooled” standard error

Compute “pooled” SE

  • “Pooled” probability of a click, p’
  • Total probability across 2 groups:

P ′ P' P = ( X c t + X t r ) / ( n c t + n t r ) (X_{ct} + X_{tr}) / (n_{ct} + n_{tr}) (Xct+Xtr)/(nct+ntr) = (11+23) / (1000+1000) = 0.017
Test statistics

TS = ( P t r (P_{tr} (Ptr - P c t ) / S E P_{ct}) / SE Pct)/SE = 0.012 / 0.00578 = 2.076


Is result statistically significant?

  • critical z-score ( α \alpha α: 0.05) = 1.96
  • TS > 1.96 or TS < -1.96, reject null hypothesis
  • In this example, Test is statistically significant.

Is result practically significant?

  • Confidence interval of d

  • Center of C.I. = 0.012 (This is P t r P_{tr} Ptr - P c t P_{ct} Pct )

  • Width of C.I. (margin of error)

m = Z * S p o o l S_{pool} Spool = 1.96 * 0.00578 = 0.0113

CI of d: 0.012 ± 0.0113 = 0.0007 ~ 0.0233


Best guess: There is a practical significant change.
It’s possible the change is not practical significant.

Make launch decision:

  • Not confident the change is practically significant.
  • Not recommend launch the feature.

Checking statistical significance:

  • Check if CI overlaps with 0: If it does, result is not statistically significant.
  • Equivalent to comparing TS with critical value.

2 Mean

Experiment: if a new feature changes avg. number of posts


Correction: Mean of treatment is 1.7

What conclusion can you draw?

  • Assume variances are similar.

Significance:

  • Practical significant boundary: 0.05
  • Significance level α \alpha α : 0.05


Correction: Spool = 1.06

SS: Sum of square

Margin of error would be t-score*Spool (1/(1/nc +1/nt)^1/2), which would come to be ~0.51 which +/- from d-hat (0.6) would be above the significance level of 0.05


3 Welch’s T-test


Reference: https://www.youtube/watch?v=6uw0A3aKwMc

本文标签: Hypothesistesting