t-분포(t-distribution)

t-분포 개요

\(\)

t-분포 (Student’s t-Distribution) 개요

t-분포는 통계학에서 주로 작은 표본 크기의 데이터를 분석할 때 사용됩니다. 이 분포는 정규 분포와 유사하지만, 꼬리가 더 두꺼워서 극단적인 값이 나올 가능성이 더 큽니다. 표본 크기가(n>30) 커질수록 t-분포는 정규 분포에 가까워집니다. t-분포는 주로 t-검정(t-test)와 신뢰 구간(CI : Confidence Intervals)을 계산할 때 사용된다..

주요 특성

모양: t-분포는 평균이 0이고, 중심이 대칭적인 종 모양을 가지고 있습니다. 꼬리가 두꺼워서 극단값이 발생할 가능성이 더 큽니다.
자유도(df : degrees of freedom): t-분포의 모양은 자유도에 따라 달라집니다. 자유도가 커질수록 분포는 정규 분포에 가까워집니다.
평균과 분산:
- 평균 : 0
- 분산 : ( \( \frac{\nu}{\nu – 2}) (자유도 (\nu > 2) \))
- \( \nu = df \)

\( nu=1 \)
\( nu=2 \)
\( nu=5 \)
\( nu= \infty \)

t-분포 vs 정규 분포

t-분포는 정규 분포와 달리 표본 크기가 작은 경우에도 신뢰 구간과 가설 검정을 수행할 수 있습니다.
표본 크기가 작을수록(즉, 자유도가 낮을수록) t-분포의 꼬리가 더 두꺼워집니다.

t-분포의 활용 예시

t-검정 (t-test):
- 단일 표본 t-검정: 표본 평균이 알려진 모집단 평균과 다른지 여부를 검정합니다.
- 독립 표본 t-검정: 두 개의 독립적인 표본이 동일한 모집단에서 나왔는지 검정합니다.
- 대응 표본 t-검정: 동일한 표본에서 두 번 측정한 결과의 차이를 검정합니다.

신뢰 구간 (Confidence Interval):
- t-분포를 사용하여 작은 표본 크기에 대해 평균에 대한 신뢰 구간을 계산합니다.

t-분포 Python 시각화

다음은 Python 코드를 사용하여 t-분포를 시각화하는 예제입니다.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import t, norm

# 자유도 설정
df1 = 5
df2 = 10
df3 = 30

# x 값 생성
x = np.linspace(-5, 5, 1000)

# t-분포와 정규 분포 계산
y_t1 = t.pdf(x, df1)
y_t2 = t.pdf(x, df2)
y_t3 = t.pdf(x, df3)
y_norm = norm.pdf(x)

# 플롯 생성
plt.figure(figsize=(10, 6))
plt.plot(x, y_t1, label=f't-distribution (df={df1})')
plt.plot(x, y_t2, label=f't-distribution (df={df2})')
plt.plot(x, y_t3, label=f't-distribution (df={df3})')
plt.plot(x, y_norm, label='Standard Normal Distribution', linestyle='dashed')

plt.title('Comparison of t-Distributions with Normal Distribution')
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.legend()
plt.grid(True)
plt.show()

이 코드는 자유도가 다른 t-분포와 표준 정규 분포를 비교하여 시각화합니다. 이 그래프를 통해 t-분포가 자유도에 따라 어떻게 변화하는지, 그리고 자유도가 커질수록 정규 분포에 가까워지는지를 확인할 수 있습니다.

t-분포 Python 시각화

R code snippet to plot the t-distribution (with 1 degree of freedom)
normal distribution together, with the specified colors

# Load necessary library
library(ggplot2)

# Generate data for plotting
x <- seq(-5, 5, length.out = 1000)
normal_density <- dnorm(x, mean = 0, sd = 1)
t_density <- dt(x, df = 1)

# Create a data frame
data <- data.frame(x = x, normal = normal_density, t_dist = t_density)

# Plot using ggplot2
ggplot(data, aes(x = x)) +
  geom_line(aes(y = normal, color = "Normal Distribution (mean=0, sd=1)"), linetype = "dashed") +
  geom_line(aes(y = t_dist, color = "t-Distribution (df=1)")) +
  scale_color_manual(values = c("Normal Distribution (mean=0, sd=1)" = "red", 
                                "t-Distribution (df=1)" = "blue")) +
  labs(title = "Normal Distribution vs t-Distribution (df=1)",
       x = "Value",
       y = "Probability Density",
       color = "Distribution") +
  theme_minimal()

This code will generate a plot similar to the one created with Python, showing the normal distribution in red and the t-distribution with 1 degree of freedom in blue. You can run this code in any R environment.