概要
離散確率分布の1つである超幾何分布 について解説します。
確率関数
確率変数 X X X が次のような確率関数をもつとき、X X X はパラメータ N , k , n N, k, n N , k , n の超幾何分布 (hypergeometric distribution) に従うという。
f X ( x ) = { ( K x ) ( N – K n – x ) ( N n ) n = 0 , 1 , ⋯ , n 0 その他の場合
f_X(x) =
\begin{cases}
\frac{\binom{K}{x} \binom{N – K}{n – x}}{\binom{N}{n}} & n = 0, 1, \cdots, n \\
0 & その他の場合
\end{cases}
f X ( x ) = ⎩ ⎨ ⎧ ( n N ) ( x K ) ( n – x N – K ) 0 n = 0 , 1 , ⋯ , n その他の場合 ただし、N ∈ { 0 , 1 , ⋯ } , K ∈ { 0 , 1 , ⋯ , N } , n ∈ { 0 , 1 , ⋯ , N } , N ≥ K , N ≥ n N \in \{0, 1, \cdots\}, K \in \{0, 1, \cdots, N\}, n \in \{0, 1, \cdots, N\}, N \ge K, N \ge n N ∈ { 0 , 1 , ⋯ } , K ∈ { 0 , 1 , ⋯ , N } , n ∈ { 0 , 1 , ⋯ , N } , N ≥ K , N ≥ n とする。
確率関数である
∑ x = 0 n ( K x ) ( N – K n – x ) ( N n ) = ( N n ) ( N n ) ∵ ∑ x = 0 n ( K x ) ( N – K n – x ) = ( N n ) = 1
\begin{aligned}
\sum_{x = 0}^n \frac{\binom{K}{x} \binom{N – K}{n – x}}{\binom{N}{n}}
&= \frac{\binom{N}{n}}{\binom{N}{n}}
\quad \because \sum_{x = 0}^n \binom{K}{x} \binom{N – K}{n – x} = \binom{N}{n} \\
&= 1
\end{aligned}
x = 0 ∑ n ( n N ) ( x K ) ( n – x N – K ) = ( n N ) ( n N ) ∵ x = 0 ∑ n ( x K ) ( n – x N – K ) = ( n N ) = 1
期待値
E [ X ] = ∑ x = 0 n x ( K x ) ( N – K n – x ) ( N n ) = ∑ x = 1 n x ( K x ) ( N – K n – x ) ( N n ) ∵ 第 0 項は 0 = ∑ x = 0 n x K x n N ( K – 1 x – 1 ) ( N – K n – x ) ( N – 1 n – 1 ) ∵ ( a b ) = a b ( a – 1 b – 1 ) = n K N ∑ x = 1 n ( K – 1 x – 1 ) ( N – K n – x ) ( N – 1 n – 1 ) = n K N ∑ x = 0 n f X ( x ) ∵ パラメータ N – 1 , K – 1 , n – 1 の超幾何分布 = n K N
\begin{aligned}
E[X]
&= \sum_{x = 0}^n x \frac{\binom{K}{x} \binom{N – K}{n – x}}{\binom{N}{n}} \\
&= \sum_{x = 1}^n x \frac{\binom{K}{x} \binom{N – K}{n – x}}{\binom{N}{n}} \quad \because 第0項は0 \\
&= \sum_{x = 0}^n x \frac{K}{x} \frac{n}{N} \frac{\binom{K – 1}{x – 1} \binom{N – K}{n – x}}{\binom{N – 1}{n – 1}}
\quad \because \binom{a}{b} = \frac{a}{b} \binom{a – 1}{b – 1} \\
&= n \frac{K}{N} \sum_{x = 1}^n \frac{\binom{K – 1}{x – 1} \binom{N – K}{n – x}}{\binom{N – 1}{n – 1}} \\
&= n \frac{K}{N} \sum_{x = 0}^n f_X(x) \quad \because パラメータ N – 1, K – 1, n – 1 の超幾何分布 \\
&= n \frac{K}{N} \\
\end{aligned}
E [ X ] = x = 0 ∑ n x ( n N ) ( x K ) ( n – x N – K ) = x = 1 ∑ n x ( n N ) ( x K ) ( n – x N – K ) ∵ 第 0 項は 0 = x = 0 ∑ n x x K N n ( n –1 N –1 ) ( x –1 K –1 ) ( n – x N – K ) ∵ ( b a ) = b a ( b –1 a –1 ) = n N K x = 1 ∑ n ( n –1 N –1 ) ( x –1 K –1 ) ( n – x N – K ) = n N K x = 0 ∑ n f X ( x ) ∵ パラメータ N –1 , K –1 , n –1 の超幾何分布 = n N K
分散
E [ X ( X – 1 ) ] = ∑ x = 0 n x ( x – 1 ) ( K x ) ( N – K n – x ) ( N n ) = ∑ x = 2 n x ( x – 1 ) ( K x ) ( N – K n – x ) ( N n ) ∵ 第 0 項、第 1 項は 0 = ∑ x = 2 n x ( x – 1 ) K x K – 1 x – 1 n N n – 1 N – 1 ( K – 2 x – 2 ) ( N – K n – x ) ( N – 2 n – 2 ) ∵ ( a b ) = a b ( a – 1 b – 1 ) = n ( n – 1 ) K ( K – 1 ) N ( N – 1 ) ∑ x = 2 n ( K – 2 x – 2 ) ( N – K n – x ) ( N – 2 n – 2 ) = n ( n – 1 ) K ( K – 1 ) N ( N – 1 ) ∑ x = 0 n f X ( x ) ∵ パラメータ N – 2 , K – 2 , n – 2 の超幾何分布 = n ( n – 1 ) K ( K – 1 ) N ( N – 1 )
\begin{aligned}
E[X(X – 1)]
&= \sum_{x = 0}^n x (x – 1) \frac{\binom{K}{x} \binom{N – K}{n – x}}{\binom{N}{n}} \\
&= \sum_{x = 2}^n x (x – 1) \frac{\binom{K}{x} \binom{N – K}{n – x}}{\binom{N}{n}}
\quad \because 第0項、第1項は0 \\
&= \sum_{x = 2}^n x (x – 1)
\frac{K}{x} \frac{K – 1}{x – 1} \frac{n}{N} \frac{n – 1}{N – 1}
\frac{\binom{K – 2}{x – 2} \binom{N – K}{n – x}}{\binom{N – 2}{n – 2}}
\quad \because \binom{a}{b} = \frac{a}{b} \binom{a – 1}{b – 1} \\
&= n (n – 1) \frac{K (K – 1)}{N (N – 1)}
\sum_{x = 2}^n \frac{\binom{K – 2}{x – 2} \binom{N – K}{n – x}}{\binom{N – 2}{n – 2}} \\
&= n (n – 1) \frac{K (K – 1)}{N (N – 1)} \sum_{x = 0}^n f_X(x)
\quad \because パラメータ N – 2, K – 2, n – 2 の超幾何分布 \\
&= n (n – 1) \frac{K (K – 1)}{N (N – 1)} \\
\end{aligned}
E [ X ( X –1 )] = x = 0 ∑ n x ( x –1 ) ( n N ) ( x K ) ( n – x N – K ) = x = 2 ∑ n x ( x –1 ) ( n N ) ( x K ) ( n – x N – K ) ∵ 第 0 項、第 1 項は 0 = x = 2 ∑ n x ( x –1 ) x K x –1 K –1 N n N –1 n –1 ( n –2 N –2 ) ( x –2 K –2 ) ( n – x N – K ) ∵ ( b a ) = b a ( b –1 a –1 ) = n ( n –1 ) N ( N –1 ) K ( K –1 ) x = 2 ∑ n ( n –2 N –2 ) ( x –2 K –2 ) ( n – x N – K ) = n ( n –1 ) N ( N –1 ) K ( K –1 ) x = 0 ∑ n f X ( x ) ∵ パラメータ N –2 , K –2 , n –2 の超幾何分布 = n ( n –1 ) N ( N –1 ) K ( K –1 ) よって、分散は
V a r [ X ] = E [ X 2 ] – [ E ( X ) ] 2 = E [ X ( X – 1 ) ] + E [ X ] – [ E ( X ) ] 2 = n ( n – 1 ) K ( K – 1 ) N ( N – 1 ) + n K N – ( n K N ) 2 = n K N 2 + n K – n N – K N N 2 ( N – 1 ) = n K ( N – K ) ( N – n ) N 2 ( N – 1 )
\begin{aligned}
Var[X]
&= E[X^2] – [E(X)]^2 \\
&= E[X(X – 1)] + E[X] – [E(X)]^2 \\
&= n (n – 1) \frac{K (K – 1)}{N (N – 1)} + n \frac{K}{N} – \left( n \frac{K}{N} \right)^2 \\
&= n K \frac{N^2 + nK – nN – KN}{N^2 (N – 1)} \\
&= \frac{n K (N – K)(N – n)}{N^2 (N – 1)} \\
\end{aligned}
Va r [ X ] = E [ X 2 ] – [ E ( X ) ] 2 = E [ X ( X –1 )] + E [ X ] – [ E ( X ) ] 2 = n ( n –1 ) N ( N –1 ) K ( K –1 ) + n N K – ( n N K ) 2 = n K N 2 ( N –1 ) N 2 + n K – n N – K N = N 2 ( N –1 ) n K ( N – K ) ( N – n )
標準偏差
S t d ( X ) = V a r ( X ) = ( n K ( N – K ) ( N – n ) N 2 ( N – 1 ) ) 1 2
Std(X) = \sqrt{Var(X)} = \left( \frac{n K (N – K)(N – n)}{N^2 (N – 1)} \right)^\frac{1}{2}
St d ( X ) = Va r ( X ) = ( N 2 ( N –1 ) n K ( N – K ) ( N – n ) ) 2 1
scipy.stats の超幾何分布
scipy.stats.hypergeom で超幾何分布に従う確率変数を作成できます。
サンプリング
確率質量関数
累積分布関数
統計量
mean 4.199999999999999
var 1.1494736842105264
std 1.0721351053904198
コメント