Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur or how likely it is that a proposition is true.
Sample Space : the set of all possible outcomes or results of an experimentor random trial. Usually it is denoted using set notation and the possible ordered outcomes are listed as elements in the set.
Event : the subset of sample space
total event( \(S\) ) : the event which includes all elements.
null event( \(\phi\) ): the event which does not includes any element.
complementary event : the event without elements of another Event.
\[\text{Assume the random variable X is continuous and a > 0. } \Rightarrow P(X \ge a) \le \frac{E(X)}{a}\]
\[\text{Let X be a random variable with finite expected value } \mu \text{ and finite non-zero variable } \sigma ^2. \\\\ \text{ Then for any real number k > 0 }, P(|X - \mu| \ge k \sigma) \le \frac{1}{k^2}.\]
Bernoulli Distribution
The Bernoulli distribution is the discrete probability distribution of a random variable which takes the value 1 with probability \(p\) and the value \(0\) with probability \(q = 1 - p \).
\[f(x) = p I(x = 1) + (1-p) I(x = 0)\]
\[E(X) = p\]
\[Var(X) = pq = p(1-p)\]
\[M_X(t) = q + p^t\]
Binomial Distribution
The Binomial distribution with parameters \(n\) and \(p\) is the distrcete probability distribution of the number of successes in a sequence of \(n\) indipendent experiments, each asking a yes-no question, and each with its own boolean-valued outcome.
\[f(x) = \binom{n}{x}p^x(1-p)^{n-x}\]
\[E(X) = np\]
\[Var(X) = npq = np(1-p)\]
\[M_X(t) = (q + pe^t)^n\]
Multinomial Distribution
The multinomial distribution is a gneralization of the binomial distribution.
It models the probability of counts for each side of a \(k\)-sided die rolled \(n\) times.
\[f(x_1, ..., x_k) = P(X_1 = x_1 and ... and X_k = x_k) = \frac{n!}{x_1 ! \cdots x_k !} p_1^{x_1} \times \cdots \times p_k^{x_k} I (\sum_{i=1}^k x_i = n)\]
It can be expressd using the gamma function \(f(x_1, …, x_k) = \frac{\Gamma(\sum_i x_i + 1)}{\Pi_i \Gamma (x_i + 1)} \Pi_{i = 1}^k p_i^{x_i}\)
\[{E(X_i) = np_i}\]
\[Var(X_i) = np_i(1 - p_i)\]
\[Cov(X_i, X_j) = -np_ip_j (i \neq j)\]
\[M_X(t) = (\sum_{i=1}^k p_i e^{t_i})^n\]
Hypergeometric Distribution
The hypergeometric distribution is a discrete probability distribution that describes the probability of \(k\) successes in \(n\) draws, without replacement, from a finite population of size \(N\) that contains exactly \(K\) objects with that feature, wherein each draw is either a success or a failure.
\[p_X(k) = P(X = k) = \frac{\binom{N}{k} \binom{N-K}{n-k}}{\binom{N}{n}}, \\\\ \text{where N is the population size,} \\\\ \text{K is the number of success states in the population,} \\\\ \text{n is the number of draws,} \\\\ \text{k is the number of observed success}\]
The Negative Binomal distribution is a discrete probability distribution that models the number of failures in a sequence of independent and identically distribution Bernoulli trials before a specified number of successes (\r\) occurs.
The negative hypergeometric distribution describes probabilities for when sampling from a finite population without replacement in which each sample can be classified into two mutually exclusive categories.
\[f(x) = \frac{\binom{k+1r-1}{k} \binom{N-r-k}{K-k}}{\binom{N}{K}} I(0 \le k \le K)\]
\[E(X) = r \frac{K}{N-K+1}\]
\[Var(X) = r \frac{(N+1)K}{(N-K+1)(N-K)+2} [1 - \frac{r}{N-K+1}]\]
Possion Distribution
The poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fiexd interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event.
The experiment consists of counting the number of events that will occur during a specific interval of time or in a specific distance, area,or volume.
The probability that an event occurs in a given time, distance, area, or volume is the same.
Each event is independent of all other events.
\[f(x) = \frac{\lambda ^x e ^{-\lambda}}{k !}\]
\[E(X) = \lambda\]
\[Var(X) = \lambda\]
\[M_X(t) = exp(\lambda (e^t - 1))\]
Possion Distribution, Binomial Distribution, Hypergeometric Distribution
\[M_X(t) = (1 - \beta t)^{-\alpha} I (t < \frac{1}{\beta})\]
Exponential Distribution
The exponential distribution is a specific form of gamma distribution.
It is the probiability distribution of the time between events in a Possion point process.
\[f(x) = \frac{1}{\beta} e ^{- \frac{x}{\beta}} I(x > 0)\]
\[E(X) = \beta\]
\[Var(X) = \beta^2\]
Beta Distribution
It is important in baysian statistics.
The beta distribution is a family of continuous probability distributions defined on the interval [0, 1] parameterized by two positive shape parameters, denoted by \(\alpha \) and \(\beta \), that appear as exponents of the random variable and control the shape of the distribution.
The beta distribution has been applied to model the behavior of random variables limited to intervals of finite length in a wide variety of disciplines.
population : A population can be defined as including all people or items with the characteristic one wishes to understand.
complete enumeration
sample :the selection of a subset of individuals
sampleing : sampling is the selection of a subset of individuals from within a statistical population to estimate chracteristics of the whole population.
random sampling
purposive sampling
\[\text{When n independent probabilities } X_1, X_2, X_3, ..., X_n \\\\ \text{ exist and each random variable has the same probability distribution } f(x), \\\\ \text{define } X_1, X_2, X_3, ... X_n \text{ as n samples from the population, and the combined probability density function is } \\\\ f(x_1, x_2, ..., x_n) = f(x_1)f(x_2) ... f(x_n)\]
Sample Mean & Sample Variance of Random Samples
Random Sample has these properties
When sampling n samples, each sample is independent.
Each sample has the same probability distribution.
The central limit theorem (CLT) establishes that, in some situations, when independent random variables are added, their properly normalized sum tends toward a normal distribution even if the original variables themselves are not normally distributed.
If \(X_1, X_2, …, X_n \) are random samples each of size \(n\) taken from a population with overall mean \(\mu \) and finite variance \(\sigma^2 \) and if \(\bar X \) is the sample mean, the lmiting from of the distribution of \(Z = (\frac{\bar X_n - \mu}{\sigma / \sqrt{n}}) \) as \( n \rightarrow \infty \), is the standard normal distribution.
Chi-Squared Distribution
\[f(x) = \frac{1}{2^{v / 2} \Gamma (v / 2)} x^{v / 2 -1} e ^{- x / 2}\]
If random variables \(X_1, X_2, … , X_n \) are independent from each other and follow \(N(\mu, \sigma^2) \), \( Y = \sum_{i=1}^n (\frac{X_i - \mu}{\sigma})^2 \) follows \(\chi ^2 (n) \).
Degree of freedom
Student-T Distribution
When \(X_1, X_2, …, X_n \) are random samples from the population which follows \(N(\mu, \sigma ^2) \) and define the sample variance \( S^2 \), \(\frac{(n-1) S^2}{\sigma^2} = \sum_{i=1}^n \frac{(X_i - \bar X)^2}{\sigma ^2} \sim \chi ^2 (n-1) \)
When \(Z \sim N(0, 1), V \sim \chi^2(v) \) are independent, \(T=\frac{Z}{\sqrt{\frac{V}{z}}} \sim T(v) \).
t- 분포는 모집단의 분산(혹은 표준편차)이 알려져 있지 않은 경우 정규분포 대신 이용하는 확률 분포.
Snedecor's F-Distribution
The F-distribution, also known as Snedecor's f Distribution or the Fisher-Snedecor distribution is a continuous probability distribution that aries frequently as the null distribution of a test statistic, most notably in th analysis of variance (ANOVA).
The null distribution is the probability distribution of the test statistic when the null hypothesis is true.
It is used to check whether two or more sample means are drawn from the same population.
When \(U, V``\) are independent random variables and they follow \(\chi^2\) distribution with \(v_1, v_2\) degrees of freedom, \(F = \frac{U/v_1}{V/v_2} \) follows F-distribution with (\(v_1, v_2 \)) degrees of freedom.
Assuming that the variances of samples of sizes \(n_1\) and \(n_2\) extracted independently of each other from the normal population with population variances \(\sigma_1^2\) and \(\sigma_2^2\), respectively, are \(S_1^2\) and \(S_2^2\), \(F=\frac{S_1^2/\sigma_1^2}{S_2^2/\sigma_2^2} = \frac{S_1^2 \sigma_2^2}{S_2^2 \sigma_2^2} \sim F(n_1-1, n_2 -1)\)
Point Estimation
Statistical inference` : estimation, test
Estimation : point estimation vs interval estimation
Point Estimator
minimum-variance mean-unbiased estimator(MVUE) : minimizes the risk of the squared-error loss-funciton.
best linear unbiased estimator(BLUE)
minimum mean squared error (MMSE)
median-unbiased esitmator, minimizes the risk of the absolute-error loss function
maximum likelihood estimator (MLE)
method of moments and generalized method of moments
\[\hat \theta = h (X_1, X_2, ..., X_n)\]
Moment method vs Maximum Likelihood method
Bias
\[B(\hat \theta) = E(\hat \theta) - \theta\]
When \(B(\hat \theta) = 0\), this estimator \(\theta\) is called unbiased estimator
\[Var(\hat \theta_1) < Var(\hat \theta_2) \Rightarrow \theta_1 \text{ is better than } \theta_2\]
likelihood function
\[L(x_1, x_2, ..., x_n; \theta)\]
If \(X_1, X_2, …, X_n \) are the samples from independent random variables which of pdf is \(f(x;\theta) \), \(L(x_1, x_2, …, x_n;\theta) = f(x_1;\theta) f(x_2; \theta) … f(x_n; \theta) = \Pi_{i=1}^n f(x_i; \theta) \)
Estimating the Proportion
*\(\hat p - z_\frac{\alpha}{2} \sqrt{\frac{\hat p (1 - \hat p)}{n}} \le p \le \hat p + z_{\frac{\alpha}{2}} \sqrt{\frac{\hat p (1 - \hat p)}{n}}\)
Testing a Statistical Hypothesis
Null hypothesis
alternative hypothesis
Critical value : A critical value is a point on the test disribution that is compared to the test statistic to determine whether to reject the null hypothesis. If the absolute value of your test statistic is greater than the critical value, you can declare statistical significance and reject the null hypothesis.
Error of Testing Statistical Hypothesis
type 1 error : alternative hypothesis is true, but it is rejected.
type 2 error : alternative hypothesis is false, but is is accepted.
\[\alpha = P(\text{type 1 error}) = P(H_0 \text{ is rejected} | H_0 \text { is true})\]
Regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the ‘outcome variable’) and one or more independent variables (often called ‘predictors’, ‘covariates’, or ‘features’).
Simple Linear Regression Model
\[y_i = \beta_0 + \beta_1 x_i + \varepsilon _i, (i = 1 \text{ to } n)\]
\[E(X^n) = \frac{\partial ^n M(s, t)}{\partial s ^n}_{(0, 0)}\]
\[E(Y^n) = \frac{\partial ^n M(s, t)}{\partial t ^n}_{(0, 0)}\]
\[E(XY) = \frac{\partial ^2 M(s, t)}{\partial s \partial t}_{(0, 0)}\]
\[\text{When Random Variables X, Y are independent, } M_{aX + bY}(t) = M_X(at)M_Y(bt)\]
\[\text{When Random Variables X, Y are independent and follow normal distributions respectively, } \\\\ aX+bY \sim N(a\mu_X + b \mu_Y, a^2 \sigma_X ^2 + b^2 \sigma_Y ^2)\]