Probability
What is multinomial distribution?
The distribution that assigns probabilities to a number of discrete choices is called the multinomial distribution.
Draw a sample dice in pytorch
import torch
from torch.distributions import multinomial
fair_probs = torch.ones([6]) / 6
multinomial.Multinomial(1, fair_probs).sample()
What is central limit theorem?
The central limit theorem states that if you have a population with mean ΞΌ and standard deviation Ο and take sufficiently large random samples from the population with replacement , then the distribution of the sample means will be approximately normally distributed with the population mean and standard deviation.
Write some pseudo code to demonstrate central limit theorem
counts = multinomial.Multinomial(10, fair_probs).sample((500,))
cum_counts = counts.cumsum(dim=0)
estimates = cum_counts / cum_counts.sum(dim=1, keepdims=True)
set_figsize((6, 4.5))
for i in range(6):
plt.plot(estimates[:, i].numpy(),
label=("P(die=" + str(i + 1) + ")"))
plt.axhline(y=0.167, color='black', linestyle='dashed')
plt.gca().set_xlabel('Groups of experiments')
plt.gca().set_ylabel('Estimated probability')
plt.legend();
What are the axioms of probability?
- For any event A, its probability is never negative, i.e., π(A)β₯0 ;
- Probability of the entire sample space is 1 , i.e., π(S)=1 ;
- For any countable sequence of events A1,A2,β¦ that are mutually exclusive ( Aiβ©Aj=β for all πβ π ), the probability that any happens is equal to the sum of their individual probabilities
What is a random variable?
A random variable can be pretty much any quantity and is not deterministic. It could take one value among a set of possibilities in a random experiment. Note that there is a subtle difference between discrete random variables, like the sides of a die, and continuous ones, like the weight and the height of a person. There is little point in asking whether two people have exactly the same height.
What is joint probability?
Given any values π and π , the joint probability lets us answer, what is the probability that π΄=π and π΅=π simultaneously.
What is conditional probability?
Note that for any values π and π , π(π΄=π,π΅=π)β€π(π΄=π) . This has to be the case, since for π΄=π and π΅=π to happen, π΄=π has to happen and π΅=π also has to happen (and vice versa). Thus, π΄=π and π΅=π cannot be more likely than π΄=π or π΅=π individually. This brings us to an interesting ratio: 0β€π(π΄=π,π΅=π)/π(π΄=π)β€1 . We call this ratio a conditional probability and denote it by π(π΅=πβ£π΄=π) : it is the probability of π΅=π , provided that π΄=π has occurred.
What is Bayes theorem?
π(π΄β£π΅) = π(π΅β£π΄)π(π΄)/π(π΅)
What is Marginalization?
It is the operation of determining π(π΅) from π(π΄,π΅) . We can see that the probability of π΅ amounts to accounting for all possible choices of π΄ and aggregating the joint probabilities over all of them: π(π΅)=βπ(π΄,π΅).
What is Independence?
Two random variables π΄ and π΅ being independent means that the occurrence of one event of π΄ does not reveal any information about the occurrence of an event of π΅ . In this case π(π΅β£π΄)=π(π΅) . Statisticians typically express this as π΄β₯π΅ . From Bayesβ theorem, it follows immediately that also π(π΄β£π΅)=π(π΄) . In all the other cases we call π΄ and π΅ dependent.
Likewise, two random variables π΄ and π΅ are conditionally independent given another random variable πΆ if and only if π(π΄,π΅β£πΆ)=π(π΄β£πΆ)π(π΅β£πΆ) . This is expressed as π΄β₯π΅β£πΆ .
What is Expectation?
The expectation (or average) of the random variable π is denoted as: πΈ[π]=βπ₯π₯π(π=π₯).
When the input of a function π(π₯) is a random variable drawn from the distribution π with different values π₯ , the expectation of π(π₯) is computed as: πΈπ₯βΌπ[π(π₯)]=βπ₯π(π₯)π(π₯).
What are Variance and Standard Deviation?
In many cases we want to measure by how much the random variable π deviates from its expectation. This can be quantified by the variance: Var[π]=πΈ[(πβπΈ[π])2]=πΈ[π2]βπΈ[π]**2.
Its square root is called the standard deviation. The variance of a function of a random variable measures by how much the function deviates from the expectation of the function, as different values π₯ of the random variable are sampled from its distribution: Var[π(π₯)]=πΈ[(π(π₯)βπΈ[π(π₯)])**2].