What is multinomial distribution?

The distribution that assigns probabilities to a number of discrete choices is called the multinomial distribution.

Draw a sample dice in pytorch

import torch
from torch.distributions import multinomial

fair_probs = torch.ones([6]) / 6
multinomial.Multinomial(1, fair_probs).sample()

What is central limit theorem?

The central limit theorem states that if you have a population with mean ΞΌ and standard deviation Οƒ and take sufficiently large random samples from the population with replacement , then the distribution of the sample means will be approximately normally distributed with the population mean and standard deviation.

Write some pseudo code to demonstrate central limit theorem

counts = multinomial.Multinomial(10, fair_probs).sample((500,))
cum_counts = counts.cumsum(dim=0)
estimates = cum_counts / cum_counts.sum(dim=1, keepdims=True)

set_figsize((6, 4.5))
for i in range(6):
    plt.plot(estimates[:, i].numpy(),
                 label=("P(die=" + str(i + 1) + ")"))
plt.axhline(y=0.167, color='black', linestyle='dashed')
plt.gca().set_xlabel('Groups of experiments')
plt.gca().set_ylabel('Estimated probability')
plt.legend();

What are the axioms of probability?

  1. For any event A, its probability is never negative, i.e., 𝑃(A)β‰₯0 ;
  2. Probability of the entire sample space is 1 , i.e., 𝑃(S)=1 ;
  3. For any countable sequence of events A1,A2,… that are mutually exclusive ( Ai∩Aj=βˆ… for all 𝑖≠𝑗 ), the probability that any happens is equal to the sum of their individual probabilities

What is a random variable?

A random variable can be pretty much any quantity and is not deterministic. It could take one value among a set of possibilities in a random experiment. Note that there is a subtle difference between discrete random variables, like the sides of a die, and continuous ones, like the weight and the height of a person. There is little point in asking whether two people have exactly the same height.

What is joint probability?

Given any values π‘Ž and 𝑏 , the joint probability lets us answer, what is the probability that 𝐴=π‘Ž and 𝐡=𝑏 simultaneously.

What is conditional probability?

Note that for any values π‘Ž and 𝑏 , 𝑃(𝐴=π‘Ž,𝐡=𝑏)≀𝑃(𝐴=π‘Ž) . This has to be the case, since for 𝐴=π‘Ž and 𝐡=𝑏 to happen, 𝐴=π‘Ž has to happen and 𝐡=𝑏 also has to happen (and vice versa). Thus, 𝐴=π‘Ž and 𝐡=𝑏 cannot be more likely than 𝐴=π‘Ž or 𝐡=𝑏 individually. This brings us to an interesting ratio: 0≀𝑃(𝐴=π‘Ž,𝐡=𝑏)/𝑃(𝐴=π‘Ž)≀1 . We call this ratio a conditional probability and denote it by 𝑃(𝐡=π‘βˆ£π΄=π‘Ž) : it is the probability of 𝐡=𝑏 , provided that 𝐴=π‘Ž has occurred.

What is Bayes theorem?

𝑃(𝐴∣𝐡) = 𝑃(𝐡∣𝐴)𝑃(𝐴)/𝑃(𝐡)

What is Marginalization?

It is the operation of determining 𝑃(𝐡) from 𝑃(𝐴,𝐡) . We can see that the probability of 𝐡 amounts to accounting for all possible choices of 𝐴 and aggregating the joint probabilities over all of them: 𝑃(𝐡)=βˆ‘π‘ƒ(𝐴,𝐡).

What is Independence?

Two random variables 𝐴 and 𝐡 being independent means that the occurrence of one event of 𝐴 does not reveal any information about the occurrence of an event of 𝐡 . In this case 𝑃(𝐡∣𝐴)=𝑃(𝐡) . Statisticians typically express this as 𝐴βŠ₯𝐡 . From Bayes’ theorem, it follows immediately that also 𝑃(𝐴∣𝐡)=𝑃(𝐴) . In all the other cases we call 𝐴 and 𝐡 dependent.

Likewise, two random variables 𝐴 and 𝐡 are conditionally independent given another random variable 𝐢 if and only if 𝑃(𝐴,𝐡∣𝐢)=𝑃(𝐴∣𝐢)𝑃(𝐡∣𝐢) . This is expressed as 𝐴βŠ₯𝐡∣𝐢 .

What is Expectation?

The expectation (or average) of the random variable 𝑋 is denoted as: 𝐸[𝑋]=βˆ‘π‘₯π‘₯𝑃(𝑋=π‘₯).

When the input of a function 𝑓(π‘₯) is a random variable drawn from the distribution 𝑃 with different values π‘₯ , the expectation of 𝑓(π‘₯) is computed as: 𝐸π‘₯βˆΌπ‘ƒ[𝑓(π‘₯)]=βˆ‘π‘₯𝑓(π‘₯)𝑃(π‘₯).

What are Variance and Standard Deviation?

In many cases we want to measure by how much the random variable 𝑋 deviates from its expectation. This can be quantified by the variance: Var[𝑋]=𝐸[(π‘‹βˆ’πΈ[𝑋])2]=𝐸[𝑋2]βˆ’πΈ[𝑋]**2.

Its square root is called the standard deviation. The variance of a function of a random variable measures by how much the function deviates from the expectation of the function, as different values π‘₯ of the random variable are sampled from its distribution: Var[𝑓(π‘₯)]=𝐸[(𝑓(π‘₯)βˆ’πΈ[𝑓(π‘₯)])**2].