Notes On Probability Distributions

You Probably Noticed …

Mi'kail Eli'yah
22 min readFeb 20, 2023
“Forest Edge, Hokuto, Hokkaido, Japan”, 2004.
1. Logistic Distribution: Logistic Regression
2. Harmonic Distribution
3. Hypergeometric Distribution
4. Binomial Distribution
5. Poisson Distribution
6. Bernoulli Distribution
7. Geometric Distribution
8. Exponential Distribution
9. Gaussian (Normal) Distribution
10. Student's t-Distribution
11. Uniform Distribution
12. Weibull Distribution
13. Chi-Square Distribution
14. Lognormal Distribution
15. F distribution
16. Logarithmic Distribution
17. Gamma Distribution
18. Beta Distribution
19. Pareto Distribution
The Less Usual
1. Zipf Distribution
_
The symmetry of chance that determines the sequence and number of tries may polarize the favor of 1 event over another. - 2008-06.02Stochasticity? It is but our mortal point of confusion/ illusion. - 2007-05.20

Logistic Distribution: Logistic Regression

The probability of succeeding given the effort …

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import scipy as sp

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

# μ or x0 is a location parameter or the midpoint of the logistic curve
def logistic(x, x0, k, L):
return L/(1+np.exp(-k*(x-x0)))

# [main]
# data
x = np.arange(10).reshape(-1, 1)
y = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1])

# model = LogisticRegression(solver='liblinear', random_state=0) # Create a Model and Train It
model = LogisticRegression(solver='liblinear', C=10.0, random_state=0)
model.fit(x, y)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='warn', n_jobs=None, penalty='l2',
random_state=0, solver='liblinear', tol=0.0001, verbose=0,
warm_start=False)

model = LogisticRegression(solver='liblinear', random_state=0).fit(x, y)

# Evaluate the model
p_pred = model.predict_proba(x)
y_pred = model.predict(x)
"""
score_ = model.score(x, y)
conf_m = confusion_matrix(y, y_pred)
report = classification_report(y, y_pred)
"""
k = 1
x0 = np.mean(x)
index_of_x0 = list(x).index(round(x0))
logistic_y = logistic(x, x0, k, 1)

fig, ax = plt.subplots()
ax.plot(x, y, 'bo')

note_text = 'Most people likely could have given up around here'
# The key option here is `bbox`.
ax.annotate(note_text, xy=(x0, logistic_y[index_of_x0]), xytext=(-20,20),
textcoords='offset points', ha='center', va='bottom',
bbox=dict(boxstyle='round,pad=0.2', fc='yellow', alpha=0.3),
arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0.5',
color='red'))

# plotting the points
plt.plot(x, y_pred)
plt.plot(x, y)
plt.plot(x, logistic_y)
# make latex equation
equation_latex = r'p(x) = \frac{1}{1+e^{-(x-\mu)/s}}'
plt.text(5, 0.1,'$%s$'%equation_latex)

# graphing x axis and y axis
plt.xlabel('probability of effort')
plt.ylabel('probability of passing')
plt.title('Logistic Regression')
plt.show() # show plot

Harmonic Distribution

The probability of blowing out candles given the number of candles …

"""
A birthday cake has n burning candles and you need to blow out all the candles. Each time you try to blow out the candles, the number of candles that remain burning is a random event. If the cake starts with k burning candles, then after your attempt to blow them out, it will end up with a whole number of burning candles between 0 and (k -1) and each possibility occurs with equal chance. Each time you will blow out at least 1 candle and for k larger than 1, you will blow a random number of candles between 1 and k.

What is the expected number of times you must blow out the candles until all of the end candles are blown out?
"""
import matplotlib.pyplot as plt
import numpy as np

# [main]
# data
n = 100 # number of candles
x = np.arange(n)

e_probability = []
e_probability.append(0)
e_probability.append(1)

for k in range(2, n):
e_probability.append(1 + ((1/(k+1)) * sum(e_probability)))

y = e_probability

# plotting the points
plt.plot(x, y)
# make latex equation
equation_latex = r'e_{(k+1)} = 1 + (\frac{1}{(k+1)}) * \sum^{k}_{0}(e_i)'
plt.text(5, 0.1,'$%s$'%equation_latex)

# graphing x axis and y axis
plt.xlabel('Number of candles')
plt.ylabel('Expected number of times of attempt to extinguish all candles')
plt.title('Harmonic Distribution')

plt.show() # show plot

_

import matplotlib.pyplot as plt
import numpy as np

# [main]
# data
n = 100
x = np.arange(n)

e_series = []
#
for k in range(1, (n+1)):
e_series.append(1/k)

y = e_series

# plotting the points
plt.plot(x, y)
# make latex equation
equation_latex = r'e_{k} = \frac{1}{k}'
plt.text(40, 0.1,'$%s$'%equation_latex)

# graphing x axis and y axis
plt.xlabel('Number of candles')
plt.ylabel('e_k')
plt.title('Harmonic Distribution')

plt.show() # show plot

Hypergeometric Distribution

The probability of getting the number of desired items (k) given the available items (K) in the midst (N) and the number of times (n) you are allow to draw …

import matplotlib.pyplot as plt
import numpy as np
import math
from scipy.special import comb

def hypergeometric_pmf(N, K, n, x):
''' Probability Mass Function for Hypergeometric Distribution
: N: population size
: K: total number of desired items in N
: n: number of draws made from N
: x: number of desired items in our draw of n items
:returns: PMF computed at x
'''
K_choose_x = comb(K,x)
N_K_choose_n_x = comb(N-K, n-x)
N_choose_n = comb(N,n)

return (K_choose_x)*N_K_choose_n_x/N_choose_n

def hypergeometric_cdf(N, K, n, t, min_value=None):
''' Cumulative Density Funtion for Hypergeometric Distribution
: N: population size
: K: total number of desired items in N
: n: number of draws made from N
: t: number of desired items in our draw of n items up to t
:returns: CDF computed up to t
'''
if min_value:
return np.sum([hypergeometric_pmf(N, K, n, x) for x in range(min_value, t+1)])

return np.sum([hypergeometric_pmf(N, K, n, x) for x in range(t+1)])

def hypergeometric_plot(N, K, n):
''' Visualization of Hypergeometric Distribution for given parameters
: N: population size
: K: total number of desired items in N
: n: number of draws made from N
:returns: Plot of Hypergeometric Distribution for given parameters
'''
x = np.arange(0, n+1)
y = [hypergeometric_pmf(N, K, n, x) for x in range(n+1)]
plt.plot(x, y, 'bo')
plt.vlines(x, 0, y, lw=2)
plt.xlabel('(k) # of desired items in our draw')
plt.ylabel('Probablities')
plt.title('Hypergeometric Distribution')
# make latex equation
equation_latex = r'p(X = k) = \frac{C^{K}_{k}*C^{N-K}_{n-k}}{C^{N}_{n}}'
plt.text(3, 0.3,'$%s$'%equation_latex)

plt.show()

# [main]
# data
"""
N = 15 total candies in the jar.
n = 5 draws for the sample.
K = 5 red candies in the jar.
k = draw 2 red candies in the sample.
"""
N = 15
K = 5
n = 5

# p = hypergeometric_pmf(15, 5, 5, 2)
print(f"The probability of drawing precisely 2 red candies in our 5 random draws from the jar is {hypergeometric_pmf(15, 5, 5, 2)}.")
"""
The probability of drawing precisely 2 red candies in our 5 random draws from the jar is 0.3996003996003996.
"""

hypergeometric_plot(N, K, n)

"""
N = 52 # 52 cards in a deck of cards.
K = 13 # 13 cards per symbol (♠ ♥ ♦ ♣)
n = 5 # 5 card opening hand
k = 3 # draw 3 of the same symbol (♠ ♥ ♦ ♣)
"""

Binomial and Poisson distributions are discrete distributions that give non-zero probabilities only for (some) integers. Normal distribution is a continuous distribution. Every normal density is non-zero for all real numbers.

Binomial Distribution

This models events that arise in a binomial experiment.

Examples:
How many coin flips show heads?
How many scratch-off lottery tickets are winners?
How many of a doctor's patients die during surgery?
How many free throws I make in 100 attempts?
Key assumptions for the Poisson model:
· A fixed number of repeated, identical, independent trials. n is usually the parameter chosen to label the number of trials.
· Every trial results in either a success, with probability p, or a failure, with probability1-p. These must be the only two outcomes for a trial.
· The random variable of interest is the total number of trials that ended in a success.
p(x)= {n \choose x} p^x (1-p)^{n-x}; for\ x=0,1,2,…,n
{a \choose b} \ or \binom {a} {b}

The probability of random experiment of tossing a biased coin of probability (p = 0.6 in each throw) for (n = 6) times where the probability of getting a head on the (k = 4) th trial …

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import binom

def binomial_cdf(n, p): # binom.cdf(x, n, p)
''' Cumulative Density Funtion for Binomial Distribution
: n: number of attempts
: p: probability of success in each trial
: x: range of kth success
:returns: CDF computed up to n
'''
distribution = [binom.pmf(x, n, p) for x in range(n+1)]

cdf = []
cdf.append(distribution[0])

for i in range(1, len(distribution)):
cdf.append((distribution[i] + cdf[i-1]))

return cdf

def binomial_plot(n, p):
''' Visualization of Binomial Distribution for given parameters
: n: number of attempts
: p: probability of success in each trial
:returns: Plot of Binomial Distribution for given parameters
'''
x = np.arange(0, n+1)
y = [binom.pmf(x, n, p) for x in range(n+1)]
# y = binomial_cdf(n, p)

plt.plot(x, y, 'bo')
plt.vlines(x, 0, y, lw=2)
plt.xlabel('probability of getting a head')
plt.ylabel('Probablities')
plt.title('Binomial Distribution')
# make latex equation
equation_latex = r'P(X = k, n, p) = \binom{n}{k} p^{k} * (1-p)^{n-k}; \binom{n}{k} = \frac{n!}{k!(n-k)!}'
plt.text(3, 0.4,'$%s$'%equation_latex)

plt.show()

# [main]
# data
"""
"""
p = 0.6 # biased coin of such porbability
n = 6 # fixed number (n) of trials
x = 4 # kth of attempts; x = k

P = binom.pmf(x, n, p)
print(f"The probability of random experiment of tossing a biased coin {n} times where the probability of getting a head on the {x}th trial is {P}.")

"""
The probability of random experiment of tossing a biased coin 6 times where the probability of getting a head on the 4th trial is 0.3110400000000001.
"""

binomial_plot(n = 6, p = 0.6)
"""

Poisson Distribution

This is to model events that seem to take place over and over again in a completely haphazard way.

Examples:
How many magnitude 8+ earthquakes will take place in a particular year?
How many babies will be born in a large hospital on a particular day?
How many hits will a website get in a particular minute?

Key assumptions for the Poisson model:
· The random variable counts the number of events that take place in a given interval (usually of time or space)
· All events take place independently of all other events
· The rate at which events take place is constant usually denoted λ
probability mass function is given by: p(x) = \frac{e^{-\lambda*t}(\lambda*t)^{x}}{x!} for x=0,1,2,…

The probability of ( λ= 2) buses arriving within 30 mins …

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import poisson

def poisson_cdf(_lambda, n): #
''' Cumulative Density Funtion for Poisson Distribution
: _lambda: average occurences of event within an interval
: n : maximal number of buses anticipated within the interval
:returns: CDF computed up to n
'''
distribution = [poisson.pmf(x, _lambda) for x in range(n+1)]

cdf = []
cdf.append(distribution[0])

for i in range(1, len(distribution)):
cdf.append((distribution[i] + cdf[i-1]))

return cdf

def poisson_plot(n, x, _lambda):
''' Visualization of Poisson Distribution for given parameters
: x: number of buses arriving within 30 mins
: n: maximal number of buses anticipated within the interval
:returns: Plot of Poisson Distribution for given parameters
'''
# x = np.arange(0, n+1)
y = poisson.pmf(x, _lambda) #
# y = poisson_cdf(lmbda)

# fig, ax = plt.subplots(1, 1, figsize=(8, 6))
plt.plot(x, y, 'bo', ms=8, label='poisson pmf')
plt.vlines(x, 0, y, colors='b', lw=5, alpha=0.5)
plt.xlabel('number of buses arriving within 30 mins', fontsize="10")
plt.ylabel('Probablities', fontsize="10")
plt.title('Poisson Distribution')
# make latex equation
equation_latex = r'P(X = k, \lambda) = \frac{\lambda ^k * e^{-\lambda}}{k!}'
plt.text(3, 0.4,'$%s$'%equation_latex)

plt.show()

# [main]
_lambda = 2 # average occurences of event within an interval, e.g. every 30 minutes, 2 buses
n = 10 # maximal number of buses anticipated within the interval
x = np.arange(0, n+1) # number of buses arriving within 30 mins

P = poisson.pmf(x[2], _lambda)
print(f"The probability of {x[2]} buses arriving within 30 mins is {P}.")

"""
The probability of 2 buses arriving within 30 mins is 0.2706705664732254.
"""

poisson_plot(n, x, _lambda)

"""
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde, poisson

def sample_poisson(mu, size):
return poisson(mu).rvs(size)

# Example usage
mu = 2.0
size = 1000
samples = sample_poisson(mu, size)

# Create PDF using KDE
kde = gaussian_kde(samples)
x = np.linspace(0, 10, 1000)
pdf = kde(x)

# Plot PDF
plt.plot(x, pdf)
plt.xlabel('x')
plt.ylabel('PDF')
plt.title('Poisson Distribution with mu = 2.0')
plt.show()
"""

If the time interval is observed to be 60 mins per bus, and the time instances at which the buses arrived are (a, b, c, d), where they belong to [0,60]. We sample from a distribution where the average of the time gaps is: 15 minutes. Thus, the values can be (5 min, 15 min, 55 min, 60 min).

When you randomly arrive at within the hour, you can choose any of the gaps, but this would not be uniform.

E(waiting time) = (1/2)[(5/60)*(5) + (10/60)*(10) + (40/60)*40 + (5/60)*5]=14.58 minutes

1/2 is assumed to be due to the fact that if we arrive in a time interval of let’s say 30 minutes, then the time we wait would be around 15 minutes if we assume we arrive at random within that time. In notation, the expectation of the arrival times is:

where p(T) is the distribution of intervals T between buses as they arrive at the bus stop. The expected wait time E[W] would then be 1/2 of the expected interval experienced by passengers so, E[W] = 0.5*E[T].

The expected value for this distribution is E(X) = 1/λ, which can be interpreted as the waiting time.

We can derive the Poisson formula mathematically from the Binomial PMF.

Where n approaches infinity.

You can also use Poisson distribution to chart for anticipated failures of systems over time.

Example Poisson Process with the average time between events of 60 days.

or anticipation of meteors.

… expect 5 meteors per hour on average or 1 every 12 minutes, and assumed we waited for 60 minutes.
Probability of observing 3 meteors in 1 hour.

Bernoulli Distribution

The probability of success and failure of a given known binary outcome is …

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import bernoulli

def bernoulli_plot(p):
''' Visualization of Bernoulli Distribution for given parameters
: p: probability of success
:returns: Plot of Bernoulli Distribution for given parameters
'''
x = [0, 1]
# bd = bernoulli(p); bd.pmf(X);
y = bernoulli(p).pmf(x) # probability mass function: f = p**x*(1-p)**(1-x)

# fig, ax = plt.subplots(1, 1, figsize=(8, 6))
plt.plot(x, y, 'bo', ms=8, label='Bernoulli pmf')
plt.vlines(x, 0, y, colors='b', lw=5, alpha=0.5)
plt.xlabel('Success or failure (0, 1)', fontsize="10")
plt.ylabel('Probablities', fontsize="10")
plt.title('Bernoulli Distribution')
# make latex equation
equation_latex = r'P(X = k) = p^{k} * (1-p)^{n-k}; k = 1, n = 2;' # 2 possibilities, 1 outcome
plt.text(0.2, 0.35,'$%s$'%equation_latex)

plt.show()

# [main]
p = 0.6 # probability of success

# P = bernoulli(p).pmf(1)
print(f"The probability of success is {bernoulli(p).pmf(1)} and failure is {bernoulli(p).pmf(0)}.")

"""
The probability of success is 0.6 and failure is 0.4.
"""

bernoulli_plot(p)

"""
k = 1 for Bernoulli Distribution
"""

Geometric Distribution

On kth attempt, i.e. X = k, there would be X = k— 1 failures before you get your success. The probability of success on the kth attempt given each time the success of probability is p is …

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import geom

def geometric_plot(x, p):
''' Visualization of Geometric Distribution for given parameters
: p: probability of success
:returns: Plot of Geometric Distribution for given parameters
'''
y = geom.pmf(x, p) #

# fig, ax = plt.subplots(1, 1, figsize=(8, 6))
plt.plot(x, y, 'bo', ms=8, label='Geometric pmf')
plt.vlines(x, 0, y, colors='b', lw=5, alpha=0.5)
plt.xlabel('# of attempts', fontsize="10")
plt.ylabel('Probablities', fontsize="10")
plt.title('Geometric Distribution')
# make latex equation
equation_latex = r'P(X = k) = p * (1-p)^{k-1}; k: \ kth \ attempt'
plt.text(2, 0.35,'$%s$'%equation_latex)

plt.show()

# [main]
p = 0.6 # probability of success
n = 10 # maximal number of trials
x = np.arange(0, n+1) # number of attempts
k = 4 # kth attempt
P = geom.pmf(k, p)
print(f"The probability of success on the {k}th attempt is {P}.")

"""
The probability of success on the 4th attempt is 0.03840000000000001.
"""

geometric_plot(x, p)

Exponential Distribution

The probability of decay and growth …

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import expon

def exponential_plot(x, location_start, _lambda):
''' Visualization of Exponential Distribution for given parameters
: p: probability of success
:returns: Plot of Exponential Distribution for given parameters
'''
# case: decay
y = expon.pdf(x, location_start, _lambda) # expon.cdf(x, location_start, _lambda)
number_of_rows = 2
number_of_columns = 1
plt.subplot(number_of_rows, number_of_columns, 1) # 1st row display: index 1
plt.plot(x, y, 'bo', ms=10, label='Geometric pdf')
plt.vlines(x, 0, y, colors='b', lw=5, alpha=0.5)
plt.xlabel('# of incidence', fontsize="10")
plt.ylabel('Probablities', fontsize="10")
plt.title('Exponential Distribution (Decay)', fontsize="10")
# make latex equation
equation_latex = r'P(X = x, \lambda) = \lambda * e^{- \lambda x}; x >= 0; P(X = x, \lambda) = 0; x < 0'
plt.text(5, 0.15,'$%s$'%equation_latex)

# case: growth
y2 = []
for i in range(0, len(y)):
y2.append(1 - y[i])

plt.subplot(number_of_rows, number_of_columns, 2) # 2nd row display: index 2
plt.plot(x, y2, 'bo', ms=10, label='Geometric pdf')
plt.vlines(x, 0, y2, colors='b', lw=5, alpha=0.5)
plt.xlabel('# of incidence', fontsize="10")
plt.ylabel('Probablities', fontsize="10")
plt.title('Exponential Distribution (Growth)', fontsize="10")
# make latex equation
equation_latex = r'P(X = x, \lambda) = 1 - \lambda * e^{- \lambda x}; x >= 0; P(X = x, \lambda) = 0; x < 0'
plt.text(5, -0.15,'$%s$'%equation_latex)

plt.show()

# [main]
n = 10 # mins
x = np.arange(0, n+1) # number of attempts
_lambda = 0.6 # rate parameter: λ: the rate parameter (calculated as λ = 1/μ), e.g. mean number of minutes between eruptions for a certain geyser is 40 minutes
k = 5
location_start = 0 # min 0
P = expon.pdf(x, location_start, _lambda)
print(f"The probability of decay on the {k}th min is {P[k]}.")

"""
The probability of decay on the 5th min is 0.0004006157940325235.
"""

exponential_plot(x, location_start, _lambda)

Gaussian (Normal) Distribution

1 reason why it appears so often is the Central limit theorem. All properties that arise as an aggregate of many smaller independent (or weakly dependent) contributors will display an approximate normal distribution as long as no small subset of those contributors dominates.

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm

"""
def normal(x, mu, sigma):
p = 1 / math.sqrt(2 * math.pi * sigma**2)
return p * np.exp(-0.5 / sigma**2 * (x - mu)**2)
"""

def normal_plot(mu, std):
''' Visualization of Gaussian Distribution for given parameters
: p: probability of success
:returns: Plot of Gaussian Distribution for given parameters
'''
x = np.linspace(-5, 5, 100)
distribution_normal = norm(mu, std)
y = distribution_normal.pdf(x)

# fig, ax = plt.subplots(1, 1, figsize=(8, 6))
plt.plot(x, y, 'bo', ms=8, label='Guassian pdf')
plt.vlines(x, 0, y, colors='b', lw=5, alpha=0.5)
plt.xlabel('X', fontsize="10")
plt.ylabel('Probablities, P(X)', fontsize="10")
plt.title('Gaussian Distribution (Normal)')
# make latex equation
equation_latex = r'P(X = k) = \frac{1} {\sigma \sqrt{2 * \pi}} * e^{- \frac{1}{2} (\frac{ x - \mu }{ \sigma })^2}'
plt.text(2, 0.35,'$%s$'%equation_latex)

plt.show()

# [main]
# Create a standard normal distribution with mean as 0 and standard deviation as 1
mu = 0
std = 1

normal_plot(mu, std)

Student’s t-Distribution

When you do not have enough data to be `normal` …

import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
import math

# Student's t-Distribution
def t_plot():
mu = 1.5
variance = 2
sigma = math.sqrt(variance)

x = np.linspace(mu - 5*sigma, mu + 3*sigma, 100)
dof = len(x) - 1
y = stats.t.pdf(x, dof)

plt.plot(x, y, 'bo', ms=8, label='t-distribution pdf' + ' with dof: ' + str(dof))
# plt.fill_between(x, x * 0, y, facecolor='b', alpha=0.1)
plt.vlines(x, 0, y, colors='b', lw=5, alpha=0.5)
plt.xlabel('Observations, X', fontsize="10")
plt.ylabel('Probablities, P(X)', fontsize="10")
plt.title('Student\'s t-Distribution')
# make latex equation
equation_latex = r'f(x; \nu) = \frac{\Gamma((\nu+1)/2)}{\sqrt{\pi\nu}\Gamma(\nu/2)}\left(1+\frac{x^2}{\nu}\right)^{-\frac{\nu+1}{2}}'
plt.text(0, -0.02,'$%s$'%equation_latex)
plt.legend();
plt.show()

# [main]
t_plot()

"""
import pandas as pd
import seaborn as sns

pdf_3 = stats.t.pdf(x, 3)
pdf_10 = stats.t.pdf(x, 10)
pdf_100 = stats.t.pdf(x, 100)
df = pd.DataFrame({'pdf_3': pdf_3, 'pdf_10': pdf_10, 'pdf_100': pdf_100, 'x_axis': x_axis})

plt.figure(figsize=(15,8))
plt.plot(df['x_axis'], df['pdf_3'], label ='3 degree of freedom');
plt.plot(df['x_axis'], df['pdf_10'], label='10 degree of freedom');
plt.plot(df['x_axis'], df['pdf_100'], label='100 degree of freedom');
"""
"""
Student distribution is used to estimating the mean of a normally-distributed population in situations where the sample size is small and the population's standard deviation is unknown.

Student distribution has heavier tails, i.e. it is more prone to producing values that fall far from its mean. t-distribution is characterised by the degrees of freedom which equals to 𝜈=𝑛−1 (𝜈 is pronounced as nu).
"""

Uniform Distribution

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import uniform


def uniform_plot(x, a, b):
''' Visualization of Uniform Distribution for given parameters
:returns: Plot of Uniform Distribution for given parameters
'''
variable_random = uniform.pdf(x, a, b)
y = variable_random

plt.plot(x, y, 'bo', ms=8, label='Uniform pdf')
plt.vlines(x, 0, y, colors='b', lw=5, alpha=0.5)
plt.xlabel('X', fontsize="10")
plt.ylabel('Probablities, P(X)', fontsize="10")
plt.title('Uniform Distribution')
# make latex equation
equation_latex = r'p(x) = 1/( b – a), a ≤ x ≤ b'
plt.text(0, 0.25,'$%s$'%equation_latex)

plt.show()

# [main]
# Create an uniform distribution
a = 0
b = 5
x = np.linspace(a, b, 100)

uniform_plot(x, a, b)

Weibull Distribution

Anticipating when the failure will set in …

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import weibull_min


def weibull_plot():
''' Visualization of Weibull Distribution for given parameters
:returns: Plot of Weibull Distribution for given parameters
'''
x = np.linspace(0, 80, 1000)
y = weibull_min.pdf(x, 2.5, scale=30)

plt.plot(x, y, 'bo', ms=8, label='Weibull pdf')
# plt.fill_between(x, x * 0, y, facecolor='b', alpha=0.1)
plt.vlines(x, 0, y, colors='b', lw=5, alpha=0.5)
plt.xlabel('X', fontsize="10")
plt.ylabel('Probablities, P(X)', fontsize="10")
plt.title('Weibull Distribution')
# make latex equation
equation_latex = r'f(x) = \frac{\gamma}{\alpha} (\frac{ x - \mu}{\alpha})^{\gamma-1} e^{-(\frac{(x - \mu)}{\alpha} )^{\gamma}}; x \geq \alpha; \gamma, \alpha > 0;'
plt.text(0, 0.04,'$%s$'%equation_latex)
"""
equation_latex = r'f(x;\lambda,k) = \frac{k}{\lambda}\left(\frac{x}{\lambda}\right)^{k-1}e^{-(x/\lambda)^k}; for x\geq 0'
plt.text(0, -0.02,'$%s$'%equation_latex)
equation_latex = r'f(x;\lambda,k) =\ 0 ; $$ otherwise '
plt.text(0, -0.03,'$%s$'%equation_latex)
"""

plt.show()

# [main]
weibull_plot()

"""
import numpy as np
import matplotlib.pyplot as plt

def sample_weibull(u, lambda_val, kappa_val):
return np.power(-(np.log(1.0 - u) / lambda_val), 1.0 / kappa_val)

# Example usage
u = np.random.uniform(size=1000)
samples = sample_weibull(u, 2.0, 1.5)

# Create PDF using KDE
from scipy.stats import gaussian_kde
kde = gaussian_kde(samples)
x = np.linspace(0, 5, 1000)
pdf = kde(x)

# Plot PDF
plt.plot(x, pdf)
plt.xlabel('x')
plt.ylabel('PDF')
plt.title('Weibull Distribution with lambda = 2.0, kappa = 1.5')
plt.show()
"""

Chi-Square Distribution

You gather Observed data and you want to check if it `fits` the expected values (frequencies expected based on the null hypothesis of your modeling) … you want to see if it may:

1. be used to see if the data follows a well-known theoretical probability distribution.
2. to assess the trained regression model's goodness of fit on the training, validation, and test data sets.

to check for:
1. Independence
2. Goodness-of-Fit
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import chi2

def chi_square_plot(dof):
#x-axis ranges from 0 to 20 with .001 steps
x = np.arange(0, 20, 0.001)
y = chi2.pdf(x, df=dof)

plt.plot(x, y, 'bo', ms=8, label='chi_square pdf')
plt.vlines(x, 0, y, colors='b', lw=5, alpha=0.5)
plt.xlabel('X', fontsize="10")
plt.ylabel('Probablities, P(X)', fontsize="10")
plt.title('Chi-Square Distribution')
# make latex equation
equation_latex = r'f(x;k) = \frac{1}{2^{k/2}\Gamma(k/2)}x^{k/2-1}e^{-x/2}; $ for $ x\geq 0 '
plt.text(0, -0.02,'$%s$'%equation_latex)
equation_latex = r'f(x;k) = 0 ; $$ otherwise '
plt.text(0, -0.03,'$%s$'%equation_latex)

plt.show()

# [main]
dof = 4 # degrees of freedom
chi_square_plot(dof)

Lognormal Distribution

You may observe in nature on the amount of rainfall, milk production by cows, and for most natural growth processes, where the growth rate is independent of size …

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import lognorm

def lognormal_plot():
# x = np.linspace(0, 80, 1000)
# loc (mu) where the graph is located, s (standard deviation) influence the shape
x = np.linspace(0, 4, 100)
y = lognorm.pdf(x, s=0.5, loc=0) # green

plt.plot(x, y, 'bo', ms=8, label='Lognormal pdf')
# plt.fill_between(x, x * 0, y, facecolor='b', alpha=0.1)
plt.vlines(x, 0, y, colors='b', lw=5, alpha=0.5)
plt.xlabel('X', fontsize="10")
plt.ylabel('Probablities, P(X)', fontsize="10")
plt.title('Lognormal Distribution')
# make latex equation
equation_latex = r'f(x;\mu,\sigma) = \frac{1}{x\sigma\sqrt{2\pi}}e^{-\frac{(\ln x-\mu)^2}{2\sigma^2}};'
plt.text(0, -0.03,'$%s$'%equation_latex)

plt.show()

# [main]
lognormal_plot()

F Distribution

When you want to know if you are within the critical regions for hypothesis tests and and if you are within the confidence intervals …

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import f

# F distribution (or the Fisher–Snedecor distribution)
def f_plot():
rv1 = f(dfn=3, dfd=15, loc=0, scale=1)
x = np.linspace(rv1.ppf(0.0001), rv1.ppf(0.9999), 100)
y = rv1.pdf(x)

plt.plot(x, y, 'bo', ms=8, label='Lognormal pmf')
# plt.fill_between(x, x * 0, y, facecolor='b', alpha=0.1)
plt.vlines(x, 0, y, colors='b', lw=5, alpha=0.5)
plt.xlabel('X', fontsize="10")
plt.ylabel('Probablities, P(X)', fontsize="10")
plt.title('F-Distribution')
# make latex equation
equation_latex = r'f(x; d_1, d_2) = \frac{\sqrt{\frac{(d_1 x)^{d_1}d_2^{d_2}}{(d_1 x + d_2)^{d_1 + d_2}}}}{x B(\frac{d_1}{2}, \frac{d_2}{2})};'
plt.text(0, -0.03,'$%s$'%equation_latex)

plt.show()

# [main]
f_plot()

"""
dfn: degrees of freedom in the estimate of variance of numerator
dfd: degrees of freedom in the estimate of variance of denominator

f probability density function:
------------------------------
df2**(df2/2) * df1**(df1/2) * x**(df1/2-1)
F.pdf(x, df1, df2) = --------------------------------------------
(df2+df1*x)**((df1+df2)/2) * B(df1/2, df2/2)

"""

Logarithmic Distribution

You will notice that the number of items purchased by a consumer in a particular period, the number of bird and plant species in an area, the number of parasites per host, the number of people clustering …

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import logser
import math

# Logarithmic Distribution
def logarithmic_plot():
p = 0.6
# mean, var, skew, kurt = logser.stats(p, moments='mvsk')
x = np.arange(logser.ppf(0.01, p),
logser.ppf(0.99, p))
y = logser.pmf(x, p)
plt.plot(x, y, 'bo', ms=8, label='Logarithmic pdf')
# plt.fill_between(x, x * 0, y, facecolor='b', alpha=0.1)
plt.vlines(x, 0, y, colors='b', lw=5, alpha=0.5)
plt.xlabel('Observations, X', fontsize="10")
plt.ylabel('Probablities, P(X)', fontsize="10")
plt.title('Logarithmic Distribution')
# make latex equation
equation_latex = r'P(X = k) = \frac{\frac{-1}{\log(1-p)} \cdot \left(\frac{p}{1-p}\right)^k}{1 - \left(\frac{p}{1-p}\right)^{k+1}};'
plt.text(0, -1,'$%s$'%equation_latex)
plt.legend();
plt.show()

# [main]
logarithmic_plot()

Gamma Distribution

You observe time between accidents or events …

Rate Parameter (λ):
β = 1 / λ. # e.g. measure the time between accidents or events in days and the scale parameter equals 4 (4 days between accidents on average)
λ = 1 / β. # scale parameter
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gamma
import math

# Gamma Distribution
def gamma_plot():
x = np.linspace (0, 40, 100)
y = gamma.pdf(x, a=5, scale=3)

plt.plot(x, y, 'bo', ms=8, label='Gamma pdf')
# plt.fill_between(x, x * 0, y, facecolor='b', alpha=0.1)
plt.vlines(x, 0, y, colors='b', lw=5, alpha=0.5)
plt.xlabel('Observations, X', fontsize="10")
plt.ylabel('Probablities, P(X)', fontsize="10")
plt.title('Gamma Distribution')
# make latex equation
equation_latex = r'f(x;\alpha,\beta) = \frac{1}{\beta^\alpha \Gamma(\alpha)} x^{\alpha - 1} e^{-x/\beta} \qquad x\geq 0;'
plt.text(0, -0.03,'$%s$'%equation_latex)
equation_latex = r'where\ \alpha>0 $ is the shape parameter, $\beta>0$ is the scale parameter, and $\Gamma(\alpha)$ is the gamma function defined as $\Gamma(\alpha) = \int_0^\infty t^{\alpha-1}e^{-t}dt'
plt.text(0, -0.04,'$%s$'%equation_latex)
plt.legend();
plt.show()

# [main]
gamma_plot()

Beta Distribution

You wonder the probability that a political candidate will win a certain percentage of the votes, likelihood of the audience rating the new movie release, the click-through rate of the website (which is the proportion of visitors), the conversion rate for buyers actually purchasing, the survival chance of a person having blood cancer …

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import beta
import math

# Beta Distribution
def beta_plot():
x = np.linspace(0, 1, 10000)
# α = 8 and β = 2
_alpha = 8
_beta = 2
y = beta.pdf(x, _alpha, _beta)

plt.plot(x, y, 'bo', ms=8, label='Beta pdf')
# plt.fill_between(x, x * 0, y, facecolor='b', alpha=0.1)
plt.vlines(x, 0, y, colors='b', lw=5, alpha=0.5)
plt.xlabel('Observations, X', fontsize="10")
plt.ylabel('Probablities, P(X)', fontsize="10")
plt.title('Beta Distribution')
equation_latex = r'f(x;a,b) = \frac{x^{a-1}(1-x)^{b-1}}{B(a,b)}\qquad 0\leq x \leq 1;'
plt.text(0, -1,'$%s$'%equation_latex)
equation_latex = r'where\ a>0\ and\ b>0,\ and\ B(a,b)$ is the beta function defined as $B(a,b) = \int_0^1 t^{a-1}(1-t)^{b-1}dt'
plt.text(0, -1.2,'$%s$'%equation_latex)
plt.legend();
plt.show()

# [main]
beta_plot()

Pareto Distribution

80-20 …

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import pareto
number_of_samples = 1000
x_m = 1 #scale
alpha = [1, 2, 3, 4, 5] #list of values of shape parameters
samples = np.linspace(start=0, stop=5, num=number_of_samples)
for a in alpha:
output = np.array([pareto.pdf(x=samples, b=a, loc=0, scale=x_m)])
plt.plot(samples, output.T, label='alpha {0}' .format(a))
plt.xlabel('samples', fontsize=15)
plt.ylabel('PDF', fontsize=15)
plt.title('Probability Density function', fontsize=15)
plt.grid(b=True, color='grey', alpha=0.3, linestyle='-.', linewidth=2)
plt.rcParams["figure.figsize"] = [5, 5]
plt.legend(loc='best')
plt.show()
alpha_chosen = alpha[2]
#drawing samples from distribution
samples = (np.random.pareto(alpha_chosen, number_of_samples) + 1) * x_m
count, bins, _ = plt.hist(samples, 100, density = True)
fit = alpha_chosen*x_m**alpha_chosen / bins**(alpha_chosen+1)
plt.plot(bins, max(count)*fit/max(fit), linewidth=2, color='r')
plt.xlabel('bins', fontsize=15)
plt.ylabel('probability density', fontsize=15)
# plt.title('Probability Density Function', fontsize=15)
plt.title('Pareto Distribution')
equation_latex = r'P(t)= \frac{\alpha * t_m^{\alpha}}{t^{\alpha + 1}}'
plt.text(0, -1,'$%s$'%equation_latex, fontsize=15)
plt.grid(b=True, color='grey', alpha=0.3, linestyle='-.', linewidth=2)
plt.rcParams['figure.figsize'] = [8, 8]
plt.show()

The Less Usual

Zipf Distribution

We observe that the rank is the inverse of the frequency … and we do not know why …

100 words makes 1/2 of our context communication.
Perhaps languages' Zipf mystery is, if not caused by it, at least strengthened by preferential attachment. Once a word (and/ or something or someone) is used (or preferred), it's more likely to be used again soon. Critical points may play a role as well. Writing and conversation often stick to a topic until a critical point is reached and the subject is changed and the vocabulary shifts. Processes like these are known to result in power laws. So, in the end, it seems tenable that all these mechanisms might collude to make Zipf's law the most natural way for language to be. Perhaps some of our vocabulary and grammar was developed randomly, according to Mandelbrot's theory. And the natural way conversation and discussion follow preferential attachment and criticality, coupled with the principle of least effort when speaking and listening are all responsible for the relationship between word rank and frequency. - Michael Stevens, Vsauce
Pareto observed …
Zipf observed …
import matplotlib.pyplot as plt
import numpy as np
from numpy import random
from scipy.stats import zipf
import math
import seaborn as sns

# Zipf Distribution
def zipf_plot():
length_of_rv = 1000
a = 2

rv = random.zipf(a, size=length_of_rv)
# rv = np.random.normal(size = length_of_rv)
frequency, bins = np.histogram(rv, bins=10, range=[0, 100])
y = frequency
x = np.linspace(0, 1, len(y))
plt.title('Zipf Distribution')
sns.distplot(rv[rv<10], kde=False)
"""
"""
# make latex equation
equation_latex = r'P(X=k, \alpha, n) = \frac{1/k^\alpha}{\sum_{i=1}^n 1/i^\alpha}'
plt.text(5, 50,'$%s$'%equation_latex)
plt.show()

# [main]
zipf_plot()

_

_

_

[UNDER CONSTRUCTION] Come back later.Notes to be added to elucidate on the use cases.

* ps. You can suggest more Probability Distributions, I will put on the list.

… to be continued …

Translate your numbers into an analogy, an effect, something that pricks a person so bad, he cannot ignore them and has to act on them. Make the sense sensible and sensual. Let them hear, see and feel the numbers. - 2011-09.06

If it is not reproducible, it should be at least reasonable or derivable, even if it is not feasible or unlikely. - 2013-04.03
Reference: List of probability_distributions

--

--

Mi'kail Eli'yah
Mi'kail Eli'yah

No responses yet