# Notes On Probability Distributions

## You Probably Noticed …

1. Logistic Distribution: Logistic Regression2. Harmonic Distribution3. Hypergeometric Distribution4. Binomial Distribution5. Poisson Distribution6. Bernoulli Distribution7. Geometric Distribution8. Exponential Distribution9. Gaussian (Normal) Distribution10. Student's t-Distribution11. Uniform Distribution12. Weibull Distribution13. Chi-Square Distribution14. Lognormal Distribution15. F distribution16. Logarithmic Distribution17. Gamma Distribution18. Beta Distribution 19. Pareto DistributionThe Less Usual1. Zipf Distribution
The symmetry of chance that determines the sequence and number of tries may polarize the favor of 1 event over another. - 2008-06.02Stochasticity? It is but our mortal point of confusion/ illusion. - 2007-05.20

# Logistic Distribution: Logistic RegressionPareto Distribution

The probability of succeeding given the effort …

import matplotlib.pyplot as pltimport pandas as pdimport numpy as npimport scipy as spfrom sklearn.linear_model import LogisticRegressionfrom sklearn.metrics import classification_report, confusion_matrix# μ or x0 is a location parameter or the midpoint of the logistic curve def logistic(x, x0, k, L):    return L/(1+np.exp(-k*(x-x0)))# [main]# data  x = np.arange(10).reshape(-1, 1)y = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1])# model = LogisticRegression(solver='liblinear', random_state=0) # Create a Model and Train Itmodel = LogisticRegression(solver='liblinear', C=10.0, random_state=0)model.fit(x, y)LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,                   intercept_scaling=1, l1_ratio=None, max_iter=100,                   multi_class='warn', n_jobs=None, penalty='l2',                   random_state=0, solver='liblinear', tol=0.0001, verbose=0,                   warm_start=False)                   model = LogisticRegression(solver='liblinear', random_state=0).fit(x, y)# Evaluate the modelp_pred = model.predict_proba(x)y_pred = model.predict(x)"""score_ = model.score(x, y)conf_m = confusion_matrix(y, y_pred)report = classification_report(y, y_pred)"""k = 1x0 = np.mean(x)index_of_x0 = list(x).index(round(x0))logistic_y = logistic(x, x0, k, 1)fig, ax = plt.subplots()ax.plot(x, y, 'bo')note_text = 'Most people likely could have given up around here'# The key option here is bbox. ax.annotate(note_text, xy=(x0, logistic_y[index_of_x0]), xytext=(-20,20),             textcoords='offset points', ha='center', va='bottom',            bbox=dict(boxstyle='round,pad=0.2', fc='yellow', alpha=0.3),            arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0.5',                             color='red'))# plotting the points plt.plot(x, y_pred)plt.plot(x, y)plt.plot(x, logistic_y)# make latex equationequation_latex = r'p(x) = \frac{1}{1+e^{-(x-\mu)/s}}'  plt.text(5, 0.1,'$%s$'%equation_latex)# graphing x axis and y axisplt.xlabel('probability of effort')plt.ylabel('probability of passing')plt.title('Logistic Regression')plt.show() # show plot

# Harmonic Distribution

The probability of blowing out candles given the number of candles …

"""A birthday cake has n burning candles and you need to blow out all the candles. Each time you try to blow out the candles, the number of candles that remain burning is a random event. If the cake starts with k burning candles, then after your attempt to blow them out, it will end up with a whole number of burning candles between 0 and (k -1) and each possibility occurs with equal chance. Each time you will blow out at least 1 candle and for k larger than 1, you will blow a random number of candles between 1 and k. What is the expected number of times you must blow out the candles until all of the end candles are blown out?"""
import matplotlib.pyplot as pltimport numpy as np# [main]# data  n = 100 # number of candlesx = np.arange(n)e_probability = []e_probability.append(0)e_probability.append(1)for k in range(2, n): e_probability.append(1 + ((1/(k+1)) * sum(e_probability)))y = e_probability# plotting the points plt.plot(x, y)# make latex equationequation_latex = r'e_{(k+1)} = 1 + (\frac{1}{(k+1)}) * \sum^{k}_{0}(e_i)'  plt.text(5, 0.1,'$%s$'%equation_latex)# graphing x axis and y axisplt.xlabel('Number of candles')plt.ylabel('Expected number of times of attempt to extinguish all candles')plt.title('Harmonic Distribution')  plt.show() # show plot

_

import matplotlib.pyplot as pltimport numpy as np# [main]# data  n = 100x = np.arange(n)e_series = []# for k in range(1, (n+1)): e_series.append(1/k) y = e_series# plotting the points plt.plot(x, y)# make latex equationequation_latex = r'e_{k} = \frac{1}{k}'  plt.text(40, 0.1,'$%s$'%equation_latex)# graphing x axis and y axisplt.xlabel('Number of candles')plt.ylabel('e_k')plt.title('Harmonic Distribution')plt.show() # show plot

# Hypergeometric Distribution

The probability of getting the number of desired items (k) given the available items (K) in the midst (N) and the number of times (n) you are allow to draw …

import matplotlib.pyplot as pltimport numpy as npimport mathfrom scipy.special import combdef hypergeometric_pmf(N, K, n, x):    ''' Probability Mass Function for Hypergeometric Distribution    : N: population size    : K: total number of desired items in N    : n: number of draws made from N    : x: number of desired items in our draw of n items    :returns: PMF computed at x    '''    K_choose_x = comb(K,x)    N_K_choose_n_x = comb(N-K, n-x)    N_choose_n = comb(N,n)        return (K_choose_x)*N_K_choose_n_x/N_choose_n    def hypergeometric_cdf(N, K, n, t, min_value=None):    ''' Cumulative Density Funtion for Hypergeometric Distribution    : N: population size    : K: total number of desired items in N    : n: number of draws made from N    : t: number of desired items in our draw of n items up to t    :returns: CDF computed up to t    '''    if min_value:        return np.sum([hypergeometric_pmf(N, K, n, x) for x in range(min_value, t+1)])        return np.sum([hypergeometric_pmf(N, K, n, x) for x in range(t+1)])def hypergeometric_plot(N, K, n):    ''' Visualization of Hypergeometric Distribution for given parameters    : N: population size    : K: total number of desired items in N    : n: number of draws made from N    :returns: Plot of Hypergeometric Distribution for given parameters    '''    x = np.arange(0, n+1)    y = [hypergeometric_pmf(N, K, n, x) for x in range(n+1)]    plt.plot(x, y, 'bo')    plt.vlines(x, 0, y, lw=2)    plt.xlabel('(k) # of desired items in our draw')    plt.ylabel('Probablities')    plt.title('Hypergeometric Distribution')    # make latex equation    equation_latex = r'p(X = k) = \frac{C^{K}_{k}*C^{N-K}_{n-k}}{C^{N}_{n}}'      plt.text(3, 0.3,'$%s$'%equation_latex)    plt.show()# [main]# data  """N = 15 total candies in the jar.n = 5 draws for the sample.K = 5 red candies in the jar.k = draw 2 red candies in the sample."""N = 15K = 5n = 5# p = hypergeometric_pmf(15, 5, 5, 2)print(f"The probability of drawing precisely 2 red candies in our 5 random draws from the jar is {hypergeometric_pmf(15, 5, 5, 2)}.")"""The probability of drawing precisely 2 red candies in our 5 random draws from the jar is 0.3996003996003996."""hypergeometric_plot(N, K, n)""" N = 52 # 52 cards in a deck of cards.K = 13 # 13 cards per symbol (♠ ♥ ♦ ♣)n = 5  # 5 card opening handk = 3  # draw 3 of the same symbol (♠ ♥ ♦ ♣)"""

# Binomial Distribution

The probability of random experiment of tossing a biased coin of probability (p = 0.6 in each throw) for (n = 6) times where the probability of getting a head on the (k = 4) th trial …

import matplotlib.pyplot as pltimport numpy as npfrom scipy.stats import binom    def binomial_cdf(n, p): # binom.cdf(x, n, p)    ''' Cumulative Density Funtion for Binomial Distribution    : n: number of attempts    : p: probability of success in each trial    : x: range of kth success    :returns: CDF computed up to n    '''    distribution = [binom.pmf(x, n, p) for x in range(n+1)]        cdf = []    cdf.append(distribution)        for i in range(1, len(distribution)):      cdf.append((distribution[i] + cdf[i-1]))        return cdfdef binomial_plot(n, p):    ''' Visualization of Binomial Distribution for given parameters    : n: number of attempts    : p: probability of success in each trial    :returns: Plot of Binomial Distribution for given parameters    '''    x = np.arange(0, n+1)    y = [binom.pmf(x, n, p) for x in range(n+1)]    # y = binomial_cdf(n, p)        plt.plot(x, y, 'bo')    plt.vlines(x, 0, y, lw=2)    plt.xlabel('probability of getting a head')    plt.ylabel('Probablities')    plt.title('Binomial Distribution')    # make latex equation    equation_latex = r'P(X = k, n, p) = \binom{n}{k} p^{k} * (1-p)^{n-k}; \binom{n}{k} = \frac{n!}{k!(n-k)!}'      plt.text(3, 0.4,'$%s$'%equation_latex)    plt.show()# [main]# data  """"""p = 0.6 # biased coin of such porbabilityn = 6 # fixed number (n) of trialsx = 4 # kth of attempts; x = kP = binom.pmf(x, n, p)print(f"The probability of random experiment of tossing a biased coin {n} times where the probability of getting a head on the {x}th trial is {P}.")"""The probability of random experiment of tossing a biased coin 6 times where the probability of getting a head on the 4th trial is 0.3110400000000001."""binomial_plot(n = 6, p = 0.6)"""

# Poisson Distribution

The probability of ( λ= 2) buses arriving within 30 mins …

import matplotlib.pyplot as pltimport numpy as npfrom scipy.stats import poisson    def poisson_cdf(_lambda, n): #     ''' Cumulative Density Funtion for Poisson Distribution    : _lambda:  average occurences of event within an interval    : n     : maximal number of buses anticipated within the interval    :returns: CDF computed up to n    '''    distribution = [poisson.pmf(x, _lambda) for x in range(n+1)]        cdf = []    cdf.append(distribution)        for i in range(1, len(distribution)):      cdf.append((distribution[i] + cdf[i-1]))        return cdfdef poisson_plot(n, x, _lambda):    ''' Visualization of Poisson Distribution for given parameters    : x: number of buses arriving within 30 mins    : n: maximal number of buses anticipated within the interval    :returns: Plot of Poisson Distribution for given parameters    '''    # x = np.arange(0, n+1)    y = poisson.pmf(x, _lambda) #     # y = poisson_cdf(lmbda)        # fig, ax = plt.subplots(1, 1, figsize=(8, 6))    plt.plot(x, y, 'bo', ms=8, label='poisson pmf')    plt.vlines(x, 0, y, colors='b', lw=5, alpha=0.5)    plt.xlabel('number of buses arriving within 30 mins', fontsize="10")    plt.ylabel('Probablities', fontsize="10")    plt.title('Poisson Distribution')    # make latex equation    equation_latex = r'P(X = k, \lambda) = \frac{\lambda ^k * e^{-\lambda}}{k!}'      plt.text(3, 0.4,'$%s$'%equation_latex)    plt.show()# [main]_lambda = 2 # average occurences of event within an interval, e.g. every 30 minutes, 2 busesn = 10 # maximal number of buses anticipated within the intervalx = np.arange(0, n+1)   # number of buses arriving within 30 minsP = poisson.pmf(x, _lambda)print(f"The probability of {x} buses arriving within 30 mins is {P}.")"""The probability of 2 buses arriving within 30 mins is 0.2706705664732254. """poisson_plot(n, x, _lambda)"""import numpy as npimport matplotlib.pyplot as pltfrom scipy.stats import gaussian_kde, poissondef sample_poisson(mu, size):    return poisson(mu).rvs(size)# Example usagemu = 2.0size = 1000samples = sample_poisson(mu, size)# Create PDF using KDEkde = gaussian_kde(samples)x = np.linspace(0, 10, 1000)pdf = kde(x)# Plot PDFplt.plot(x, pdf)plt.xlabel('x')plt.ylabel('PDF')plt.title('Poisson Distribution with mu = 2.0')plt.show()"""

If the time interval is observed to be 60 mins per bus, and the time instances at which the buses arrived are (a, b, c, d), where they belong to [0,60]. We sample from a distribution where the average of the time gaps is: 15 minutes. Thus, the values can be (5 min, 15 min, 55 min, 60 min).

When you randomly arrive at within the hour, you can choose any of the gaps, but this would not be uniform.

E(waiting time) = (1/2)[(5/60)*(5) + (10/60)*(10) + (40/60)*40 + (5/60)*5]=14.58 minutes

1/2 is assumed to be due to the fact that if we arrive in a time interval of let’s say 30 minutes, then the time we wait would be around 15 minutes if we assume we arrive at random within that time. In notation, the expectation of the arrival times is:

where p(T) is the distribution of intervals T between buses as they arrive at the bus stop. The expected wait time E[W] would then be 1/2 of the expected interval experienced by passengers so, E[W] = 0.5*E[T].

The expected value for this distribution is E(X) = 1/λ, which can be interpreted as the waiting time.

We can derive the Poisson formula mathematically from the Binomial PMF.

You can also use Poisson distribution to chart for anticipated failures of systems over time. Example Poisson Process with the average time between events of 60 days.

or anticipation of meteors. … expect 5 meteors per hour on average or 1 every 12 minutes, and assumed we waited for 60 minutes. Probability of observing 3 meteors in 1 hour.

# Bernoulli Distribution

The probability of success and failure of a given known binary outcome is …

import matplotlib.pyplot as pltimport numpy as npfrom scipy.stats import bernoullidef bernoulli_plot(p):     ''' Visualization of Bernoulli Distribution for given parameters    : p: probability of success     :returns: Plot of Bernoulli Distribution for given parameters    '''    x = [0, 1]    # bd = bernoulli(p); bd.pmf(X);    y = bernoulli(p).pmf(x) # probability mass function: f = p**x*(1-p)**(1-x)        # fig, ax = plt.subplots(1, 1, figsize=(8, 6))    plt.plot(x, y, 'bo', ms=8, label='Bernoulli pmf')    plt.vlines(x, 0, y, colors='b', lw=5, alpha=0.5)    plt.xlabel('Success or failure (0, 1)', fontsize="10")    plt.ylabel('Probablities', fontsize="10")    plt.title('Bernoulli Distribution')    # make latex equation    equation_latex = r'P(X = k) = p^{k} * (1-p)^{n-k}; k = 1, n = 2;' # 2 possibilities, 1 outcome    plt.text(0.2, 0.35,'$%s$'%equation_latex)    plt.show()# [main]p = 0.6 # probability of success # P = bernoulli(p).pmf(1) print(f"The probability of success is {bernoulli(p).pmf(1)} and failure is {bernoulli(p).pmf(0)}.")"""The probability of success is 0.6 and failure is 0.4."""bernoulli_plot(p)"""k = 1 for Bernoulli Distribution"""

# Geometric Distribution

On kth attempt, i.e. X = k, there would be X = k— 1 failures before you get your success. The probability of success on the kth attempt given each time the success of probability is p is …

import matplotlib.pyplot as pltimport numpy as npfrom scipy.stats import geomdef geometric_plot(x, p):     ''' Visualization of Geometric Distribution for given parameters    : p: probability of success     :returns: Plot of Geometric Distribution for given parameters    '''    y = geom.pmf(x, p) #         # fig, ax = plt.subplots(1, 1, figsize=(8, 6))    plt.plot(x, y, 'bo', ms=8, label='Geometric pmf')    plt.vlines(x, 0, y, colors='b', lw=5, alpha=0.5)    plt.xlabel('# of attempts', fontsize="10")    plt.ylabel('Probablities', fontsize="10")    plt.title('Geometric Distribution')    # make latex equation    equation_latex = r'P(X = k) = p * (1-p)^{k-1}; k: \ kth \ attempt'      plt.text(2, 0.35,'$%s$'%equation_latex)    plt.show()# [main]p = 0.6 # probability of success n = 10 # maximal number of trialsx = np.arange(0, n+1)   # number of attemptsk = 4   # kth attemptP = geom.pmf(k, p)print(f"The probability of success on the {k}th attempt is {P}.")"""The probability of success on the 4th attempt is 0.03840000000000001."""geometric_plot(x, p)

# Exponential Distribution

The probability of decay and growth …

import matplotlib.pyplot as pltimport numpy as npfrom scipy.stats import expondef exponential_plot(x, location_start, _lambda):     ''' Visualization of Exponential Distribution for given parameters    : p: probability of success     :returns: Plot of Exponential Distribution for given parameters    '''    # case: decay    y = expon.pdf(x, location_start, _lambda) # expon.cdf(x, location_start, _lambda)    number_of_rows = 2    number_of_columns = 1    plt.subplot(number_of_rows, number_of_columns, 1) # 1st row display: index 1     plt.plot(x, y, 'bo', ms=10, label='Geometric pdf')    plt.vlines(x, 0, y, colors='b', lw=5, alpha=0.5)    plt.xlabel('# of incidence', fontsize="10")    plt.ylabel('Probablities', fontsize="10")    plt.title('Exponential Distribution (Decay)', fontsize="10")    # make latex equation    equation_latex = r'P(X = x, \lambda) = \lambda * e^{- \lambda x}; x >= 0; P(X = x, \lambda) = 0; x < 0'      plt.text(5, 0.15,'$%s$'%equation_latex)    # case: growth    y2 = []    for i in range(0, len(y)):      y2.append(1 - y[i])        plt.subplot(number_of_rows, number_of_columns, 2) # 2nd row display: index 2    plt.plot(x, y2, 'bo', ms=10, label='Geometric pdf')    plt.vlines(x, 0, y2, colors='b', lw=5, alpha=0.5)    plt.xlabel('# of incidence', fontsize="10")    plt.ylabel('Probablities', fontsize="10")    plt.title('Exponential Distribution (Growth)', fontsize="10")    # make latex equation    equation_latex = r'P(X = x, \lambda) = 1 - \lambda * e^{- \lambda x}; x >= 0; P(X = x, \lambda) = 0; x < 0'      plt.text(5, -0.15,'$%s$'%equation_latex)    plt.show()# [main]n = 10  # minsx = np.arange(0, n+1)   # number of attempts_lambda = 0.6   # rate parameter: λ: the rate parameter (calculated as λ = 1/μ), e.g. mean number of minutes between eruptions for a certain geyser is 40 minutesk = 5location_start = 0 # min 0P = expon.pdf(x, location_start, _lambda)print(f"The probability of decay on the {k}th min is {P[k]}.")"""The probability of decay on the 5th min is 0.0004006157940325235."""exponential_plot(x, location_start, _lambda)

# Gaussian (Normal) Distribution

import matplotlib.pyplot as pltimport numpy as npfrom scipy.stats import norm"""def normal(x, mu, sigma):    p = 1 / math.sqrt(2 * math.pi * sigma**2)    return p * np.exp(-0.5 / sigma**2 * (x - mu)**2)"""def normal_plot(mu, std):     ''' Visualization of Gaussian Distribution for given parameters    : p: probability of success     :returns: Plot of Gaussian Distribution for given parameters    '''    x = np.linspace(-5, 5, 100)    distribution_normal = norm(mu, std)    y = distribution_normal.pdf(x)        # fig, ax = plt.subplots(1, 1, figsize=(8, 6))    plt.plot(x, y, 'bo', ms=8, label='Guassian pdf')    plt.vlines(x, 0, y, colors='b', lw=5, alpha=0.5)    plt.xlabel('X', fontsize="10")    plt.ylabel('Probablities, P(X)', fontsize="10")    plt.title('Gaussian Distribution (Normal)')    # make latex equation    equation_latex = r'P(X = k) = \frac{1} {\sigma \sqrt{2 * \pi}} * e^{- \frac{1}{2} (\frac{ x - \mu }{ \sigma })^2}'      plt.text(2, 0.35,'$%s$'%equation_latex)    plt.show()# [main]# Create a standard normal distribution with mean as 0 and standard deviation as 1mu = 0std = 1normal_plot(mu, std)

# Student’s t-Distribution

When you do not have enough data to be normal …

import matplotlib.pyplot as pltimport numpy as npimport scipy.stats as statsimport math# Student's t-Distributiondef t_plot():     mu = 1.5    variance = 2    sigma = math.sqrt(variance)        x = np.linspace(mu - 5*sigma, mu + 3*sigma, 100)    dof = len(x) - 1    y = stats.t.pdf(x, dof)        plt.plot(x, y, 'bo', ms=8, label='t-distribution pdf' + ' with dof: ' + str(dof))    # plt.fill_between(x, x * 0, y, facecolor='b', alpha=0.1)    plt.vlines(x, 0, y, colors='b', lw=5, alpha=0.5)    plt.xlabel('Observations, X', fontsize="10")    plt.ylabel('Probablities, P(X)', fontsize="10")    plt.title('Student\'s t-Distribution')    # make latex equation    equation_latex = r'f(x; \nu) = \frac{\Gamma((\nu+1)/2)}{\sqrt{\pi\nu}\Gamma(\nu/2)}\left(1+\frac{x^2}{\nu}\right)^{-\frac{\nu+1}{2}}'      plt.text(0, -0.02,'$%s$'%equation_latex)     plt.legend();    plt.show()# [main]t_plot()"""import pandas as pdimport seaborn as snspdf_3 = stats.t.pdf(x, 3)pdf_10 = stats.t.pdf(x, 10)pdf_100 = stats.t.pdf(x, 100)df = pd.DataFrame({'pdf_3': pdf_3, 'pdf_10': pdf_10, 'pdf_100': pdf_100, 'x_axis': x_axis})plt.figure(figsize=(15,8))plt.plot(df['x_axis'], df['pdf_3'], label ='3 degree of freedom');plt.plot(df['x_axis'], df['pdf_10'], label='10 degree of freedom');plt.plot(df['x_axis'], df['pdf_100'], label='100 degree of freedom');""""""Student distribution is used to estimating the mean of a normally-distributed population in situations where the sample size is small and the population's standard deviation is unknown.Student distribution has heavier tails, i.e. it is more prone to producing values that fall far from its mean. t-distribution is characterised by the degrees of freedom which equals to 𝜈=𝑛−1 (𝜈 is pronounced as nu)."""

# Uniform Distribution

import matplotlib.pyplot as pltimport numpy as npfrom scipy.stats import uniformdef uniform_plot(x, a, b):     ''' Visualization of Uniform Distribution for given parameters    :returns: Plot of Uniform Distribution for given parameters    '''    variable_random = uniform.pdf(x, a, b)     y = variable_random        plt.plot(x, y, 'bo', ms=8, label='Uniform pdf')    plt.vlines(x, 0, y, colors='b', lw=5, alpha=0.5)    plt.xlabel('X', fontsize="10")    plt.ylabel('Probablities, P(X)', fontsize="10")    plt.title('Uniform Distribution')    # make latex equation    equation_latex = r'p(x) = 1/( b – a), a ≤ x ≤ b'      plt.text(0, 0.25,'$%s$'%equation_latex)    plt.show()# [main]# Create an uniform distribution a = 0b = 5x = np.linspace(a, b, 100)uniform_plot(x, a, b)

# Weibull Distribution

Anticipating when the failure will set in …

# Lognormal Distribution

You may observe in nature on the amount of rainfall, milk production by cows, and for most natural growth processes, where the growth rate is independent of size …

import matplotlib.pyplot as pltimport numpy as npfrom scipy.stats import lognormdef lognormal_plot():     # x = np.linspace(0, 80, 1000)        # loc (mu) where the graph is located, s (standard deviation) influence the shape    x = np.linspace(0, 4, 100)     y = lognorm.pdf(x, s=0.5, loc=0)  # green        plt.plot(x, y, 'bo', ms=8, label='Lognormal pdf')    # plt.fill_between(x, x * 0, y, facecolor='b', alpha=0.1)    plt.vlines(x, 0, y, colors='b', lw=5, alpha=0.5)    plt.xlabel('X', fontsize="10")    plt.ylabel('Probablities, P(X)', fontsize="10")    plt.title('Lognormal Distribution')    # make latex equation    equation_latex = r'f(x;\mu,\sigma) = \frac{1}{x\sigma\sqrt{2\pi}}e^{-\frac{(\ln x-\mu)^2}{2\sigma^2}};'      plt.text(0, -0.03,'$%s$'%equation_latex)     plt.show()# [main]lognormal_plot()

# F Distribution

When you want to know if you are within the critical regions for hypothesis tests and and if you are within the confidence intervals …

import matplotlib.pyplot as pltimport numpy as npfrom scipy.stats import f# F distribution (or the Fisher–Snedecor distribution) def f_plot():     rv1 = f(dfn=3, dfd=15, loc=0, scale=1)    x = np.linspace(rv1.ppf(0.0001), rv1.ppf(0.9999), 100)    y = rv1.pdf(x)        plt.plot(x, y, 'bo', ms=8, label='Lognormal pmf')    # plt.fill_between(x, x * 0, y, facecolor='b', alpha=0.1)    plt.vlines(x, 0, y, colors='b', lw=5, alpha=0.5)    plt.xlabel('X', fontsize="10")    plt.ylabel('Probablities, P(X)', fontsize="10")    plt.title('F-Distribution')    # make latex equation    equation_latex = r'f(x; d_1, d_2) = \frac{\sqrt{\frac{(d_1 x)^{d_1}d_2^{d_2}}{(d_1 x + d_2)^{d_1 + d_2}}}}{x B(\frac{d_1}{2}, \frac{d_2}{2})};'      plt.text(0, -0.03,'$%s$'%equation_latex)     plt.show()# [main]f_plot()"""dfn: degrees of freedom in the estimate of variance of numeratordfd: degrees of freedom in the estimate of variance of denominatorf probability density function:------------------------------                     df2**(df2/2) * df1**(df1/2) * x**(df1/2-1)F.pdf(x, df1, df2) = --------------------------------------------                     (df2+df1*x)**((df1+df2)/2) * B(df1/2, df2/2)"""

# Logarithmic Distribution

You will notice that the number of items purchased by a consumer in a particular period, the number of bird and plant species in an area, the number of parasites per host, the number of people clustering …

import matplotlib.pyplot as pltimport numpy as npfrom scipy.stats import logserimport math# Logarithmic Distributiondef logarithmic_plot():     p = 0.6    # mean, var, skew, kurt = logser.stats(p, moments='mvsk')    x = np.arange(logser.ppf(0.01, p),              logser.ppf(0.99, p))    y = logser.pmf(x, p)    plt.plot(x, y, 'bo', ms=8, label='Logarithmic pdf')    # plt.fill_between(x, x * 0, y, facecolor='b', alpha=0.1)    plt.vlines(x, 0, y, colors='b', lw=5, alpha=0.5)    plt.xlabel('Observations, X', fontsize="10")    plt.ylabel('Probablities, P(X)', fontsize="10")    plt.title('Logarithmic Distribution')    # make latex equation    equation_latex = r'P(X = k) = \frac{\frac{-1}{\log(1-p)} \cdot \left(\frac{p}{1-p}\right)^k}{1 - \left(\frac{p}{1-p}\right)^{k+1}};'      plt.text(0, -1,'$%s$'%equation_latex)     plt.legend();    plt.show()# [main]logarithmic_plot()

# Gamma Distribution

You observe time between accidents or events …

Rate Parameter (λ):β = 1 / λ.  # e.g. measure the time between accidents or events in days and the scale parameter equals 4 (4 days between accidents on average)λ = 1 / β.  # scale parameter
import matplotlib.pyplot as pltimport numpy as npfrom scipy.stats import gammaimport math# Gamma Distributiondef gamma_plot():     x = np.linspace (0, 40, 100)     y = gamma.pdf(x, a=5, scale=3)        plt.plot(x, y, 'bo', ms=8, label='Gamma pdf')    # plt.fill_between(x, x * 0, y, facecolor='b', alpha=0.1)    plt.vlines(x, 0, y, colors='b', lw=5, alpha=0.5)    plt.xlabel('Observations, X', fontsize="10")    plt.ylabel('Probablities, P(X)', fontsize="10")    plt.title('Gamma Distribution')    # make latex equation    equation_latex = r'f(x;\alpha,\beta) = \frac{1}{\beta^\alpha \Gamma(\alpha)} x^{\alpha - 1} e^{-x/\beta} \qquad x\geq 0;'      plt.text(0, -0.03,'$%s$'%equation_latex)     equation_latex = r'where\ \alpha>0 $is the shape parameter,$\beta>0$is the scale parameter, and$\Gamma(\alpha)$is the gamma function defined as$\Gamma(\alpha) = \int_0^\infty t^{\alpha-1}e^{-t}dt'      plt.text(0, -0.04,'$%s$'%equation_latex)        plt.legend();    plt.show()# [main]gamma_plot()

# Beta Distribution

You wonder the probability that a political candidate will win a certain percentage of the votes, likelihood of the audience rating the new movie release, the click-through rate of the website (which is the proportion of visitors), the conversion rate for buyers actually purchasing, the survival chance of a person having blood cancer …

import matplotlib.pyplot as pltimport numpy as npfrom scipy.stats import betaimport math# Beta Distributiondef beta_plot():     x = np.linspace(0, 1, 10000)    # α = 8 and β = 2    _alpha = 8     _beta = 2    y = beta.pdf(x, _alpha, _beta)        plt.plot(x, y, 'bo', ms=8, label='Beta pdf')    # plt.fill_between(x, x * 0, y, facecolor='b', alpha=0.1)    plt.vlines(x, 0, y, colors='b', lw=5, alpha=0.5)    plt.xlabel('Observations, X', fontsize="10")    plt.ylabel('Probablities, P(X)', fontsize="10")    plt.title('Beta Distribution')    equation_latex = r'f(x;a,b) = \frac{x^{a-1}(1-x)^{b-1}}{B(a,b)}\qquad 0\leq x \leq 1;'      plt.text(0, -1,'$%s$'%equation_latex)     equation_latex = r'where\ a>0\ and\ b>0,\ and\ B(a,b)$is the beta function defined as$B(a,b) = \int_0^1 t^{a-1}(1-t)^{b-1}dt'      plt.text(0, -1.2,'$%s$'%equation_latex)      plt.legend();    plt.show()# [main]beta_plot()

# Pareto Distribution

80-20 …

import numpy as npimport matplotlib.pyplot as pltfrom scipy.stats import paretonumber_of_samples = 1000x_m = 1 #scalealpha = [1, 2, 3, 4, 5] #list of values of shape parameterssamples = np.linspace(start=0, stop=5, num=number_of_samples)for a in alpha:    output = np.array([pareto.pdf(x=samples, b=a, loc=0, scale=x_m)])    plt.plot(samples, output.T, label='alpha {0}' .format(a))plt.xlabel('samples', fontsize=15)plt.ylabel('PDF', fontsize=15)plt.title('Probability Density function', fontsize=15)plt.grid(b=True, color='grey', alpha=0.3, linestyle='-.', linewidth=2)plt.rcParams["figure.figsize"] = [5, 5]plt.legend(loc='best')plt.show()alpha_chosen = alpha#drawing samples from distributionsamples = (np.random.pareto(alpha_chosen, number_of_samples) + 1) * x_mcount, bins, _ = plt.hist(samples, 100, density = True)fit = alpha_chosen*x_m**alpha_chosen / bins**(alpha_chosen+1)plt.plot(bins, max(count)*fit/max(fit), linewidth=2, color='r')plt.xlabel('bins', fontsize=15)plt.ylabel('probability density', fontsize=15)# plt.title('Probability Density Function', fontsize=15)plt.title('Pareto Distribution')equation_latex = r'P(t)= \frac{\alpha * t_m^{\alpha}}{t^{\alpha + 1}}'  plt.text(0, -1,'$%s$'%equation_latex, fontsize=15)plt.grid(b=True, color='grey', alpha=0.3, linestyle='-.', linewidth=2)plt.rcParams['figure.figsize'] = [8, 8]plt.show()

# Zipf Distribution

We observe that the rank is the inverse of the frequency … and we do not know why …

Perhaps languages' Zipf mystery is, if not caused by it, at least strengthened by preferential attachment. Once a word (and/ or something or someone) is used (or preferred), it's more likely to be used again soon. Critical points may play a role as well. Writing and conversation often stick to a topic until a critical point is reached and the subject is changed and the vocabulary shifts. Processes like these are known to result in power laws. So, in the end, it seems tenable that all these mechanisms might collude to make Zipf's law the most natural way for language to be. Perhaps some of our vocabulary and grammar was developed randomly, according to Mandelbrot's theory. And the natural way conversation and discussion follow preferential attachment and criticality, coupled with the principle of least effort when speaking and listening are all responsible for the relationship between word rank and frequency. - Michael Stevens, Vsauce
import matplotlib.pyplot as pltimport numpy as npfrom numpy import randomfrom scipy.stats import zipfimport mathimport seaborn as sns# Zipf Distributiondef zipf_plot():     length_of_rv = 1000    a = 2    rv = random.zipf(a, size=length_of_rv)    # rv = np.random.normal(size = length_of_rv)    frequency, bins = np.histogram(rv, bins=10, range=[0, 100])    y = frequency    x = np.linspace(0, 1, len(y))    plt.title('Zipf Distribution')    sns.distplot(rv[rv<10], kde=False)    """    """    # make latex equation    equation_latex = r'P(X=k, \alpha, n) = \frac{1/k^\alpha}{\sum_{i=1}^n 1/i^\alpha}'      plt.text(5, 50,'$%s$'%equation_latex)    plt.show()# [main]zipf_plot()

_

_

_

[UNDER CONSTRUCTION] Come back later.Notes to be added to elucidate on the use cases. * ps. You can suggest more Probability Distributions, I will put on the list.

… to be continued …

Translate your numbers into an analogy, an effect, something that pricks a person so bad, he cannot ignore them and has to act on them. Make the sense sensible and sensual. Let them hear, see and feel the numbers. - 2011-09.06If it is not reproducible, it should be at least reasonable or derivable, even if it is not feasible or unlikely. - 2013-04.03
Reference: List of probability_distributions