Probability theory

Prabhat Dixit
7 min readJun 29, 2021

From the school days only, we have been taught probability with examples like tossing a coin or rolling a dice, though the power of probability is not limited to that much, the probability theory is a branch of mathematics concerned with the analysis of random phenomena. The outcome of a random event cannot be determined before it occurs, but it may be any one of several possible outcomes. This random phenomenon outcomes stored in a variable called random variable.

So in other words random variable is a function defined on a sample space where sample space is a collection of all possible outcomes of a random experiment.

Now there can be two different types of random variables discrete random variable and then there is continuous random variable. To understand this let’s take an example of both the cases and understand it better.

Example for Discrete Random variable:-

James a good friend of yours has an ice cream stand and wanted to analyze his sales and make a decision to employee an assistant as you are a data scientist, James asked you to help him with the same. You collected the data for the number of ice-creams brought and the number of customers who bought that number of ice creams and created like this:-

Here we can define X as the number of ice- cream bought and is our random variable, it is a random variable because it can take several different values. Then you estimate the probability of each value of X by finding the relative frequencies from the experimental data i.e., no. of customers in our case. And it looks like this.

So above example is a discrete random variable as it will take on a numerical, countable number of possible values.

Example for Continuous Random variable:-

From the above example only, if James is now interested in different weights he serves in one or two scoops or he is interested to know how long he takes to serve a customer so these are the examples of Continuous random variables as they can take any value between the range including the fractions, the weight will take a range of values in grams and the time will take the range of values in minutes or seconds.

Probability Distributions:-

Now that we have seen the different types of a random variable, let us discuss more of the probability distributions associated with both the random variables:-

  1. Discrete probability distributions:- A discrete distribution is a probability distribution that depicts the occurrence of discrete (individually countable) outcomes, such as 1, 2, 3… or zero. The function fX(x) = P(X=x) specifies how the total probability of 1 is divided up amongst the possible values of X and so gives the probability distribution of X.

In the above example where our friend James needs help with his ice-cream business, we created the probability distribution table.

There are many discrete probability distributions to be used in different scenarios. Let’s mention some of them:-

i) Uniform discrete distribution:- A random variable X has a discrete uniform distribution if each of the n values in its range, say x1, x2, . . . , xn, has equal probability. Then,

fX(x) = P(X=x) = 1/n

A number that comes when we roll dice is a perfect example of uniform distribution.

Uniform Distribution

ii) Binomial Distribution:- Suppose a trial has only two outcomes, denoted by S for success and F for failure with P(S) = p and P(F) = 1 − p. For example, a coin toss where a Head is a success S and a Tail is a failure F. Such a trial is called a Bernoulli trial. If we perform a random experiment by repeating n independent Bernoulli trials, then the random variable X representing the number of successes in the n trials has a binomial distribution.

Binomial Distribution

iii) Geometric Distribution:- The geometric distribution represents the number of failures before you get success in a series of Bernoulli trials. To understand it better consider it this way Thomas Edison had to fail 1000 times before creating a bulb, so thousand is out geometric distribution.

iv) Poisson Distribution:- This distribution describes the events that occur in a fixed interval of time or space. For example, consider a case of a number of calls in a call center per hour, here we can estimate the average number of calls however we cannot know the exact number or the exact time for each call, each occurrence is independent of one another. This is a special case of the Binomial Distribution as n goes to infinity while the expected number of successes remains fixed.

2. Continuous probability distributions:- A probability distribution in which the random variable X can take on any value (is continuous). Because there are infinite values that X could assume, the probability of X taking on anyone specific value is zero. Therefore we often speak in ranges of values (p(X>0) = .50).

Again there are different continuous probability distributions that are commonly used are:-

i) Normal Distribution:- A normal distribution, sometimes called the bell curve, is a distribution that occurs naturally in many situations(that is why it called normal I guess) and is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean.

ii) Standard Normal Distribution:- The standard normal distribution is a special case of the normal distribution. It is the distribution that occurs when a normal random variable has a mean of zero and a standard deviation of one. This allows you to easily calculate the probability of certain values occurring in your distribution, or to compare data sets with different means and standard deviations.

The normal random variable of a standard normal distribution is called a standard score or a z score. Every normal random variable X can be transformed into a z score via the following equation:

z = (X — μ) / σ

iii) Exponential distribution:- The exponential distribution is a continuous probability distribution used to model the time we need to wait before a given event occurs. Consider a waiting time at a bus stop before you got your bus, here waiting time is exponentially distributed.

iv) Chi-square Distribution:- A chi-square distribution is a continuous distribution with k degrees of freedom. It is used to describe the distribution of a sum of squared random variables. It is also used to test the goodness of fit of a distribution of data, whether data series are independent, and for estimating confidences surrounding variance and standard deviation for a random variable from a normal distribution.

v) t-Distribution:- t-distribution is a type of normal distribution used for smaller sample sizes, where the variance in the data is unknown, the estimation of variance is based on many degrees of freedom. The t- distribution approaches the normal distribution as the degrees of freedom increase.

vi) F-distribution:- F-distribution is a probability density function that is used especially in the analysis of variance and is a function of the ratio of two independent random variables each of which has a chi-square distribution and is divided by its number of degrees of freedom.

So this brings to end of our article, hope you all have learned the different types of a random variable and distribution associate with them. I will come with more such topics super soon, until then sayonara!

--

--