Bayes’ theorem

Key focus: Bayes’ theorem is a method for revising the prior probability for specific event, taking into account the evidence available about the event.

Introduction

In statistics, the process of drawing conclusions from data subject to random variations – is called “statistical inference”. Usually, in any random experiment, the observations are recorded and conclusions have to be drawn based on the recorded data set. Conclusions over the underlying random process are necessary to establish one or many of the following:

* Estimation of a parameter of interest (For example: the carrier frequency estimation in the receiver)
* Confidence and credibility of the estimate
* Rejecting a preconceived hypothesis
* Classification of data set into groups

Several schools of statistical inference have evolved over time. Bayesian inference is one of them.

Bayes’ theorem

Bayes’ theorem is central to scientific discovery and a core tool in machine learning/AI. It has numerous applications including but not limited to areas such as: mathematics, medicine, finance, marketing and engineering.

The Bayes’ theorem is used in Bayesian inference, usually dealing with a sequence of events, as new information becomes available about a subsequent event, that new information is used to update the probability of the initial event. In this context, we encounter two flavors of probabilities: prior probability and posterior probability.

Prior probability : This is the initial probability about an event before any information is available about the event. In other words, this is the initial belief about a particular hypothesis before any evidence is available about the hypothesis.

Posterior probability: This is the probability value that has been revised by using new information that is later obtained from a subsequent event. In other words, this is the updated belief about the hypothesis as new evident becomes available.

The formula for Bayes’ theorem is

Figure 1: Formula for Bayes’ theorem

A very simple thought experiment

You are asked to conduct a random experiment with a given coin. You are told that the coin is unbiased (probability of obtaining head or tail is equal and is exactly 50%). You believe (before conducting the experiment) that the coin is unbiased and that the chance of getting head or tail is equal to be 0.5.

Assume that you have not looked at both sides of the coin and simply you start to conduct the experiment. You start to toss the coin repeatedly and record the events (This is the observed new information/evidences). On the first toss you observe the coin lands on the ground with head faced up. On the second toss, again the head shows up. On subsequent tosses, the coin always shows up head. You have tossed 100 times and all these tosses you observe only head. Now what will you think about the coin? You will really start to think that both sides of the coin are engraved with “head” (no tail etched on the coin). Now, based on the new evidences, your belief about the “unbiasedness” of the coin is altered.

This is what Bayes’ theorem or Bayesian inference is all about. It is a general principle about learning from experience. It connects beliefs (called prior probabilities) and evidences (observed data). Based on the evidence, the degree of belief is refined. The degree of belief after conducting the experiment is called posterior probability.

Figure 2: Bayes’ theorem – the process

Real world example

Suppose, a person X falls sick and goes to the doctor for diagnosis. The doctor runs a series of tests and the test result came positive for a rare disease that affects 0.1% of the population. The accuracy of the test is 99%. That is, the test can correctly identify 99% of people that have the disease and will incorrectly report disease in only 1% of the people that do not have the disease. Now, how certain is that the person X actually have the disease ?

In this scenario, we can apply the extended form of Bayes’ theorem

Figure 3: Bayes’ theorem – extended form

Extended form of Bayes’ theorem is applied in special scenarios where P(H) is a binary variable, which implies it can take only two possible states. In the given problem above, the hypothesis can take only two states – H – “having the disease” and – “not having the disease”.

For the given problem, we can come up with the following numbers for the various quantities in the extended form of Bayes’ theorem.

P(H) = prior probability of having the disease before the availability of test results. This is often guess work, but luckily we have the probability that affects the population (0.1% = 0.001) to replace this.
P(E/H) = probability to test positive for the disease if person X has the disease (99% = 0.99)
P(H̅) = probability of NOT having the disease (1-0.001 = 0.999)
P(E/H̅) = probability of NOT having the disease and falsely identified positive by the test (1% = 0.01).
P(H/E) = probability of person X actually have the disease given the test result is positive.

Plugging-in these numbers in the extended form of Bayes’ theorem, we get the probability that X actually have the disease is just 9%.

Figure 4: Calculation using extended form of Bayes’ theorem

Person X doubts the result and goes for a second opinion to another doctor and gets tested from an independent laboratory. The second test result came back positive this time too. Now what is the probability that person X actually have the disease ?

P(H) = Replace this with the posterior probability from first test (we are refining the belief about the result of the first test) = 9.016% = 0.09016
P(E/H) = probability to test positive for the disease if person X has the disease (99% = 0.99)
P(H̅) = probability of NOT having the disease from first test (1-0.09016 = 0.90984)
P(E/H̅) = probability of NOT having the disease and falsely identified positive by the second test (1% = 0.01).
P(H/E) = probability of person X actually have the disease given the second test result is also positive.

Figure 5: Refining the belief about the first test using results from second test

Therefore, the updated probability based on two positive tests is 90.75%. This implies that there is a 90.75% chance that person X has the disease.

I hope the reader got a better understanding of what Bayes’ theorem is, various parameters in the equation for Bayes’ theorem and how to apply it.

Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.

References

[1] Jeremy Orloff and Jonathan Bloom, “Conditional Probability, Independence and Bayes’ Theorem”, MIT OCW, Class 3, 18.05 Introduction to Probability and Statistics ↗.
[2] Veritasium, “The Bayesian Trap”, YouTube

Books by the author


Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart

Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Checkout Added to cart
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Introduction to concepts in probability

Note: There is a rating embedded within this post, please visit this post to rate it.

What is Probability?

Probability is a branch of mathematics that deals with uncertainty. The term “probability” is used to quantify the degree of belief or confidence that something is true (or false). It gives us the likelihood of occurrence of a given event. It is expressed as a number that could take any value in the closed interval [0,1]

Consider the following experiment describing a simple communication system. A user transmits data through a noisy medium and another user receives it. Here, the sender utters a single alphabet on the phone. Due to the noise characteristics of the communication medium, we do not know whether the user at the destination will be able to hear what the sender has already spoken. Before performing the experiment, we would like to know the likelihood that the user at the destination hears the particular syllable (given the noise characteristics). This likelihood of the particular event is called probability of the event.

Experiment:

Any activity that can produce observable results is called an experiment. For example, tossing a coin (observable results: Head/Tail), Rolling a die (observable results: numbers on the faces of the die), drawing a card from a deck (observable results: symbols, numbers and alphabets on the cards), sending & receiving bits in a communication system (observable results: bits/alphabets transferred or voltage level at the receiver).

Sample Space:

Given an experiment, the sample space comprises a set of all possible outcomes of the experiment. It plays the role of the universal set when modeling the experiment. It is denoted by the letter – ‘S’. Following examples illustrate the sample spaces for various experiments.

Event:

It is also a set of outcomes of an experiment. It is a subset of sample space. Each time the experiment is run, either a particular event occurs or it does not occur. Events are associated with a probability number.

Types of Events:

Events can be classified according to their relationship with one another. The following table shows the classification of events and their definition.

Computing Probability:

The proability of the occurrence of an event (say ‘A’) is given by the ratio of number of ways that particular event can happen and the total number of all possible outcomes.

For example, consider the experiment of an unbiased rolling of a die. The sample space is given by S={1,2,3,4,5,6}. Let’s say that an event is defined as getting ‘4’ when you roll the die. The probability of getting the face with ‘4’ (event) can be calculated as follows.

Axioms of Probability:

Following definitions are assumed for the axioms listed below: ‘S’ denotes the sample space of an experiment, ‘A’ and ‘B’ are events and P(A) denotes the probability of occurrence of event ‘A’.

Properties of Probability:

The definition of probability – has some properties as listed below.


Here the symbol Ø indicates null event, Ā indicates that the event A is NOT occuring.

Joint probability and Marginal probability:

Joint probability is defined as the probability that two or more events occur simultaneously. For two events A and B, the joint probability is denoted by P(A,B) or P(A∩B).

Given two or more events, the marginal probability is the probability of occurrence of a single event. It is also called a-priori probability.

The following table illustrates the concept of computing the joint and marginal probabilities. Here, four events (P, Q, R, S) are used for illustration. For example, the table indicates that the probability of occurrence of both events R & Q is given by b/n. This is the joint probability of R and Q. Adding all the probabilities either row wise or column wise gives us the marginal probability of a single event. For example, adding a/n and b/n gives the marginal probability of event similarly, adding a/n and c/n gives the marginal probability of event P.

Conditional probability or Posteriori probability:

Conditional probabilities (also called posteriori probability) deal with dependent events. It is used to calculate the probability of an event given that some other event has already occurred.

It is denoted as P(B|A)–meaning that ‘the probability of event B given that the event A has occurred already’. It is called “a-posteriori” because it is only available “after” observing A (the first event).

The conditional probability P(B|A) is mathematically computed as

Recommended Books: