Keywords: Markov Chain, Python, probability, data analysis, data science
Markov Chain
Markov chain is a probabilistic models that describe a sequence of observations whose occurrence are statistically dependent only on the previous ones. This article is about implementing Markov chain in Python
We will model a car’s behavior using the same transition matrix and starting probabilities described in the earlier post for modeling the corresponding Markov chain model (refer Figure 1). The matrix defines the probabilities of transitioning between different states, including accelerating, maintaining a constant speed, idling, and braking.
The starting probabilities indicate that the car starts in the break state with probability 1, which means it is already stopped and not moving.
Python implementation
Here’s the sample code in Python that implements the above model:
import random
# Define a transition matrix for the Markov chain
transition_matrix = {
'accelerate': {'accelerate': 0.3, 'constant speed': 0.2, 'idling': 0 , 'break': 0.5 },
'constant speed': {'accelerate': 0.1, 'constant speed': 0.4, 'idling': 0 , 'break': 0.5 },
'idling': {'accelerate': 0.8, 'constant speed': 0, 'idling': 0.2 , 'break': 0 },
'break': {'accelerate': 0.4, 'constant speed': 0.05, 'idling': 0.5 , 'break': 0.05 },
}
# Define starting probabilities for each state
starting_probabilities = {'accelerate': 0, 'constant speed': 0, 'idling': 0, 'break': 1}
# Choose the starting state randomly based on the starting probabilities
current_state = random.choices(
population=list(starting_probabilities.keys()),
weights=list(starting_probabilities.values())
)[0]
# Generate a sequence of states using the transition matrix
num_iterations = 10
for i in range(num_iterations):
print(current_state)
next_state = random.choices(
population=list(transition_matrix[current_state].keys()),
weights=list(transition_matrix[current_state].values())
)[0]
current_state = next_state
In this example, we use the random.choices() function to choose the starting state randomly based on the starting probabilities. We then generate a sequence of 10 states using the transition matrix, and print out the sequence of states as they are generated. A sample output of the program is given below.
Markov chainsare useful in computing the probability of events that are observable. However, in many real world applications, the events that we are interested in are usually hidden, that is we don’t observe them directly. These hidden events need to be inferred. For example, given a sentence in a natural language we only observe the words and characters directly. The parts-of-speech from the sentence are hidden, they have to be inferred. This bring us to the following topic– the hidden Markov models.
Hidden Markov models enables us to visualize both observations and the associated hidden events. Let’s consider an example for understanding the concept.
The cheating casino and the gullible gambler
Consider a dishonest casino that deceives it player by using two types of die : a fair die (F) and a loaded die (L). For a fair die, each of the faces has the same probability of landing facing up. For the loaded die, the probabilities of the faces are skewed as given next
When the gambler throws the die, numbers land facing up. These are our observations at a given time t (denoted as Ot = {1,2,3,4,5,6}). At any given time t, whether these number are rolled from a fair die (state St = F) or a loaded die (St = L), is unknown to an observer and therefore they are the hidden events.
Emission probabilities
The probabilities associated with the observations are the observation likelihoods, also called emission probabilities (B).
Initial probabilities
The initial probability of starting (at time = 0) with any of fair die or loaded die (hidden event) is 50%.
Transition probabilities
A gullible gambler switches from the fair die to loaded die with 10% probability. He switches back from loaded die to fair die with 5% probability.
The probabilities of transitioning from one hidden event to another is described by the transition probability matrix (A). The elements of the probability transition matrix, are the transition probabilities (pij) of moving from one hidden state i to another hidden state j.
The transition probabilities from time t-1 to t, for the hidden events are
Therefore, the transition probability matrix is
Based on the given information so far, a probability model is constructed. This is the Hidden Markov Model (HMM) for the given problem.
In the model given here, the probability of a given hidden state depends only on the previous hidden state. This is a typical first order Markov chain assumption.
2. Assumption on Output
The probability of any observation (output) depends on the hidden state that produce it and not on any other hidden state or output observations.
Problems and Algorithms
Let’s briefly discuss the different problems and the related algorithms for HMMs. The algorithms will be explained in detail in the future articles.
In the dishonest casino, the gambler rolls the following numbers:
1. Evaluation
Given the model of the dishonest casino, what is the probability of obtaining the above sequence ? This is a typical evaluation problem in HMMs. Forward algorithm is applied for such evaluation problems.
2. Decoding
What is the most likely sequence of die (hidden states) given the above sequence ? Such problems are addressed by Viterbi decoding.
What is the probability of fourth die being loaded, given the above sequence ? Forward-backward algorithm to our rescue.
3. Learning
Learning problems involve parametrization of the model. In learning problems, we attempt to find the various parameters (transition probabilities, emission probabilities) of the HMM, given the observation. Baum-Welch algorithm helps us to find the unknown parameters of a HMM.
Some real-life examples
Here are some real-life examples of HMM applications:
Speech recognition: HMMs are widely used in speech recognition systems to model the variability of speech sounds. In this application, the observable events are the acoustic features of the speech signal, while the hidden states represent the phonemes or words that generate the speech signal.
Handwriting recognition: HMMs can be used to recognize handwritten characters by modeling the temporal variability of the pen strokes. In this application, the observable events are the coordinates of the pen on the writing surface, while the hidden states represent the letters or symbols that generate the handwriting.
Stock price prediction: HMMs can be used to model the behavior of stock prices and predict future price movements. In this application, the observable events are the daily price movements, while the hidden states represent the different market conditions that generate the price movements.
Gene prediction: HMMs can be used to identify genes in DNA sequences. In this application, the observable events are the nucleotides in the DNA sequence, while the hidden states represent the different regions of the genome that generate the sequence.
Natural language processing: HMMs are used in many natural language processing tasks, such as part-of-speech tagging and named entity recognition. In these applications, the observable events are the words in the text, while the hidden states represent the grammatical structures or semantic categories that generate the text.
Image and video analysis: HMMs can be used to analyze images and videos, such as for object recognition and tracking. In this application, the observable events are the pixel values in the image or video, while the hidden states represent the object or motion that generates the pixel values.
Bio-signal analysis: HMMs can be used to analyze physiological signals, such as electroencephalograms (EEGs) and electrocardiograms (ECGs). In this application, the observable events are the signal measurements, while the hidden states represent the physiological states that generate the signal.
Radar signal processing: HMMs can be used to process radar signals and detect targets in noisy environments. In this application, the observable events are the radar measurements, while the hidden states represent the presence or absence of targets.
Rate this post: Note: There is a rating embedded within this post, please visit this post to rate it.
Key focus: Markov chains are a probabilistic models that describe a sequence of observations whose occurrence are statistically dependent only on the previous ones.
● Time-series data like speech, stock price movements. ● Words in a sentence. ● Base pairs on the rung of a DNA ladder.
States and transitions
Assume that we want to model the behavior of a driver behind the wheel. The possible behaviors are
● accelerate (state 1) ● constant speed (state 2) ● idling (engine running slowly but the vehicle is not moving – (state 3)) ● brake (state 4)
Let’s refer each of these behaviors as a state. In the given example, there are N=4 states, refer them as Q = {q1,q2,q3,q4}.
We observe the following pattern in the driver’s behavior (Figure 1). That is, the driver operates the vehicle through a certain sequence of states. In the graph shown in Figure 1, the states are represented as nodes and the transitions as edges.
We see that, sometimes, the driver changes the state of the vehicle from one state to another and sometimes stays in the same state (as indicated by the arrows).
We also note that either the vehicle stays in the same state or changes to the next state. Therefore, from this model, if we want to predict the future state, all that matters is the current state of the vehicle. The past states has no bearing on the future state except through the current state. Take note of this important assumption for now.
Probabilistic model
We know that we cannot be certain about the driver’s behavior at any given point in time. Therefore, to model this uncertainty, the model is turned into a probabilistic model. A probabilistic model allows us to account for the likelihood of the behaviors or change of states.
An example for a probabilistic model for the given problem is given in Figure 2.
In this probabilistic model, we have assigned probability values to the transitions.These probabilities are collectively called transition probabilities. For example, considering the state named “idling”, the probability of the car to transition from this state to the next state (accelerate) is 0.8. In probability mathematics this is expressed as a conditional probability conditioned on the previous state.
Usually, the transition probabilities are formulated in the form of matrix called transition probability matrix. The transition probability matrix is shown in Figure 2. In a transition matrix, denoted as A, each element aij represent the probability of transitioning from state i to state j. The elements of the transition matrix satisfy the following property.
That is, the sum of transition probabilities leaving any given state is 1.
As we know, in this example, the driver cannot start car in any state (example, it is impossible to start the car in “constant speed” state). He can only start the car from at rest (i.e, brake state). To model this uncertainty, we introduce πi – the probability that the Markov chain starts in a given state i. The set of starting probabilities for all the N states are called initial probability distribution (π = π1, π2, …, πN). In Figure 3, the starting probabilities are denoted by green arrows.
Markov Assumption
As noted in the definition, the Markov chain in this example, assumes that the occurrence of each event/observation is statistically dependent only on the previous one. This is a first order Markov chain (or termed as bigram language model in natural language processing application). For the states Q = {q1, …, qn}, predicting the probability of a future state depends only on the current observation, all other previous observations do not matter. In probabilistic terms, this first order Markov chain assumption is denoted as
Extending the assumption for mthorder Markov chain, predicting the probability of a future observation depends only on the previous m observations. This is an m-gram model.
Given a set of n arbitrary random variables/observations Q = {q1, …, qn}, their joint probability distribution is usually computed by applying the following chain rule.
However, if the random observations in Q are of sequential in nature and follows the generic mth order Markov chain model, then the computation of joint probability gets simplified.
The Markov assumptions for first and second order of Markov models are summarized in Figure 4.Figure 4: Assumptions for 1st order and 2nd order Markov chains
Hidden Markov Model (HMM)
Markov chains are useful in computing the probability of events that are observable. However, in many real world applications, the events that we are interested in are usually hidden, that is we don’t observe them directly. These hidden events need to be inferred. For example, given a sentence in a natural language we only observe the words and characters directly. The parts-of-speech from the sentence are hidden, they have to be inferred. This brings us to the next topic of discussion – the hidden Markov models.
Rate this post: Note: There is a rating embedded within this post, please visit this post to rate it.
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
cookielawinfo-checbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional
11 months
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-functional
11 months
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checbox-others
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy
11 months
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.