Score, Fisher Information and Estimator Sensitivity

As we have seen in the previous articles, that the estimation of a parameter from a set of data samples depends strongly on the underlying PDF. The accuracy of the estimation is inversely proportional to the variance of the underlying PDF. That is, less the variance of PDF more is the accuracy of estimation and vice versa. In other words, the estimation accuracy depends on the sharpness of the PDF curve. Sharper the PDF curve more is the accuracy.

Gradient and score :

In geometry, given any curve, the gradient (also called slope) of the curve is zero at maximum and minimum points of the curve. Gradient of a function (representing a curve) is calculated by its first derivative. The gradient of log likelihood function is called score and is used to find Maximum Likelihood estimate of a parameter.

The gradient of log likelihood function is called score
Figure: The gradient of log likelihood function is called score

Denoting the score as u(θ),

u(\theta) =\frac{\partial }{\partial \theta} \left[ ln L(\theta) \right ]

At the MLE point, where the true value of the parameter θ is equal to the ML estimate \hat{\theta} the gradient is zero. Thus equating the score to zero and finding the corresponding \hat{\theta} gives the ML estimate of θ (provided the log likelihood function is a concave curve).

u(\hat{\theta}) = 0

Curvature and Fisher Information :

In geometry, the sharpness of a curve is measured by its Curvature. The sharpness of a PDF curve is influenced by its variance. More the variance less is the sharpness and vice versa. The accuracy of the estimator is measure by the sharpness of the underlying PDF curve. In differential geometry, the curvature is related to second derivative of a function.

The mean of the score evaluated at ML estimate (or true value of estimate) θ is zero. This gives,

E[u(\theta)] = 0

Under this regularity condition that the expectation of the score is zero, the variance of the score is called Fisher Information. That is the expectation of second derivative of log likelihood function is called Fisher Information. It measures the sharpness of the log likelihood function. More the value of Fisher Information; more is the sharpness of the curve and vice versa. So if we can calculate the Fisher Information of a log likelihood function, then we can know more about the accuracy or sensitivity of the estimator with respect to the parameter to be estimated.

The variance of the score is called Fisher Information
Figure 2: The variance of the score is called Fisher Information

The Fisher Information denoted by I(θ) is given by the variance of the score.

\displaystyle{\begin{aligned} I(\theta) &= -var \left[ u (\theta) \right] \\ &= - E \left[ u (\theta) u^\ast (\theta) \right] \\ &= - E \left[ \frac{\partial^2 \; ln \left[ L(\theta) \right] }{\partial \theta^2} \right] \end{aligned}}

Here the operator indicates the operation of taking complex conjugate. The negative sign in the above equation is introduced to bring inverse relationship between variance and the Fisher Information (i.e. Fisher Information will be high for log likelihood functions that have low variance). As we can see from the above equation, that the Fisher Information is related to the second derivative (Curvature or Sharpness) of the log likelihood function. The I(θ) computed above is also called Observed Fisher Information.

Rate this article: PoorBelow averageAverageGoodExcellent (8 votes, average: 4.13 out of 5)

For further reading

[1] Songfeng Zheng, “Fisher Information and Cramer-Rao Bound”, lecture notes, Statistical Theory II, Missouri State University.↗

Topics in this series

[1]An Introduction to Estimation Theory
[2]Bias of an Estimator
[3]Minimum Variance Unbiased Estimators (MVUE)
[4]Maximum Likelihood Estimation
[5]Maximum Likelihood Decoding
[6]Probability and Random Process
[7]Likelihood Function and Maximum Likelihood Estimation (MLE)
[8]Score, Fisher Information and Estimator Sensitivity
[9]Introduction to Cramer Rao Lower Bound (CRLB)
[10]Cramer Rao Lower Bound for Scalar Parameter Estimation
[11]Applying Cramer Rao Lower Bound (CRLB) to find a Minimum Variance Unbiased Estimator (MVUE)
[12]Efficient Estimators and CRLB
[13]Cramer Rao Lower Bound for Phase Estimation
[14]Normalized CRLB - an alternate form of CRLB and its relation to estimator sensitivity
[15]Cramer Rao Lower Bound (CRLB) for Vector Parameter Estimation
[16]The Mean Square Error – Why do we use it for estimation problems
[17]How to estimate unknown parameters using Ordinary Least Squares (OLS)
[18]Essential Preliminary Matrix Algebra for Signal Processing
[19]Why Cholesky Decomposition ? A sample case:
[20]Tests for Positive Definiteness of a Matrix
[21]Solving a Triangular Matrix using Forward & Backward Substitution
[22]Cholesky Factorization - Matlab and Python
[23]LTI system models for random signals – AR, MA and ARMA models
[24]Comparing AR and ARMA model - minimization of squared error
[25]Yule Walker Estimation
[26]AutoCorrelation (Correlogram) and persistence – Time series analysis
[27]Linear Models - Least Squares Estimator (LSE)
[28]Best Linear Unbiased Estimator (BLUE)

Books by the author

Wireless Communication Systems in Matlab
Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Digital modulations using Python
Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
digital_modulations_using_matlab_book_cover
Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

Post your valuable comments !!!