von Mises–Fisher distribution

In directional statistics, the von Mises–Fisher distribution (named after Richard von Mises and Ronald Fisher), is a probability distribution on the $(p-1)$ -sphere in $\mathbb {R} ^{p}$ . If $p=2$ the distribution reduces to the von Mises distribution on the circle.

Definition

The probability density function of the von Mises–Fisher distribution for the random p-dimensional unit vector $\mathbf {x}$ is given by:

f_{p}(\mathbf {x

where $\kappa \geq 0,\left\Vert {\boldsymbol {\mu }}\right\Vert =1$ and the normalization constant $C_{p}(\kappa )$ is equal to

C_{p}(\kappa )={\frac {\kappa ^{p/2-1}}{(2\pi )^{p/2}I_{p/2-1}(\kappa )}},

where $I_{v}$ denotes the modified Bessel function of the first kind at order $v$ . If $p=3$ , the normalization constant reduces to

C_{3}(\kappa )={\frac {\kappa }{4\pi \sinh \kappa }}={\frac {\kappa }{2\pi (e^{\kappa }-e^{-\kappa })}}.

The parameters ${\boldsymbol {\mu }}$ and $\kappa$ are called the mean direction and concentration parameter, respectively. The greater the value of $\kappa$ , the higher the concentration of the distribution around the mean direction ${\boldsymbol {\mu }}$ . The distribution is unimodal for $\kappa >0$ , and is uniform on the sphere for $\kappa =0$ .

The von Mises–Fisher distribution for $p=3$ is also called the Fisher distribution.[1][2] It was first used to model the interaction of electric dipoles in an electric field.[3] Other applications are found in geology, bioinformatics, and text mining.

Relation to normal distribution

Starting from a normal distribution, with isotropic covariance, $\kappa ^{-1}\mathbf {I}$ , and a mean, ${\boldsymbol {\mu }}$ of length $r>0$ , that has the density:

G_{p}(\mathbf {x

the Von Mises-Fisher distribution is obtained by conditioning on $\left\|\mathbf {x} \right\|=1$ . By expanding

(\mathbf {x} -{\boldsymbol {\mu }})'(\mathbf {x} -{\boldsymbol {\mu }})=\mathbf {x} '\mathbf {x} +{\boldsymbol {\mu }}'{\boldsymbol {\mu }}-2{\boldsymbol {\mu }}'\mathbf {x} ,

and using the fact that the first two right-hand-side terms are fixed, the Von Mises-Fisher density, $f_{p}(\mathbf {x} ;r^{-1}{\boldsymbol {\mu }},r\kappa )$ is recovered by recomputing the normalization constant by integrating $\mathbf {x}$ over the unit sphere. If $r=0$ , we get the uniform distribution, with density $f_{p}(\mathbf {x$ .

More succinctly, the restriction of any isotropic multivariate normal density to the unit hypershere, gives a Von Mises-Fisher density, up to normalization.

This construction can be generalized by starting with a normal distribution with a general covariance matrix, in which case conditioning on $\left\|\mathbf {x} \right\|=1$ gives the Fisher-Bingham distribution.

Estimation of parameters

Mean direction

A series of N independent unit vectors $x_{i}$ are drawn from a von Mises–Fisher distribution. The maximum likelihood estimates of the mean direction $\mu$ is simply the normalized arithmetic mean, a sufficient statistic:[3]

\mu ={\bar {x}}/{\bar {R}},{\text{where }}{\bar {x}}={\frac {1}{N}}\sum _{i}^{N}x_{i},{\text{and }}{\bar {R}}=\|{\bar {x}}\|,

Concentration parameter

Use the Bessel function of the first kind to define

A_{p}(\kappa )={\frac {I_{p/2}(\kappa )}{I_{p/2-1}(\kappa )}}.

Then:

\kappa =A_{p}^{-1}({\bar {R}}).

Thus $\kappa$ is the solution to

A_{p}(\kappa )={\frac {\left\|\sum _{i}^{N}x_{i}\right\|}{N}}={\bar {R}}.

A simple approximation to $\kappa$ is (Sra, 2011)

{\hat {\kappa }}={\frac {{\bar {R}}(p-{\bar {R}}^{2})}{1-{\bar {R}}^{2}}},

A more accurate inversion can be obtained by iterating the Newton method a few times

{\hat {\kappa }}_{1}={\hat {\kappa }}-{\frac {A_{p}({\hat {\kappa }})-{\bar {R}}}{1-A_{p}({\hat {\kappa }})^{2}-{\frac {p-1}{\hat {\kappa }}}A_{p}({\hat {\kappa }})}},

{\hat {\kappa }}_{2}={\hat {\kappa }}_{1}-{\frac {A_{p}({\hat {\kappa }}_{1})-{\bar {R}}}{1-A_{p}({\hat {\kappa }}_{1})^{2}-{\frac {p-1}{{\hat {\kappa }}_{1}}}A_{p}({\hat {\kappa }}_{1})}}.

Standard error

For N ≥ 25, the estimated spherical standard error of the sample mean direction can be computed as:[4]

{\hat {\sigma }}=\left({\frac {d}{N{\bar {R}}^{2}}}\right)^{1/2}

where

d=1-{\frac {1}{N}}\sum _{i}^{N}\left(\mu ^{T}x_{i}\right)^{2}

It is then possible to approximate a $100(1-\alpha )\%$ a spherical confidence interval (a confidence cone) about $\mu$ with semi-vertical angle:

q=\arcsin \left(e_{\alpha }^{1/2}{\hat {\sigma }}\right),

where

e_{\alpha }=-\ln(\alpha ).

For example, for a 95% confidence cone, $\alpha =0.05,e_{\alpha }=-\ln(0.05)=2.996,$ and thus $q=\arcsin(1.731{\hat {\sigma }}).$

Expected value

The expected value of the Von Mises-Fisher distribution is not on the unit hyperpshere, but instead has a length of less than one. This length is given by $A_{p}(\kappa )$ as defined above. For a Von Mises-Fisher distribution with mean direction ${\boldsymbol {\mu }}$ and concentration $\kappa >0$ , the expected value is:

A_{p}(\kappa ){\boldsymbol {\mu }}

.

For $\kappa =0$ , the expected value is at the origin. For finite $\kappa >0$ , the length of the expected value, is strictly between zero and one and is a monotonic rising function of $\kappa$ .

The empirical mean (arithmetic average) of a collection of points on the unit hypersphere behaves in a similar manner, being close to the origin for widely spread data and close to the sphere for concentrated data. Indeed, for the Von Mises-Fisher distribution, the expected value of the maximum-likelihood estimate based on a collection of points is equal to the empirical mean of those points.

Entropy and KL divergence

The expected value can be used to compute differential entropy and KL divergence.

The differential entropy of $f_{p}(\mathbf {x$ is:

-\log f_{p}(A_{p}(\kappa ){\boldsymbol {\mu }};{\boldsymbol {\mu }},\kappa )=-\log C_{p}(\kappa )-\kappa A_{p}(\kappa )

.

Notice that the entropy is a function of $\kappa$ only.

The KL divergence between $f_{p}(\mathbf {x$ and $f_{p}(\mathbf {x$ is:

\log {\frac {f_{p}(A_{p}(\kappa _{0}){\boldsymbol {\mu _{0}}};{\boldsymbol {\mu _{0}}},\kappa _{0})}{f_{p}(A_{p}(\kappa _{0}){\boldsymbol {\mu _{0}}};{\boldsymbol {\mu _{1}}},\kappa _{1})}}

Generalizations

The matrix von Mises-Fisher distribution (also known as matrix Langevin distribution[5][6]) has the density

f_{n,p}(\mathbf {X

supported on the Stiefel manifold of $n\times p$ orthonormal p-frames $\mathbf {X}$ , where $\mathbf {F}$ is an arbitrary $n\times p$ real matrix.[7][8]

Distribution of polar angle

For $p=3$ , the angle θ between $\mathbf {x}$ and ${\boldsymbol {\mu }}$ satisfies $\cos \theta ={\boldsymbol {\mu }}^{\mathsf {T}}\mathbf {x}$ . It has the distribution

p(\theta )=\int d^{2}xf(x;{\boldsymbol {\mu }},\kappa )\,\delta \left(\theta -{\text{arc cos}}({\boldsymbol {\mu }}^{\mathsf {T}}\mathbf {x} )\right)

,

which can be easiliy evaluated as

p(\theta )=2\pi C_{3}(\kappa )\,\sin \theta \,e^{\kappa \cos \theta }

.

References

Fisher, R. A. (1953). "Dispersion on a sphere". Proc. Roy. Soc. Lond. A. 217 (1130): 295–305. Bibcode:1953RSPSA.217..295F. doi:10.1098/rspa.1953.0064. S2CID 123166853.
Watson, G. S. (1980). "Distributions on the Circle and on the Sphere". J. Appl. Probab. 19: 265–280. doi:10.2307/3213566. JSTOR 3213566.
Mardia, Kanti; Jupp, P. E. (1999). Directional Statistics. John Wiley & Sons Ltd. ISBN 978-0-471-95333-3.
Embleton, N. I. Fisher, T. Lewis, B. J. J. (1993). Statistical analysis of spherical data (1st pbk. ed.). Cambridge: Cambridge University Press. pp. 115–116. ISBN 0-521-45699-1.
Pal, Subhadip; Sengupta, Subhajit; Mitra, Riten; Banerjee, Arunava (2020). "Conjugate Priors and Posterior Inference for the Matrix Langevin Distribution on the Stiefel Manifold". Bayesian Analysis. 15 (3): 871–908. doi:10.1214/19-BA1176. ISSN 1936-0975. Retrieved 10 July 2021.
Chikuse, Yasuko (1 May 2003). "Concentrated matrix Langevin distributions". Journal of Multivariate Analysis. 85 (2): 375–394. doi:10.1016/S0047-259X(02)00065-9. ISSN 0047-259X.
Jupp (1979). "Maximum likelihood estimators for the matrix von Mises-Fisher and Bingham distributions". The Annals of Statistics. 7 (3): 599–606. doi:10.1214/aos/1176344681.
Downs (1972). "Orientational statistics". Biometrika. 59 (3): 665–676. doi:10.1093/biomet/59.3.665.