Fréchet inception distance
The Fréchet inception distance (FID) is a metric used to assess the quality of images created by a generative model, like a generative adversarial network (GAN).[1][2] Unlike the earlier inception score (IS), which evaluates only the distribution of generated images, the FID compares the distribution of generated images with the distribution of real images that were used to train the generator.[1][3]
The FID metric is the squared Wasserstein metric between two multidimensional Gaussian distributions: , the distribution of some neural network features of the images generated by the GAN and the distribution of the same neural network features from the "world" or real images used to train the GAN. As a neural network the Inception v3 trained on the ImageNet is commonly used. As a result, it can be computed from the mean and the covariance of the activations when the synthesized and real images are fed into the Inception network as: [1][3][4]
Rather than directly comparing images pixel by pixel (for example, as done by the L2 norm), the FID compares the mean and standard deviation of one of the deeper layers in a convolutional neural network named Inception v3. These layers are closer to output nodes that correspond to real-world objects such as a specific breed of dog or an airplane, and further from the shallow layers near the input image. As a result, they tend to mimic human perception of similarity in images.
The FID metric is the current standard metric for assessing the quality of GANs as of 2020. It has been used to measure the quality of many recent GANs[3] including the high-resolution StyleGAN1[5] and StyleGAN2[6] networks.
Variants
Specialized variants of FID have been suggested as evaluation metric for music enhancement algorithms as Fréchet Audio Distance (FAD),[7] for generative models of video as Fréchet Video Distance (FVD),[8] and for AI-generated molecules as Fréchet ChemNet Distance (FCD).[9]
History
The FID metric was introduced in 2017.[1] It is inspired by the metric introduced in 1957 by M. Frechet,[10] which was later generalized to the Wasserstein metric.
Limitations
Chong and Forsyth [11] showed FID to be statistically biased, in the sense that their expected value over a finite data is not their true value. Also, because FID measured the Wasserstein distance towards the ground-truth distribution, it is inadequate for evaluating the quality of generators in domain adaptation setups, or in zero-shot generation. Finally, while FID is more consistent with human judgement than previously used inception score, there are cases where FID is inconsistent with human judgment (e.g. Figure 3,5 in Liu et al.[12]).
See also
References
- Heusel, Martin; Ramsauer, Hubert; Unterthiner, Thomas; Nessler, Bernhard; Hochreiter, Sepp (2017). "GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium". Advances in Neural Information Processing Systems. 30. arXiv:1706.08500.
- Heusel, Martin; Ramsauer, Hubert; Unterthiner, Thomas; Nessler, Bernhard; Klambauer, Günter; Hochreiter, Sepp (2017-06-26). "GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium". arXiv:1706.08500 [cs.LG].
- Jean, Neal (15 July 2018). "Fréchet Inception Distance". Neal Jean. Retrieved 3 July 2020.
- Dowson, D. C; Landau, B. V (1 September 1982). "The Fréchet distance between multivariate normal distributions". Journal of Multivariate Analysis. 12 (3): 450–455. doi:10.1016/0047-259X(82)90077-X. ISSN 0047-259X.
- Karras, Tero; Laine, Samuli; Aila, Timo (2020). "A Style-Based Generator Architecture for Generative Adversarial Networks". IEEE Transactions on Pattern Analysis and Machine Intelligence. PP (12): 4217–4228. arXiv:1812.04948. doi:10.1109/TPAMI.2020.2970919. PMID 32012000. S2CID 211022860.
- Karras, Tero; Laine, Samuli; Aittala, Miika; Hellsten, Janne; Lehtinen, Jaakko; Aila, Timo (23 March 2020). "Analyzing and Improving the Image Quality of StyleGAN". arXiv:1912.04958 [cs.CV].
- Kilgour, Kevin; Zuluaga, Mauricio; Roblek, Dominik; Sharifi, Matthew (2019-09-15). "Fréchet Audio Distance: A Reference-Free Metric for Evaluating Music Enhancement Algorithms". Interspeech 2019: 2350–2354. doi:10.21437/Interspeech.2019-2219. S2CID 202725406.
- Unterthiner, Thomas; Steenkiste, Sjoerd van; Kurach, Karol; Marinier, Raphaël; Michalski, Marcin; Gelly, Sylvain (2019-03-27). "FVD: A new Metric for Video Generation". Open Review.
- Preuer, Kristina; Renz, Philipp; Unterthiner, Thomas; Hochreiter, Sepp; Klambauer, Günter (2018-09-24). "Fréchet ChemNet Distance: A Metric for Generative Models for Molecules in Drug Discovery". Journal of Chemical Information and Modeling. 58 (9): 1736–1741. arXiv:1803.09518. doi:10.1021/acs.jcim.8b00234. PMID 30118593. S2CID 51892387.
- Fréchet., M (1957). "Sur la distance de deux lois de probabilité". C. R. Acad. Sci. Paris. 244: 689–692.
- Chong, Min Jin; Forsyth, David (2020-06-15). "Effectively Unbiased FID and Inception Score and where to find them". arXiv:1911.07023 [cs.CV].
- Liu, Shaohui; Wei, Yi; Lu, Jiwen; Zhou, Jie (2018-07-19). "An Improved Evaluation Framework for Generative Adversarial Networks". arXiv:1803.07474 [cs.CV].