Conformal prediction

Conformal prediction (CP) is a set of algorithms devised to assess the uncertainty of predictions produced by a machine learning model. CP algorithms do this by computing and comparing nonconformity measure (often referred to as α-values), of examples from the training set, and compare these with measure computed for examples from a test set. Conformal predictors can be divided into inductive and transductive. These mainly differ in their computational complexity and whether they can be applied to regression or classification tasks. Inductive algorithms train one or several machine learning models which are re-used for future test objects, and can be used for both classification and regression tasks, whereas transductive algorithms re-train the model for every test object, and can only be used for classification tasks.

Conformal prediction requires a user-specified significance level for which the algorithm should produce its predictions. This significance level restricts the frequency of errors that the algorithm is allowed to make. For example, a significance level of 0.1 means that the algorithm can make at most 10% erroneous predictions. To meet this requirement, the output is a set prediction, instead of a point prediction produced by standard supervised machine learning models. For classification tasks, this means that predictions are not a single class, for example 'cat', but instead a set like {'cat', 'dog'}. Depending on how good is the underlying model (how well it can discern between cats, dogs and other animals) and the specified significance level, these sets can be smaller or larger. For regression tasks, the output is prediction intervals, where a smaller significance level (less allowed errors) produces wider intervals which are less specific, and vise versa – more allowed errors produces tighter prediction intervals.[1][2][3][4]

Theory

Conformal Prediction was first proposed by Vovk et.al in 2005. [1]A tutorial by Vovk and Shafter was published in 2008.[5] The data has to conform to some standards, such as data being exchangeable (a sightly weaker assumption than the standard IID imposed in standard machine learning). For conformal prediction, a n% prediction region is said to be valid if the truth is in the output n% of the time.[1] The efficiency is the size of the output. For classification, this size is the number of classes; for regression, it is interval width. [5]

In the purest form, conformal prediction is made for an online (transductive) section. That is, after a label is predicted, its true label is known before the next prediction. Thus, the underlying model can be re-trained using this new data point and the next prediction will be made on a calibration set containing n + 1 data points, where the previous model had n data points. [5]

Classification algorithms

The goal of standard classification algorithms is to classify a test object into one of several discrete classes. Conformal classifiers instead compute and output the p-value for each available class by performing a ranking of the nonconformity measure (α-value) of the test object against examples from the training data set. Similar to standard hypothesis testing, the p-value together with a threshold (referred to as significance level in the CP field) is used to determine whether the label should be in the prediction set. For example, for a significance level of 0.1, all classes with a p-value of 0.1 or greater are added to the prediction set. Transductive algorithms compute the nonconformity score using all available training data, while inductive algorithms compute it on a subset of the training set.

Inductive conformal prediction (ICP)

Inductive Conformal Prediction was first known as inductive confidence machines,[6] but was later re-introduced as ICP. It has gained popularity in practical settings because the underlying model does not need to be retrained for every new test example. This makes it interesting for any model that is heavy to train, such as neural networks.[7]

Mondrian inductive conformal prediction (MICP)

In MICP, the alpha values are class-dependent (Mondrian) and the underlying model does not follow the original online setting introduced in 2005. [2]

Training algorithm:

  1. Train a machine learning model (MLM)
  2. Run a calibration set through the MLM, save output from the chosen stage
    1. In deep learning, the softmax values are often used
  3. Use a non-conformity function to compute α-values
    1. A data point in the calibration set will result in an α-values for its true class

Prediction algorithm:

  1. For a test data point, generate a new α-values
  2. Find a p-value for each class of the data point
  3. If the p-value is greater than the significance level, include the class in the output[2]

Regression algorithms

Conformal prediction was initially formulated for the task of classification, but was later modified for regression. Unlike classification, which outputs p-values without a given significance level, regression requires a fixed significance level at prediction time in order to produce prediction intervals for a new test object. For classic conformal regression, there is no transductive algorithm. This is because it is impossible to postulate all possible labels for a new test object, because the label space is continuous. The available algorithms are all formulated in the inductive setting, which computes a prediction rule once and applies it to all future predictions.

Inductive conformal prediction (ICP)

All inductive algorithms require splitting the available training examples into two disjoint sets: one set used for training the underlying model (the proper training set) and one set for calibrating the prediction (the calibration set). In ICP, this split is done once, thus training a single ML model. If the split is performed randomly and that data is exchangeable, the ICP model is proven to be automatically valid (i.e. the error rate corresponds to the required significance level).

Training algorithm:

  1. Split the training data into proper training set and calibration set
  2. Train the underlying ML model using the proper training set
  3. Predict the examples from the calibration set using the derived ML model → ŷ-values
  4. Optional: if using a normalized nonconformity function
    1. Train the normalization ML model
    2. Predict normalization scores → 𝜺 -values
  5. Compute the nonconformity measures (α-values) for all calibration examples, using ŷ- and 𝜺-values
  6. Sort the nonconformity measure and generate nonconformity scores
  7. Save underlying ML model, normalization ML model (if any) and nonconformity scores

Prediction algorithm:

Required input: significance level (s)

  1. Predict the test object using the ML model → ŷt
  2. Optional: if using a normalized nonconformity function
    1. Predict the test object using normalization model → 𝜺t
  3. Pick the nonconformity score from the list of scores produced by the calibration set in training, corresponding to the significance level sαs
  4. Compute the prediction interval half width (d) from rearranging the nonconformity function and input αs (and optionally 𝜺) → d
  5. Output prediction interval (ŷd, ŷ + d) for the given significance level s

Split conformal prediction (SCP)

The SCP, often called aggregated conformal predictor (ACP), can be considered an ensemble of ICPs. SCP usually improves the efficiency of predictions (that is, it creates smaller prediction intervals) compared to a single ICP, but loses the automatic validity in the generated predictions.

A common type of SCPs is the cross-conformal predictor (CCP), which splits the training data into proper training and calibration sets multiple times in a strategy similar to k-fold cross-validation. Regardless of the splitting technique, the algorithm performs n splits and trains a ICP for each split. When predicting a new test object, it uses the median ŷ and d from the n ICPs to create the final prediction interval as (ŷmediandmedian, ŷmedian + dmedian).

Applications

Types of learning models

Several machine learning models can be used in conjunction with conformal prediction. Studies have shown that it can be applied to for example convolutional neural networks,[8] support-vector machines and others.

Data used

Conformal prediction is used in a variety of fields and is an active area of research. For example, in biotechnology it has been used to predict uncertainties in breast cancer,[9] stroke risks [10] and more. Within language technology, conformal prediction papers are routinely presented at COPA.[11]

Conferences

Conformal prediction is one of the main subjects discussed during the COPA-conference each year. Both theory and applications of conformal predictions are presented by leaders of the field. The conference has been held since 2012.[11] It has been hosted in several different European countries including Greece, Great Britain, Italy and Sweden.

References

  1. Vovk, Vladimir (2005). Algorithmic learning in a random world. A. Gammerman, Glenn Shafer. New York: Springer. ISBN 978-0-387-00152-4. OCLC 209818494.
  2. Toccaceli, Paolo; Gammerman, Alexander (2019-03-01). "Combination of inductive mondrian conformal predictors". Machine Learning. 108 (3): 489–510. doi:10.1007/s10994-018-5754-9. ISSN 1573-0565.
  3. Norinder, Ulf; Carlsson, Lars; Boyer, Scott; Eklund, Martin (2014-06-23). "Introducing Conformal Prediction in Predictive Modeling. A Transparent and Flexible Alternative to Applicability Domain Determination". Journal of Chemical Information and Modeling. 54 (6): 1596–1603. doi:10.1021/ci5001168. ISSN 1549-9596.
  4. Alvarsson, Jonathan; McShane, Staffan Arvidsson; Norinder, Ulf; Spjuth, Ola (2021-01-01). "Predicting With Confidence: Using Conformal Prediction in Drug Discovery". Journal of Pharmaceutical Sciences. 110 (1): 42–49. doi:10.1016/j.xphs.2020.09.055. ISSN 0022-3549. PMID 33075380.
  5. Vovk, Vladimir; Shafer, Glenn (2008-08-03). "A Tutorial on Conformal Prediction" (PDF). Journal of Machine Learning Research. 9: 371–421.
  6. Papadopoulos, Harris; Proedrou, Kostas; Vovk, Volodya; Gammerman, Alex (2002). Elomaa, Tapio; Mannila, Heikki; Toivonen, Hannu (eds.). "Inductive Confidence Machines for Regression". Machine Learning: ECML 2002. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer: 345–356. doi:10.1007/3-540-36755-1_29. ISBN 978-3-540-36755-0.
  7. Papadopoulos, Harris; Haralambous, Haris (2010). Diamantaras, Konstantinos; Duch, Wlodek; Iliadis, Lazaros S. (eds.). "Neural Networks Regression Inductive Conformal Predictor and Its Application to Total Electron Content Prediction". Artificial Neural Networks – ICANN 2010. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer: 32–41. doi:10.1007/978-3-642-15819-3_4. ISBN 978-3-642-15819-3.
  8. Papadopoulos, Harris; Vovk, Volodya; Gammerman, Alex (October 2007). "Conformal Prediction with Neural Networks". 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007). 2: 388–395. doi:10.1109/ICTAI.2007.47.
  9. Lambrou, A.; Papadopoulos, H.; Gammerman, A. (November 2009). "Evolutionary Conformal Prediction for Breast Cancer Diagnosis". 2009 9th International Conference on Information Technology and Applications in Biomedicine: 1–4. doi:10.1109/ITAB.2009.5394447.
  10. Lambrou, Antonis; Papadopoulos, Harris; Kyriacou, Efthyvoulos; Pattichis, Constantinos S.; Pattichis, Marios S.; Gammerman, Alexander; Nicolaides, Andrew (2010), Papadopoulos, Harris; Andreou, Andreas S.; Bramer, Max (eds.), "Assessment of Stroke Risk Based on Morphological Ultrasound Image Analysis with Conformal Prediction", Artificial Intelligence Applications and Innovations, Berlin, Heidelberg: Springer Berlin Heidelberg, vol. 339, pp. 146–153, doi:10.1007/978-3-642-16239-8_21, ISBN 978-3-642-16238-1, retrieved 2021-09-15
  11. "10th Symposium on Conformal and Probabilistic Prediction with Applications (COPA 2021)". cml.rhul.ac.uk. Retrieved 2021-09-15.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.