qumphy.uq_metrics module
File: qumphy/uq_metrics.py Project: QUMPHY Contact: vivek.desai@npl.co.uk Gitlab: https://gitlab.com/qumphy Description: Evaluation metrics for assessing model calibration for classification and regression tasks.
- qumphy.uq_metrics.adaptive_calibration_error(predictions, targets, *args, range_number=15, threshold=0.0)[source]
Calculate the Adaptive Calibration Error (ACE) for the predicted class probabilities. This metric will work for both the binary and multiclass classification cases, with the option to apply thresholding as done in the original paper.
References:
The ACE is introduced in the following paper: <https://arxiv.org/pdf/1904.01685>. This implementation is heavily inspired by the more general calibration error functions in: <https://github.com/JeremyNixon/uncertainty-metrics-1/tree/master>. We present a simplified version here, focused on the ACE alone.
Parameters:
- predictions: np.ndarray
Model output probabilities with shape (n, m) for classification. Note m >= 2 i.e. probabilities given for all classes.
- targets: np.ndarray
Target class labels with shape (n, 1)
- *args:
Used to catch any un-used arguments in the main UQ evaluation notebook.
- range_number: int, optional
Number of ranges to be used (note: ranges are equal frequency bins in the mean confidence axis), by default 15
- threshold: float, optional
Threshold value below which probabilities are ignored, by default 0.0 for the ACE i.e. no thresholding.
Returns:
:
- ACE: float
The ACE metric value.
- qumphy.uq_metrics.continuous_ranked_probability_score(predictions, targets, uncertainties, converter='Gaussian', distributions=None, is_var=True)[source]
Calculate the Continuous Ranked Probability Score (CRPS) from the given model predictions, uncertainties, and corresponding ground truths. The option is given to calculate the CRPS either using the analytical expression given the model predictions parameterising a Gaussian, or using numerical integration. The numerical integration method for a distribution estimated with KDE yields errors of <1% compared to the analytical expression for a Gaussian.
- Return type:
float
References:
The CRPS and its analytical expression for a univariate Gaussian is given in Gneiting et al., 2007: <https://sites.stat.washington.edu/raftery/Research/PDF/Gneiting2007jasa.pdf>.
Parameters:
- predictions (np.ndarray):
An array of the model predictions with shape (n, 1).
- targets (np.ndarray):
Array of target values with shape (n, 1).
- uncertainties (np.ndarray):
Array of the total uncertainties associated with the model predictions from a chosen UQ method, with shape (n, 1).
- converter (str, optional):
Specify whether the converter used was “Gaussian” or “KDE”; evaluates CRPS using numerical integration of CDF if “KDE”, otherwise uses analytical expression for Gaussian. Defaults to “Gaussian”.
- distributions (List[distribution objects], optional):
Option to include the list of distributions as a direct input - needed if input is from KDE conversion of prediction interval to distribution. Defaults to None.
- is_var (bool, optional):
Boolean parameter that sets whether to convert the uncertainties from variances to standard deviations. Defaults to True.
Returns:
:
- CRPS (float):
The average CRPS value across the dataset.
- qumphy.uq_metrics.coverage_calibration_error(predictions, targets, uncertainties, filepath=None, is_var=True, save_fig=False, number_of_intervals=10)[source]
Calculate the confidence interval-based metric for model predictions and their corresponding uncertainties. If uncertainties are given as variances, they are converted to standard deviations.
Parameters:
- predictions: np.ndarray
Array of model predictions with shape (n, 1)
- targets: np.ndarray
Array of target values with shape (n, 1)
- uncertainties: np.ndarray
Array of total uncertainties (aleatoric + epistemic) with shape (n, 1)
- filepath: str
Path to directory to save the coverage calibration curve.
- is_var: bool, optional
Boolean parameter that determines whether to convert the uncertainties from variances to standard deviations, by default True
- save_fig: bool, optional
Boolean parameter that sets whether to save the plot to the filepath, by default True.
- number_of_intervals: int, optional
The number of confidence intervals to evaluate the calibration error on, by default 10 (range 0 to 1 in steps of 0.1)
Returns:
:
- calibration_error: float
The confidence interval-based calibration error metric
- qumphy.uq_metrics.entropy(p)[source]
Calculate the normalised entropy of the probabilities array.
- Return type:
ndarray
Parameters:
- p (np.ndarray):
Probabilities from classifier.
Returns:
- :
- ents (np.ndarray):
Entropy values for each sample in the probability array.
- qumphy.uq_metrics.expected_calibration_error(predictions, targets, filepath=None, save_fig=False, bin_number=15)[source]
Calculate the Expected Calibration Error (ECE) for the predicted class model probabilities for classification tasks.
References:
The script below is heavily based on the following implementations of the ECE: <https://medium.com/towards-data-science/expected-calibration-error-ece-a-step-by-step-visual-explanation-with-python-code-c3e9aa12937d>
and
<https://github.com/gpleiss/temperature_scaling/blob/master/temperature_scaling.py>.
There exists alternative packages that calculate an ECE value, such as the netcal package. Depending on the implementation of binning in each package, the ECE values differ by small amounts (~1e-3). In the QUMPHY evaluation framework, the quoted ECE value is determined using this function.
Parameters:
- predictions: np.ndarray
Model output probabilities with shape (n, k) for k-class classification.
- targets: np.ndarray
Target class labels with shape (n, 1).
- filepath: str
The filepath for the directory to save the reliability diagram to.
- save_fig: bool
Boolean to determine whether to save the reliability diagram to the given filepath.
- bin_number: int, optional
Number of bins to be used (note: bins are equal width in the mean confidence axis), by default 15.
Returns:
:
- ECE: float
ECE value for the input data
- qumphy.uq_metrics.expected_cumulative_calibration_errors(predictions, targets, filepath, save_fig=True)[source]
Calculating the ECCE-MAD (aka Kolomogorv-Smirnov statistic) and ECCE-R (aka Kuiper statistic) metrics. The saved plot shows the deviation from zero of the cumulative errors.
References:
The ECCE metrics are introduced in the following: <https://arxiv.org/pdf/2205.09680>. This implementation is from the cumulative function in: <https://github.com/facebookresearch/ecevecce/blob/main/codes/calibration.py>.
The alterations to the cumulative function simply involve editing the format of the inputs to the function to be consistent with the other metrics. In addition, the normalised ECCE_MAD and ECCE_R metrics are given, as this is easier to compare with the target limits of convergence under the null hypothesis of perfect calibration.
Parameters:
- predictions: np.ndarray
Model output probabilities with shape (n, 2) for binary classification
- targets: np.ndarray
Target class labels with shape (n, 1)
- filepath: str
Path to directory to save the plot of the cumulative calibration errors.
- save_fig: bool
Boolean to determine whether to save the reliability diagram to the given filepath.
Returns:
:
- norm_ECCE-MAD: float
The maximum absolute deviation from zero statistic (Kolmogorov-Smirnov) for the cumulative differences between the confidences and outcomes
- norm_ECCE-R: float
The range of deviation from zero statistic (Kuiper) for the cumulative differences between the confidences and outcomes
- statistical_significance_scale: float
Normalisation factor that is used to assess statistically significant deviations from the asymptotic expected values (under large sample size n)
- qumphy.uq_metrics.expected_normalised_calibration_error(predictions, targets, uncertainties, filepath=None, is_var=False, save_fig=False, bin_number=15)[source]
Calculating the Expected Normaised Calibration Error (ENCE) for the model predictions and the corresponding total uncertainties.
Parameters:
- predictions: np.ndarray
An array of the model predictions with shape (n, 1)
- targets: np.ndarray
Array of target values with shape (n, 1)
- uncertainties: np.ndarray
Array of the total uncertainties associated with the model predictions from a chosen UQ method, with shape (n, 1)
- filepath: str
Path of directory to save the reliability diagram.
- is_var: bool, optional
Boolean parameter that sets whether to convert the uncertainties to standard deviations, by default True
- save_fig: bool, optional
Boolean parameter that sets whether to save the plot to the filepath, by default True.
- bin_number: int, optional
Chosen number of bins for calculating ENCE (note: the bins are defined to have an equal number of samples per bin), by default 15
Returns:
:
- ENCE: float
The value of the ENCE calculated for the given inputs
- qumphy.uq_metrics.flatten_list_of_batches(output_list, concat_axis=-1)[source]
Function to flatten model output arrays before they are used to calculate the calibration error metrics.
Parameters:
- output_list: list
List to be flattened into 1D np.ndarray
- concat_axis: int, optional
Axis on which to perform the flattening, by default -1 (last axis)
Returns:
: flattened_array: np.ndarry
Flattened array
- qumphy.uq_metrics.negative_log_likelihood(predictions, targets, *args)[source]
Compute the negative log likelihood metric for the predicted probabilities for k-class classification. Values are clipped to avoid divide by zero errors when taking the base-2 logarithm.
- Return type:
float
Parameters:
- predictions: np.ndarray
Model output probabilities with shape (n, k) for k-class classification.
- targets: np.ndarray
Target class labels with shape (n, 1).
- *args:
Used to capture other arguments passed to the function that are not necessary e.g. filepath for reliability diagram.
Returns:
- :
nll (float): Calculated NLL value.
- qumphy.uq_metrics.picp(predictions, targets, confidence_level, is_interval=True, uncertainties=None, is_var=True)[source]
Calculate the Prediction Interval Coverage Probability (PICP) for predictions given either as intervals or point-predictions, for a specified confidence level. If inputs are given as point-predictions, uncertainty (as variance/standard deviations) arrays must be given, from which intervals are derived, assuming a Gaussian distribution.
The Mean Prediction Interval Width (MPIW) is also returned. Note this is an unbounded non-negative metric.
- Return type:
tuple
Parameters:
- predictions: np.ndarray
Array of model predictions with shape either (n, 2) for intervals or (n, 1) for point predictions
- targets: np.ndarray
Array of target values with shape (n, 1)
- confidence_level: float
The confidence level of the prediction intervals. If point-predictions are given, intervals will be created assuming Gaussian distribution at this confidence level.
- is_interval: bool
Boolean flag to set whether predictions are given as intervals or point-predictions. Defaults to True.
- uncertainties: np.ndarray
Array of model uncertainties with shape (n, 1). Defaults to None, must be present if is_interval is False.
- is_var: bool
Bollean flag corresponding to if the uncertainties are given as variances (default) or standard deviations. Defaults to True.
Returns:
:
- picp: float
The prediction interval coverage probability of the model predictions. Aim is for the PICP to match the confidence level.
- MPIW: float
The Mean Predicted Interval Wdith of the model predictions.
- qumphy.uq_metrics.set_coverage(prediction_sets, targets)[source]
Calculate the prediction set coverage for model ouputs given from either basic conformal prediction, or sets created using top-k selection from model probabilities. Note: top-k selection from model probabilities has no coverage guarantee, in contrast to conformal methods.
Parameters:
prediction_sets (np.ndarray(List)): Array of lists containing the predicted classes. targets (np.ndarry): The ground truth class labels.
Returns:
- :
float: The empirical frequency of the ground truth class labels in the set.
- qumphy.uq_metrics.smooth_expected_calibration_error(predictions, targets, filepath, save_fig=True)[source]
Calculate the smooth ECE value (smECE) for the predicted class probabilities for the classification task.
- Return type:
float
References:
Details on the relplot python package can be found in the following repository: <https://github.com/apple/ml-calibration/tree/main>. The accompanying paper for the smooth ECE metric, and the smooth reliability diagrams can be found here: <https://arxiv.org/pdf/2309.12236>.
Parameters:
- predictions: np.ndarray
Model output probabilities with shape (n, k) for k class classification.
- targets: np.ndarray
Target class labels with shape (n, 1).
- filepath: str
Path to directory to save the reliability diagram.
- save_fig: bool
Boolean to determine whether to save the reliability diagram to the given filepath.
Returns:
:
- smECE: float
The calculated smECE value using the relplot python package.
- qumphy.uq_metrics.split_into_classes(predictions, targets)[source]
Helper function to split classification predictions into predictions conditioned on the ground truth class label.
- Return type:
tuple
Parameters:
- predictions: np.ndarray
Model output probabilities with shape (n, k) for k-class classification.
- targets: np.ndarray
Target class labels with shape (n, 1).
- returns:
Predictions per-class for k classes, with corresponding ground truth class labels for each class. Size of tuple is dependent on number of classes.
- rtype:
tuple
- qumphy.uq_metrics.uncertainty_calibration_error(predictions, targets, filepath=None, save_fig=False, prediction_entropies=None, bin_number=15)[source]
Calculate the Uncertainty Calibration Error (UCE) for model predictions and target labels for classification. There exists the optional argument to provide an array of entropies of shape (n, 1); if None, the entropies will be calculated from the probabilities.
References:
The UCE was introduced in the following paper: <https://arxiv.org/pdf/1909.13550>. This implementation is based on: <https://github.com/mlaves/bayesian-temperature-scaling/blob/master/uce.py>.
Parameters:
- predictions: np.ndarray
Model output probabilities with shape (n, k) for k-class classification.
- targets: np.ndarray
Target class labels with shape (n, 1).
- filepath: str
Path to directory to save the reliability diagram.
- prediction_entropies: np.ndarray, optional
The entropies for the predictions, by default None; if None, then the entropies are calculated from the prediction probabilities.
- bin_number: int, optional
Number of bins to be used (note: bins are equal width in the normalised entropy axis), by default 15.
Returns:
:
- UCE: float
The UCE for the model predictions and their corresponding entropies.
- xs: np.ndarray
Array of the mean normalised predicted entropies, used for plotting the reliability diagram.
- ys: np.ndarray
Array of the empirical inaccuracy per bin, used for the plotting the reliability diagram.
- qumphy.uq_metrics.variation_calibration_error(predictions, targets, filepath, save_fig=True, bin_number=15)[source]
Compute the Variation Calibration Error (VCE) metric. Extends the ECE to a measure of distributional calibration for multiclass problems. Measure of variation employed for this metric is the Shannon entropy. Bins can be defined with either equal-widths or equal-frequency.
- Return type:
Tuple[float,ndarray,ndarray]
Parameters:
- predictions (np.ndarray):
Prediction array with shape (n, k) for n samples and k classes.
- targets (np.ndarray):
Target class labels with shape (n, 1).
- save_fig (bool, default = False):
Flag to save the reliability diagram.
- filepath (str, default = os.getcwd()):
Filepath to which the reliability diagram is saved.
- bin_number (int, default = 10):
Number of bins used to compute metric and plot reliability diagram.
Returns:
: VCE (float):
VCE metric value.
- xs (np.ndarray):
Mean entropy of predictions array for plotting the reliability diagram.
- ys (np.ndarray):
Entropy of the rank distribution array for plotting the reliability diagram.
- qumphy.uq_metrics.z_variance_error(predictions, targets, uncertainties, is_var=True, bin_number=15)[source]
Calculate the Z-variance error (ZVE) for the model predictions and corresponding total uncertainties.
Parameters:
- predictions: np.ndarray
An array of the model predictions with shape (n, 1)
- targets: np.ndarray
Array of target values with shape (n, 1)
- uncertainties: np.ndarray
Array of the total uncertainties associated with the model predictions from a chosen UQ method, with shape (n, 1)
- is_var: bool, optional
Boolean parameter that sets whether to convert the uncertainties from variances to standard deviations, by default True
- bin_number: int, optional
Chosen number of bins for calculating ZVE (note: the bins are defined to have an equal number of samples per bin), by default 15
Returns:
:
- ZVE: float
The ZVE value for the input predictions and uncertainties