Uncertainty Quantification in Machine Learning with an Easy Python Interface

The ML Uncertainty Package The post Uncertainty Quantification in Machine Learning with an Easy Python Interface appeared first on Towards Data Science.

Mar 26, 2025 - 22:21

Uncertainty quantification (UQ) in a Machine Learning (ML) model allows one to estimate the precision of its predictions. This is extremely important for utilizing its predictions in real-world tasks. For instance, if a machine learning model is trained to predict a property of a material, a predicted value with a 20% uncertainty (error) is likely to be used very differently from a predicted value with a 5% uncertainty (error) in the overall decision-making process. Despite its importance, UQ capabilities aren’t available with popular ML software in Python, such as scikit-learn, Tensorflow, and Pytorch.

Enter ML Uncertainty: a Python package designed to address this problem. Built on top of popular Python libraries such as SciPy and scikit-learn, ML Uncertainty provides a very intuitive interface to estimate uncertainties in ML predictions and, where possible, model parameters. Requiring only about four lines of code to perform these estimations, the package leverages powerful and theoretically rigorous mathematical methods in the background. It exploits the underlying statistical properties of the ML model in question, making the package computationally inexpensive. Moreover, this approach extends its applicability to real-world use cases where often, only small amounts of data are available.

Motivation

I have been an avid Python user for the last 10 years. I love the large number of powerful libraries that have been created and maintained, and the community, which is very active. The idea for ML Uncertainty came to me when I was working on a hybrid ML problem. I had built an ML model to predict stress-strain curves of some polymers. Stress-strain curves–an important property of polymers–obey certain physics-based rules; for instance, they have a linear region at low strain values, and the tensile modulus decreases with temperature.

I found from literature some non-linear models to describe the curves and these behaviors, thereby reducing the stress-strain curves to a set of parameters, each with some physical meaning. Then, I trained an ML model to predict these parameters from some easily measurable polymer attributes. Notably, I only had a few hundred data points, as is quite common in scientific applications. Having trained the model, finetuned the hyperparameters, and performed the outlier analysis, one of the stakeholders asked me: “This is all good, but what are the error estimates on your predictions?” And I realized that there wasn’t an elegant way to estimate this with Python. I also realized that this wasn’t going to be the last time that this problem was going to arise. And that led me down the path that culminated in this package.

Having spent some time studying Statistics, I suspected that the math for this wasn’t impossible or even that hard. I began researching and reading up books like Introduction to Statistical Learning and Elements of Statistical Learning^1,2 and found some answers there. ML Uncertainty is my attempt at implementing some of those methods in Python to integrate statistics more tightly into machine learning. I believe that the future of machine learning depends on our ability to increase the reliability of predictions and the interpretability of models, and this is a small step towards that goal. Having developed this package, I have frequently used it in my work, and it has benefited me greatly.

This is an introduction to ML Uncertainty with an overview of the theories underpinning it. I have included some equations to explain the theory, but if those are overwhelming, feel free to gloss over them. For every equation, I have stated the key idea it represents.

Getting started: An example

We often learn best by doing. So, before diving deeper, let’s consider an example. Say we are working on a good old-fashioned linear regression problem where the model is trained with scikit-learn. We think that the model has been trained well, but we want more information. For instance, what are the prediction intervals for the outputs? With ML Uncertainty, this can be done in 4 lines as shown below and discussed in this example.

Illustrating ML uncertainty code (a) and plot (b) for linear regression. Image by author.

All examples for this package can be found here: https://github.com/architdatar/ml_uncertainty/tree/main/examples.

Delving deeper: A peek under the hood

ML Uncertainty performs these computations by having the ParametricModelInference class wrap around the LinearRegression estimator from scikit-learn to extract all the information it needs to perform the uncertainty calculations. It follows the standard procedure for uncertainty estimation, which is detailed in many a statistics textbook,² of which an overview is shown below.

Since this is a linear model that can be expressed in terms of parameters (( beta )) as ( y = Xbeta ), ML Uncertainty first computes the degrees of freedom for the model (( p )), the error degrees of freedom (( n – p – 1 )), and the residual sum of squares (( hat{sigma}^2 )). Then, it computes the uncertainty in the model parameters; i.e., the variance-covariance matrix.³

( text{Var}(hat{beta}) = hat{sigma}^2 (J^T J)^{-1} )

Where ( J ) is the Jacobian matrix for the parameters. For linear regression, this translates to:

( text{Var}(hat{beta}) = hat{sigma}^2 (X^T X)^{-1} )

Finally, the get_intervals function computes the prediction intervals by propagating the uncertainties in both inputs as well as the parameters. Thus, for data ( X^* ) where predictions and uncertainties are to be estimated, predictions ( hat{y^*} ) along with the ( (1 – alpha) times 100% ) prediction interval are:

( hat{y^*} pm t_{1 – alpha/2, n – p – 1} , hat{sigma} sqrt{text{Var}(hat{y^*})} )

Where,

( text{Var}(hat{y^*}) = (nabla_X f)(delta X^*)^2(nabla_X f)^T + (nabla_beta f)(delta hat{beta})^2(nabla_beta f)^T + hat{sigma}^2 )

In English, this means that the uncertainty in the output depends on the uncertainty in the inputs, uncertainty in the parameters, and the residual uncertainty. Simplified for a multiple linear model and assuming no uncertainty in inputs, this translates to:

( text{Var}(hat{y^*}) = hat{sigma}^2 left(1 + X^* (X^T X)^{-1} X^{*T} right) )

Extensions to linear regression

So, this is what goes on under the hood when those four lines of code are executed for linear regression. But this isn’t all. ML Uncertainty comes equipped with two more powerful capabilities:

Regularization: ML Uncertainty supports L1, L2, and L1+L2 regularization. Combined with linear regression, this means that it can cater to LASSO, ridge, and elastic net regressions. Check out this example.
Weighted least squares regression: Sometimes, not all observations are equal. We might want to give more weight to some observations and less weight to others. Commonly, this happens in science when some observations have a high amount of uncertainty while some are more precise. We want our regression to reflect the more precise ones, but cannot fully discard the ones with high uncertainty. For such cases, the weighted least squares regression is used.

Most importantly, a key assumption of linear regression is something known as homoscedasticity; i.e., that the samples of the response variables are drawn from populations with similar variances. If this is not the case, it is handled by assigning weights to responses depending on the inverse of their variance. This can be easily handled in ML Uncertainty by simply specifying the sample weights to be used during training in the y_train_weights parameter of the ParametricModelInference class, and the rest will be handled. An application of this is shown in this example, albeit for a nonlinear regression case.

Basis expansions

I am always fascinated by how much ML we can get done by just doing linear regression properly. Many kinds of data such as trends, time series, audio, and images, can be represented by basis expansions. These representations behave like linear models with many amazing properties. ML Uncertainty can be used to compute uncertainties for these models easily. Check out these examples called spline_synthetic_data, spline_wage_data, and fourier_basis.

Results of ML Uncertainty used for weighted least squares regression, B-Spline basis with synthetic data, B-Spline basis with wage data, and Fourier basis. Image by author.

Beyond linear regression

We often encounter situations where the underlying model cannot be expressed as a linear model. This commonly occurs in science, for instance, when complex reaction kinetics, transport phenomena, process control problems, are modeled. Standard Python packages like scikit-learn, etc., don’t allow one to directly fit these non-linear models and perform uncertainty estimation on them. ML Uncertainty ships with a class called NonLinearRegression capable of handling non-linear models. The user can specify the model to be fit and the class handles fitting with a scikit-learn-like interface which uses a SciPy least_squares function in the background. This can be easily integrated with the ParametericModelInference class for seamless uncertainty estimation. Like linear regression, we can handle weighted least squares and regularization for non-linear regression. Here is an example.

Random Forests

Random Forests have gained significant popularity in the field. They operate by averaging the predictions of decision trees. Decision trees, in turn, identify a set of rules to divide the predictor variable space (input space) and assign a response value to each terminal node (leaf). The predictions from decision trees are averaged to provide a prediction for the random forest.¹ They are particularly useful because they can identify complex relationships in data, are accurate, and make fewer assumptions about the data than regressions do.

While it is implemented in popular ML libraries like scikit-learn, there is no straightforward way to estimate prediction intervals. This is particularly important for regression as random forests, given their high flexibility, tend to overfit their training data. Since random forests doesn’t have parameters like traditional regression models do, uncertainty quantification needs to be performed differently.

We use the basic idea of estimating prediction intervals using bootstrapping as described by Hastie et al. in Chapter 7 of their book Elements of Statistical Learning.² The central idea we can exploit is that the variance of the predictions ( S(Z) ) for some data ( Z ) can be estimated via predictions of its bootstrap samples as follows:

( widehat{text{Var}}[S(Z)] = frac{1}{B – 1} sum_{b=1}^{B} left( S(Z^{*b}) – bar{S}^{*} right)^2 )

Where ( bar{S}^{*} = sum_b S(Z^{*b}) / B ). Bootstrap samples are samples drawn from the original dataset repeatedly and independently, thereby allowing repetitions. Lucky for us, random forests are trained using one bootstrap sample for each decision tree within it. So, the prediction from each tree results in a distribution whose variance gives us the variance of the prediction. But there is still one problem. Let’s say we want to obtain the variance in prediction for the ( i^{text{th}} ) training sample. If we simply use the formula above, some predictions will be from trees that include the ( i^{text{th}} ) sample in the bootstrap sample on which they are trained. This could lead to an unrealistically smaller variance estimate.

To solve this problem, the algorithm implemented in ML Uncertainty only considers predictions from trees which did not use the ( i^{text{th}} ) sample for training. This results in an unbiased estimate of the variance.

The beautiful thing about this approach is that we don’t need any additional re-training steps. Instead, the EnsembleModelInference class elegantly wraps around the RandomForestRegressor estimator in scikit-learn and obtains all the necessary information from it.

This method is benchmarked using the method described in Zhang et al.,⁴ which states that a correct ( (1 – alpha) times 100% ) prediction interval is one for which the probability of it containing the observed response is ( (1 – alpha) times 100% ). Mathematically,

( P(Y in I_{alpha}) approx 1 – alpha )

Here is an example to see ML Uncertainty in action for random forest models.

Uncertainty propagation (Error propagation)

How much does a certain amount of uncertainty in input variables and/or model parameters affect the uncertainty in the response variable? How does this uncertainty (epistemic) compare to the inherent uncertainty in the response variables (aleatoric uncertainty)? Often, it is important to answer these questions to decide on the course of action. For instance, if one finds that the uncertainty in model parameters contributes highly to the uncertainty in predictions, one could collect more data or investigate alternative models to reduce this uncertainty. Conversely, if the epistemic uncertainty is smaller than the aleatoric uncertainty, trying to reduce it further might be pointless. With ML uncertainty, these questions can be answered easily.

Given a model relating the predictor variables to the response variable, the ErrorPropagation class can easily compute the uncertainty in responses. Say the responses (( y )) are related to the predictor variables (( X )) via some function (( f )) and some parameters (( beta )), expressed as:

( y = f(X, beta) ).

We wish to obtain prediction intervals for responses (( hat{y^*} )) for some predictor data (( X^* )) with model parameters estimated as ( hat{beta} ). The uncertainty in ( X^* ) and ( hat{beta} ) are given by ( delta X^* ) and ( delta hat{beta} ), respectively. Then, the ( (1 – alpha) times 100% ) prediction interval of the response variables will be given as:

( hat{y^*} pm t_{1 – alpha/2, n – p – 1} , hat{sigma} sqrt{text{Var}(hat{y^*})} )

Where,

( text{Var}(hat{y^*}) = (nabla_X f)(delta X^*)^2(nabla_X f)^T + (nabla_beta f)(delta hat{beta})^2(nabla_beta f)^T + hat{sigma}^2 )

The important thing here is to notice how the uncertainty in predictions includes contributions from the inputs, parameters, as well as the inherent uncertainty of the response.

The ability of the ML Uncertainty package to propagate both input and parameter uncertainties makes it very handy, particularly in science, where we strongly care about the error (uncertainty) in each value being predicted. Consider the often talked about concept of hybrid machine learning. Here, we model known relationships in data through first principles and unknown ones using black-box models. Using ML Uncertainty, the uncertainties obtained from these different methods can be easily propagated through the computation graph.

A very simple example is that of the Arrhenius model for predicting reaction rate constants. The formula ( k = Ae^{-E_a / RT} ) is very well-known. Say, the parameters ( A, E_a ) were predicted from some ML model and have an uncertainty of 5%. We wish to know how much error that translates to in the reaction rate constant.

This can be very easily accomplished with ML Uncertainty as shown in this example.

Illustration of uncertainty propagation through computational graph. Image by author.

Limitations

As of v0.1.1, ML Uncertainty only works for ML models trained with scikit-learn. It supports the following ML models natively: random forest, linear regression, LASSO regression, ridge regression, elastic net, and regression splines. For any other models, the user can create the model, the residual, loss function, etc., as shown for the non-linear regression example. The package has not been tested for neural networks, transformers, and other deep learning models.

Contributions from the open-source ML community are welcome and highly appreciated. While there is much to be done, some key areas of effort are adapting ML Uncertainty to other frameworks such as PyTorch and Tensorflow, adding support for other ML models, highlighting issues, and improving documentation.

Benchmarking

The ML Uncertainty code has been benchmarked against the statsmodels package in Python. Specific cases can be found here.

Background

Uncertainty quantification in machine learning has been studied in the ML community and there is growing interest in this field. However, as of now, the existing solutions are applicable to very specific use cases and have key limitations.

For linear models, the statsmodels library can provide UQ capabilities. While theoretically rigorous, it cannot handle non-linear models. Moreover, the model needs to be expressed in a format specific to the package. This means that the user cannot take advantage of the powerful preprocessing, training, visualization, and other capabilities provided by ML packages like scikit-learn. While it can provide confidence intervals based on uncertainty in the model parameters, it cannot propagate uncertainty in predictor variables (input variables).

Another family of solutions is model-agnostic UQ. These solutions utilize subsamples of training data, train the model repeatedly based on it, and use these results to estimate prediction intervals. While sometimes useful in the limit of large data, these techniques may not provide accurate estimates for small training datasets where the samples chosen might lead to substantially different estimates. Moreover, it is a computationally expensive exercise since the model needs to be retrained multiple times. Some packages using this approach are MAPIE, PUNCC, UQPy, and ml_uncertainty by NIST (same name, different package), among many others.^5–8

With ML Uncertainty, the goals have been to keep the training of the model and its UQ separate, cater to more generic models beyond linear regression, exploit the underlying statistics of the models, and avoid retraining the model multiple times to make it computationally inexpensive.

Summary and future work

This was an introduction to ML Uncertainty—a Python software package to easily compute uncertainties in machine learning. The main features of this package have been introduced here and some of the philosophy of its development has been discussed. More detailed documentation and theory can be found in the docs. While this is only a start, there is immense scope to expand this. Questions, discussions, and contributions are always welcome. The code can be found on GitHub and the package can be installed from PyPi. Give it a try with pip install ml-uncertainty.

References

(1) James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer US: New York, NY, 2021. https://doi.org/10.1007/978-1-0716-1418-1.

(2) Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer New York: New York, NY, 2009. https://doi.org/10.1007/978-0-387-84858-7.

(3) Börlin, N. Nonlinear Optimization. https://www8.cs.umu.se/kurser/5DA001/HT07/lectures/lsq-handouts.pdf.

(4) Zhang, H.; Zimmerman, J.; Nettleton, D.; Nordman, D. J. Random Forest Prediction Intervals. Am Stat 2020, 74 (4), 392–406. https://doi.org/10.1080/00031305.2019.1585288.

(5) Cordier, T.; Blot, V.; Lacombe, L.; Morzadec, T.; Capitaine, A.; Brunel, N. Flexible and Systematic Uncertainty Estimation with Conformal Prediction via the MAPIE Library. In Conformal and Probabilistic Prediction with Applications; 2023.

(6) Mendil, M.; Mossina, L.; Vigouroux, D. PUNCC: A Python Library for Predictive Uncertainty and Conformalization. In Proceedings of the Twelfth Symposium on Conformal and Probabilistic Prediction with Applications; Papadopoulos, H., Nguyen, K. A., Boström, H., Carlsson, L., Eds.; Proceedings of Machine Learning Research; PMLR, 2023; Vol. 204, pp 582–601.

(7) Tsapetis, D.; Shields, M. D.; Giovanis, D. G.; Olivier, A.; Novak, L.; Chakroborty, P.; Sharma, H.; Chauhan, M.; Kontolati, K.; Vandanapu, L.; Loukrezis, D.; Gardner, M. UQpy v4.1: Uncertainty Quantification with Python. SoftwareX 2023, 24, 101561. https://doi.org/10.1016/j.softx.2023.101561.

(8) Sheen, D. Machine Learning Uncertainty Estimation Toolbox. https://github.com/usnistgov/ml_uncertainty_py.

\[\]

The post Uncertainty Quantification in Machine Learning with an Easy Python Interface appeared first on Towards Data Science.