Creating a Multivariate PLS-model for estimating ethanol concentrations


The purpose of measuring ethanol concentrations in this way using a spectrophotometer, was to evaluate ethanol production and in the same time explore some unknown territory. The easiest way to measure ethanol concentrations in our cultures would probably be to use a GC, but this seemed much more fun.

Creating the model

PLS stands for Partial Least Squares and is a method to find patterns in complex sets of data by reducing and tweaking the variable space. In this way instead of looking at a few variables/peaks we can simultaneously consider thousands of wavelengths and extract the common information.

We started out by setting the limits for our model by creating a calibration-set of samples with as diverse properties as possible; ethanol concentration (by adding a known amount), growth phase etc, and at the same time set the procedure for measuring so that all samples are treated in the same way.

We used the Varian Cary 50 broad scan spectrophotometer and scanned near IR from 850nm to 1050nm with a resolution of 1 wavelength and a scan rate of 0.1nm/sec. We chose this part of the spectrum as we had determined that ethanol here has a distinct absorption pattern with low background interference from sugars and other complex contaminants.

Using the spectra from all the samples in the calibration-set we constructed a PLS model using UNSCRAMBLER software. This model can then be used to make predictions on the ethanol concentration in the samples with unknown ethanol concentrations.



Calibration/Validation plot

This plot show the calibration set in blue and the validation set in red. The validation is of the type Cross validation where each sample is taken out of the model and the model is recalculated to fit the remaining samples, and the residual is calculated. This is repeated with each sample and the combined residuals are plotted as the validation curve. In a perfect model the validation sets will be identical, but this is never the case because of various noise. We have in this plot estimate that we have a limit of quantification of 50mM (+-20%) ethanol and a limit of detection of about 12.5mM ethanol.


Matrix plot over the samples spectra that we try to determine ethanol concentration in. Quite similar right :-)

Pred graf.jpg

Bar plot over the predicted values with deviation, to the left we have a series of samples from cultures containing the ethanol and butanol constructs under various conditions, to the right we have samples from cultures containing a non functional ethanol construct

Numerical values for above graph

Pred tab.jpg

The results are not too good to be honest but they gives an indication of the ethanol concentrations. There is a clear difference between the functional and non functional inserts. The model is not fitted for butanol but seems to be able to detect also larger alcohols, to avoid this these alcohols/contaminants have to be included in the calibration-set. A bit surprising is the lack of difference between the IPTG induced and non induced samples, this can mean that our promoter is leaking. Maybe our M15 strain doesn't contain the lac repressor after all.

Bioneer Biolegio Clontech Uppsala Genome Center