5  Resampling methods

Author

phonchi

Published

October 3, 2022

Open In Colab


5.1 The Validation Set Approach

import pandas as pd
import numpy as np

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split, LeaveOneOut, cross_val_score, KFold
from sklearn import metrics
from sklearn.utils import resample

import statsmodels.formula.api as smf
import statsmodels.api as sm

from functools import partial
from tqdm import tqdm
tqdm = partial(tqdm, position=0, leave=True)

import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
plt.style.use('seaborn-white')
sns.set_context("notebook", font_scale=1.5, rc={"lines.linewidth": 2.5})

In this lab, we explore the resampling techniques covered in this chapter. Some of the commands in this lab may take a while to run on your computer.

We explore the use of the validation set approach in order to estimate the test error rates that result from fitting various linear models on the Auto data set.

Before we begin, we use the random_state in order to set a for Python’s random number generator, so that the reader of this book will obtain precisely the same results as those shown below. It is generally a good idea to set a random seed when performing an analysis such as cross-validation that contains an element of randomness, so that the results obtained can be reproduced precisely at a later time.

We begin by using the train_test_split() function to split the set of observations into two halves, by selecting a random subset of \(196\) observations out of the original \(392\) observations. We refer to these observations as the training set.

from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
# Load data
Auto = pd.read_csv("/content/drive/MyDrive/Lab/Data/Auto.csv", index_col=0)
print(Auto.shape)
Auto.info()
(392, 9)
<class 'pandas.core.frame.DataFrame'>
Int64Index: 392 entries, 1 to 397
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   mpg           392 non-null    float64
 1   cylinders     392 non-null    int64  
 2   displacement  392 non-null    float64
 3   horsepower    392 non-null    int64  
 4   weight        392 non-null    int64  
 5   acceleration  392 non-null    float64
 6   year          392 non-null    int64  
 7   origin        392 non-null    int64  
 8   name          392 non-null    object 
dtypes: float64(3), int64(5), object(1)
memory usage: 30.6+ KB
Auto.head()
mpg cylinders displacement horsepower weight acceleration year origin name
1 18.0 8 307.0 130 3504 12.0 70 1 chevrolet chevelle malibu
2 15.0 8 350.0 165 3693 11.5 70 1 buick skylark 320
3 18.0 8 318.0 150 3436 11.0 70 1 plymouth satellite
4 16.0 8 304.0 150 3433 12.0 70 1 amc rebel sst
5 17.0 8 302.0 140 3449 10.5 70 1 ford torino

We need to organize them into NumPy array first.

X = Auto.horsepower.values.reshape(-1,1)
y = Auto.mpg.values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)

We now use the predict() function to estimate the response for all \(392\) observations, and we use the mean_squared_error() function to calculate the MSE of the \(196\) observations in the validation set.

lr_model = LinearRegression()
lr_model.fit(X_train, y_train)
y_pred = lr_model.predict(X_test)


print ("The MAE is: {:.5}".format( metrics.mean_absolute_error(y_test, y_pred) )  )  # (True, Predict) 
print ("The MSE is: {:.5}".format( metrics.mean_squared_error(y_test, y_pred) )  )
print ("The RMSE is: {:.5}".format( np.sqrt( metrics.mean_squared_error(y_test, y_pred)))) 
The MAE is: 3.8385
The MSE is: 23.617
The RMSE is: 4.8597

Therefore, the estimated test MSE for the linear regression fit is \(23.6\). We can use the poly() function to estimate the test error for the quadratic and cubic regressions.

# quadratic polynomial
# include bias = False, will not return the constant value, that is X**0, as that is added automatically by lin regression
poly = PolynomialFeatures(2,include_bias=False)
poly.fit(X_train)
X_train_quad = poly.transform(X_train)
X_test_quad = poly.transform(X_test)

lr_model.fit(X_train_quad,y_train)
y_pred = lr_model.predict(X_test_quad)

print ("The MAE is: {:.5}".format( metrics.mean_absolute_error(y_test, y_pred) )  )  # (True, Predict) 
print ("The MSE is: {:.5}".format( metrics.mean_squared_error(y_test, y_pred) )  )
print ("The RMSE is: {:.5}".format( np.sqrt( metrics.mean_squared_error(y_test, y_pred)))) 
The MAE is: 3.2235
The MSE is: 18.763
The RMSE is: 4.3316
#cubic polynomial
poly = PolynomialFeatures(3,include_bias=False)
poly.fit(X_train)
X_train_cubic = poly.transform(X_train)
X_test_cubic = poly.transform(X_test)

lr_model.fit(X_train_cubic,y_train)
y_pred = lr_model.predict(X_test_cubic)

print ("The MAE is: {:.5}".format( metrics.mean_absolute_error(y_test, y_pred) )  )  # (True, Predict) 
print ("The MSE is: {:.5}".format( metrics.mean_squared_error(y_test, y_pred) )  )
print ("The RMSE is: {:.5}".format( np.sqrt( metrics.mean_squared_error(y_test, y_pred)))) 
The MAE is: 3.219
The MSE is: 18.797
The RMSE is: 4.3355

These error rates are \(18.76\) and \(18.80\), respectively. If we choose a different training set instead, then we will obtain somewhat different errors on the validation set.

# with different random state
# earlier it was random state 0, now its 2
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=2)

lr_model.fit(X_train,y_train)
y_pred = lr_model.predict(X_test)
print('Mean squared error is ',metrics.mean_squared_error(y_test,y_pred))

# quadratic polynomial
poly = PolynomialFeatures(2,include_bias=False)
poly.fit(X_train)
X_train_quad = poly.transform(X_train)
X_test_quad = poly.transform(X_test)

lr_model.fit(X_train_quad,y_train)
y_pred = lr_model.predict(X_test_quad)

print('Mean squared error for quadratic is ',metrics.mean_squared_error(y_test,y_pred))

#cubic polynomial
poly = PolynomialFeatures(3,include_bias = False)
poly.fit(X_train)
X_train_cubic = poly.transform(X_train)
X_test_cubic = poly.transform(X_test)

lr_model.fit(X_train_cubic,y_train)
y_pred = lr_model.predict(X_test_cubic)

print('Mean squared error for cubic is ',metrics.mean_squared_error(y_test,y_pred))
Mean squared error is  23.442643969985735
Mean squared error for quadratic is  18.550198801910312
Mean squared error for cubic is  18.59522229455435

Using this split of the observations into a training set and a validation set, we find that the validation set error rates for the models with linear, quadratic, and cubic terms are \(23.44\), \(18.55\), and \(18.60\), respectively.

These results are consistent with our previous findings: a model that predicts mpg using a quadratic function of horsepower performs better than a model that involves only a linear function of horsepower, and there is little evidence in favor of a model that uses a cubic function of horsepower.

5.2 Leave-One-Out Cross-Validation

The LOOCV estimate can be automatically computed using cross_val_score or LeaveOneOut.

loocv = LeaveOneOut()
loocv.get_n_splits(X)
392
loocv_mse = []
lm = LinearRegression()

for train_index, test_index in loocv.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    
    lm1_fit = lm.fit(X_train, y_train)
    lm1_predict = lm1_fit.predict(X_test)
    
    loocv_mse.append(metrics.mean_squared_error(y_test, lm1_predict))
    
np.array(loocv_mse).mean()
24.231513517929226
loocv_mse2 = []

for train_index, test_index in loocv.split(X):  
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    lm2 = sm.OLS(y_train, sm.add_constant(X_train))
        
    lm2_fit = lm2.fit()
    lm2_predict = lm2_fit.predict(sm.add_constant(X_test, has_constant='add'))

    loocv_mse2.append(metrics.mean_squared_error(y_test, lm2_predict))
    
np.array(loocv_mse2).mean()
24.23151351792922

The value correspond to the LOOCV statistic given in (5.1). Our cross-validation estimate for the test error is approximately \(24.23\).

We can repeat this procedure for increasingly complex polynomial fits.

cv_error = []
lm = LinearRegression()

for i in tqdm(range(1,11)):
    loocv_mse = []
    for train_index, test_index in loocv.split(X):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        
        poly = PolynomialFeatures(i,include_bias=False)
        poly.fit(X_train)
        X_train_2 = poly.transform(X_train)
        X_test_2 = poly.transform(X_test)
        
        lm1_fit = lm.fit(X_train_2, y_train)
        lm1_predict = lm1_fit.predict(X_test_2)
        
        loocv_mse.append(metrics.mean_squared_error(y_test, lm1_predict))
    cv_error.append(np.array(loocv_mse).mean())
100%|██████████| 10/10 [00:02<00:00,  3.38it/s]
cv_error
[24.231513517929226,
 19.24821312448967,
 19.33498406402931,
 19.42443031024277,
 19.03321248615882,
 18.97863406819667,
 19.129480449254846,
 19.224150660848743,
 19.133322843461364,
 18.93976572079586]

We can instead use cross_val_score for calculating the validation error. Check https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html for more details.

error_list = []
for power in tqdm(range(1,11)):
    poly = PolynomialFeatures(power,include_bias=False)
    X2 = poly.fit_transform(X)  
    lr = LinearRegression()
    # for LOOCV, the number of folds be will n = size of data
    error_list.append(-1*cross_val_score(lr,X2,y, cv = len(X), n_jobs=-1, scoring = 'neg_mean_squared_error').mean())
100%|██████████| 10/10 [00:05<00:00,  1.78it/s]
pd.DataFrame({"DEGREE":np.arange(1,11),"MEAN SQUARED ERROR":error_list})
DEGREE MEAN SQUARED ERROR
0 1 24.231514
1 2 19.248213
2 3 19.334984
3 4 19.424430
4 5 19.033212
5 6 18.978634
6 7 19.129480
7 8 19.224151
8 9 19.133323
9 10 18.939766

As in Figure 5.4, we see a sharp drop in the estimated test MSE between the linear and quadratic fits, but then no clear improvement from using higher-order polynomials.

5.3 \(k\)-Fold Cross-Validation

The cross_val_score function can also be used to implement \(k\)-fold CV. Below we use \(k=10\), a common choice for \(k\), on the Auto data set. We once again set a random seed and initialize a vector in which we will store the CV errors corresponding to the polynomial fits of orders one to ten.

cv_error = []
lm = LinearRegression()

kf = KFold(n_splits=10, shuffle=True, random_state=1)
for i in tqdm(range(1,11)):
    kfold_mse = []
    for train_index, test_index in kf.split(X):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        
        poly = PolynomialFeatures(i,include_bias=False)
        poly.fit(X_train)
        X_train_2 = poly.transform(X_train)
        X_test_2 = poly.transform(X_test)
        
        lm1_fit = lm.fit(X_train_2, y_train)
        lm1_predict = lm1_fit.predict(X_test_2)
        
        kfold_mse.append(metrics.mean_squared_error(y_test, lm1_predict))
    cv_error.append(np.array(kfold_mse).mean())
100%|██████████| 10/10 [00:00<00:00, 67.33it/s]
cv_error
[24.097675731883058,
 19.17888986488955,
 19.21385952370859,
 19.21280701624055,
 18.757991658614383,
 18.6424339955243,
 18.810193380203554,
 18.975900958024067,
 18.937570150409766,
 18.79101740940429]
lr = LinearRegression()
error_list = []
kf_10 = KFold(n_splits=10, shuffle=True, random_state=1)
for power in tqdm(range(1,11)):
    poly = PolynomialFeatures(power,include_bias=False)
    X2 = poly.fit_transform(X) 
    error_list.append(-1*cross_val_score(lr,X2,y, cv = kf_10, n_jobs=-1, scoring = 'neg_mean_squared_error').mean()) 
    #use k fold instead of stratefied k fold

print('K FOLD CV')    
pd.DataFrame({"DEGREE":np.arange(1,11),"MEAN SQUARED ERROR":error_list})
100%|██████████| 10/10 [00:00<00:00, 23.77it/s]
K FOLD CV
DEGREE MEAN SQUARED ERROR
0 1 24.097676
1 2 19.178890
2 3 19.213860
3 4 19.212807
4 5 18.757992
5 6 18.642434
6 7 18.810193
7 8 18.975901
8 9 18.937570
9 10 18.791017

Slightly differnt may due to https://stackoverflow.com/questions/60432894/cross-val-score-and-stratifiedkfold-give-different-result.

Notice that the computation time is shorter than that of LOOCV. (In principle, the computation time for LOOCV for a least squares linear model should be faster than for \(k\)-fold CV, due to the availability of the formula (5.2) for LOOCV; however, unfortunately the function does not make use of this formula.) We still see little evidence that using cubic or higher-order polynomial terms leads to lower test error than simply using a quadratic fit.

5.4 The Bootstrap

We illustrate the use of the bootstrap in the simple example of Section 5.2, as well as on an example involving estimating the accuracy of the linear regression model on the Auto data set.

5.4.1 Estimating the Accuracy of a Statistic of Interest

One of the great advantages of the bootstrap approach is that it can be applied in almost all situations. No complicated mathematical calculations are required.

The Portfolio data set in the ISLR2 package is simulated data of \(100\) pairs of returns, generated in the fashion described in Section 5.2. To illustrate the use of the bootstrap on this data, we must first create a function, alpha.fn(), which takes as input the \((X,Y)\) data as well as a vector indicating which observations should be used to estimate \(\alpha\). The function then outputs the estimate for \(\alpha\) based on the selected observations.

Portfolio = pd.read_csv('/content/drive/MyDrive/Lab/Data/Portfolio.csv')
print(Portfolio.shape)
Portfolio.head()
(100, 2)
X Y
0 -0.895251 -0.234924
1 -1.562454 -0.885176
2 -0.417090 0.271888
3 1.044356 -0.734198
4 -0.315568 0.841983
# we first define a function equivalent to func alpha defined in the Lab, i would recommend you to go through the function in lab
# This function takes two arguements, data and indeces, this indices are used to calculate the estimate for alpha for this bootstrap
def alpha_fn(data, start_index, end_index):
    X = data['X'][start_index:end_index]
    Y = data['Y'][start_index:end_index]
    return ((np.var(Y) - np.cov(X, Y)[0][1]) / (np.var(X) + np.var(Y) - 2 * np.cov(X, Y)[0][1]))

This function returns, or outputs, an estimate for \(\alpha\) based on applying (5.7) to the observations indexed by the argument index. For instance, the following command tells Python to estimate \(\alpha\) using all \(100\) observations.

alpha_fn(Portfolio, 0, 100)
0.5766511516104116

The next command uses the get_indices() function to randomly select \(100\) observations from the range \(1\) to \(100\), with replacement. This is equivalent to constructing a new bootstrap data set and recomputing \(\hat{\alpha}\) based on the new data set.

portfolio_bs = resample(Portfolio, replace=True, n_samples=100, random_state=0)

alpha_fn(portfolio_bs, 0, 100)
0.560336658007497

We can implement a bootstrap analysis by performing this command many times, recording all of the corresponding estimates for \(\alpha\), and computing the resulting standard deviation. Note you should not set a fix random_state for resampling, otherwise the variance will be 0. In stead you may follow https://scikit-learn.org/stable/common_pitfalls.html#controlling-randomness for controlling randomness.

rng = np.random.RandomState(0)
bs_alpha = []

for i in range(0, 1000):
    bs_alpha.append(
        alpha_fn(resample(Portfolio, replace=True, n_samples=100, random_state=rng), 0, 100)
    )

bs_alpha = np.array(bs_alpha)

print('Bootstrapped alpha:', bs_alpha.mean())
print('SE:',  bs_alpha.std())
Bootstrapped alpha: 0.5805913095922189
SE: 0.08980409161302538

The final output shows that using the original data, \(\hat{\alpha}=0.581\), and that the bootstrap estimate for \({\rm SE}(\hat{\alpha})\) is \(0.0898\).

5.4.2 Estimating the Accuracy of a Linear Regression Model

The bootstrap approach can be used to assess the variability of the coefficient estimates and predictions from a statistical learning method. Here we use the bootstrap approach in order to assess the variability of the estimates for \(\beta_0\) and \(\beta_1\), the intercept and slope terms for the linear regression model that uses horsepower to predict mpg in the Auto data set. We will compare the estimates obtained using the bootstrap to those obtained using the formulas for \({\rm SE}(\hat{\beta}_0)\) and \({\rm SE}(\hat{\beta}_1)\) described in Section 3.1.2.

We first create a simple function, boot.fn(), which takes in the Auto data set as well as a set of indices for the observations, and returns the intercept and slope estimates for the linear regression model. We then apply this function to the full set of \(392\) observations in order to compute the estimates of \(\beta_0\) and \(\beta_1\) on the entire data set using the usual linear regression coefficient estimate formulas from Chapter 3.

# auto data used earlier in the notebook
Auto = pd.read_csv("/content/drive/MyDrive/Lab/Data/Auto.csv", index_col=0)
Auto.head()
mpg cylinders displacement horsepower weight acceleration year origin name
1 18.0 8 307.0 130 3504 12.0 70 1 chevrolet chevelle malibu
2 15.0 8 350.0 165 3693 11.5 70 1 buick skylark 320
3 18.0 8 318.0 150 3436 11.0 70 1 plymouth satellite
4 16.0 8 304.0 150 3433 12.0 70 1 amc rebel sst
5 17.0 8 302.0 140 3449 10.5 70 1 ford torino
def boot_fn(data, start_index, end_index):
    m = LinearRegression(fit_intercept=True).fit(
        data['horsepower'][start_index:end_index].values.reshape(-1, 1),
        data['mpg'][start_index:end_index]
    )
    
    return m.intercept_, m.coef_
    
boot_fn(Auto, 0, 392)
(39.93586102117047, array([-0.15784473]))
boot_fn(resample(Auto, replace=True, n_samples=392, random_state=0), 0, 392)
(40.480438868243674, array([-0.16156162]))

Next, we use the boot() function to compute the standard errors of 1,000 bootstrap estimates for the intercept and slope terms.

rng = np.random.RandomState(0)
bs_boot = {'t1': [], 't2': []}

for i in range(0, 1000):
    est = boot_fn(resample(Auto, replace=True, n_samples=392, random_state=rng), 0, 392)
    bs_boot['t1'].append(
        est[0]
    )
    bs_boot['t2'].append(
        est[1][0]
    )

t1_es = np.array(bs_boot['t1']).mean()
t1_se = np.array(bs_boot['t1']).std()
t2_es = np.array(bs_boot['t2']).mean()
t2_se = np.array(bs_boot['t2']).std()

print('t1 bs estimate & se:', t1_es, t1_se)
print('t2 bs estimate & se:', t2_es, t2_se)
t1 bs estimate & se: 39.996156672438836 0.8528864796025836
t2 bs estimate & se: -0.158407507124534 0.0072655098695998096

This indicates that the bootstrap estimate for \({\rm SE}(\hat{\beta}_0)\) is \(0.85\), and that the bootstrap estimate for \({\rm SE}(\hat{\beta}_1)\) is \(0.0073\). As discussed in Section 3.1.2, standard formulas can be used to compute the standard errors for the regression coefficients in a linear model. These can be obtained using the summary() function.

# for lets see what the model predicts
ols = smf.ols('mpg ~ horsepower', data=Auto).fit()
ols.summary().tables[1]
coef std err t P>|t| [0.025 0.975]
Intercept 39.9359 0.717 55.660 0.000 38.525 41.347
horsepower -0.1578 0.006 -24.489 0.000 -0.171 -0.145

The standard error estimates for \(\hat{\beta}_0\) and \(\hat{\beta}_1\) obtained using the formulas from Section 3.1.2 are \(0.717\) for the intercept and \(0.006\) for the slope. Interestingly, these are somewhat different from the estimates obtained using the bootstrap. Does this indicate a problem with the bootstrap? In fact, it suggests the opposite. Recall that the standard formulas given in Equation 3.8 on page 66 rely on certain assumptions. For example, they depend on the unknown parameter \(\sigma^2\), the noise variance. We then estimate \(\sigma^2\) using the RSS. Now although the formulas for the standard errors do not rely on the linear model being correct, the estimate for \(\sigma^2\) does.

We see in Figure 3.8 on page 91 that there is a non-linear relationship in the data, and so the residuals from a linear fit will be inflated, and so will \(\hat{\sigma}^2\). Secondly, the standard formulas assume (somewhat unrealistically) that the \(x_i\) are fixed, and all the variability comes from the variation in the errors \(\epsilon_i\). The bootstrap approach does not rely on any of these assumptions, and so it is likely giving a more accurate estimate of the standard errors of \(\hat{\beta}_0\) and \(\hat{\beta}_1\) than is the summary() function.

Below we compute the bootstrap standard error estimates and the standard linear regression estimates that result from fitting the quadratic model to the data. Since this model provides a good fit to the data (Figure 3.8), there is now a better correspondence between the bootstrap estimates and the standard estimates of \({\rm SE}(\hat{\beta}_0)\), \({\rm SE}(\hat{\beta}_1)\) and \({\rm SE}(\hat{\beta}_2)\).

def boot_fn2(data, start_index, end_index):
    m = LinearRegression(fit_intercept=True).fit(
        data[['horsepower', 'horsepower_2']][start_index:end_index],
        data['mpg'][start_index:end_index]
    )
    
    return m.intercept_, m.coef_
Auto['horsepower_2'] = np.power(Auto.horsepower, 2)
rng = np.random.RandomState(0)

bs_boot2 = {'t1': [], 't2': [], 't3': []}

for i in range(0, 1000):
    est = boot_fn2(resample(Auto, replace=True, n_samples=392, random_state=rng), 0, 392)
    bs_boot2['t1'].append(
        est[0]
    )
    bs_boot2['t2'].append(
        est[1][0]
    )
    bs_boot2['t3'].append(
        est[1][1]
    )


t1_es = np.array(bs_boot2['t1']).mean()
t1_se = np.array(bs_boot2['t1']).std()
t2_es = np.array(bs_boot2['t2']).mean()
t2_se = np.array(bs_boot2['t2']).std()
t3_es = np.array(bs_boot2['t3']).mean()
t3_se = np.array(bs_boot2['t3']).std()

print('t1 bs estimate & se:', t1_es, t1_se)
print('t2 bs estimate & se:', t2_es, t2_se)
print('t3 bs estimate & se:', t3_es, t3_se)


# for lets see what the model predicts
ols2 = smf.ols('mpg ~ horsepower + horsepower_2', data=Auto).fit()
ols2.summary().tables[1]
t1 bs estimate & se: 56.95462099069789 2.0847008918695753
t2 bs estimate & se: -0.46692993431680474 0.03307843231128892
t3 bs estimate & se: 0.0012328331475612475 0.00011942814562882365
coef std err t P>|t| [0.025 0.975]
Intercept 56.9001 1.800 31.604 0.000 53.360 60.440
horsepower -0.4662 0.031 -14.978 0.000 -0.527 -0.405
horsepower_2 0.0012 0.000 10.080 0.000 0.001 0.001

5.5 Cross validation in sklearn

https://scikit-learn.org/stable/modules/cross_validation.html.

cv_error2 = []


for i in tqdm(range(1,11)):
    loocv_mse2 = []
    for train_index, test_index in loocv.split(X):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        poly = PolynomialFeatures(i)
        poly.fit(X_train) 
            
        X_train_2 = poly.transform(X_train)
        X_test_2 = poly.transform(X_test)

        lm2 = sm.OLS(y_train, X_train_2)
            
        lm2_fit = lm2.fit()
        lm2_predict = lm2_fit.predict(X_test_2)
            
        loocv_mse2.append(metrics.mean_squared_error(y_test, lm2_predict))
    cv_error2.append(np.array(loocv_mse2).mean())
100%|██████████| 10/10 [00:03<00:00,  3.00it/s]
cv_error2
[24.23151351792922,
 19.24821312448954,
 19.334984064022773,
 19.424430290207603,
 19.03320647959501,
 19.00693693541756,
 18.995137417415865,
 23.214193821020206,
 38.92432074370519,
 75.92002510349722]