All subset regression tests all possible subsets of the set of potential independent variables. If there are K potential independent variables (besides the constant), then there are \(2^{k}\) distinct subsets of them to be tested. For example, if you have 10 candidate independent variables, the number of subsets to be tested is \(2^{10}\), which is 1024, and if you have 20 candidate variables, the number is \(2^{20}\), which is more than one million.
model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
ols_all_subset(model)
## # A tibble: 15 x 6
## Index N Predictors `R-Square` `Adj. R-Square` `Mallow's Cp`
## <int> <int> <chr> <chr> <chr> <chr>
## 1 1 1 wt 0.75283 0.74459 12.48094
## 2 2 1 disp 0.71834 0.70895 18.12961
## 3 3 1 hp 0.60244 0.58919 37.11264
## 4 4 1 qsec 0.17530 0.14781 107.06962
## 5 5 2 hp wt 0.82679 0.81484 2.36900
## 6 6 2 wt qsec 0.82642 0.81444 2.42949
## 7 7 2 disp wt 0.78093 0.76582 9.87910
## 8 8 2 disp hp 0.74824 0.73088 15.23312
## 9 9 2 disp qsec 0.72156 0.70236 19.60281
## 10 10 2 hp qsec 0.63688 0.61183 33.47215
## 11 11 3 hp wt qsec 0.83477 0.81706 3.06167
## 12 12 3 disp hp wt 0.82684 0.80828 4.36070
## 13 13 3 disp wt qsec 0.82642 0.80782 4.42934
## 14 14 3 disp hp qsec 0.75420 0.72786 16.25779
## 15 15 4 disp hp wt qsec 0.83514 0.81072 5.00000
The plot
method shows the panel of fit criteria for all possible regression methods.
model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
k <- ols_all_subset(model)
plot(k)
Select the subset of predictors that do the best at meeting some well-defined objective criterion, such as having the largest R2 value or the smallest MSE, Mallow’s Cp or AIC.
model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
ols_best_subset(model)
## Best Subsets Regression
## ------------------------------
## Model Index Predictors
## ------------------------------
## 1 wt
## 2 hp wt
## 3 hp wt qsec
## 4 disp hp wt qsec
## ------------------------------
##
## Subsets Regression Summary
## -------------------------------------------------------------------------------------------------------------------------------
## Adj. Pred
## Model R-Square R-Square R-Square C(p) AIC SBIC SBC MSEP FPE HSP APC
## -------------------------------------------------------------------------------------------------------------------------------
## 1 0.7528 0.7446 0.7087 12.4809 166.0294 74.2916 170.4266 9.8972 9.8572 0.3199 0.2801
## 2 0.8268 0.8148 0.7811 2.3690 156.6523 66.5755 162.5153 7.4314 7.3563 0.2402 0.2091
## 3 0.8348 0.8171 0.782 3.0617 157.1426 67.7238 164.4713 7.6140 7.4756 0.2461 0.2124
## 4 0.8351 0.8107 0.771 5.0000 159.0696 70.0408 167.8640 8.1810 7.9497 0.2644 0.2259
## -------------------------------------------------------------------------------------------------------------------------------
## AIC: Akaike Information Criteria
## SBIC: Sawa's Bayesian Information Criteria
## SBC: Schwarz Bayesian Criteria
## MSEP: Estimated error of prediction, assuming multivariate normality
## FPE: Final Prediction Error
## HSP: Hocking's Sp
## APC: Amemiya Prediction Criteria
The plot
method shows the panel of fit criteria for best subset regression methods.
model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
k <- ols_best_subset(model)
plot(k)
Build regression model from a set of candidate predictor variables by entering predictors based on p values, in a stepwise manner until there is no variable left to enter any more. The model should include all the candidate predictor variables. If details is set to TRUE
, each step is displayed.
# stepwise forward regression
model <- lm(y ~ ., data = surgical)
ols_step_forward(model)
## We are selecting variables based on p value...
## 1 variable(s) added....
## 1 variable(s) added...
## 1 variable(s) added...
## 1 variable(s) added...
## 1 variable(s) added...
## No more variables satisfy the condition of penter: 0.3
## Forward Selection Method
##
## Candidate Terms:
##
## 1 . bcs
## 2 . pindex
## 3 . enzyme_test
## 4 . liver_test
## 5 . age
## 6 . gender
## 7 . alc_mod
## 8 . alc_heavy
##
## ------------------------------------------------------------------------------
## Selection Summary
## ------------------------------------------------------------------------------
## Variable Adj.
## Step Entered R-Square R-Square C(p) AIC RMSE
## ------------------------------------------------------------------------------
## 1 liver_test 0.4545 0.4440 62.5119 771.8753 296.2992
## 2 alc_heavy 0.5667 0.5498 41.3681 761.4394 266.6484
## 3 enzyme_test 0.6590 0.6385 24.3379 750.5089 238.9145
## 4 pindex 0.7501 0.7297 7.5373 735.7146 206.5835
## 5 bcs 0.7809 0.7581 3.1925 730.6204 195.4544
## ------------------------------------------------------------------------------
model <- lm(y ~ ., data = surgical)
k <- ols_step_forward(model)
## We are selecting variables based on p value...
## 1 variable(s) added....
## 1 variable(s) added...
## 1 variable(s) added...
## 1 variable(s) added...
## 1 variable(s) added...
## No more variables satisfy the condition of penter: 0.3
plot(k)
# stepwise forward regression
model <- lm(y ~ ., data = surgical)
ols_step_forward(model, details = TRUE)
## We are selecting variables based on p value...
## 1 variable(s) added....
## Variable Selection Procedure
## Dependent Variable: y
##
## Forward Selection: Step 1
##
## Variable liver_test Entered
##
## Model Summary
## -----------------------------------------------------------------
## R 0.674 RMSE 296.299
## R-Squared 0.455 Coef. Var 42.202
## Adj. R-Squared 0.444 MSE 87793.232
## Pred R-Squared 0.386 MAE 212.857
## -----------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -----------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -----------------------------------------------------------------------
## Regression 3804272.477 1 3804272.477 43.332 0.0000
## Residual 4565248.060 52 87793.232
## Total 8369520.537 53
## -----------------------------------------------------------------------
##
## Parameter Estimates
## -------------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## -------------------------------------------------------------------------------------------
## (Intercept) 15.191 111.869 0.136 0.893 -209.290 239.671
## liver_test 250.305 38.025 0.674 6.583 0.000 174.003 326.607
## -------------------------------------------------------------------------------------------
## 1 variable(s) added...
## Forward Selection: Step 2
##
## Variable alc_heavy Entered
##
## Model Summary
## -----------------------------------------------------------------
## R 0.753 RMSE 266.648
## R-Squared 0.567 Coef. Var 37.979
## Adj. R-Squared 0.550 MSE 71101.387
## Pred R-Squared 0.487 MAE 187.393
## -----------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -----------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -----------------------------------------------------------------------
## Regression 4743349.776 2 2371674.888 33.356 0.0000
## Residual 3626170.761 51 71101.387
## Total 8369520.537 53
## -----------------------------------------------------------------------
##
## Parameter Estimates
## --------------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## --------------------------------------------------------------------------------------------
## (Intercept) -5.069 100.828 -0.050 0.960 -207.490 197.352
## liver_test 234.597 34.491 0.632 6.802 0.000 165.353 303.841
## alc_heavy 342.183 94.156 0.338 3.634 0.001 153.157 531.208
## --------------------------------------------------------------------------------------------
## 1 variable(s) added...
## Forward Selection: Step 3
##
## Variable enzyme_test Entered
##
## Model Summary
## -----------------------------------------------------------------
## R 0.812 RMSE 238.914
## R-Squared 0.659 Coef. Var 34.029
## Adj. R-Squared 0.639 MSE 57080.128
## Pred R-Squared 0.567 MAE 170.603
## -----------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -----------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -----------------------------------------------------------------------
## Regression 5515514.136 3 1838504.712 32.209 0.0000
## Residual 2854006.401 50 57080.128
## Total 8369520.537 53
## -----------------------------------------------------------------------
##
## Parameter Estimates
## ---------------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ---------------------------------------------------------------------------------------------
## (Intercept) -344.559 129.156 -2.668 0.010 -603.976 -85.141
## liver_test 183.844 33.845 0.495 5.432 0.000 115.865 251.823
## alc_heavy 319.662 84.585 0.315 3.779 0.000 149.769 489.555
## enzyme_test 6.263 1.703 0.335 3.678 0.001 2.843 9.683
## ---------------------------------------------------------------------------------------------
## 1 variable(s) added...
## Forward Selection: Step 4
##
## Variable pindex Entered
##
## Model Summary
## -----------------------------------------------------------------
## R 0.866 RMSE 206.584
## R-Squared 0.750 Coef. Var 29.424
## Adj. R-Squared 0.730 MSE 42676.744
## Pred R-Squared 0.669 MAE 146.473
## -----------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -----------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -----------------------------------------------------------------------
## Regression 6278360.060 4 1569590.015 36.779 0.0000
## Residual 2091160.477 49 42676.744
## Total 8369520.537 53
## -----------------------------------------------------------------------
##
## Parameter Estimates
## -----------------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## -----------------------------------------------------------------------------------------------
## (Intercept) -789.012 153.372 -5.144 0.000 -1097.226 -480.799
## liver_test 125.474 32.358 0.338 3.878 0.000 60.448 190.499
## alc_heavy 359.875 73.754 0.355 4.879 0.000 211.660 508.089
## enzyme_test 7.548 1.503 0.404 5.020 0.000 4.527 10.569
## pindex 7.876 1.863 0.335 4.228 0.000 4.133 11.620
## -----------------------------------------------------------------------------------------------
## 1 variable(s) added...
## Forward Selection: Step 5
##
## Variable bcs Entered
##
## Model Summary
## -----------------------------------------------------------------
## R 0.884 RMSE 195.454
## R-Squared 0.781 Coef. Var 27.839
## Adj. R-Squared 0.758 MSE 38202.426
## Pred R-Squared 0.700 MAE 137.656
## -----------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -----------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -----------------------------------------------------------------------
## Regression 6535804.090 5 1307160.818 34.217 0.0000
## Residual 1833716.447 48 38202.426
## Total 8369520.537 53
## -----------------------------------------------------------------------
##
## Parameter Estimates
## ------------------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ------------------------------------------------------------------------------------------------
## (Intercept) -1178.330 208.682 -5.647 0.000 -1597.914 -758.746
## liver_test 58.064 40.144 0.156 1.446 0.155 -22.652 138.779
## alc_heavy 317.848 71.634 0.314 4.437 0.000 173.818 461.878
## enzyme_test 9.748 1.656 0.521 5.887 0.000 6.419 13.077
## pindex 8.924 1.808 0.380 4.935 0.000 5.288 12.559
## bcs 59.864 23.060 0.241 2.596 0.012 13.498 106.230
## ------------------------------------------------------------------------------------------------
## No more variables satisfy the condition of penter: 0.3
## Forward Selection Method
##
## Candidate Terms:
##
## 1 . bcs
## 2 . pindex
## 3 . enzyme_test
## 4 . liver_test
## 5 . age
## 6 . gender
## 7 . alc_mod
## 8 . alc_heavy
##
## ------------------------------------------------------------------------------
## Selection Summary
## ------------------------------------------------------------------------------
## Variable Adj.
## Step Entered R-Square R-Square C(p) AIC RMSE
## ------------------------------------------------------------------------------
## 1 liver_test 0.4545 0.4440 62.5119 771.8753 296.2992
## 2 alc_heavy 0.5667 0.5498 41.3681 761.4394 266.6484
## 3 enzyme_test 0.6590 0.6385 24.3379 750.5089 238.9145
## 4 pindex 0.7501 0.7297 7.5373 735.7146 206.5835
## 5 bcs 0.7809 0.7581 3.1925 730.6204 195.4544
## ------------------------------------------------------------------------------
Build regression model from a set of candidate predictor variables by removing predictors based on p values, in a stepwise manner until there is no variable left to remove any more. The model should include all the candidate predictor variables. If details is set to TRUE
, each step is displayed.
# stepwise backward regression
model <- lm(y ~ ., data = surgical)
ols_step_backward(model)
## We are eliminating variables based on p value...
## No more variables satisfy the condition of prem: 0.3
## Backward Elimination Method
##
## Candidate Terms:
##
## 1 . bcs
## 2 . pindex
## 3 . enzyme_test
## 4 . liver_test
## 5 . age
## 6 . gender
## 7 . alc_mod
## 8 . alc_heavy
##
## --------------------------------------------------------------------------
## Elimination Summary
## --------------------------------------------------------------------------
## Variable Adj.
## Step Removed R-Square R-Square C(p) AIC RMSE
## --------------------------------------------------------------------------
## 1 alc_mod 0.7818 0.7486 7.0141 734.4068 199.2637
## 2 gender 0.7814 0.7535 5.0870 732.4942 197.2921
## 3 age 0.7809 0.7581 3.1925 730.6204 195.4544
## --------------------------------------------------------------------------
model <- lm(y ~ ., data = surgical)
k <- ols_step_backward(model)
## We are eliminating variables based on p value...
## No more variables satisfy the condition of prem: 0.3
plot(k)
# stepwise backward regression
model <- lm(y ~ ., data = surgical)
ols_step_backward(model, details = TRUE)
## We are eliminating variables based on p value...
## Backward Elimination: Step 1
##
## Variable alc_mod Removed
##
## Model Summary
## -----------------------------------------------------------------
## R 0.884 RMSE 199.264
## R-Squared 0.782 Coef. Var 28.381
## Adj. R-Squared 0.749 MSE 39706.040
## Pred R-Squared 0.678 MAE 137.053
## -----------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -----------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -----------------------------------------------------------------------
## Regression 6543042.709 7 934720.387 23.541 0.0000
## Residual 1826477.828 46 39706.040
## Total 8369520.537 53
## -----------------------------------------------------------------------
##
## Parameter Estimates
## ------------------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ------------------------------------------------------------------------------------------------
## (Intercept) -1145.971 238.536 -4.804 0.000 -1626.119 -665.822
## bcs 62.274 24.187 0.251 2.575 0.013 13.589 110.959
## pindex 8.987 1.850 0.382 4.857 0.000 5.262 12.711
## enzyme_test 9.875 1.720 0.528 5.743 0.000 6.414 13.337
## liver_test 50.763 44.379 0.137 1.144 0.259 -38.567 140.093
## age -0.911 2.599 -0.025 -0.351 0.728 -6.142 4.320
## gender 15.786 57.840 0.020 0.273 0.786 -100.639 132.212
## alc_heavy 315.854 73.849 0.312 4.277 0.000 167.202 464.505
## ------------------------------------------------------------------------------------------------
##
##
## Backward Elimination: Step 2
##
## Variable gender Removed
##
## Model Summary
## -----------------------------------------------------------------
## R 0.884 RMSE 197.292
## R-Squared 0.781 Coef. Var 28.101
## Adj. R-Squared 0.754 MSE 38924.162
## Pred R-Squared 0.692 MAE 138.160
## -----------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -----------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -----------------------------------------------------------------------
## Regression 6540084.920 6 1090014.153 28.004 0.0000
## Residual 1829435.617 47 38924.162
## Total 8369520.537 53
## -----------------------------------------------------------------------
##
## Parameter Estimates
## ------------------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ------------------------------------------------------------------------------------------------
## (Intercept) -1143.080 235.943 -4.845 0.000 -1617.737 -668.424
## bcs 61.424 23.748 0.248 2.586 0.013 13.649 109.199
## pindex 8.974 1.832 0.382 4.900 0.000 5.290 12.659
## enzyme_test 9.852 1.700 0.527 5.794 0.000 6.431 13.273
## liver_test 54.053 42.288 0.146 1.278 0.207 -31.019 139.125
## age -0.850 2.563 -0.024 -0.332 0.742 -6.007 4.307
## alc_heavy 314.585 72.974 0.310 4.311 0.000 167.781 461.390
## ------------------------------------------------------------------------------------------------
##
##
## Backward Elimination: Step 3
##
## Variable age Removed
##
## Model Summary
## -----------------------------------------------------------------
## R 0.884 RMSE 195.454
## R-Squared 0.781 Coef. Var 27.839
## Adj. R-Squared 0.758 MSE 38202.426
## Pred R-Squared 0.700 MAE 137.656
## -----------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -----------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -----------------------------------------------------------------------
## Regression 6535804.090 5 1307160.818 34.217 0.0000
## Residual 1833716.447 48 38202.426
## Total 8369520.537 53
## -----------------------------------------------------------------------
##
## Parameter Estimates
## ------------------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ------------------------------------------------------------------------------------------------
## (Intercept) -1178.330 208.682 -5.647 0.000 -1597.914 -758.746
## bcs 59.864 23.060 0.241 2.596 0.012 13.498 106.230
## pindex 8.924 1.808 0.380 4.935 0.000 5.288 12.559
## enzyme_test 9.748 1.656 0.521 5.887 0.000 6.419 13.077
## liver_test 58.064 40.144 0.156 1.446 0.155 -22.652 138.779
## alc_heavy 317.848 71.634 0.314 4.437 0.000 173.818 461.878
## ------------------------------------------------------------------------------------------------
## No more variables satisfy the condition of prem: 0.3
## Backward Elimination Method
##
## Candidate Terms:
##
## 1 . bcs
## 2 . pindex
## 3 . enzyme_test
## 4 . liver_test
## 5 . age
## 6 . gender
## 7 . alc_mod
## 8 . alc_heavy
##
## --------------------------------------------------------------------------
## Elimination Summary
## --------------------------------------------------------------------------
## Variable Adj.
## Step Removed R-Square R-Square C(p) AIC RMSE
## --------------------------------------------------------------------------
## 1 alc_mod 0.7818 0.7486 7.0141 734.4068 199.2637
## 2 gender 0.7814 0.7535 5.0870 732.4942 197.2921
## 3 age 0.7809 0.7581 3.1925 730.6204 195.4544
## --------------------------------------------------------------------------
Build regression model from a set of candidate predictor variables by entering and removing predictors based on p values, in a stepwise manner until there is no variable left to enter or remove any more. The model should include all the candidate predictor variables. If details is set to TRUE
, each step is displayed.
# stepwise regression
model <- lm(y ~ ., data = surgical)
ols_stepwise(model)
## We are selecting variables based on p value...
## 1 variable(s) added....
## 1 variable(s) added...
## 1 variable(s) added...
## 1 variable(s) added...
## 1 variable(s) added...
## No more variables to be added or removed.
## Stepwise Selection Method
##
## Candidate Terms:
##
## 1 . bcs
## 2 . pindex
## 3 . enzyme_test
## 4 . liver_test
## 5 . age
## 6 . gender
## 7 . alc_mod
## 8 . alc_heavy
##
## ------------------------------------------------------------------------------------------
## Stepwise Selection Summary
## ------------------------------------------------------------------------------------------
## Added/ Adj.
## Step Variable Removed R-Square R-Square C(p) AIC RMSE
## ------------------------------------------------------------------------------------------
## 1 liver_test addition 0.455 0.444 62.5120 771.8753 296.2992
## 2 alc_heavy addition 0.567 0.550 41.3680 761.4394 266.6484
## 3 enzyme_test addition 0.659 0.639 24.3380 750.5089 238.9145
## 4 pindex addition 0.750 0.730 7.5370 735.7146 206.5835
## 5 bcs addition 0.781 0.758 3.1920 730.6204 195.4544
## ------------------------------------------------------------------------------------------
model <- lm(y ~ ., data = surgical)
k <- ols_stepwise(model)
## We are selecting variables based on p value...
## 1 variable(s) added....
## 1 variable(s) added...
## 1 variable(s) added...
## 1 variable(s) added...
## 1 variable(s) added...
## No more variables to be added or removed.
plot(k)
# stepwise regression
model <- lm(y ~ ., data = surgical)
ols_stepwise(model, details = TRUE)
## We are selecting variables based on p value...
## 1 variable(s) added....
## Variable Selection Procedure
## Dependent Variable: y
##
## Stepwise Selection: Step 1
##
## Variable liver_test Entered
##
## Model Summary
## -----------------------------------------------------------------
## R 0.674 RMSE 296.299
## R-Squared 0.455 Coef. Var 42.202
## Adj. R-Squared 0.444 MSE 87793.232
## Pred R-Squared 0.386 MAE 212.857
## -----------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -----------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -----------------------------------------------------------------------
## Regression 3804272.477 1 3804272.477 43.332 0.0000
## Residual 4565248.060 52 87793.232
## Total 8369520.537 53
## -----------------------------------------------------------------------
##
## Parameter Estimates
## -------------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## -------------------------------------------------------------------------------------------
## (Intercept) 15.191 111.869 0.136 0.893 -209.290 239.671
## liver_test 250.305 38.025 0.674 6.583 0.000 174.003 326.607
## -------------------------------------------------------------------------------------------
## 1 variable(s) added...
## Stepwise Selection: Step 2
##
## Variable alc_heavy Entered
##
## Model Summary
## -----------------------------------------------------------------
## R 0.753 RMSE 266.648
## R-Squared 0.567 Coef. Var 37.979
## Adj. R-Squared 0.550 MSE 71101.387
## Pred R-Squared 0.487 MAE 187.393
## -----------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -----------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -----------------------------------------------------------------------
## Regression 4743349.776 2 2371674.888 33.356 0.0000
## Residual 3626170.761 51 71101.387
## Total 8369520.537 53
## -----------------------------------------------------------------------
##
## Parameter Estimates
## --------------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## --------------------------------------------------------------------------------------------
## (Intercept) -5.069 100.828 -0.050 0.960 -207.490 197.352
## liver_test 234.597 34.491 0.632 6.802 0.000 165.353 303.841
## alc_heavy 342.183 94.156 0.338 3.634 0.001 153.157 531.208
## --------------------------------------------------------------------------------------------
## 1 variable(s) added...
## Stepwise Selection: Step 3
##
## Variable enzyme_test Entered
##
## Model Summary
## -----------------------------------------------------------------
## R 0.812 RMSE 238.914
## R-Squared 0.659 Coef. Var 34.029
## Adj. R-Squared 0.639 MSE 57080.128
## Pred R-Squared 0.567 MAE 170.603
## -----------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -----------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -----------------------------------------------------------------------
## Regression 5515514.136 3 1838504.712 32.209 0.0000
## Residual 2854006.401 50 57080.128
## Total 8369520.537 53
## -----------------------------------------------------------------------
##
## Parameter Estimates
## ---------------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ---------------------------------------------------------------------------------------------
## (Intercept) -344.559 129.156 -2.668 0.010 -603.976 -85.141
## liver_test 183.844 33.845 0.495 5.432 0.000 115.865 251.823
## alc_heavy 319.662 84.585 0.315 3.779 0.000 149.769 489.555
## enzyme_test 6.263 1.703 0.335 3.678 0.001 2.843 9.683
## ---------------------------------------------------------------------------------------------
## 1 variable(s) added...
## Stepwise Selection: Step 4
##
## Variable pindex Entered
##
## Model Summary
## -----------------------------------------------------------------
## R 0.866 RMSE 206.584
## R-Squared 0.750 Coef. Var 29.424
## Adj. R-Squared 0.730 MSE 42676.744
## Pred R-Squared 0.669 MAE 146.473
## -----------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -----------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -----------------------------------------------------------------------
## Regression 6278360.060 4 1569590.015 36.779 0.0000
## Residual 2091160.477 49 42676.744
## Total 8369520.537 53
## -----------------------------------------------------------------------
##
## Parameter Estimates
## -----------------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## -----------------------------------------------------------------------------------------------
## (Intercept) -789.012 153.372 -5.144 0.000 -1097.226 -480.799
## liver_test 125.474 32.358 0.338 3.878 0.000 60.448 190.499
## alc_heavy 359.875 73.754 0.355 4.879 0.000 211.660 508.089
## enzyme_test 7.548 1.503 0.404 5.020 0.000 4.527 10.569
## pindex 7.876 1.863 0.335 4.228 0.000 4.133 11.620
## -----------------------------------------------------------------------------------------------
## 1 variable(s) added...
## Stepwise Selection: Step 5
##
## Variable bcs Entered
##
## Model Summary
## -----------------------------------------------------------------
## R 0.884 RMSE 195.454
## R-Squared 0.781 Coef. Var 27.839
## Adj. R-Squared 0.758 MSE 38202.426
## Pred R-Squared 0.700 MAE 137.656
## -----------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -----------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -----------------------------------------------------------------------
## Regression 6535804.090 5 1307160.818 34.217 0.0000
## Residual 1833716.447 48 38202.426
## Total 8369520.537 53
## -----------------------------------------------------------------------
##
## Parameter Estimates
## ------------------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ------------------------------------------------------------------------------------------------
## (Intercept) -1178.330 208.682 -5.647 0.000 -1597.914 -758.746
## liver_test 58.064 40.144 0.156 1.446 0.155 -22.652 138.779
## alc_heavy 317.848 71.634 0.314 4.437 0.000 173.818 461.878
## enzyme_test 9.748 1.656 0.521 5.887 0.000 6.419 13.077
## pindex 8.924 1.808 0.380 4.935 0.000 5.288 12.559
## bcs 59.864 23.060 0.241 2.596 0.012 13.498 106.230
## ------------------------------------------------------------------------------------------------
## No more variables to be added or removed.
## Stepwise Selection Method
##
## Candidate Terms:
##
## 1 . bcs
## 2 . pindex
## 3 . enzyme_test
## 4 . liver_test
## 5 . age
## 6 . gender
## 7 . alc_mod
## 8 . alc_heavy
##
## ------------------------------------------------------------------------------------------
## Stepwise Selection Summary
## ------------------------------------------------------------------------------------------
## Added/ Adj.
## Step Variable Removed R-Square R-Square C(p) AIC RMSE
## ------------------------------------------------------------------------------------------
## 1 liver_test addition 0.455 0.444 62.5120 771.8753 296.2992
## 2 alc_heavy addition 0.567 0.550 41.3680 761.4394 266.6484
## 3 enzyme_test addition 0.659 0.639 24.3380 750.5089 238.9145
## 4 pindex addition 0.750 0.730 7.5370 735.7146 206.5835
## 5 bcs addition 0.781 0.758 3.1920 730.6204 195.4544
## ------------------------------------------------------------------------------------------
Build regression model from a set of candidate predictor variables by entering predictors based on Akaike Information Criteria, in a stepwise manner until there is no variable left to enter any more. The model should include all the candidate predictor variables. If details is set to TRUE
, each step is displayed.
# stepwise aic forward regression
model <- lm(y ~ ., data = surgical)
ols_stepaic_forward(model)
## --------------------------------------------------------------------------
## Variable AIC Sum Sq RSS R-Sq Adj. R-Sq
## --------------------------------------------------------------------------
## liver_test 771.875 3804272.477 4565248.060 0.455 0.444
## alc_heavy 761.439 4743349.776 3626170.761 0.567 0.550
## enzyme_test 750.509 5515514.136 2854006.401 0.659 0.639
## pindex 735.715 6278360.060 2091160.477 0.750 0.730
## bcs 730.620 6535804.090 1833716.447 0.781 0.758
## --------------------------------------------------------------------------
model <- lm(y ~ ., data = surgical)
k <- ols_stepaic_forward(model)
plot(k)
# stepwise aic forward regression
model <- lm(y ~ ., data = surgical)
ols_stepaic_forward(model, details = TRUE)
## Step 0: AIC = 802.606
## y ~ 1
##
## --------------------------------------------------------------------------------
## Variable DF AIC Sum Sq RSS R-Sq Adj. R-Sq
## --------------------------------------------------------------------------------
## liver_test 1 771.875 3804272.477 4565248.060 0.455 0.444
## enzyme_test 1 782.629 2798309.881 5571210.656 0.334 0.322
## pindex 1 794.100 1479766.754 6889753.784 0.177 0.161
## alc_heavy 1 794.301 1454057.255 6915463.282 0.174 0.158
## bcs 1 797.697 1005151.658 7364368.879 0.120 0.103
## alc_mod 1 802.828 271062.330 8098458.207 0.032 0.014
## gender 1 802.956 251808.570 8117711.967 0.030 0.011
## age 1 803.834 118862.559 8250657.978 0.014 -0.005
## --------------------------------------------------------------------------------
##
##
## Step 1 : AIC = 771.8753
## y ~ liver_test
##
## -------------------------------------------------------------------------------
## Variable DF AIC Sum Sq RSS R-Sq Adj. R-Sq
## -------------------------------------------------------------------------------
## alc_heavy 1 761.439 939077.300 3626170.761 0.567 0.550
## enzyme_test 1 762.077 896004.331 3669243.729 0.562 0.544
## pindex 1 770.387 285591.786 4279656.274 0.489 0.469
## alc_mod 1 771.141 225396.238 4339851.822 0.481 0.461
## gender 1 773.802 6162.222 4559085.838 0.455 0.434
## age 1 773.831 3726.297 4561521.763 0.455 0.434
## bcs 1 773.867 685.256 4564562.805 0.455 0.433
## -------------------------------------------------------------------------------
##
##
## Step 2 : AIC = 761.4394
## y ~ liver_test + alc_heavy
##
## -------------------------------------------------------------------------------
## Variable DF AIC Sum Sq RSS R-Sq Adj. R-Sq
## -------------------------------------------------------------------------------
## enzyme_test 1 750.509 772164.360 2854006.401 0.659 0.639
## pindex 1 756.125 459358.635 3166812.126 0.622 0.599
## bcs 1 763.063 25195.587 3600975.173 0.570 0.544
## age 1 763.110 22048.109 3604122.652 0.569 0.544
## alc_mod 1 763.428 784.551 3625386.210 0.567 0.541
## gender 1 763.433 443.343 3625727.417 0.567 0.541
## -------------------------------------------------------------------------------
##
##
## Step 3 : AIC = 750.5089
## y ~ liver_test + alc_heavy + enzyme_test
##
## -----------------------------------------------------------------------------
## Variable DF AIC Sum Sq RSS R-Sq Adj. R-Sq
## -----------------------------------------------------------------------------
## pindex 1 735.715 762845.924 2091160.477 0.750 0.730
## bcs 1 750.782 89836.308 2764170.093 0.670 0.643
## alc_mod 1 752.403 5607.570 2848398.831 0.660 0.632
## age 1 752.416 4896.081 2849110.320 0.660 0.632
## gender 1 752.509 5.958 2854000.443 0.659 0.631
## -----------------------------------------------------------------------------
##
##
## Step 4 : AIC = 735.7146
## y ~ liver_test + alc_heavy + enzyme_test + pindex
##
## -----------------------------------------------------------------------------
## Variable DF AIC Sum Sq RSS R-Sq Adj. R-Sq
## -----------------------------------------------------------------------------
## bcs 1 730.620 257444.030 1833716.447 0.781 0.758
## age 1 737.680 1325.880 2089834.596 0.750 0.724
## gender 1 737.712 90.186 2091070.290 0.750 0.724
## alc_mod 1 737.713 60.620 2091099.857 0.750 0.724
## -----------------------------------------------------------------------------
##
##
## Step 5 : AIC = 730.6204
## y ~ liver_test + alc_heavy + enzyme_test + pindex + bcs
##
## ---------------------------------------------------------------------------
## Variable DF AIC Sum Sq RSS R-Sq Adj. R-Sq
## ---------------------------------------------------------------------------
## age 1 732.494 4280.830 1829435.617 0.781 0.754
## gender 1 732.551 2360.288 1831356.159 0.781 0.753
## alc_mod 1 732.614 216.992 1833499.455 0.781 0.753
## ---------------------------------------------------------------------------
## No more variables to be added.
## Model Summary
## -----------------------------------------------------------------
## R 0.884 RMSE 195.454
## R-Squared 0.781 Coef. Var 27.839
## Adj. R-Squared 0.758 MSE 38202.426
## Pred R-Squared 0.700 MAE 137.656
## -----------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -----------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -----------------------------------------------------------------------
## Regression 6535804.090 5 1307160.818 34.217 0.0000
## Residual 1833716.447 48 38202.426
## Total 8369520.537 53
## -----------------------------------------------------------------------
##
## Parameter Estimates
## ------------------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ------------------------------------------------------------------------------------------------
## (Intercept) -1178.330 208.682 -5.647 0.000 -1597.914 -758.746
## liver_test 58.064 40.144 0.156 1.446 0.155 -22.652 138.779
## alc_heavy 317.848 71.634 0.314 4.437 0.000 173.818 461.878
## enzyme_test 9.748 1.656 0.521 5.887 0.000 6.419 13.077
## pindex 8.924 1.808 0.380 4.935 0.000 5.288 12.559
## bcs 59.864 23.060 0.241 2.596 0.012 13.498 106.230
## ------------------------------------------------------------------------------------------------
## --------------------------------------------------------------------------
## Variable AIC Sum Sq RSS R-Sq Adj. R-Sq
## --------------------------------------------------------------------------
## liver_test 771.875 3804272.477 4565248.060 0.455 0.444
## alc_heavy 761.439 4743349.776 3626170.761 0.567 0.550
## enzyme_test 750.509 5515514.136 2854006.401 0.659 0.639
## pindex 735.715 6278360.060 2091160.477 0.750 0.730
## bcs 730.620 6535804.090 1833716.447 0.781 0.758
## --------------------------------------------------------------------------
Build regression model from a set of candidate predictor variables by removing predictors based on Akaike Information Criteria, in a stepwise manner until there is no variable left to remove any more. The model should include all the candidate predictor variables. If details is set to TRUE
, each step is displayed.
# stepwise aic backward regression
model <- lm(y ~ ., data = surgical)
k <- ols_stepaic_backward(model)
k
##
##
## Backward Elimination Summary
## -------------------------------------------------------------------------
## Variable AIC RSS Sum Sq R-Sq Adj. R-Sq
## -------------------------------------------------------------------------
## Full Model 736.390 1825905.713 6543614.824 0.782 0.743
## alc_mod 734.407 1826477.828 6543042.709 0.782 0.749
## gender 732.494 1829435.617 6540084.920 0.781 0.754
## age 730.620 1833716.447 6535804.090 0.781 0.758
## -------------------------------------------------------------------------
### Plot
model <- lm(y ~ ., data = surgical)
k <- ols_stepaic_backward(model)
plot(k)
# stepwise aic backward regression
model <- lm(y ~ ., data = surgical)
ols_stepaic_backward(model, details = TRUE)
## Step 0: AIC = 736.3899
## y ~ bcs + pindex + enzyme_test + liver_test + age + gender + alc_mod + alc_heavy
##
## --------------------------------------------------------------------------------
## Variable DF AIC Sum Sq RSS R-Sq Adj. R-Sq
## --------------------------------------------------------------------------------
## alc_mod 1 734.407 572.115 1826477.828 0.782 0.749
## gender 1 734.478 2990.338 1828896.051 0.781 0.748
## age 1 734.544 5231.108 1831136.821 0.781 0.748
## liver_test 1 735.878 51016.156 1876921.869 0.776 0.742
## bcs 1 741.677 263780.393 2089686.106 0.750 0.712
## alc_heavy 1 749.210 576636.222 2402541.935 0.713 0.669
## pindex 1 756.624 930187.311 2756093.024 0.671 0.621
## enzyme_test 1 763.557 1307756.930 3133662.644 0.626 0.569
## --------------------------------------------------------------------------------
##
## Step 1 : AIC = 734.4068
## y ~ bcs + pindex + enzyme_test + liver_test + age + gender + alc_heavy
##
## --------------------------------------------------------------------------------
## Variable DF AIC Sum Sq RSS R-Sq Adj. R-Sq
## --------------------------------------------------------------------------------
## gender 1 732.494 2957.789 1829435.617 0.781 0.754
## age 1 732.551 4878.331 1831356.159 0.781 0.753
## liver_test 1 733.921 51951.343 1878429.171 0.776 0.747
## bcs 1 739.677 263219.094 2089696.922 0.750 0.718
## alc_heavy 1 750.486 726328.685 2552806.513 0.695 0.656
## pindex 1 754.759 936543.762 2763021.590 0.670 0.628
## enzyme_test 1 761.595 1309433.007 3135910.834 0.625 0.577
## --------------------------------------------------------------------------------
##
## Step 2 : AIC = 732.4942
## y ~ bcs + pindex + enzyme_test + liver_test + age + alc_heavy
##
## --------------------------------------------------------------------------------
## Variable DF AIC Sum Sq RSS R-Sq Adj. R-Sq
## --------------------------------------------------------------------------------
## age 1 730.620 4280.830 1833716.447 0.781 0.758
## liver_test 1 732.339 63596.190 1893031.807 0.774 0.750
## bcs 1 737.680 260398.979 2089834.596 0.750 0.724
## alc_heavy 1 748.486 723371.473 2552807.090 0.695 0.663
## pindex 1 752.777 934511.071 2763946.688 0.670 0.635
## enzyme_test 1 759.596 1306482.666 3135918.283 0.625 0.586
## --------------------------------------------------------------------------------
##
## Step 3 : AIC = 730.6204
## y ~ bcs + pindex + enzyme_test + liver_test + alc_heavy
##
## --------------------------------------------------------------------------------
## Variable DF AIC Sum Sq RSS R-Sq Adj. R-Sq
## --------------------------------------------------------------------------------
## liver_test 1 730.924 79919.825 1913636.272 0.771 0.753
## bcs 1 735.715 257444.030 2091160.477 0.750 0.730
## alc_heavy 1 747.181 752122.827 2585839.274 0.691 0.666
## pindex 1 750.782 930453.646 2764170.093 0.670 0.643
## enzyme_test 1 757.971 1324076.125 3157792.572 0.623 0.592
## --------------------------------------------------------------------------------
## No more variables to be removed.
##
##
## Backward Elimination Summary
## -------------------------------------------------------------------------
## Variable AIC RSS Sum Sq R-Sq Adj. R-Sq
## -------------------------------------------------------------------------
## Full Model 736.390 1825905.713 6543614.824 0.782 0.743
## alc_mod 734.407 1826477.828 6543042.709 0.782 0.749
## gender 732.494 1829435.617 6540084.920 0.781 0.754
## age 730.620 1833716.447 6535804.090 0.781 0.758
## -------------------------------------------------------------------------
Build regression model from a set of candidate predictor variables by entering and removing predictors based on Akaike Information Criteria, in a stepwise manner until there is no variable left to enter or remove any more. The model should include all the candidate predictor variables. If details is set to TRUE
, each step is displayed.
# stepwise aic regression
model <- lm(y ~ ., data = surgical)
ols_stepaic_both(model)
## No more variables to be added or removed.
##
##
## Stepwise Summary
## --------------------------------------------------------------------------------------
## Variable Method AIC RSS Sum Sq R-Sq Adj. R-Sq
## --------------------------------------------------------------------------------------
## liver_test addition 771.875 4565248.060 3804272.477 0.455 0.444
## alc_heavy addition 761.439 3626170.761 4743349.776 0.567 0.550
## enzyme_test addition 750.509 2854006.401 5515514.136 0.659 0.639
## pindex addition 735.715 2091160.477 6278360.060 0.750 0.730
## bcs addition 730.620 1833716.447 6535804.090 0.781 0.758
## --------------------------------------------------------------------------------------
model <- lm(y ~ ., data = surgical)
k <- ols_stepaic_both(model)
## No more variables to be added or removed.
plot(k)
# stepwise aic regression
model <- lm(y ~ ., data = surgical)
ols_stepaic_both(model, details = TRUE)
## Step 0: AIC = 802.606
## y ~ 1
##
##
##
## Step 1 : AIC = 771.8753
## y ~ liver_test
##
##
##
## Step 2 : AIC = 761.4394
## y ~ liver_test + alc_heavy
##
##
##
## Step 3 : AIC = 750.5089
## y ~ liver_test + alc_heavy + enzyme_test
##
##
##
## Step 4 : AIC = 735.7146
## y ~ liver_test + alc_heavy + enzyme_test + pindex
##
##
##
## Step 5 : AIC = 730.6204
## y ~ liver_test + alc_heavy + enzyme_test + pindex + bcs
## No more variables to be added or removed.
##
##
## Stepwise Summary
## --------------------------------------------------------------------------------------
## Variable Method AIC RSS Sum Sq R-Sq Adj. R-Sq
## --------------------------------------------------------------------------------------
## liver_test addition 771.875 4565248.060 3804272.477 0.455 0.444
## alc_heavy addition 761.439 3626170.761 4743349.776 0.567 0.550
## enzyme_test addition 750.509 2854006.401 5515514.136 0.659 0.639
## pindex addition 735.715 2091160.477 6278360.060 0.750 0.730
## bcs addition 730.620 1833716.447 6535804.090 0.781 0.758
## --------------------------------------------------------------------------------------