In this post I give an example of making a prediction with R stats for a type of business problem that requires a classification prediction.
The business question being answered is to evaluate the safety standard of cars based on certain parameters and classify them to predict Car Acceptability.
This data is obtained from UCI Machine learning repository.
There are many types of models that can be used for this such as Logistic Regression, Discriminant Analysis, Random Forests, and many others. You can spend a lifetime studying these.
We'll use an R package called Nnet to fit a single-hidden-layer neural network.
But before we do the modelling we need to load the data and process it to test/training sets.
#Process data car_eval <- read.csv("C:\\Users\\lee_h\\Documents\\R Stats\\Classification\\Data\\car.data.txt", header=FALSE) colnames(car_eval)<-c("buying","maint","doors","persons","lug_boot","safety","class")
We're predicting the
class which is a factor containing 4 categories.
Have a look at the data with the following commands
We'll train the model on 70% of random rows leaving the other 30% for testing.
#Split to training/test random_rows = sort(sample(nrow(car_eval), nrow(car_eval)*.7)) training_data <- car_eval[random_rows,] test_data <- car_eval[-random_rows,]
The model is built.
# Neural Network library(nnet) # Build the model model<-nnet(class~buying+maint+doors+persons+lug_boot+safety,data=training_data,size = 4,decay = 0.0001,maxit = 500)
The parameters used in the nnet() function can be tuned to improve performance.
- Size : Number of units in the hidden layer
- Decay : Weight decay specifies regularisation in the neural network. As a rule of thumb, the more training examples you have, the weaker this term should be. The more parameters you have the higher this term should be.
- Maxit : Maxium iterations, the default is 100. There's no way to determine the optimal iterations, it's a case of adjusting, running the model, checking performance. Try changing the iterations to 1000 and see the impact on accuracy.
You can check the residuals with
Once you're happy with the model you can run the prediction.
Evaluate the prediction.
mtab<-table(test_data$pred_nnet,test_data$class) confusionMatrix(mtab) Confusion Matrix and Statistics acc good unacc vgood acc 111 0 9 0 good 2 15 0 3 unacc 1 0 364 0 vgood 0 0 0 14 Overall Statistics Accuracy : 0.9711 95% CI : (0.9528, 0.9837) No Information Rate : 0.7187 P-Value [Acc > NIR] : < 2.2e-16 Kappa : 0.9346 Mcnemar's Test P-Value : NA Statistics by Class: Class: acc Class: good Class: unacc Class: vgood Sensitivity 0.9737 1.00000 0.9759 0.82353 Specificity 0.9778 0.99008 0.9932 1.00000 Pos Pred Value 0.9250 0.75000 0.9973 1.00000 Neg Pred Value 0.9925 1.00000 0.9416 0.99406 Prevalence 0.2197 0.02890 0.7187 0.03276 Detection Rate 0.2139 0.02890 0.7013 0.02697 Detection Prevalence 0.2312 0.03854 0.7033 0.02697 Balanced Accuracy 0.9757 0.99504 0.9845 0.91176
With Machine Learning models an important communication ability to being able to explain a model in terms of intuition. This can be tricky with Neural Networks as it's modelled after the brain !
I recommend the article here.
You can plot a Neural Network from nnet() to assist in this task. You need to install an extra function.
library(devtools) source_url('https://gist.githubusercontent.com/fawda123/7471137/raw/466c1474d0a505ff044412703516c34f1a4684a5/nnet_plot_update.r') plot.nnet(model)
Neural Networks provide a powerful learning method to make predictions. On this data set I found Neural Networks beat Random Forests. You should always run different models to find the optimal results.
Be careful not to overfit. This is where the model that is run on the training data is fit so close that it is useless on the real data. We have the decay function to manage overfitting. Neural Networks are not so good at generalising across different business problems with different datasets. Always better to train on the actual data relevant to the question.
- Update : In 2018 I came across XGBoost which is usually used to train gradient-boosted decision trees and other gradient boosted models. My initial findings are that it uses less resources than Neural Networks and is somewhat easier to explain. I will run the XGBoost on this data when I get time.