This analysis will use K-means clustering to predict what type of vehicle goes through a toll booth. This will fix a long process of looking at the images manually. Arford Inc. has started using a new software in the cameras at the toll booths. This software takes 3 length measurement approximations as well as the weight of the vehicle. A toll for a car is 7 dollars. The truck is 9 and the large truck costs 15. It is important we predict the vehicle with high accuracy so we don’t overcharge people or lose out on income.
Load in Data and Import Packages
We will first load in our dataset and take a look at the data structure. We have all numbers and one character column.
## tibble [150 × 5] (S3: tbl_df/tbl/data.frame)
## $ fwindshield: num [1:150] 137.7 63.7 61.1 59.8 65 ...
## $ tirewidth : num [1:150] 94.5 39 41.6 40.3 46.8 50.7 44.2 44.2 37.7 40.3 ...
## $ weight : num [1:150] 37.8 18.2 16.9 19.5 18.2 22.1 18.2 19.5 18.2 19.5 ...
## $ clearance : num [1:150] 5.4 5.4 5.4 5.4 5.4 10.8 8.1 5.4 5.4 2.7 ...
## $ vehicle : chr [1:150] "car" "car" "car" "car" ...
The following packages will be used for this model.
Splitting the data
A data split of 70/30 for training and testing data is common in k-Nearest Neighbor clustering.
tolldata <- as.data.frame(tolldata)
tolldata <- tolldata %>% mutate_at(c('vehicle'), as.factor)
# Splitting data into train and test data
split <- sample.split(tolldata, SplitRatio = 0.7)
train_cl <- subset(tolldata, split == "TRUE")
test_cl <- subset(tolldata, split == "FALSE")
# Feature Scaling
train_scale <- scale(train_cl[, 1:4])
test_scale <- scale(test_cl[, 1:4])
head(train_scale)
## fwindshield tirewidth weight clearance
## 1 4.6267266 6.68678539 -0.5012242 -1.290892
## 3 -1.2414830 0.19034926 -1.3887350 -1.290892
## 4 -1.3410740 0.03070149 -1.2783270 -1.290892
## 6 -0.5443458 1.30788364 -1.1679189 -1.032426
## 8 -0.9427099 0.50964480 -1.2783270 -1.290892
## 9 -1.5402560 -0.28859404 -1.3335310 -1.290892
## fwindshield tirewidth weight clearance
## 2 -1.1724769 -0.18115482 -1.368268 -1.325089
## 5 -1.0392409 1.23657852 -1.368268 -1.325089
## 7 -1.5721850 0.76400074 -1.368268 -1.192359
## 10 -1.1724769 0.05513407 -1.308735 -1.457819
## 12 -1.3057129 0.76400074 -1.249202 -1.325089
## 15 0.0266472 2.18173409 -1.487334 -1.325089
Categorized Visualization
Next, we will create a basic cluster plot so we can get an idea of what to expect.
Model Creation
We can now create our model using the knn function.
# Fitting KNN Model to training dataset
classifier_knn <- knn(train = train_scale,
test = test_scale,
cl = train_cl$vehicle,
k = 1)
classifier_knn
## [1] car car car car car car
## [7] car car car car car car
## [13] car car car car car car
## [19] car car truck truck truck truck
## [25] truck truck truck truck truck truck
## [31] truck truck truck truck truck truck
## [37] truck truck truck truck largetruck largetruck
## [43] truck largetruck largetruck largetruck largetruck truck
## [49] largetruck largetruck largetruck largetruck largetruck truck
## [55] largetruck largetruck largetruck largetruck largetruck largetruck
## Levels: car largetruck truck
Confusion Matrix
Let’s take a look at a confusion matrix to see how our model performs.
## classifier_knn
## car largetruck truck
## car 20 0 0
## largetruck 0 17 3
## truck 0 0 20
We are performing at 100% accuracy for cars and trucks with 90% accuracy for large trucks. This means we are not overcharging anyone but losing out on 10% of our large truck tolls. Overall this is a well performing model but lets see if we can up the accuracy.
Changing the K
Our parameter in this model is the number of clusters. We can experiment with a different number of clusters.
## [1] "Accuracy = 0.95"
## [1] "Accuracy = 0.933333333333333"
## [1] "Accuracy = 0.933333333333333"
## [1] "Accuracy = 0.95"
## [1] "Accuracy = 0.916666666666667"
## [1] "Accuracy = 0.9"
Visualizing Performance at Different K Values
Let’s visualize our performance
In conclusion, we can achieve over 98% accuracy with a k value between 5 and 15. This will automate the task of manually classifying the vehicle and will save the company a good amount of money.