Analytics

Posts

Showing posts from 2016

Cross-Validation: Concept and Example in R

- October 24, 2016

Cross-validation , sometimes called rotation estimation , is a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice . In Machine Learning, Cross-validation is a resampling method used for model evaluation to avoid testing a model on the same dataset on which it was trained. This is a common mistake, especially that a separate testing dataset is not always available. However, this usually leads to inaccurate performance measures (as the model will have an almost perfect score since it is being tested on the same data it was trained on). To avoid this kind of mistakes, cross validation is usually preferred. The concept of cross-validation is actually simple: Instead of using the whole dataset to train and then test on same data, we could randomly divide

Investigating the makes and models of automobiles

- October 06, 2016

Investigating the makes and models of automobiles With the first set of questions asked and answered about this dataset, let's move on to additional analyses. Getting ready If you completed the previous recipe, you should have everything you need to continue. How to do it... This recipe will investigate the makes and models of automobiles and how they have changed over time: Let's look at how the makes and models of cars inform fuel efficiency over time. First, let's look at the frequency of the makes and models of cars available in the US over this time and concentrate on four-cylinder cars: Copy carsMake <- ddply(gasCars4, ~year, summarise, numberOfMakes = length(unique(make))) ggplot(carsMake, aes(year, numberOfMakes)) + geom_point() + labs(x = "Year", y = "Number of available makes") + ggtitle("Four cylinder cars") We see in the following graph that there has been a decline in the number

Analysing automobile fuel efficiency over time

- October 05, 2016

Analysing automobile fuel efficiency over time We have now successfully imported the data and looked at some important high-level statistics that provided us with a basic understanding of what values are in the dataset and how frequently some features appear. With this recipe, we continue the exploration by looking at some of the fuel efficiency metrics over time and in relation to other data points. Getting ready If you completed the previous recipe, you should have everything you need to continue. How to do it... The following steps will use both plyr and the graphing library, ggplot2 , to explore the dataset: Let's start by looking at whether there is an overall trend of how MPG changes over time on an average. To do this, we use the ddply function from the plyr package to take the vehicles data frame, aggregate rows by year, and then, for each group, we compute the mean highway, city, and combine fuel efficiency.