Driving Visual Analysis with Automobile Data (R)

Acquiring automobile fuel efficiency data

Every data science project starts with data and this chapter is no different. For this recipe, we will dive into a dataset that contains fuel efficiency performance metrics, measured in miles per gallon (MPG) over time, for most makes and models of automobiles available in the U.S. since 1984. This data is courtesy of the U.S. Department of Energy and the US Environmental Protection Agency. In addition to fuel efficiency data, the dataset also contains several features and attributes of the automobiles listed, thereby providing the opportunity to summarize and group data to determine which groups tend to have better fuel efficiency historically and how this has changed over the years. The latest version of the dataset is available at http://www.fueleconomy.gov/feg/epadata/vehicles.csv.zip, and information about the variables in the dataset can be found at http://www.fueleconomy.gov/feg/ws/index.shtml#vehicle. The data was last updated on December 4, 2013 and was downloaded on December 8, 2013.

Tip

We recommend that you use the copy of the data set provided with the code for this book to ensure that the results described in this chapter match what your efforts produce.

Getting ready

To complete this recipe, you will need a computer with access to the Internet and a text editor of your choice.

How to do it...

Perform the following simple steps to acquire the data needed for the rest of the chapter:
  1. Unzip vehicles.csv with the decompression tool of your choice and move it to your working code directory.
  2. Take a moment and open the unzipped vehicles.csv file with Microsoft Excel, Google Spreadsheet, or a simple text editor. Comma-separated value (csv) files are very convenient to work with as they can be edited and viewed with very basic, freely available tools. With the file open, scroll through some of the data and get a sense of what you will be working with.
  3. Select and copy all the text below the vehicle heading under Data Description, and paste it into a text file. Do not include the emissions heading. Save this file in your working directory as varlabels.txt. The first five lines of the file are as follows:
    atvtype - type of alternative fuel or advanced technology vehicle
    barrels08 - annual petroleum consumption in barrels for fuelType1 (1)
    barrelsA08 - annual petroleum consumption in barrels for fuelType2 (1)
    charge120 - time to charge an electric vehicle in hours at 120 V
    charge240 - time to charge an electric vehicle in hours at 240 V

Tip

Note that this file is provided for your convenience in the repository containing the chapter's code.

How it works…

There isn't much to explain in this first simple recipe, but note that we are starting off relatively easily here. In many data science projects, you will not be able to access and view the data so easily. Next Page

Comments

Post a Comment

Popular posts from this blog

Evaluating Classification Model Performance

Practical Employment project with R