Driving Visual Analysis with Automobile Data (R)
Acquiring automobile fuel efficiency data
Every data science project starts with data and this chapter is no different. For this recipe, we will dive into a dataset that contains fuel efficiency performance metrics, measured in miles per gallon (MPG) over time, for most makes and models of automobiles available in the U.S. since 1984. This data is courtesy of the U.S. Department of Energy and the US Environmental Protection Agency. In addition to fuel efficiency data, the dataset also contains several features and attributes of the automobiles listed, thereby providing the opportunity to summarize and group data to determine which groups tend to have better fuel efficiency historically and how this has changed over the years. The latest version of the dataset is available at http://www.fueleconomy.gov/feg/epadata/vehicles.csv.zip, and information about the variables in the dataset can be found at http://www.fueleconomy.gov/feg/ws/index.shtml#vehicle. The data was last updated on December 4, 2013 and was downloaded on December 8, 2013.
Tip
We recommend that you use the copy of the data set provided with the code for this book to ensure that the results described in this chapter match what your efforts produce.
To complete this recipe, you will need a computer with access to the Internet and a text editor of your choice.
Perform the following simple steps to acquire the data needed for the rest of the chapter:
- Download the dataset from http://www.fueleconomy.gov/feg/epadata/vehicles.csv.zip.
- Unzip
vehicles.csv
with the decompression tool of your choice and move it to your working code directory. - Take a moment and open the unzipped
vehicles.csv
file with Microsoft Excel, Google Spreadsheet, or a simple text editor. Comma-separated value (csv) files are very convenient to work with as they can be edited and viewed with very basic, freely available tools. With the file open, scroll through some of the data and get a sense of what you will be working with. - Select and copy all the text below the vehicle heading under Data Description, and paste it into a text file. Do not include the emissions heading. Save this file in your working directory as
varlabels.txt
. The first five lines of the file are as follows:atvtype - type of alternative fuel or advanced technology vehicle barrels08 - annual petroleum consumption in barrels for fuelType1 (1) barrelsA08 - annual petroleum consumption in barrels for fuelType2 (1) charge120 - time to charge an electric vehicle in hours at 120 V charge240 - time to charge an electric vehicle in hours at 240 V
There isn't much to explain in this first simple recipe, but note that we are starting off relatively easily here. In many data science projects, you will not be able to access and view the data so easily. Next Page
Thank you
ReplyDelete