Visually Exploring Employment Data (R)

This article will walk you through a financial analysis project where you will analyze stock market data, determine whether stocks are over- or under-valued, use this information to identify a list of target stocks that may make good investments, and visually analyze the price histories of the target stocks.

We must caution that the goal of this article is not to make you an expert in stock market analysis or to make you rich. Quants on Wall Street study engineering models that perform significantly more sophisticated operations than those we will touch upon here. Entire articles have been written on stock market models and financial engineering, but we only have a single article to dedicate to this topic. So, given the time and format constraints, the goals of this article will be:

The data we will use for this article consists of current data for stocks tracked by the website finviz.com and daily histories of stock prices obtained from Yahoo! Finance.

As in previous articles, the tool we will rely on most heavily for this project will be the R statistical programming language. As you've probably noticed by now, R has strong packages available that can assist us in the needed analytical tasks; we will be leveraging some of these packages in this article. Additionally, the recipes in this article will roughly follow the data science pipeline, which we will adapt to the type of data we are working with and the types of analysis we would like to conduct on the data.

Requirements

For this article, you will need a computer with access to the Internet. You will also need to have R installed and the following packages installed and loaded:

install.packages("XML")

install.packages("ggplot2")

install.packages("plyr")

install.packages("reshape2")

install.packages("zoo")

library(XML)

library(ggplot2)

library(plyr)

library(reshape2)

library(zoo)

The XML package will assist us with acquiring data from the Internet, ggplot2 will let us create beautiful graphs and visualizations from our data, plyr will help us with summarizing our data, and the zoo package will allow us to calculate moving averages.

You will also want to set a working directory where some of the charts that we generate will be saved:

setwd("path/where/you/want/to save/charts")

Acquiring stock market data

If you look on the Internet for stock market data, you will quickly find yourself inundated with sources providing stock quotes and financial data. An important but often overlooked factor when acquiring data is the efficiency of getting the data. All else being equal, you don't want to spend hours piecing together a dataset that you could have acquired in far less time. Taking this into consideration, we will try to obtain the largest amount of data from the least number of sources. This not only helps to keep the data as consistent as possible, but it also improves the repeatability of the analysis and the reproducibility of the results.

How to do it...

The first piece of data we want to obtain is a snapshot of the stocks we want to analyze. One of the best ways to do this is to download data from one of the many stock screener applications that exist. Our favorite screener to download stock data from belongs to http://finviz.com.

Let's acquire the stock market data we will use for this article with the help of the following steps:

1. First, let's pull up FINVIZ.com's stock screener available at http://finviz.com/screener.ashx:

https://www.packtpub.com/graphics/9781783980246/graphics/0246OS_04_01.jpg

As you can see, the site has multiple fields that can be filtered. If you click on the All tab, you can see all of fields that can be displayed.

2. For this project, we want to export all the fields for all the companies in the screener. You can either customize the screener by checking 69 checkboxes, as of the time of writing, or you can use the following URL to make all the fields show up automatically:

http://finviz.com/screener.ashx?v=152&c=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68

You should now see the screener with all the available fields.

3. If you scroll all the way to the bottom right of the screen, there should be an export link. Click on this link and save the CSV file as finviz.csv.

4. Finally, we will launch RStudio, read the finviz.csv file from the path where we saved it, and assign it to a data frame, as follows:

finviz <- read.csv("path/finviz.csv")

Note

In data analysis, it is always better for each step that is performed to be in code instead of as a series of point-and-click actions that require human intervention. This way, it is much easier and faster to reproduce your results.

5. After going through steps 1 to 4 for the first time (and some clever reading of URLs from our browser), we can replace the previous lines of code with the following two commands:

url_to_open <- 'http://finviz.com/export.ashx?v=152&c=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68'

finviz <- read.csv(url(url_to_open))

Tip

Note the structure of the URL in step 2; it contains a comma-separated list of the checkboxes we wish to select. You can programmatically generate this URL to easily select whichever combination of companies' data you want to download.

If you want to avoid typing the numbers 0 through 68, you can use a combination of the sprintf and pastecommands to accomplish the same thing:

url_to_open <- sprintf("http://finviz.com/export.ashx?v=152&c=%s",  paste(0:68, collapse = ","))

How it works...

Now that we've taken an initial glance at the data, it's important to take some time out to identify the fields that will be most important to us, and understand what these fields mean.

The first few fields contain identifying information about the company.

The ticker (sometimes also called the symbol) is the identifier for the stock of a company. No two companies will have the exact same ticker symbol. So AA is always Alcoa, AAPL is always Apple, and so on.

Next, we have the company name, sector, industry, and home country of the company. The sector and industry details serve as ways to classify stocks to inform us of each company's primary line of business; sector is more general (higher level), and industry is more specific (lower level). For example, Apple Inc. (AAPL) is in the Consumer Goods sector and primarily produces consumer goods in the Electronic Equipment industry.

There's more...

Once we get past these fields, most of the other fields in our dataset are numeric. Let's define some of the most important ones:

· Price: This indicates the ongoing dollar value to purchase one share of a company's stock.

· Volume: This indicates the most recent number of shares of the stock transacted in a day.

· Shares Outstanding: This is the total number of stock shares the company has issued.

· P/E: The Price to Earnings ratio is the price of the company's stock divided by the company's earnings per share outstanding.

· PEG: The P/E Growth ratio is the company's P/E ratio divided by its annual growth rate, and it gives you a sense of the valuation of the company's earnings relative to its growth.

· EPS growth next year: This is the expected rate at which the company's earnings per share will grow in the next year.

· Total Debt/Equity: The total debt to equity is used as a measure of financial health calculated by dividing the dollar value of the company's total debt with the equity in the company. This gives you a sense of how the company has been financing its growth and operations. Debt is more risky than equity, so a high ratio will be cause for concern.

· Beta: This is a measure of the stock's volatility (swings in its price) relative to the overall stock market. A beta of 1 means the stock is as volatile as the market. A beta more than 1 means it's more volatile, while a beta less than 1 means it's less volatile.

· RSI: The Relative Strength Index is a metric based on stock price activity, which uses the number of days a stock has closed higher than its opening price and the number of days a stock has closed lower than its opening price within the last two weeks to determine a score between 0 and 100. A higher index value indicates that the stock might be overvalued, and therefore, the price might drop soon; a lower value indicates that the stock might be undervalued, so the price might rise soon.

If you want to know the definitions of some of the other fields, http://investopedia.com is a great place to find definitions of financial and investment terms.

Cleaning and exploring the data

Now that we've acquired the data and learned a little about what the fields mean, the next step is to clean up the data and conduct some exploratory analysis.

Getting ready

Make sure you have the packages mentioned at the beginning of the article installed and you have successfully imported the FINVIZ data into R using the steps in the previous sections.

How to do it...

To clean and explore the data, closely follow the ensuing instructions:

1. Imported numeric data often contains special characters such as percentage signs, dollar signs, commas, and so on. This causes R to think that the field is a character field instead of a numeric field. For example, our FINVIZ dataset contains numerous values with percentage signs that must be removed. To do this, we will create a clean_numeric function that will strip away any unwanted characters using the gsub command. We will create this function once and then use it multiple times throughout the article:

clean_numeric <- function(s){

  s <- gsub("%|\\$|,|\\)|\\(", "", s)

  s <- as.numeric(s)

2. Next, we will apply this function to the numeric fields in our finviz data frame:

finviz <- cbind(finviz[,1:6],apply(finviz[,7:68], 2, clean_numeric))

3. If you look at the data again, all the pesky percentage signs will be gone, and the fields will all be numeric.

Tip

In this command, and throughout the rest of this article, there will be many instances where we reference columns by their column number. If the number of columns changes for some reason, the numbers referenced will need to be adjusted accordingly.

4. Now we are ready to really start exploring our data! The first thing to do is take a look at how the prices are distributed in order to get a visual sense of what is a high stock price, what is a low stock price, and where the prices of most stocks fall:

hist(finviz$Price, breaks=100, main="Price Distribution", xlab="Price")

You will get the following graph as output:

https://www.packtpub.com/graphics/9781783980246/graphics/0246OS_04_02.jpg

Here, we encounter our first problem. Outlier stocks with very high prices cause R to scale the xaxis of the histogram in such a way as to make the graph useless. We simply cannot see what the distribution for the more normally priced stocks looks like. This is a very common issue when first histogramming data.

5. Let's put a cap on the x axis of $150 and see what that produces for us:

hist(finviz$Price[finviz$Price<150], breaks=100, main="Price Distribution", xlab="Price")

You will get the following graph as output:

https://www.packtpub.com/graphics/9781783980246/graphics/0246OS_04_03.jpg

This is much better! It shows that the majority of stocks in our dataset are priced under $50. So, in absolute terms, a stock that was priced at $100 would be considered expensive.

6. But of course, things aren't so simple. Perhaps different sectors and industries have different price levels. So, theoretically, a $100 stock might be cheap if all the other stocks in its industry are priced in the $120 to $150 range. Let's get the average prices by sector and see how they compare. Note that we are not excluding any stocks:

sector_avg_prices <- aggregate(Price~Sector,data=finviz,FUN="mean")

colnames(sector_avg_prices)[2] <- "Sector_Avg_Price"

ggplot(sector_avg_prices, aes(x=Sector, y=Sector_Avg_Price, fill=Sector)) +

  geom_bar(stat="identity") + ggtitle("Sector Avg Prices") +

  theme(axis.text.x = element_text(angle = 90, hjust = 1))

You will get the following graph as output:

https://www.packtpub.com/graphics/9781783980246/graphics/0246OS_04_04.jpg

This is interesting. Stocks in the financial sector seem to have a significantly higher average price than stocks in other sectors. I'm willing to bet that this is due to some of the outliers that messed up our distribution earlier.

7. Let's get to the bottom of this! Let's find out which industries and companies are responsible for making the average price of the financial sector so much higher than all the others.

First, we create a summary of the average prices by industry:

industry_avg_prices <- aggregate(Price~Sector+Industry,data=finviz,FUN="mean")

industry_avg_prices <- industry_avg_prices[order(industry_avg_prices$Sector,industry_avg_prices$Industry),]

colnames(industry_avg_prices)[3] <- "Industry_Avg_Price"

Then, we isolate the industries in the financial sector:

industry_chart <- subset(industry_avg_prices,Sector=="Financial")

Finally, we create a chart showing the average price of each industry in the financial sector:

ggplot(industry_chart, aes(x=Industry, y=Industry_Avg_Price, fill=Industry)) +

  geom_bar(stat="identity") + theme(legend.position="none") + ggtitle("Industry Avg Prices") +

  theme(axis.text.x = element_text(angle = 90, hjust = 1))

You will get the following graph as output:

https://www.packtpub.com/graphics/9781783980246/graphics/0246OS_04_05.jpg

From this graph, it looks like the Property & Casualty Insurance industry is the main culprit that is driving the average prices up.

8. Next, we will drill down further into the Property & Casualty Insurance industry to identify which companies are the outliers:

company_chart <- subset(finviz,Industry=="Property & Casualty Insurance")

ggplot(company_chart, aes(x=Company, y=Price, fill=Company)) +

  geom_bar(stat="identity") + theme(legend.position="none") +

  ggtitle("Company Avg Prices") +

  theme(axis.text.x = element_text(angle = 90, hjust = 1))

You will get the following graph as output:

https://www.packtpub.com/graphics/9781783980246/graphics/0246OS_04_06.jpg

It's hard to see because there are so many companies, but if you zoom in on the graph, it is clear that the outlier company is Berkshire Hathaway, where the stock price is currently over $172,000 per share.

9. Since their stock price is so extreme, let's remove them from our dataset and then re-average the sectors so that we have a more realistic average price for the financial sector:

finviz <- subset(finviz, Ticker!="BRK-A")

sector_avg_prices <- aggregate(Price~Sector,data=finviz,FUN="mean")

colnames(sector_avg_prices)[2] <- "Sector_Avg_Price"

ggplot(sector_avg_prices, aes(x=Sector, y=Sector_Avg_Price, fill=Sector)) +

  geom_bar(stat="identity") + ggtitle("Sector Avg Prices") +

  theme(axis.text.x = element_text(angle = 90, hjust = 1))

You will get the following graph as output:

https://www.packtpub.com/graphics/9781783980246/graphics/0246OS_04_07.jpg

Now, our averages look much better and we have a decent basis to compare stock prices to their industry and sector averages.

How it works...

In this section, we used the aggregate command to summarize our data. Here's a reminder of the code we used:

sector_avg_prices <- aggregate(Price~Sector,data=finviz,FUN="mean")

An alternative way to do this is with the ddply command that is part of the plyr package:

sector_avg_prices <- ddply(finviz, "Sector", summarise,Price=mean(Price, na.rm=TRUE))

Wherever you see the aggregate command used in this article, feel free to challenge yourself by also trying to summarize the data, using ddply.

Generating relative valuations

One of the most interesting things that you can do with stock market data is come up with a valuation model. The ultimate goal is to arrive at a decision about whether the stock might be overvalued or undervalued. There are two main ways to do this. Intrinsic valuation is generally more time consuming because it involves digging into the financial statements of a company to arrive at a valuation decision. The alternative method is relative valuation, which will quickly provide a sense of how the stock is valued but does not take into account a comprehensive set of factors. The basic idea is that it compares a stock's price and valuation ratios to similar stocks to arrive at a conclusion. In this section, we will value stocks using the simpler relative valuation method.

Getting ready

This recipe requires the data downloaded and cleaned in the previous recipes.

How to do it...

We will essentially do three major things in this section. First, we calculate sector averages for fields that we can use in our relative valuation efforts. Then, we do the same at the industry level. Finally, we compare the stocks' statistics to the averages to arrive at an index value for each stock that indicates whether it might be undervalued. The following steps will guide you:

1. In order to calculate averages in multiple columns in R, we first need to melt the data. This will make every column after Sector a row and then display its value, essentially making the data long instead of wide. Take a look at the following screenshots for the different steps in this recipe to better understand how the data changes shape. It goes from being wide to long, and then back to wide again, but in summary form.

https://www.packtpub.com/graphics/9781783980246/graphics/0246OS_04_08.jpg

We will use the following command to perform this action:

sector_avg <- melt(finviz, id="Sector")

2. Next, we need to filter so that the data frame contains only the fields we want to average:

sector_avg <- subset(sector_avg,variable%in%c("Price","P.E","PEG","P.S","P.B"))

Now your sector_avg data frame should look like this:

https://www.packtpub.com/graphics/9781783980246/graphics/0246OS_04_09.jpg

Each column heading (variable) is now listed vertically alongside its value. This allows us to do some grouping later to get the averages for each variable.

3. Not all stocks in our original dataset had all of these values; where the values were null, we wanted to remove the records. We also wanted to make sure all of our values are numeric:

sector_avg <- (na.omit(sector_avg))

sector_avg$value <- as.numeric(sector_avg$value)

4. The next step is to cast the data to make it wide again. This will produce a column for each of the fields we filtered, and will now contain the average by sector. We will also rename the columns so that we know they are sector averages:

sector_avg <- dcast(sector_avg, Sector~variable, mean)

colnames(sector_avg)[2:6] <- c("SAvgPE","SAvgPEG","SAvgPS","SAvgPB","SAvgPrice")

You will get the following plot as output:

https://www.packtpub.com/graphics/9781783980246/graphics/0246OS_04_10.jpg

5. We will now do the exact same thing, but at the industry level:

industry_avg <- melt(finviz, id=c("Sector","Industry"))

industry_avg <- subset(industry_avg,variable %in% c("Price","P.E","PEG","P.S","P.B"))

industry_avg <- (na.omit(industry_avg))

industry_avg$value <- as.numeric(industry_avg$value)

industry_avg <- dcast(industry_avg, Sector+Industry~variable, mean)

industry_avg <- (na.omit(industry_avg))

colnames(industry_avg)[3:7] <- c("IAvgPE","IAvgPEG","IAvgPS","IAvgPB","IAvgPrice")

6. We will now add the sector and industry average columns to our original finviz dataset:

finviz <- merge(finviz, sector_avg, by.x="Sector", by.y="Sector")

finviz <- merge(finviz, industry_avg, by.x=c("Sector","Industry"), by.y=c("Sector","Industry"))

You might have noticed that the number of records in the finviz data frame decreased when we executed the last line of code. It removed all stock that didn't have an industry average from the dataset. This is fine since the overall goal is to narrow down the list of stocks, and we wouldn't have had sufficient information to generate a valuation for these stocks anyway.

7. Now, it's time to put these new fields to use. First, we will add 10 placeholder fields that contain all 0s. These will be used to track whether a stock is undervalued, based on being lower than the sector or industry average:

finviz$SPEUnder <- 0

finviz$SPEGUnder <- 0

finviz$SPSUnder <- 0

finviz$SPBUnder <- 0

finviz$SPriceUnder <- 0

finviz$IPEUnder <- 0

finviz$IPEGUnder <- 0

finviz$IPSUnder <- 0

finviz$IPBUnder <- 0

finviz$IPriceUnder <- 0

8. Next, we will replace the 0s with 1s wherever the respective value for the stock is less than the average to indicate that these stocks might be undervalued based on that metric:

finviz$SPEUnder[finviz$P.E<finviz$SAvgPE] <- 1

finviz$SPEGUnder[finviz$PEG<finviz$SAvgPEG] <- 1

finviz$SPSUnder[finviz$P.S<finviz$SAvgPS] <- 1

finviz$SPBUnder[finviz$P.B<finviz$SAvgPB] <- 1

finviz$SPriceUnder[finviz$Price<finviz$SAvgPrice] <- 1

finviz$IPEUnder[finviz$P.E<finviz$IAvgPE] <- 1

finviz$IPEGUnder[finviz$PEG<finviz$IAvgPEG] <- 1

finviz$IPSUnder[finviz$P.S<finviz$IAvgPS] <- 1

finviz$IPBUnder[finviz$P.B<finviz$IAvgPB] <- 1

finviz$IPriceUnder[finviz$Price<finviz$IAvgPrice] <- 1

9. Finally, we will sum these 10 columns to create a new column with the index value telling you, on a scale of 1 to 10, how undervalued the stock is based on the different dimensions that were considered:

finviz$RelValIndex <- apply(finviz[79:88],1,sum)

How it works...

Relative valuation involves comparing a stock's statistics with that of similar stocks in order to determine whether the stock is overvalued or undervalued. In an overly simplified example, a stock with a lower P/E ratio relative to the industry average P/E ratio for their industry (all else being equal) can be considered undervalued and might make a decent investment if the company has good financial health. Once we have this, we can filter for the stocks that look most promising, such as ones that have a RelValIndex of 8 or higher:

potentially_undervalued <- subset(finviz,RelValIndex>=8)

The potentially_undervalued data frame we just created should look like this:

https://www.packtpub.com/graphics/9781783980246/graphics/0246OS_04_11.jpg

We admit that this is an overly simplistic approach. However, it provides a framework to expand into more complex calculations. For example, once comfortable with this process, you can:

· Add in customized criteria to assign a 1 to indicate that the stock is undervalued

· Weigh the values differently

· Add or remove criteria

· Create more precise index values than just 1s and 0s, and so on

The sky is the limit here, but the process is the same.

Screening stocks and analyzing historical prices

When we are looking for stocks to invest in, we need to have a way to narrow the list down. In other words, we need to eliminate stocks that we don't think will be good investments. The definition of a good investment varies from person to person, but in this section, we will use some basic criteria to reduce our master list of stocks to just a few that we think might make good prospects. Once comfortable with the process, we encourage you to modify the criteria based on your own opinion of what defines a stock worth investing in. Once we have our prospects, we will analyze their historical prices and see what conclusions we can draw from them.

Getting ready

We will start with the finviz dataset as it was at the end of the previous section, along with the sector and industry averages columns, the binary undervalued columns, and the index values that summed up the values in the binary columns.

In addition to the packages we have used so far in this article, we will also need the zoo package for this section. This will help us calculate moving averages for the historical stock prices that we will pull.

How to do it...

The steps that you are about to embark upon will allow you to screen stocks:

1. First, choose some stock screening criteria, that is, a way to select the stocks within the finvizdataset that we feel have the potential to be good investments. Here are some sample criteria to start with:

· Only US companies

· Price per share between $20 and $100

· Volume greater than 10,000

· Positive earnings per share currently and projected for the future

· Total debt to equity ratio less than 1

· Beta less than 1.5

· Institutional ownership less than 30 percent

· Relative valuation index value greater than 8

2. As mentioned, these are just examples. Feel free to remove criteria, add criteria, or make changesbased on what you think will give you the best output. The goal is to narrow the list down to less than 10 stocks.

3. Next, we apply our criteria to subset the finviz data frame into a new data frame called target_stocks:

target_stocks <- subset(finviz, Price>20 & Price<100 & Volume>10000 &

                          Country=="USA" &

                          EPS..ttm.>0 &

                          EPS.growth.next.year>0 &

                          EPS.growth.next.5.years>0 &

                          Total.Debt.Equity<1 & Beta<1.5 &

                          Institutional.Ownership<30 &

                          RelValIndex>8)

At the time of writing this article, this produces a target list of six stocks, as shown in the following screenshot. You might get a different number or different stocks altogether if you pull updated data from the Web.

https://www.packtpub.com/graphics/9781783980246/graphics/0246OS_04_12.jpg

4. Now, let's go out and get historical prices for our target list of stocks so that we can see how their prices have looked over time. We will use a for loop to iterate through the list of symbols and pull prices for each one, but we will break up the loop across several steps and explain what each chunk is doing:

counter <- 0

for (symbol in target_stocks$Ticker){

The preceding command initializes a counter to keep track of where we are in our list of target stocks. Immediately after, we begin the for loop by telling every symbol in our target list to do the following:

  url <- paste0("http://ichart.finance.yahoo.com/table.csv?s=",symbol,"&a=08&b=7&c=1984&d=01&e=23&f=2014&g=d&ignore=.csv")

  stock <- read.csv(url)

  stock <- na.omit(stock)

  colnames(stock)[7] <- "AdjClose"

  stock[,1] <- as.Date(stock[,1])

  stock <- cbind(Symbol=symbol,stock)

This code assigns a URL to the url variable that has the current stock symbol embedded into it. Then, we read the data located at this URL and assign it to a data frame called stock. We then do some clean up and formatting by removing all null values from the data frame, renaming the last column, making sure the Date column is formatted as a date that R can recognize, and adding the stock's symbol to the first row of the data frame.

5. The next few lines of our for loop will calculate some moving averages so that we can compare them with the daily stock prices. For this step, make sure you have the zoo package mentioned at the beginning of this section installed and loaded.

The first part will calculate both a 50-day moving average and a 200-day moving average:

  maxrow <- nrow(stock)-49

  ma50 <- cbind(stock[1:maxrow,1:2],rollmean(stock$AdjClose,50,align="right"))

  maxrow <- nrow(stock)-199

  ma200 <- cbind(stock[1:maxrow,1:2],rollmean(stock$AdjClose,200,align="right"))

The second part will combine the moving average data frames with the data frame containing the historical stock prices so that everything is part of the same dataset:

  stock <- merge(stock,ma50,by.x=c("Symbol","Date"),by.y=c("Symbol","Date"),all.x=TRUE)

  colnames(stock)[9] <- "MovAvg50"

  stock <- merge(stock,ma200,by.x=c("Symbol","Date"),by.y=c("Symbol","Date"),all.x=TRUE)

  colnames(stock)[10] <- "MovAvg200"

6. Next, we will plot a historical chart for each stock that our for loop iterates through, and then save that plot:

  price_chart <- melt(stock[,c(1,2,8,9,10)],id=c("Symbol","Date"))

  qplot(Date, value, data=price_chart, geom="line", color=variable,

        main=paste(symbol,"Daily Stock Prices"),ylab="Price")

  ggsave(filename=paste0("stock_price_",counter,".png"))

The charts that get generated and saved should look like the following two charts:

https://www.packtpub.com/graphics/9781783980246/graphics/0246OS_04_13.jpg

https://www.packtpub.com/graphics/9781783980246/graphics/0246OS_04_14.jpg

The next part of our loop summarizes the opening, high, low, and closing prices of the current stock:

price_summary <- ddply(stock, "Symbol", summarise, open=Open[nrow(stock)],

                         high=max(High),low=min(Low),close=AdjClose[1])

Then, it accumulates the summarized opening, high, low, and closing prices in a data frame called stocks so that the different stocks can be compared later. Also, it separately accumulates all the daily historical prices for the stocks in a data frame called price summaries so that they can be compared as well:

  if(counter==0){

    stocks <- rbind(stock)

    price_summaries <- rbind(price_summary)

  }else{

    stocks <- rbind(stocks, stock)

    price_summaries <- rbind(price_summaries, price_summary)

At the end of the loop, we increment our counter by one, and then close our for loop with a curly bracket:

counter <- counter+1

Tip

We broke our loop into pieces in order to explain what each part of the loop does. If you want to see what the entire for loop should look like, check the accompanying code file for this article.

7. Once we have iterated through all the stock symbols, we are left with a data frame named stocks that contains the historical prices for all the symbols in our target list and a data frame named price_summaries that holds the summaries for all our stocks. Let's graph them and see what they look like.

First, we will graph the historical prices for all our stocks:

qplot(Date, AdjClose, data=stocks, geom="line", color=Symbol,

      main="Daily Stock Prices")

ggsave(filename=("stock_price_combined.png"))

The preceding commands will produce the following graph:

https://www.packtpub.com/graphics/9781783980246/graphics/0246OS_04_15.jpg

8. Then, let's graph the price summaries:

summary <- melt(price_summaries,id="Symbol")

ggplot(summary, aes(x=variable, y=value, fill=Symbol)) +

  geom_bar(stat="identity") + facet_wrap(~Symbol)

ggsave(filename=("stock_price_summaries.png"))

The resulting graph should look similar to this:

https://www.packtpub.com/graphics/9781783980246/graphics/0246OS_04_16.jpg

How it works...

Daily stock price charts are very "spiky" or volatile, and this sometimes makes them difficult to read. Moving averages smooth out the price fluctuations of a stock so that you can get a better sense of whether the stock is moving up or down over time.

Moving averages are also used to time investment in stocks. In other words, they are used as a guide to determine whether to invest in a stock now or to wait. There are varying opinions about what signals the best time, but one example is when the stock's 50-day moving average is below its 200-day moving average but is trending up. For more on moving averages, please see http://www.investopedia.com/university/movingaverage/.

The combined historical price chart we generated in this section shows us the degree to which our target stocks' prices move in tandem. If you are looking to invest in multiple stocks, it can be good to invest in ones where the prices are not too highly correlated. You can also visualize how volatile one stock has been when compared to another. In our graph, you can see that the symbols WPZ and NRCIB have been fairly volatile, while the other symbols have been somewhat less volatile.

Another way to look at the price comparisons is by examining the price summaries' bar chart we created. This chart shows the opening, high, low, and closing prices for the period analyzed. The opening price is the very first price the stock traded at, the closing price is the very last price the stock has traded at thus far, the high price is the highest price the stock has been at during the period, and the low price is the lowest price the stock has been at during the period. The volatility mentioned previously can be viewed in a different way on this graph, as you can clearly see the difference between the highs and the lows of our two most volatile stocks. This chart also lets you see where the stock's closing price is relative to its all-time high and all-time low, which might help to give you a clue of the fairness of its current valuation.

This Post is an extract form https://www.packtpub.com book called Practical Data Science cookbook.

Search This Blog

Analytics