Visually Exploring Employment Data (R)
This article will walk you through a
financial analysis project where you will analyze stock market data, determine
whether stocks are over- or under-valued, use this information to identify a
list of target stocks that may make good investments, and visually analyze the
price histories of the target stocks.
We must caution that the goal of this article
is not to make you an expert in stock market analysis or to make you rich.
Quants on Wall Street study engineering models that perform significantly more
sophisticated operations than those we will touch upon here. Entire articles
have been written on stock market models and financial engineering, but we only
have a single article to dedicate to this topic. So, given the time and format
constraints, the goals of this article will be:
The data we will use for this article consists
of current data for stocks tracked by the website finviz.com and daily histories of stock
prices obtained from Yahoo! Finance.
As in previous articles, the tool we will
rely on most heavily for this project will be the R statistical programming
language. As you've probably noticed by now, R has strong packages available
that can assist us in the needed analytical tasks; we will be leveraging some
of these packages in this article. Additionally, the recipes in this article
will roughly follow the data science pipeline, which we will adapt to the type
of data we are working with and the types of analysis we would like to conduct
on the data.
Requirements
For this article, you will need
a computer with access to the Internet. You will also need to have R installed
and the following packages installed and loaded:
install.packages("XML")
install.packages("ggplot2")
install.packages("plyr")
install.packages("reshape2")
install.packages("zoo")
library(XML)
library(ggplot2)
library(plyr)
library(reshape2)
library(zoo)
The
XML
package will assist us with
acquiring data from the Internet, ggplot2
will let us create beautiful graphs and
visualizations from our data, plyr
will help us with summarizing our data, and the zoo
package will allow us to
calculate moving averages.
You will also want to set a working
directory where some of the charts that we generate will be saved:
setwd("path/where/you/want/to save/charts")
Acquiring stock market data
If you look on the Internet for stock
market data, you will quickly find yourself inundated with sources
providing stock quotes and financial data. An important but often overlooked
factor when acquiring data is the efficiency of getting the data. All else
being equal, you don't want to spend hours piecing together a dataset that you
could have acquired in far less time. Taking this into consideration, we will
try to obtain the largest amount of data from the least number of sources.
This not only helps to keep the data as consistent as possible, but it
also improves the repeatability of the analysis and the reproducibility of the
results.
How to do it...
The first piece of data we want to
obtain is a snapshot of the stocks we want to analyze. One of the best ways to
do this is to download data from one of the many stock screener applications that
exist. Our favorite screener to download stock data from belongs to http://finviz.com.
Let's acquire the stock market data
we will use for this article with the help of the following steps:
1.
First, let's pull up FINVIZ.com's stock screener
available at http://finviz.com/screener.ashx:
As you can see, the site has multiple
fields that can be filtered. If you click on the All tab,
you can see all of fields that can be displayed.
2.
For this project, we want to export all the fields for
all the companies in the screener. You can either customize the screener by
checking 69 checkboxes, as of the time of writing, or you can use the following
URL to make all the fields show up automatically:
You should now see the screener with
all the available fields.
3.
If you scroll all the way to the bottom right
of the screen, there should be an export link.
Click on this link and save the CSV file as
finviz.csv
.
4.
Finally, we will launch RStudio, read the
finviz.csv
file from the path where we
saved it, and assign it to a data frame, as follows:finviz <- read.csv("path/finviz.csv")
Note
In data analysis, it is always better
for each step that is performed to be in code instead of as a series of
point-and-click actions that require human intervention. This way, it is much
easier and faster to reproduce your results.
5.
After going through steps 1 to 4 for the first time (and
some clever reading of URLs from our browser), we can replace the previous
lines of code with the following two commands:
url_to_open <- 'http://finviz.com/export.ashx?v=152&c=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68'
finviz <- read.csv(url(url_to_open))
Tip
Note the structure of the URL in step
2; it contains a comma-separated list of the checkboxes we wish to select. You
can programmatically generate this URL to easily select whichever combination
of companies' data you want to download.
If you want to avoid typing the
numbers 0 through 68, you can use a combination of the
sprintf
and paste
commands to accomplish the same
thing:url_to_open <- sprintf("http://finviz.com/export.ashx?v=152&c=%s", paste(0:68, collapse = ","))
How
it works...
Now that we've taken an
initial glance at the data, it's important to take some time out to identify
the fields that will be most important to us, and understand what these fields
mean.
The first few fields contain
identifying information about the company.
The ticker (sometimes also called the
symbol) is the identifier for the stock of a company. No two companies will
have the exact same ticker symbol. So AA is always Alcoa, AAPL is always Apple,
and so on.
Next, we have the company name,
sector, industry, and home country of the company. The sector and industry
details serve as ways to classify stocks to inform us of each company's primary
line of business; sector is more general (higher level), and industry is more
specific (lower level). For example, Apple Inc. (AAPL) is in the Consumer Goods
sector and primarily produces consumer goods in the Electronic Equipment
industry.
There's more...
Once we get past these fields, most
of the other fields in our dataset are numeric. Let's define some of the most
important ones:
·
Price: This indicates the ongoing dollar value to
purchase one share of a company's stock.
·
Volume: This indicates the most recent number of shares of
the stock transacted in a day.
·
Shares Outstanding: This is the total number of stock shares the
company has issued.
·
P/E: The Price to Earnings ratio is the price of the
company's stock divided by the company's earnings per share outstanding.
·
PEG: The P/E Growth ratio is the company's P/E ratio divided
by its annual growth rate, and it gives you a sense of the valuation of
the company's earnings relative to its growth.
·
EPS growth next year: This is the expected rate at
which the company's earnings per share will grow in the next year.
·
Total Debt/Equity: The total debt to equity is used as a measure of
financial health calculated by dividing the dollar value of the company's
total debt with the equity in the company. This gives you a sense of how the
company has been financing its growth and operations. Debt is more risky than
equity, so a high ratio will be cause for concern.
·
Beta: This is a measure of the stock's volatility (swings in
its price) relative to the overall stock market. A beta of
1
means the stock is as
volatile as the market. A beta more than 1 means it's more volatile, while
a beta less than 1 means it's less volatile.
·
RSI: The Relative Strength Index is a metric based on
stock price activity, which uses the number of days a stock has closed higher
than its opening price and the number of days a stock has closed lower than its
opening price within the last two weeks to determine a score between 0 and 100.
A higher index value indicates that the stock might be overvalued, and
therefore, the price might drop soon; a lower value indicates that the stock
might be undervalued, so the price might rise soon.
If you want to know the definitions
of some of the other fields, http://investopedia.com is
a great place to find definitions of financial and investment terms.
Cleaning and exploring the data
Now that we've acquired the data
and learned a little about what the fields mean, the next step is to clean
up the data and conduct some exploratory analysis.
Getting ready
Make sure you have the packages
mentioned at the beginning of the article installed and you have successfully
imported the FINVIZ data into R using the steps in the previous sections.
How to do it...
To clean and explore the data,
closely follow the ensuing instructions:
1.
Imported numeric data often contains special characters
such as percentage signs, dollar signs, commas, and so on. This causes R to
think that the field is a character field instead of a numeric field.
For example, our FINVIZ dataset contains numerous values with percentage signs
that must be removed. To do this, we will create a
clean_numeric
function that will strip away
any unwanted characters using the gsub
command. We will create this function once and then
use it multiple times throughout the article:clean_numeric <- function(s){
s <- gsub("%|\\$|,|\\)|\\(", "", s)
s <- as.numeric(s)
}
2.
Next, we will apply this function to the numeric
fields in our
finviz
data frame:finviz <- cbind(finviz[,1:6],apply(finviz[,7:68], 2, clean_numeric))
3.
If you look at the data again, all the pesky
percentage signs will be gone, and the fields will all be numeric.
Tip
In this command, and throughout the
rest of this article, there will be many instances where we reference columns
by their column number. If the number of columns changes for some reason, the
numbers referenced will need to be adjusted accordingly.
4.
Now we are ready to really start exploring our data! The
first thing to do is take a look at how the prices are distributed in order to
get a visual sense of what is a high stock price, what is a low stock price,
and where the prices of most stocks fall:
hist(finviz$Price, breaks=100, main="Price Distribution", xlab="Price")
You will get the following graph as
output:
Here, we encounter our first problem.
Outlier stocks with very high prices cause R to scale the xaxis
of the histogram in such a way as to make the graph useless. We simply cannot
see what the distribution for the more normally priced stocks looks like. This
is a very common issue when first histogramming data.
5.
Let's put a cap on the x axis of
$150 and see what that produces for us:
hist(finviz$Price[finviz$Price<150], breaks=100, main="Price Distribution", xlab="Price")
You will get the following graph as
output:
This is much better! It shows that
the majority of stocks in our dataset are priced under $50. So, in absolute
terms, a stock that was priced at $100 would be considered expensive.
6.
But of course, things aren't so simple.
Perhaps different sectors and industries have different price levels. So,
theoretically, a $100 stock might be cheap if all the other stocks in its
industry are priced in the $120 to $150 range. Let's get the average prices by
sector and see how they compare. Note that we are not excluding any stocks:
sector_avg_prices <- aggregate(Price~Sector,data=finviz,FUN="mean")
colnames(sector_avg_prices)[2] <- "Sector_Avg_Price"
ggplot(sector_avg_prices, aes(x=Sector, y=Sector_Avg_Price, fill=Sector)) +
geom_bar(stat="identity") + ggtitle("Sector Avg Prices") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
You will get the following graph as
output:
This is interesting. Stocks in the
financial sector seem to have a significantly higher average price than stocks
in other sectors. I'm willing to bet that this is due to some of the outliers
that messed up our distribution earlier.
7.
Let's get to the bottom of this! Let's find
out which industries and companies are responsible for making the average
price of the financial sector so much higher than all the others.
First, we create a summary of the
average prices by industry:
industry_avg_prices <- aggregate(Price~Sector+Industry,data=finviz,FUN="mean")
industry_avg_prices <- industry_avg_prices[order(industry_avg_prices$Sector,industry_avg_prices$Industry),]
colnames(industry_avg_prices)[3] <- "Industry_Avg_Price"
Then, we isolate the industries in
the financial sector:
industry_chart <- subset(industry_avg_prices,Sector=="Financial")
Finally, we create a chart showing
the average price of each industry in the financial sector:
ggplot(industry_chart, aes(x=Industry, y=Industry_Avg_Price, fill=Industry)) +
geom_bar(stat="identity") + theme(legend.position="none") + ggtitle("Industry Avg Prices") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
You will get the following graph as
output:
From this graph, it looks like the Property
& Casualty Insurance industry is the main
culprit that is driving the average prices up.
8.
Next, we will drill down further into the Property
& Casualty Insurance industry to identify
which companies are the outliers:
company_chart <- subset(finviz,Industry=="Property & Casualty Insurance")
ggplot(company_chart, aes(x=Company, y=Price, fill=Company)) +
geom_bar(stat="identity") + theme(legend.position="none") +
ggtitle("Company Avg Prices") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
You will get the following graph as
output:
It's hard to see because
there are so many companies, but if you zoom in on the graph, it is clear that
the outlier company is Berkshire Hathaway, where the stock price is currently
over $172,000 per share.
9.
Since their stock price is so extreme, let's remove them
from our dataset and then re-average the sectors so that we have a
more realistic average price for the financial sector:
finviz <- subset(finviz, Ticker!="BRK-A")
sector_avg_prices <- aggregate(Price~Sector,data=finviz,FUN="mean")
colnames(sector_avg_prices)[2] <- "Sector_Avg_Price"
ggplot(sector_avg_prices, aes(x=Sector, y=Sector_Avg_Price, fill=Sector)) +
geom_bar(stat="identity") + ggtitle("Sector Avg Prices") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
You will get the following
graph as output:
Now, our averages look
much better and we have a decent basis to compare stock prices to their
industry and sector averages.
How it works...
In this section, we used the
aggregate
command to summarize our data.
Here's a reminder of the code we used:sector_avg_prices <- aggregate(Price~Sector,data=finviz,FUN="mean")
An alternative way to do this is with
the
ddply
command that is part of the plyr
package:sector_avg_prices <- ddply(finviz, "Sector", summarise,Price=mean(Price, na.rm=TRUE))
Wherever you see the
aggregate
command used in this article,
feel free to challenge yourself by also trying to summarize the data,
using ddply
.
Generating relative
valuations
One of the most interesting
things that you can do with stock market data is come up with a valuation
model. The ultimate goal is to arrive at a decision about whether the stock
might be overvalued or undervalued. There are two main ways to do this.
Intrinsic valuation is generally more time consuming because it involves
digging into the financial statements of a company to arrive at a valuation
decision. The alternative method is relative valuation, which will quickly
provide a sense of how the stock is valued but does not take into account a
comprehensive set of factors. The basic idea is that it compares a stock's
price and valuation ratios to similar stocks to arrive at a conclusion. In this
section, we will value stocks using the simpler relative valuation method.
Getting ready
This recipe requires the data
downloaded and cleaned in the previous recipes.
How to do it...
We will essentially do three major
things in this section. First, we calculate sector averages for fields that we
can use in our relative valuation efforts. Then, we do the same at the industry
level. Finally, we compare the stocks' statistics to the averages to arrive at
an index value for each stock that indicates whether it might be undervalued.
The following steps will guide you:
1.
In order to calculate averages in multiple columns
in R, we first need to melt the data. This will make every column after Sector a
row and then display its value, essentially making the data long instead of
wide. Take a look at the following screenshots for the different steps in this
recipe to better understand how the data changes shape. It goes from being wide
to long, and then back to wide again, but in summary form.
We will use the following command to
perform this action:
sector_avg <- melt(finviz, id="Sector")
2.
Next, we need to filter so that the data frame
contains only the fields we want to average:
sector_avg <- subset(sector_avg,variable%in%c("Price","P.E","PEG","P.S","P.B"))
Now your
sector_avg
data frame should look like
this:
Each column heading (variable) is now
listed vertically alongside its value. This allows us to do some grouping later
to get the averages for each variable.
3.
Not all stocks in our original dataset had all of
these values; where the values were null, we wanted to remove the records. We
also wanted to make sure all of our values are numeric:
sector_avg <- (na.omit(sector_avg))
sector_avg$value <- as.numeric(sector_avg$value)
4.
The next step is to cast the data to make it wide again.
This will produce a column for each of the fields we filtered, and will now
contain the average by sector. We will also rename the columns so that we know
they are sector averages:
sector_avg <- dcast(sector_avg, Sector~variable, mean)
colnames(sector_avg)[2:6] <- c("SAvgPE","SAvgPEG","SAvgPS","SAvgPB","SAvgPrice")
You will get the following plot as
output:
5.
We will now do the exact same thing, but at the
industry level:
industry_avg <- melt(finviz, id=c("Sector","Industry"))
industry_avg <- subset(industry_avg,variable %in% c("Price","P.E","PEG","P.S","P.B"))
industry_avg <- (na.omit(industry_avg))
industry_avg$value <- as.numeric(industry_avg$value)
industry_avg <- dcast(industry_avg, Sector+Industry~variable, mean)
industry_avg <- (na.omit(industry_avg))
colnames(industry_avg)[3:7] <- c("IAvgPE","IAvgPEG","IAvgPS","IAvgPB","IAvgPrice")
6.
We will now add the sector and industry average columns
to our original
finviz
dataset:finviz <- merge(finviz, sector_avg, by.x="Sector", by.y="Sector")
finviz <- merge(finviz, industry_avg, by.x=c("Sector","Industry"), by.y=c("Sector","Industry"))
You might have noticed that the
number of records in the
finviz
data frame decreased when we executed the last line
of code. It removed all stock that didn't have an industry average from the
dataset. This is fine since the overall goal is to narrow down the list of
stocks, and we wouldn't have had sufficient information to generate a valuation
for these stocks anyway.
7.
Now, it's time to put these new fields to use.
First, we will add 10 placeholder fields that contain all 0s. These will be
used to track whether a stock is undervalued, based on being lower than the
sector or industry average:
finviz$SPEUnder <- 0
finviz$SPEGUnder <- 0
finviz$SPSUnder <- 0
finviz$SPBUnder <- 0
finviz$SPriceUnder <- 0
finviz$IPEUnder <- 0
finviz$IPEGUnder <- 0
finviz$IPSUnder <- 0
finviz$IPBUnder <- 0
finviz$IPriceUnder <- 0
8.
Next, we will replace the 0s with 1s wherever the
respective value for the stock is less than the average to indicate that these
stocks might be undervalued based on that metric:
finviz$SPEUnder[finviz$P.E<finviz$SAvgPE] <- 1
finviz$SPEGUnder[finviz$PEG<finviz$SAvgPEG] <- 1
finviz$SPSUnder[finviz$P.S<finviz$SAvgPS] <- 1
finviz$SPBUnder[finviz$P.B<finviz$SAvgPB] <- 1
finviz$SPriceUnder[finviz$Price<finviz$SAvgPrice] <- 1
finviz$IPEUnder[finviz$P.E<finviz$IAvgPE] <- 1
finviz$IPEGUnder[finviz$PEG<finviz$IAvgPEG] <- 1
finviz$IPSUnder[finviz$P.S<finviz$IAvgPS] <- 1
finviz$IPBUnder[finviz$P.B<finviz$IAvgPB] <- 1
finviz$IPriceUnder[finviz$Price<finviz$IAvgPrice] <- 1
9.
Finally, we will sum these 10 columns to create a new
column with the index value telling you, on a scale of 1 to 10, how undervalued
the stock is based on the different dimensions that were considered:
finviz$RelValIndex <- apply(finviz[79:88],1,sum)
How it works...
Relative valuation involves
comparing a stock's statistics with that of similar stocks in order to
determine whether the stock is overvalued or undervalued. In an overly
simplified example, a stock with a lower P/E ratio relative to the industry
average P/E ratio for their industry (all else being equal) can be considered
undervalued and might make a decent investment if the company has good
financial health. Once we have this, we can filter for the stocks that look
most promising, such as ones that have a
RelValIndex
of 8 or higher:potentially_undervalued <- subset(finviz,RelValIndex>=8)
The
potentially_undervalued
data frame we just created
should look like this:
We admit that this is an overly
simplistic approach. However, it provides a framework to expand into more
complex calculations. For example, once comfortable with this process, you can:
·
Add in customized criteria to assign a
1
to indicate that the stock is
undervalued
·
Weigh the values differently
·
Add or remove criteria
·
Create more precise index values than just 1s and 0s, and
so on
The sky is the limit here,
but the process is the same.
Screening stocks and analyzing historical
prices
When we are looking for stocks to
invest in, we need to have a way to narrow the list down. In other words,
we need to eliminate stocks that we don't think will be good
investments. The definition of a good investment varies from person to person,
but in this section, we will use some basic criteria to reduce our master list
of stocks to just a few that we think might make good prospects. Once
comfortable with the process, we encourage you to modify the criteria based on
your own opinion of what defines a stock worth investing in. Once we have our
prospects, we will analyze their historical prices and see what conclusions we
can draw from them.
Getting ready
We will start with the
finviz
dataset as it was at the end of
the previous section, along with the sector and industry averages columns, the
binary undervalued columns, and the index values that summed up the values in
the binary columns.
In addition to the packages we have
used so far in this article, we will also need the
zoo
package for this section. This
will help us calculate moving averages for the historical stock prices that we
will pull.
How to do it...
The steps that you are about to
embark upon will allow you to screen stocks:
1.
First, choose some stock screening criteria, that is, a
way to select the stocks within the
finviz
dataset that we feel have the potential to be good
investments. Here are some sample criteria to start with:
·
Only US companies
·
Price per share between $20 and $100
·
Volume greater than 10,000
·
Positive earnings per share currently and projected for
the future
·
Total debt to equity ratio less than 1
·
Beta less than 1.5
·
Institutional ownership less than 30 percent
·
Relative valuation index value greater than 8
2.
As mentioned, these are just examples. Feel free to
remove criteria, add criteria, or make changesbased on what you think will
give you the best output. The goal is to narrow the list down to less than
10 stocks.
3.
Next, we apply our criteria to subset the
finviz
data frame into a new data
frame called target_stocks
:target_stocks <- subset(finviz, Price>20 & Price<100 & Volume>10000 &
Country=="USA" &
EPS..ttm.>0 &
EPS.growth.next.year>0 &
EPS.growth.next.5.years>0 &
Total.Debt.Equity<1 & Beta<1.5 &
Institutional.Ownership<30 &
RelValIndex>8)
At the time of writing this article,
this produces a target list of six stocks, as shown in the following
screenshot. You might get a different number or different stocks altogether if
you pull updated data from the Web.
4.
Now, let's go out and get historical prices for our
target list of stocks so that we can see how their prices have looked over
time. We will use a
for
loop to iterate through the
list of symbols and pull prices for each one, but we will break up the loop
across several steps and explain what each chunk is doing:counter <- 0
for (symbol in target_stocks$Ticker){
The preceding command initializes a counter
to keep track of where we are in our list of target stocks. Immediately after,
we begin the
for
loop by telling every symbol in
our target list to do the following: url <- paste0("http://ichart.finance.yahoo.com/table.csv?s=",symbol,"&a=08&b=7&c=1984&d=01&e=23&f=2014&g=d&ignore=.csv")
stock <- read.csv(url)
stock <- na.omit(stock)
colnames(stock)[7] <- "AdjClose"
stock[,1] <- as.Date(stock[,1])
stock <- cbind(Symbol=symbol,stock)
This code assigns a URL to the
url
variable that has the current stock
symbol embedded into it. Then, we read the data located at this URL and assign
it to a data frame called stock
. We then do some clean up and formatting by removing all
null values from the data frame, renaming the last column, making sure the Date
column is formatted as a date
that R can recognize, and adding the stock's symbol to the first row of the
data frame.
5.
The next few lines of our
for
loop will calculate some moving
averages so that we can compare them with the daily stock prices. For this
step, make sure you have the zoo
package mentioned at the beginning of this section
installed and loaded.
The first part will calculate both a
50-day moving average and a 200-day moving average:
maxrow <- nrow(stock)-49
ma50 <- cbind(stock[1:maxrow,1:2],rollmean(stock$AdjClose,50,align="right"))
maxrow <- nrow(stock)-199
ma200 <- cbind(stock[1:maxrow,1:2],rollmean(stock$AdjClose,200,align="right"))
The second part will combine the
moving average data frames with the data frame containing the historical stock
prices so that everything is part of the same dataset:
stock <- merge(stock,ma50,by.x=c("Symbol","Date"),by.y=c("Symbol","Date"),all.x=TRUE)
colnames(stock)[9] <- "MovAvg50"
stock <- merge(stock,ma200,by.x=c("Symbol","Date"),by.y=c("Symbol","Date"),all.x=TRUE)
colnames(stock)[10] <- "MovAvg200"
6.
Next, we will plot a historical chart for each stock
that our
for
loop iterates through,
and then save that plot: price_chart <- melt(stock[,c(1,2,8,9,10)],id=c("Symbol","Date"))
qplot(Date, value, data=price_chart, geom="line", color=variable,
main=paste(symbol,"Daily Stock Prices"),ylab="Price")
ggsave(filename=paste0("stock_price_",counter,".png"))
The charts that get generated and
saved should look like the following two charts:
The next part of our loop summarizes
the opening, high, low, and closing prices of the current stock:
price_summary <- ddply(stock, "Symbol", summarise, open=Open[nrow(stock)],
high=max(High),low=min(Low),close=AdjClose[1])
Then, it accumulates the summarized
opening, high, low, and closing prices in a data frame called
stocks
so that the different stocks
can be compared later. Also, it separately accumulates all the daily historical
prices for the stocks in a data frame called price summaries
so that they can be compared as
well: if(counter==0){
stocks <- rbind(stock)
price_summaries <- rbind(price_summary)
}else{
stocks <- rbind(stocks, stock)
price_summaries <- rbind(price_summaries, price_summary)
}
At the end of the loop, we increment
our counter by one, and then close our
for
loop with a curly bracket:counter <- counter+1
}
Tip
We broke our loop into pieces in
order to explain what each part of the loop does. If you want to see what
the entire
for
loop should look like, check
the accompanying code file for this article.
7.
Once we have iterated through all the stock
symbols, we are left with a data frame named
stocks
that contains the historical
prices for all the symbols in our target list and a data frame named price_summaries
that holds the summaries for
all our stocks. Let's graph them and see what they look like.
First, we will graph the historical
prices for all our stocks:
qplot(Date, AdjClose, data=stocks, geom="line", color=Symbol,
main="Daily Stock Prices")
ggsave(filename=("stock_price_combined.png"))
The preceding commands will produce
the following graph:
8.
Then, let's graph the price summaries:
summary <- melt(price_summaries,id="Symbol")
ggplot(summary, aes(x=variable, y=value, fill=Symbol)) +
geom_bar(stat="identity") + facet_wrap(~Symbol)
ggsave(filename=("stock_price_summaries.png"))
The resulting graph should look
similar to this:
How it works...
Daily stock price charts are
very "spiky" or volatile, and this sometimes makes them
difficult to read. Moving averages smooth out the price fluctuations
of a stock so that you can get a better sense of whether the stock is moving up
or down over time.
Moving averages are also used to time
investment in stocks. In other words, they are used as a guide to determine
whether to invest in a stock now or to wait. There are varying opinions about
what signals the best time, but one example is when the stock's 50-day moving
average is below its 200-day moving average but is trending up. For more on
moving averages, please see http://www.investopedia.com/university/movingaverage/.
The combined historical price chart
we generated in this section shows us the degree to which our target stocks'
prices move in tandem. If you are looking to invest in multiple stocks, it can
be good to invest in ones where the prices are not too highly correlated. You
can also visualize how volatile one stock has been when compared to another. In
our graph, you can see that the symbols WPZ and NRCIB have
been fairly volatile, while the other symbols have been somewhat less volatile.
Another way to look at the price
comparisons is by examining the price summaries' bar chart we created. This
chart shows the opening, high, low, and closing prices for the period analyzed.
The opening price is the very first price the stock traded at, the closing
price is the very last price the stock has traded at thus far, the high
price is the highest price the stock has been at during the period,
and the low price is the lowest price the stock has been at during the
period. The volatility mentioned previously can be viewed in a different way on
this graph, as you can clearly see the difference between the highs and the
lows of our two most volatile stocks. This chart also lets you see where the
stock's closing price is relative to its all-time high and all-time low, which
might help to give you a clue of the fairness of its current valuation.
This Post is an extract form https://www.packtpub.com book called Practical
Data Science cookbook.
Comments
Post a Comment