Find Undervalued Stocks with R:

💰A Step-by-Step Guide

Want to find hidden gems in the stock market without needing an advanced finance degree? While Wall Street quants use complex engineering models, we can build a powerful, functional stock screener using **R** to do 80% of the heavy lifting.

Disclaimer: The goal of this article is to teach data analysis with R. This is not financial advice. Always do your own research before investing.

In this post, we will:

  • Scrape fundamental data for thousands of stocks.
  • Clean the data and handle outliers.
  • Build a "Relative Valuation" model to score stocks.
  • Screen for top prospects.
  • Visualize the historical price action of our targets.

🛠️ Prerequisites

You need R installed. We will rely heavily on **data.table** for fast data manipulation, **ggplot2** for visualization, and a few others for financial metrics.

Run this once to install needed packages:

R Code: Install Packages
install.packages(c("data.table", "ggplot2", "zoo", "quantmod"))

Load them at the start of your session:

R Code: Load Libraries
library(data.table)
library(ggplot2)
library(zoo)
library(quantmod) # For fetching historical price data

# Set your working directory for saving charts later
# setwd("~/path/to/your/project")

1: Acquiring the Data Snapshot

We need a snapshot of fundamental data (P/E ratios, debt levels, current price) for a broad range of stocks. The website **finviz.com** is excellent for this.

Rather than manually clicking through their screener, we can use a specially crafted URL that asks Finviz to export all available metrics into a CSV.

Note: This URL retrieves a large dataset and may take a moment to load.

R Code: Scrape Finviz Data
# A URL configured to export all metrics from Finviz
url_to_open <- 'http://finviz.com/export.ashx?v=152&c=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68'

# Read the CSV directly into a data.table
finviz <- setDT(read.csv(url(url_to_open), stringsAsFactors = FALSE))

# Let's look at the size of our data
dim(finviz)
# Output should show thousands of rows and 69 columns

2: Cleaning the Data

Financial data often comes in messy strings (e.g., `"15.5%"`, `"$120B"`). We need these to be raw numbers.

We will define the columns containing percentages and currency, strip the special characters, and convert them to numeric using **data.table's efficient in-place update operator (:=)**.

R Code: Clean and Convert Data
# Identify columns that need cleaning (those ending in '%' or containing specific names)
pct_cols <- grep("Margin|Payout|Float|Inst|ROE|ROA|ROI|Perf|Dividend|Sales|EPS|Insider", names(finviz), value = TRUE)
currency_cols <- c("Market.Cap", "Income", "Sales", "Book.sh", "Cash.sh", "Dividend")

# Define a quick helper function to strip characters
clean_numeric <- function(s) {
    as.numeric(gsub("%|\\$|,", "", s))
}

# Apply cleaning across specified columns in-place
finviz[, (pct_cols) := lapply(.SD, clean_numeric), .SDcols = pct_cols]
finviz[, (currency_cols) := lapply(.SD, clean_numeric), .SDcols = currency_cols]

# Ensure Price is numeric
finviz[, Price := as.numeric(Price)]

3: Handling Outliers

If you try to plot a histogram of stock prices right now, it will look useless because stocks like **Berkshire Hathaway A shares (BRK-A)** trade for hundreds of thousands of dollars, squashing the rest of the graph.

Let's find the most expensive stocks by Industry to identify the culprit:

R Code: Identify Outliers
# Calculate average price by Industry using data.table aggregation
industry_prices <- finviz[, .(AvgPrice = mean(Price, na.rm=TRUE)), by = .(Sector, Industry)]

# Order by price descending to find the top outliers
head(industry_prices[order(-AvgPrice)], 5)
# You will likely see "Property & Casualty Insurance" at the top due to BRK-A.

To prevent these outliers from skewing our sector averages later, let's remove Berkshire Hathaway's A shares:

R Code: Remove BRK-A
finviz <- finviz[Ticker != "BRK-A"]

4: Building the Relative Valuation Model

This is the core of our analysis. **"Relative Valuation"** means comparing a stock's metrics to its peers. If a company has a Price-to-Earnings (P/E) ratio of 15, and its industry average is 25, it might be **"relatively" undervalued.**

We will select five key metrics and calculate the industry average for each.

R Code: Calculate Industry Averages
# The metrics we want to compare
valuation_metrics <- c("P.E", "PEG", "P.S", "P.B", "Price")

# Calculate Industry Averages and add them as new columns to our main dataset
# We use .SD (Subset of Data) and lapply to calculate means for all metrics at once, grouped by Industry
finviz[, paste0("IndAvg_", valuation_metrics) := lapply(.SD, mean, na.rm=TRUE),
        by = .(Sector, Industry),
        .SDcols = valuation_metrics]

Now we create a simple scoring system. For each of the five metrics, if the stock is **lower** than its industry average, we give it **1 point**. A perfect "undervalued" score is 5.

R Code: Apply Scoring Logic
# Initialize score at 0
finviz[, ValueScore := 0]

# Add 1 point for each metric that is below the industry average
finviz[P.E < IndAvg_P.E, ValueScore := ValueScore + 1]
finviz[PEG < IndAvg_PEG, ValueScore := ValueScore + 1]
finviz[P.S < IndAvg_P.S, ValueScore := ValueScore + 1]
finviz[P.B < IndAvg_P.B, ValueScore := ValueScore + 1]
finviz[Price < IndAvg_Price, ValueScore := ValueScore + 1]

# Let's see how many stocks got a perfect score
nrow(finviz[ValueScore == 5])

5: The Final Screen

We now have a master list with valuation scores. But a cheap stock isn't always a good stock. We need to filter out risky companies. Let's define criteria for a "healthy" prospect:

  • USA based (for simplicity)
  • Liquid: Decent trading volume (>50k avg daily)
  • Not a penny stock: Price > $10
  • Profitable: Positive earnings (EPS > 0) and projected growth.
  • Manageable Debt: Debt/Equity ratio < 1.
  • Undervalued: A ValueScore of at least 4 out of 5.
R Code: Apply Screening Criteria
target_stocks <- finviz[Country == "USA" &
                        Average.Volume > 50000 &
                        Price > 10 &
                        EPS..ttm. > 0 &
                        EPS.growth.next.5.years > 0 &
                        Total.Debt.Equity < 1 &
                        ValueScore >= 4]

# Look at the results
print(target_stocks[, .(Ticker, Company, Sector, Price, ValueScore)])

Depending on market conditions, this list might be long or very short. Adjust your criteria as needed!


6: Visualizing Historical Prices

We have our targets. Now let's look at their price history to see how they have performed recently. We will use the **quantmod** package to fetch daily price data and calculate 50-day and 200-day moving averages to visualize trends.

R Code: Fetch and Process Historical Data
# Extract the tickers from our target list
tickers <- target_stocks$Ticker

# Fetch historical data for all tickers at once from Yahoo Finance
stock_data_list <- lapply(tickers, function(sym) {
    tryCatch({
        # Get data from 2020 to present
        df <- as.data.table(getSymbols(sym, src = "yahoo", from = "2020-01-01", auto.assign = FALSE))
        setnames(df, "index", "Date") # Rename date column
        df[, Ticker := sym] # Add ticker symbol column

        # Keep only Adjusted Close price and Date
        df <- df[, .(Date, Ticker, AdjClose = get(paste0(sym, ".Adjusted")))]

        # Calculate moving averages using zoo::rollmean
        df[, MA50 := rollmean(AdjClose, 50, fill = NA, align = "right")]
        df[, MA200 := rollmean(AdjClose, 200, fill = NA, align = "right")]
        return(df)
    }, error = function(e) return(NULL)) # Skip if a ticker fails
})

# Combine the list into one large data.table using rbindlist (very efficient)
history_dt <- rbindlist(stock_data_list, fill = TRUE)

# Remove rows before moving averages could be calculated
history_dt <- na.omit(history_dt)

Finally, let's use **ggplot2** to create a faceted chart showing the price action versus the moving averages for our prospects.

R Code: Plot & Save Chart
ggplot(history_dt, aes(x = Date)) +
    geom_line(aes(y = AdjClose, color = "Price"), size = 0.8) +
    geom_line(aes(y = MA50, color = "50-Day MA"), linetype = "dashed") +
    geom_line(aes(y = MA200, color = "200-Day MA"), linetype = "dotted") +
    facet_wrap(~Ticker, scales = "free_y") +
    theme_minimal() +
    labs(title = "Target Stocks: Price vs Moving Averages",
         y = "Adjusted Price", color = "Metric") +
    theme(legend.position = "bottom")

# Save the chart
ggsave("target_stocks_analysis.png", width = 12, height = 8)

Analyzing the output: You now have a visual dashboard of your potentially undervalued stocks. Traders often look for stocks where the price is above the 200-day moving average (indicating a long-term uptrend) or where the 50-day MA crosses above the 200-day MA (a "golden cross").

You have successfully built an **end-to-end stock analysis pipeline in R!**

Comments

Popular posts from this blog

Driving Visual Analysis with Automobile Data (R)

Unlocking the power of relational data visualization with ggraph

Beyond the Buzz: Understanding Classification Model Performance