Part 6 of the *Inefficient Markets* sequence.

[I am not your financial advisor! For educational purposes only. Read part 1 first.]

The technical analyst reads the signs and portends and divines the future prices. Three black crows are a bad omen, don'tcha know. An abandoned baby is a sign of changes coming. The morning star doji means the Fates are smiling.

Candlestick patterns! Trendlines! MACD's! Bollinger Bands! Stochastics! There are dozens and dozens of these so-called "technical indicators". Your stockbroker's trading software is probably full of them. They all sound very arcane and financial. Many of them have tunable parameters.

Do you know what the right parameters are? Do you know what the signs and portends mean?

Neither do they.

Maybe some of this stuff used to work. Some of it may be self-fulfilling prophecy. (Ichimoku charts only work in Japan for some reason.)

Maybe if you stare at enough charts, memorize enough patterns, some day it will all click.

Some traders seem to think that way. They may even think they're making progress. I'd call it confirmation bias.

There is something to some of this, but you're not quantifying it. What parameters? Why? How big is your edge?

Stop thinking like a diviner, and start thinking like a data scientist.

## How to Chart Like a Quant

I'm not going to say that you never look at a price chart (although I have heard that said), it does tell you some things at a glance.

But as trading styles go,
collecting directional alpha in a single asset is *hard mode*.
And people think they can do it using a gut feel for *signs and portends*?

*Really?*

We kind of have to start with a single asset, because it's hard to explain how to chart a pair without explaining how to chart a single asset first, but we do want to be able to compare the relative performance of assets later on.

Share price is not the same as market cap. It's as much about how many shares were issued, and how often it split as about how well the company performed.

Comparing two assets on share price is apples-to-oranges. It's just useless for our purposes.

Even comparing the recent prices to past prices is not that helpful. Is Apple the same company it was in the '80s? Different market, different CEO, different employees (mostly). And different share price.

If you want to make money,
you don't care so much about *where it was*,
but *how it moves*.

Think velocity and acceleration,
not position.
We don't care about where it *is*,

We don't have to do differential calculus, but that's kind of the idea. Because price history is not continuous (there have been a finite number of trades in the history of an asset), discrete methods will do.

### From Adjusted Close to Daily Returns

All right. Hands-on learning time. We're going to transform and visualize some market data, but to do that, we first need some data.

There are many sources of market data.
We want one with a daily *adjusted close*.
That's the close price in terms of the current number of shares,
accounting for any splits and dividends.

Honestly, once you know how, it's a easier to do data science in a Jupyter Notebook with Python (or in R, or J, among others), especially when using very large data sets. I would be using Python for my own analysis, but a spreadsheet is a little more accessible, so I'll start there to explain how it's done.

First, get the data.
For example, go to https://finance.yahoo.com (or wherever you can get adjusted close data)
and look up a big tech company, like `AAPL`

(Apple Inc.)

Find the "Historical Data",
set the time period to get at least ten years worth,
with "daily" resolution.
Find the "download" link to get the data as a `.csv`

file.

Do the same for a big index ETF, like `SPY`

(SPDR S&P 500 ETF Trust).

I'll be using the free and open-source Gnumetric, but any good spreadsheet application (Google Sheets, Microsoft Excel, etc.) can do this stuff.

You should see a table that looks something like this, with a lot more rows.

Date | Open | High | Low | Close | Adj Close | Volume |
---|---|---|---|---|---|---|

2010-08-20 | 35.627144 | 36.274284 | 35.57143 | 35.662857 | 30.817675 | 96057500 |

2010-08-23 | 35.970001 | 36 | 35.035713 | 35.114285 | 30.343628 | 103510400 |

Select the "Date" and "Adj Close" columns, and find the button to plot a chart. You should get something like this:

This is basically what your broker would give you. This is not what we want. We want something like the first derivative of this (think velocity, not position), but it's discrete data, so we'll use finite differences.

In the next open column, add a "differences" header.
Then enter the formula to get the difference in adjusted close
from the previous day `=F3-F2`

in cell `H3`

,
in this example.
Double click the fill handle to fill the formula into the rest of the column.
This is the profit or loss for holding one (adjusted) share for one day.
We'll be adding several columns.
It's probably a good idea to save your work after each column.
But the `.csv`

format doesn't store the formulas,
so save using your spreadsheet's native format (`.gnumetric`

in this example).

Select the "differences" column and plot a chart. You should get something like this:

Notice that the differences chart looks like more of a stationary process than the price chart did. This should make it easier to compare market behavior over different time frames. But notice that we still have a problem. The earlier times have a much lower variance than the later times. This isn't because the volatility increased; it's because the share price increased. But, except for granularity, the number of dollars you can put in doesn't depend on the share price.

In the next open column, add a "returns" header.
Then scale the difference by the previous day's price to get the percentage return.
In this example, you would enter the formula `=H3/F2`

in the cell `I3`

, and fill down.
Then select the returns column and change its display style to percentage.
This is the percentage profit or loss you would get for holding a portfolio of this asset for one day, regardless of how many shares.

Now plot the chart for "returns". You should get something like this:

Notice that the process looks very stationary now. You can more easily compare the behavior of the asset at different times, and also between assets by charting them this way.

### From Daily Returns to Distribution Histograms

For a typical stock, this chart will look very noisy.

We can get a better handle on "random" data by using the usual statistical methods. Try computing some summary statistics. Note the skewness and variance. Your spreadsheet should have functions for these.

The geometric mean of returns is a more useful statistic than the arithmetic mean.
But this is not the right form to compute it.
In the next open column, add a "factors" header.
Add one to the returns.
In this example, you would use the formula `=1+I3`

in the cell `J3`

, and fill down.
Multiplying that factor by your portfolio value yields the portfolio value for the next day.
If the geometric mean of these factors is over 1,
then you have a positive expectancy for buy-and-hold.

We can also plot the distribution of returns as a histogram. In Gnumetric, you can find this feature in the statistics menu. One-tenth the number of rows is probably a reasonable bin count (Gnumetric calls them "cutoffs"), but the bars can get too thin to see well if you use too many.

You should get something like this:

Notice that it resembles a normal distribution, but it's not perfect. There are extreme outliers, or fat tails. Most days it doesn't move very much. There's a bit of negative skew.

### Autocorrelation Effects

One form of exploitable non-randomness that's easy to look for is autocorrelation. This means that if you shift the data by a certain time, you may observe a dependence between the shifted data and the original. In other words, there's a tendency to trend or to revert to the mean. Pure noise doesn't show this property, although, due to chance, the correlation would not be zero.

In the next open column, add a "shifted 1" header. We want to ask the question, was the previous day's return at all predictive of today's?

In this example, you would use the formula `=I3`

in cell `K2`

, and fill down.
Now "shifted 1" should have the same data as "returns" but shifted up by one row.

Select the "returns" and "shifted 1" columns and find "correlation" in the statistics menu.
You can also plot the resulting table as columns.
We see that there's about a -0.057 correlation for `AAPL`

in this time frame,
in other words, a small mean-reversion tendency.

It's small. Is this statistically significant, or what we'd expect from noise? If it's just noise, there's really no reason to expect this to hold in the future. Is it a big enough effect to exploit after paying spread and commissions? These are questions to keep in mind.

Let's try this with the index.
In a new Gnumetric window, open the index data (`SPY`

in this example).

Copy the columns you added
and paste them in the same positions in the new window
to apply our formulas to the `SPY`

data.

Compute the correlation table again. Notice anything different?

The correlation is almost -0.15. That's a much stronger effect. Is this significant or due to noise?

Collecting directional alpha in a single asset is *hard mode*.
By using an index ETF, we're looking at a basket of multiple assets instead,
and got a much stronger mean-reversion effect.

An edge doesn't have to be all that big to make money over time,
if you have the discipline to trade it *systematically*.

The difference was already that noticeable
just by using an ETF that tracks a basket instead of a single stock.
Does `SPY`

have the strongest effect?
Try others and see!
Let me know what you find in the comments below.

But there's no reason to think any of the available ETFs are optimal.
You can get even stronger mean-reversion effects with a basket of assets you specifically select and balance to behave that way.
Actually *finding* those is a bit of a trick.
And trading them can get a bit more complex.
But that's a topic for a later post.

You can also try shifting by different amounts. You can add a "shifted 2" and "shifted 3" column, and so on, and plot the correlations all at once.

### Conditional Return Distributions

So we've found what appears to be some non-randomness in market behavior. Is it exploitable?

We can plot a histogram to show the returns distribution for a *strategy*,
instead of a single asset.

Let's use the index ETF, because it had a stronger effect. The basic strategy will be to hold only if the previous day's close was negative. Otherwise, our return will be zero, because we're holding cash instead.

In the next open column,
add the header "strategy returns".
In this example, that's `=IF(K3<0, I3, 0)`

in cell `L3`

,
which means "if 'shifted 1' is negative, then the return is as before, otherwise zero."

Let's plot the histograms together. Select the "returns" and "strategy returns" columns, and find "histogram" in the statistics menu.

You should get something like this:

I've adjusted the colors and zoomed in on the Y-axis a bit. That big spike in magenta goes much higher. The green histogram is the original returns. The magenta is the strategy. The spike is from the zeros, which is the most frequent bin for our strategy. Notice that for the strategy, the right side of the zero line tends to be higher. The green histogram shows the opposite effect. That's the strategy exploiting the mean-reversion effect. Notice also that we escaped some of the most negative outliers.

Plotting *conditional* histograms like this is a good way to see what effect
a condition has on return distributions.

There are many other simple conditions you might try. Does it work better if you use some cutoff more negative than zero? Maybe the effect is bigger, but that would also increase the number of zero returns. Maybe it's worth it, maybe it isn't.

Those signs and portends I mentioned before. How do you know how to use them? How do you know if they're effective? You can make a conditional histogram based on whatever signal, and see what effect it has on returns.

A word of caution. You can stack a few rules to get better distributions, but the more ad-hoc you make your strategy, the more likely you're overfitting to the noise. Plot simple rules independently before you plot the combined strategy. Keep your strategies to just a handful of simple rules.

The idea isn't to *optimize* your strategy's parameters,
as that will risk an overfit,
instead try the *obvious* rule for the effect you want to exploit,
and see if perturbations to the parameters make the effect disappear.
If so, it was probably noise.
We're not interested in trading any effect so fragile.
We'd rather trade what's likely to hold up in the future.
Any monkey can optimize a strategy on *past* data.
But we can't trade in the past.

Also try the rule on different time frames. If it works on old data, but not recent data, then maybe the pattern no longer holds. If it inverts frequently, maybe it's just noise.

Let's try some summary statistics for the strategy vs the original. Also make a "strategy factors" column and compute the geometric mean. What do you find? How does it compare? Also look at the variance, and consider leverage. Even though the absolute returns may be lower, is the Sharpe ratio better? If so, we can leverage up to compensate without adding much risk.

Experiment with these and compare notes with each other in the comments below. If any of you are statistics people, I'd like to hear your thoughts about statistical significance, and how we can apply Bayes' rule to this kind of analysis.

If you find a nice effect, check the quant/finance literature on Google Scholar. If someone else has found it too, you can be more confident that you've found a legitimate edge, and you're not just chasing ghosts in the noise.