Crypto quant trading: Intro

Alexei

I’m going to write a few posts on quant trading. Specifically trading crypto, since that’s what I know best. Here’s a few reasons why I’m doing this:

I think I can benefit a lot from writing about my approach and methodology. Hopefully this will make the ideas and assumptions more clear.
I’d love to get input from other people in the community on their approaches to model building, data analysis, time series analysis, and trading.
There’s been a lot of great content on this website, and I’d love to contribute. This is the topic I currently know best, so I might as well write about it.
My company (Temple Capital) is also looking to hire quants and we believe the rationalist way of thinking is very conducive to successful quant trading.

My goal here isn’t to make you think that “Oh gosh, I can become a millionaire by trading crypto!” or “Here’s the strategy that nobody else has found!” Instead, I want to give you a taste of what quant trading looks like, and what thinking like a quant feels like. EAs have been talking about earning to give for a while, and it’s well known that quant trading is a very lucrative career. I’ve known about it for a while, and several of my friends have done quant (e.g. at Jane Street) or worked at a hedge fund. But, I never thought that it was something I could do or would find enjoyable. Turns out that I can! And it is!

I’m going to be sharing the code and sometimes the step by step thinking process. If you’re interested in learning this on a deeper level, definitely download the code and play with the data yourself. I’ve been doing this for just over a year, so in many ways I’m a novice myself. But the general approach I’ll be sharing has yielded good results, and it’s consistent with what other traders / hedge funds are doing.

Setup

Note: I’ve actually haven’t gone through these install steps on a clean machine. I think they’re mostly sufficient. If you run into any issues, please post in the comments.

Make sure you have Python 3.6+ and pip
`pip install pandas numpy scipy matplotlib ipython jupyter`
`git clone https://github.com/STOpandthink/temple-capital.git`
`cd temple-capital`
`jupyter notebook`
Open `blog1_simple_prediction_daily.ipynb`

If you’re not familiar with the tools we’re using here, then the next section is for you.

Python, Pandas, Matplotlib, and Jupyter

We’re going to be writing Python code. Python has a lot of really good libraries for doing numerical computation and statistics. If you don’t know Python, but you know other programming languages, you can still probably follow along.

Pandas is an amazing, wonderful library for manipulating tabular data and time series. (It can do a lot more, but that’s primarily what we’re using it for.) We’re going to be using this library a lot, so if you’re interested in following along, I’d recommend spending at least 10 minutes learning the basics.

Matplotlib is a Python library for plotting and graphing. Sometimes it’s much easier to understand what’s going on with a strategy when you can see it visually.

Jupyter notebooks are useful for organizing and running snippets of code. It’s well integrated with Matplotlib, allowing us to show the graphs right next to the code. And it’s good at displaying Pandas dataframes too. Overall, it’s perfect for quick prototyping.

There are a few things you should be aware of with Jupyter notebooks:

Just like running Python in an interactive shell mode, the state persists across all cells. So if you set the variable `x` in one cell, after you run it, it’ll be accessible in all other cells.
If you change any of the code outside of the notebook (like in `notebook_utils.py`), you have to restart the kernel and recompute all the cells. A neat trick to avoid doing this is:
`import importlib`
`importlib.reload(notebook_utils)`

Our first notebook

We’re not going to do anything fancy in the first notebook. I simply want to go over the data, how we’re simulating a trading strategy, and how we analyze its performance. This is a simplified version of the framework you might use to quickly backtest a strategy.

Cell 1

The first cell loads daily Bitcoin data from Bitmex. Each row is a “daily bar.” Each bar has the `open_date` (beginning of the day) and `close_date` (end of the day). The dataframe index is the same as the `open_date`. We have the `high`, `low`, and `close` prices. These are, respectively, the highest price traded in that bar, the lowest, and the last. In stock market data you usually have the open price as well, but since the crypto market is active 24/7, the open price is basically just the close price of the previous bar. `volume_usd` shows how much USD has been transacted. `num_trades_in_bar` is how many trades happened. This is the raw data we have to work with.

From that raw data we compute a few useful variables that we’ll need for basically any strategy: `pct_change` and `price_change`. `pct_change` is the percent change in price between the previous bar and this bar (e.g. 0.05 for +5%). `price_change` is the multiplicative factor, such that: `new_price = old_price * price_change`; additionally, if we had long position, our portfolio would change: `new_portfolio_usd = old_portfolio_usd * price_change`.

A few terms you might not be familiar with:

We take a long position when we want to profit from the price of an asset going up. So, generally, if the asset price goes up 5%, we make 5% on the money we invested.
We take a short position when we want to profit from the price of an asset going down. So, generally, if the asset price goes down 5%, we make 5% on the money we invested.

Cell 2

Cell 3

Here we see that indeed BTC recently crossed its 200 day SMA (Simple Moving Average). One neat thing about that that I didn’t realize myself is that it looks like the SMA has done a decent job of acting as support/resistance historically.

Cell 4

Cell 5

Here we simulate a perfect strategy: it knows the future!

One thing to note is that the returns are not as smooth / linear as one might expect. It makes sense, since each day bar has a different `pct_change`. Some days the price doesn’t move very much, so even if we guess it perfectly, we won’t make that much money. But it’s also interesting to note that there are whole periods where the bars are smaller / bigger than average. For example, even with perfect guessing, we don’t make that much money in October of 2018.

Cell 6

Here we simulate what would have happened if we bought and held at the beginning of 2017 (first graph) vs shorted.

Quick explanation of the computed statistics:

Returns: multiplicative factor on our returns (e.g. 5.2 means 420% gain or turning $1 into $5.20)
Returns after fees: multiplicative factor on our returns, after accounting for the fees that we would have paid for each transaction. (On Bitmex each time you enter/leave a position, you pay 0.075% fees, assuming you’re placing a market order.)
SR: is Sharpe Ratio. It’s a very common metric used to measure the performance of a strategy. “Usually, any Sharpe ratio greater than 1 is considered acceptable to good by investors. A ratio higher than 2 is rated as very good, and a ratio of 3 or higher is considered excellent.” (Source)
% bars right: what percent of days did we guess correctly.
% bars in the market: what percent of day were we trading (rather than being out of the market). (It’s a bit misleading here, because 1.0 = 100%)
Bars count: number of days simulated

Cell 7

There are more graphs in the notebook, but you get the idea.

I’m not going to discuss this particular strategy here. I just wanted to show something more interesting than constantly holding the same position.

Future information

One of the insidious bugs you can run into while working with time series is using future information. This happens when you make a trading decision using information you wouldn’t have access to if you were trading live. One of the easiest ways to avoid it is to do all the computation in a loop, where each iteration you’re given the data you have up until that point in time, and you have to compute the trading signal from that data. That way you simply don’t have access to future data. Unfortunately this method is pretty slow when you start working with more data or if there’s a lot of computation that needs to be done for each bar.

For this reason, we’ve structured our code in a way where to compute the signal for row N, you can use any information up to and including row N. The computed `strat_signal` will be used to trade the next day’s bar (N+1). (You can see the logic for this in `add_performance_columns()`: `df['strat_pct_change'] = df['strat_signal'].shift(1) * df['pct_change']`. This way as long as you’re using standard Pandas functions and not using `shift(-number)`, you’ll likely be fine.

That’s it for now!

Potential future topics:

What is overfit and how it impacts strategy research
Filters (market regimes, entry/exit conditions)
Common strategies (e.g. moving average crossover)
Common indicators
Using simple ML (e.g. Naive Bayes)
Support / resistance
Autocorrelation
Multi-coin analysis

Questions for the community:

Do you feel like you understand what’s going on so far, or should I move slower / zoom in on one of the prerequisites?
What topics would you like me to explore?
What strategies are you interested to try?

Just letting you know that I am still looking forward to the next post in this sequence.

I’ll try to post it this weekend. :)

I've been a trader for 2 years, it's just a hobby for me. I've mostly done Sentiment and Fundamental but wanted to try Technical and others.

If you were a hobbist like me, what would you say is the best way to host a bot? Should I get my own server? Do I rent one? Or do I use an exchange?

Thanks, great post and I love your logo.

Hmm, that's a good question. I haven't looked into it too much, but I'd definitely try to leverage a 3rd party platform for hosting or at least use a 3rd party library to do the trading. https://enigma.co/catalyst/ was kind of decent when I used it for a while a year ago. By now, if they're still improving it, it's probably pretty good. One advantage there is that they have the data too. https://github.com/ccxt/ccxt is a wonderful API that generalizes how you interact with exchanges.

Hosting wise, AWS (what we use) or Google Cloud would both work. I personally like the latter a lot more.

Depending on how complex your strategies are, may be something like this could work: https://www.youtube.com/watch?v=P_YmXyUsb9A

This is quite interesting. I don't expect to ever get into quant-trading myself, but still expect to find a bunch of this valuable. I like the hands-on approach and the pace seems roughly good to me so far, though I already have some amount of data-science experience.

The anti-inductive nature of the whole thing makes me really confused about what kinds of strategies I would want to explore. Like, we could use any of the standard AI methods, from bayesian modeling, to a hidden markov model to deep neural nets, but I don't expect any of them to work.

This comment was written in response to you feeling confused about what strategies to explore. I might write a fuller post about it, but for now here're the thoughts off the top of my head:

Calling marking anti-inductive is correct, but it's not helpful when trying to find strategies (as you've just noticed). I'd break down the strategy research process steps into:

1) Can you find a strategy (algorithm + data) that historically has performed well?

2) Can you find this strategy in such a way so as not to find a ton of other strategies that worked by random chance?

3) What % of the market has figured out this strategy?

From Eliezer's post:

Let's say you see me flipping a coin. It is not necessarily a fair coin. It's a biased coin, and you don't know the bias. I flip the coin nine times, and the coin comes up "heads" each time. I flip the coin a tenth time. What is the probability that it comes up heads?

If you answered "ten-elevenths, by Laplace's Rule of Succession", you are a fine scientist in ordinary environments, but you will lose money in finance.

In finance the correct reply is, "Well... if everyone else also saw the coin coming up heads... then by now the odds are probably back to fifty-fifty."

Right. But if it's slightly more complicated than just looking at the coin, then suddenly we can have an edge:

1) May be not everyone can write the code to compute which way the coin is facing (algorithm). May be not everyone can see the coin (data).

2) May be other people are looking at the weather and the weather has been sunny nine days in a row.

3) May be not everyone can run their algorithm fast enough to make the trading decision in time. May be others figured out this strategy, but they're not confident in it enough to deploy a lot of money.

So once you find your strategy, you might be in a pretty small group of people who have discovered it. So you'll be fine in proportion to how much money is allocated to this strategy vs how much capacity it has.

And the lesson is: aim for a strategy complexity that's simple enough to pass 2), but complicated enough that most people haven't found it. And the bar for that is actually not that high (at least in crypto).

I have 1 foot in finance, and still learning a lot. As it appears you self-taught this stuff, what were good resources for you?

Well, I was learning together with the rest of our team. So there were a lot of conversations and an inflow of information from all sorts of places: books, blogs, Quora, advisors, internet research, etc... Honestly, it's all a little helpful to get you thinking along the right lines. So I can't recommend any specific resource, but I can recommend learning broadly.

Okay, one particular resource I'll recommend is: https://www.youtube.com/channel/UC-dLWl8etTtPSGdbbcYffGw

If you have things that you found helpful, please share them here too.

Are you working with Satvik? I know he was into this stuff.

Yeah, Satvik works on the same team as Alexei (I think)

Just commenting to say thanks for sharing this! I'm a statistician currently dipping more into python/ML (mostly an R background) with a casual interest in trading and this looks like a great set of material to work through while learning.

This was a nice introductory post, my question is as a firm and under the assumption you have large capital what is your experience on using market making strategies? from a few tests I have done last year the arbitrage opportunities are good but I could not afford bigger size so my profits were relatively smalI. My interest in market making is due to the limited need of "predictions" that I am highly skeptical of and instead focusing on spread inefficiency. Edit : I also think that having a good strategy is merely 25% of your trade especially if you trade bigger orders then you need to focus extensively on order execution and take into account fees and commissions depending on your side for example when I tried market making I found myself bleeding on fees that I didn't really account for. Fun Fact I competed on Numerai when it had just launched and found myself having better results (score) using Linear Models instead more complex ones if I have time I may jump again to try using NEAT methods to evolve linear models and see how they fare.

Can't comment on market making; that's not something we do.

I agree that linear models usually perform much better.

Minor comments:

price_change is the multiplicative factor, such that: new_price = old_price * price_change; additionally, if we had long position, our portfolio would change: new_portfolio_usd = old_portfolio_usd * price_change.

These should be "(price_change + 1)", right?

One neat thing about that that I didn’t realize myself is that it looks like the SMA has done a decent job of acting as support/resistance historically.

I'm not familiar with support/resistance, can you clarify?

price_change=price/old_price, so no. pct_change is what you're thinking.

Basically support is some line (horizontal or angled) which predicts when the price going down will stop and revert to going up. It's acting as a kind of "support". Resistance is the opposite. I'll probably end up writing a post on this sometime.