How To Deploy a (Tiny) AI

michaelwaves

I was using neuronpedia's steering feature and was curious: How much does it cost to run? How does one do all the networking and expose the endpoints to the internet with a fancy domain?

The plan:

Make a project with a small open weight model
Choose a GPU provider
Tunnel to the internet

Make An AI Project

I built a steering interface where users can specify an emotional steering vector, steering strength, target layers, etc. and do emotional brain surgery on an AI. It runs on Qwen 2.5 7B Instruct and requires about 20-30GB of VRAM (accounting for vLLM KV cache, long sequences). Here's the repo.

I ran the frontend (a nextjs app) and the main server (main_fastapi.py) in two tmux sessions, at localhost:3000 and localhost:5000.

Choose a GPU Provider

I looked at Lambda, Runpod, and Vast AI. Here's what their prices look like as of 30/11/2025 (normalized to a single GPU). On average it costs about 2 cents/hour/GB of VRAM.

Vast tends to have the most options, since it's a marketplace where anyone can rent out their GPUs (you trade off reliability and security). Lambda has the fewest choices but it's bare metal, meaning you can run Docker and VMs.

You'll notice the MI300X pod from Runpod is a significant outlier. This is because no one wants to deal with AMD and its ROCM stack instead of CUDA. Similarly, Lambda has a GH200 instance but it has an ARM64 architecture, meaning nothing works on it without significant massaging.

I decided to go with Runpod because it had generally cheap prices and I had a bunch of credits. On Runpod, the cheapest option I could find that wasn't too slow was an A40 pod for 40 cents/hour.

You can also opt for an interruptible instance for 20 cents/hour

Tunnel to the Internet

Time to release your AI into the wild!

Believe it or not, most compute providers don't let random internet traffic into your instance. You have to manually specify which ports to expose. On Runpod, you can choose up to 10.

You can now already access the app at Runpod's default ports, but the domain is quite ugly.

Handling a Custom Domain

Because most providers use Docker, it's quite a pain to do port mapping and getting an SSL certificate. For example, on Vast AI external ports are randomly generated. Certificate helpers like certbot typically require an exposed port 80 (HTTP) and port 443 (HTTPS) when requesting a certificate.

As a workaround, you can use a Cloudflare tunnel.

First, download cloudflared (cloudflare daemon) on your GPU instance.

Then, login and choose a domain you have

Next, create a tunnel

Add this config to /etc/cloudflared/config.yaml. Hostname is your domain, and the service is which internal port your service is running on (in my case localhost:3000).

And add a CNAME record routing your domain to this tunnel

Open a tmux session and run the tunnel

Now open your browser, go to your domain, and the app magically works!

This is much easier than messing around with certbot and nginx, although the downside is you need to run a separate proxy server on your instance.

Limitations

There is no cybersecurity here whatsoever.

My interruptible Runpod instance is a DIY chaos monkey.

No tests were written, no backups were made, and there are no guarantees this thing will continue working for long.

Conclusion

Research code is cool but it's often hard to share it widely (especially to a nontechnical audience). Do you have a research project more people should play with? Launch your AI today!