drekken watching the neural network

Pinch of Machine Learning For Your Bot

Apr 16th 2025

Why hello Reader,

During last year’s Roundtable, Timo mentioned that the key to making Reinforcement Learning work in SC2 was breaking the problem into smaller chunks. That idea stuck with me. And now that we’ve kicked off a new Terran bot for the MicroLadder this week, it feels like the right time to finally give RL a shot. The MicroLadder is all about small, focused challenges—exactly the kind of environment where this kind of experimentation might actually stand a chance.

RL bots have shown up on the melee ladder before—with mixed results—and honestly, most of that work was abandoned years ago. It’s tricky, slow, and rarely worth the effort… at least on the surface. But that’s exactly why I want to explore it. I’m not doing this because I know what I’m doing—I don’t. I’m doing it because no one else is, and the MicroLadder gives just enough structure to make the unknown feel manageable. So I reached out to one of the few who’s actually done it: the creator of DoogieHowitzer.

In Case You Missed It

Follow Along as I Build a Terran Micro Bot

Quick Context:

  • MindMe is known for his bot at the top of the ladder, NegativeZero, but not many know he also built an RL bot called DoogieHowitzer.
  • Its core goal was to produce cannon rushes, and it was trained on 12k battles.
  • It had 10 basic actions: 8 for probe movement, and 2 to build a cannon or a probe (unless a pylon wasn’t present when it tried to build a cannon).
  • Decommissioned in 2020.

I remember you saying you did it on a pretty low-end machine. Did it take a long time to train?

Yes, months—and the PC wasn’t powerful enough memory-wise to do a neural network big enough to do what it needed to do. Would not recommend.

It had like 9 actions it could take, and it basically only ever learned how to spam pylons and cannons. The other actions rarely got used.

I am thinking of using cloud GPU to train it, and I’m thinking it would cut the training down significantly (obviously there would be some costs). And you were doing RL, right? Not unsupervised?

Yeah, on that one. It was not a good attempt—I spent over a year on it.

Really?! Was the training the longest part?

Waiting for results from training. Train for a month, oh it sucks, try something else. Train for a month, oh it sucks, try something else. 😄

Good luck. It’s very hard.

My Takeaway

After talking with him, here’s what stood out to me:

  • Unsurprisingly, hardware is going to be a big deal. The current plan is to use DigitalOcean’s GPU droplet. (affiliate link, you get $200 credit)
  • It’s been a while since this was explored, so I’m looking into different training methods. Training from replays, as outlined in AlphaStar Unplugged, comes to mind.
  • Just training for micro will limit the scope, with the expectation of getting better results.

Happy Coding!

Drekken
Founder, VersusAI

📧 Drekken@versusai.net | 💬 Discord: drekken1

May the Bugs Be Ever In your Favour🪲

Community Wisdom

Email Preference:
Unsubscribe | Update your profile | 113 Cherry St #92768, Seattle, WA 98104-2205