LLMallard: the low-key AI chat bot you secretly need

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

I've been known to have some lightly critical takes about AI coding partners. Even more, I'm always keen to improve my toolkit. Let me tell you, I've discovered my perfect workflow for taking advantage of the deeply powerful AI tooling. Not only has this flow helped me arrive at a deeper understanding of the problems I'm working on, but it's also hyper-efficient on token usage, with minimal API calls to the expensive cutting-edge models.

I think it's really important to share when we find a workflow that helps, so I've published this as a standalone chatbot. It draws on a key practice, fundamental to software development for folks who have been around longer than AI.

Meet LLMallard, the Limited Language Mallard and perfect pairing partner.

You can treat it like any other LLM coding assistant: write in your question, and while the model is parsing, you might just find that you come to a solution on your own.

Seriously, rubber ducking is fantastic. It's so good that when I recently wrote into my model of choice, I came to an answer before I had even fully built out the question. 0 tokens spent, 0 colleagues interrupted by a Slack ping, environmentally friendly, and very zen.

How we think about things, how we talk about things, how we see our ideas written—all of this changes our understanding.

I'm not going to pretend. I get a lot out of coding assistants and use them more days than not. I still don't have the right balance of when to reach for that tool over others in my tool belt.

Recently, I was pairing with another Agent on a kinda-weird data transformation method. We're adhering to strict sorbet typing in the project, and we couldn't remember the syntax for the signature we needed. The perfect place for a copilot, in my experience so far. The in-line suggestion in my editor wasn't quite right, so I took a moment to write out a question. When the model earnestly suggested T.untyped, I had a good chuckle while pulling up the docs.

On the other hand, I had a ton of fun vibe-coding my way to that silly LLMallard app. The code is far from what I'd want to put into a production app, but it gave me a great excuse to road-test a theory (and make a questionably funny rubber ducking joke).

I've been noodling with running models locally via Ollama. My javascript is a bit rusty, and I wanted to experience the joy others have said vibe coding brings. I found a decent flow where I'd pair with the local model, giving it very tight constraints for incrementally adding features or styling changes. When the local model started losing fidelity (token window thresholds, complexity churn), I'd take a single larger pass via a cloud-based model to refactor or fix things the local model struggled with. That's the new foundation to drive with the local model.

I built LLMallard in about 45 minutes of 'actual' coding with the models. I know a lot of devs who would crank that out way faster, but it sped me up a lot while shaking off my rusty javascript. I made exactly one call to a cloud model that cost about $0.45. That’s a substantial discount from the other 'vibe coding' experiments I've tried.

How do you know which LLM or AI tool to reach for in a given circumstance?

I think through all of this, I'm getting a much better vision of where to reach for different tools. I’ve added which models I’ve been using via Ollama:

I just needed a coding duck (no model)
I want to know some fundamentals of a well-known tool (documentation, maybe combined with small local models)
I need a one-liner (gemma3:7b / gemma3:12b locally)
I want to spike a small feature (gemma:12b / gemma3:27b locally)
I'm refactoring a well-isolated chunk of code (somewhere between the powerful local models and a cloud tool)

My intuition is still struggling to effectively use these tools when I’m trying to tackle a wide-spanning problem. Add in a large legacy system and/or a bunch of custom in-house tooling, and that’s a personal recipe for burning time and tokens to little effect. That said, I’ve seen some unreal work done with Claude Code. It’s wonderful—and expensive.

Is this nuance or division necessary? Not today. Every service under the sun wants us to shove everything into their most powerful models and are operating at a loss to entice us while continuing to tweak and improve. I'm deeply skeptical that business model will last, and am equally in awe of the breakneck speed of evolution happening around me. So I want to keep experimenting with leveraging low-fidelity local models before tapping into the massive power wielded by the giants.

I think a narrow-focused model acting as a tightly constrained agent is an important building block to learn to make. If the hype internet is right, it may be a key dependency in future systems and is worth learning how to leverage them. At worst, I'll have some fun making a silly app.

‍

Related Insights

Explore our insights

See all insights

Developers

Power up with Rails scripts Part 2: Docker

In part 2 of the three part series on Rails scripts, learn about short shell scripts for simplifying Docker interactions from Rails apps.

Ed Toro

Developers

Power up with Rails scripts Part 1: Environment setup

Short shell scripts for simplifying onboarding from Rails apps.

Ed Toro

Developers

Pydantically perfect: Normalize legacy data in Python

Learn how to normalize inconsistent data structures in Python with Pydantic. The post guides you through different approaches and pitfalls, using Pydantic's alias path and alias choices features.