Skip to main content
Test Double company logo
Services
Services Overview
Holistic software investment consulting
Software Delivery
Accelerate quality software development
Product Management
Launch modern product orgs
Legacy Modernization
Renovate legacy software systems
DevOps
Scale infrastructure smoothly
Upgrade Rails
Update Rails versions seamlessly
Technical Recruitment
Build tech & product teams
Technical Assessments
Uncover root causes & improvements
Case Studies
Solutions
Accelerate Quality Software
Software Delivery, DevOps, & Product Delivery
Maximize Software Investments
Product Performance, Product Scaling, & Technical Assessments
Future-Proof Innovative Software
Legacy Modernization, Product Transformation, Upgrade Rails, Technical Recruitment
About
About
What's a test double?
Approach
Meeting you where you are
Founder's Story
The origin of our mission
Culture
Culture & Careers
Double Agents decoded
Great Causes
Great code for great causes
EDI
Equity, diversity & inclusion
Insights
All Insights
Hot takes and tips for all things software
Leadership
Bold opinions and insights for tech leaders
Developer
Essential coding tutorials and tools
Product Manager
Practical advice for real-world challenges
Say Hello
Test Double logo
Menu
Services
BackGrid of dots icon
Services Overview
Holistic software investment consulting
Software Delivery
Accelerate quality software development
Product Management
Launch modern product orgs
Legacy Modernization
Renovate legacy software systems
Cycle icon
DevOps
Scale infrastructure smoothly
Upgrade Rails
Update Rails versions seamlessly
Technical Recruitment
Build tech & product teams
Technical Assessments
Uncover root causes & improvements
Case Studies
Solutions
Solutions
Accelerate Quality Software
Software Delivery, DevOps, & Product Delivery
Maximize Software Investments
Product Performance, Product Scaling, & Technical Assessments
Future-Proof Innovative Software
Legacy Modernization, Product Transformation, Upgrade Rails, Technical Recruitment
About
About
About
What's a test double?
Approach
Meeting you where you are
Founder's Story
The origin of our mission
Culture
Culture
Culture & Careers
Double Agents decoded
Great Causes
Great code for great causes
EDI
Equity, diversity & inclusion
Insights
Insights
All Insights
Hot takes and tips for all things software
Leadership
Bold opinions and insights for tech leaders
Developer
Essential coding tutorials and tools
Product Manager
Practical advice for real-world challenges
Say hello
Developers
Developers
Developers
Software tooling & tips

LLMallard: the low-key AI chat bot you secretly need

The most perfect dev workflow for taking advantage of deeply powerful AI tooling that’s hyper efficient on token usage with minimal API calls—and the perfect pair programming partner.
Daniel Huss
|
May 12, 2025
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

I've been known to have some lightly critical takes about AI coding partners. Even more, I'm always keen to improve my toolkit. Let me tell you, I've discovered my perfect workflow for taking advantage of the deeply powerful AI tooling. Not only has this flow helped me arrive at a deeper understanding of the problems I'm working on, but it's also hyper-efficient on token usage, with minimal API calls to the expensive cutting-edge models.

I think it's really important to share when we find a workflow that helps, so I've published this as a standalone chatbot. It draws on a key practice, fundamental to software development for folks who have been around longer than AI.

Meet LLMallard, the Limited Language Mallard and perfect pairing partner.

You can treat it like any other LLM coding assistant: write in your question, and while the model is parsing, you might just find that you come to a solution on your own.

Seriously, rubber ducking is fantastic. It's so good that when I recently wrote into my model of choice, I came to an answer before I had even fully built out the question. 0 tokens spent, 0 colleagues interrupted by a Slack ping, environmentally friendly, and very zen.

How we think about things, how we talk about things, how we see our ideas written—all of this changes our understanding.

I'm not going to pretend. I get a lot out of coding assistants and use them more days than not. I still don't have the right balance of when to reach for that tool over others in my tool belt.

Recently, I was pairing with another Agent on a kinda-weird data transformation method. We're adhering to strict sorbet typing in the project, and we couldn't remember the syntax for the signature we needed. The perfect place for a copilot, in my experience so far. The in-line suggestion in my editor wasn't quite right, so I took a moment to write out a question. When the model earnestly suggested T.untyped, I had a good chuckle while pulling up the docs.

On the other hand, I had a ton of fun vibe-coding my way to that silly LLMallard app. The code is far from what I'd want to put into a production app, but it gave me a great excuse to road-test a theory (and make a questionably funny rubber ducking joke).

I've been noodling with running models locally via Ollama. My javascript is a bit rusty, and I wanted to experience the joy others have said vibe coding brings. I found a decent flow where I'd pair with the local model, giving it very tight constraints for incrementally adding features or styling changes. When the local model started losing fidelity (token window thresholds, complexity churn), I'd take a single larger pass via a cloud-based model to refactor or fix things the local model struggled with. That's the new foundation to drive with the local model.

I built LLMallard in about 45 minutes of 'actual' coding with the models. I know a lot of devs who would crank that out way faster, but it sped me up a lot while shaking off my rusty javascript. I made exactly one call to a cloud model that  cost about $0.45. That’s a substantial discount from the other 'vibe coding' experiments I've tried.

I think through all of this, I'm getting a much better vision of where to reach for different tools. I’ve added which models I’ve been using via Ollama:

  • I just needed a coding duck (no model)
  • I want to know some fundamentals of a well-known tool (documentation, maybe combined with small local models)
  • I need a one-liner (gemma3:7b / gemma3:12b locally)
  • I want to spike a small feature (gemma:12b / gemma3:27b locally)
  • I'm refactoring a well-isolated chunk of code (somewhere between the powerful local models and a cloud tool)

My intuition is still struggling to effectively use these tools when I’m trying to tackle a wide-spanning problem. Add in a large legacy system and/or a bunch of custom in-house tooling, and that’s a personal recipe for burning time and tokens to little effect. That said, I’ve seen some unreal work done with Claude Code. It’s wonderful—and expensive.

Is this nuance or division necessary? Not today. Every service under the sun wants us to shove everything into their most powerful models and are operating at a loss to entice us while continuing to tweak and improve. I'm deeply skeptical that business model will last, and am equally in awe of the breakneck speed of evolution happening around me. So I want to keep experimenting with leveraging low-fidelity local models before tapping into the massive power wielded by the giants.

I think a narrow-focused model acting as a tightly constrained agent is an important building block to learn to make. If the hype internet is right, it may be a key dependency in future systems and is worth learning how to leverage them. At worst, I'll have some fun making a silly app.

‍

Related Insights

🔗
Pair programming: the unexpected benefits and challenges

Explore our insights

See all insights
Developers
Developers
Developers
You’re holding it wrong! The double loop model for agentic coding

Joé Dupuis has noticed an influx of videos and blog posts about the "correct" way of working with AI agents. Joé thinks most of it is bad advice, and has a better approach he wants to show you.

by
Joé Dupuis
Leadership
Leadership
Leadership
Don't play it safe: Improve your continuous discovery process to reduce risk

We often front-load discovery to feel confident before building—but that’s not real agility. This post explores how continuous learning reduces risk better than perfect plans ever could.

by
Doc Norton
Leadership
Leadership
Leadership
How an early-stage startup engineering team improved the bottom line fast

A fast-growing startup was burning cash faster than it could scale. Here’s how smart engineering decisions helped them improve the bottom line.

by
Jonathon Baugh
Letter art spelling out NEAT

Join the conversation

Technology is a means to an end: answers to very human questions. That’s why we created a community for developers and product managers.

Explore the community
Test Double Executive Leadership Team

Learn about our team

Like what we have to say about building great software and great teams?

Get to know us
Test Double company logo
Improving the way the world builds software.
What we do
Services OverviewSoftware DeliveryProduct ManagementLegacy ModernizationDevOpsUpgrade RailsTechnical RecruitmentTechnical Assessments
Who WE ARE
About UsCulture & CareersGreat CausesEDIOur TeamContact UsNews & AwardsN.E.A.T.
Resources
Case StudiesAll InsightsLeadership InsightsDeveloper InsightsProduct InsightsPairing & Office Hours
NEWSLETTER
Sign up hear about our latest innovations.
Your email has been added!
Oops! Something went wrong while submitting the form.
Standard Ruby badge
614.349.4279hello@testdouble.com
Privacy Policy
© 2020 Test Double. All Rights Reserved.