A lot of my recent thoughts about coding with agents lately (like this, and this, and this) have been circling a specific topic… design.
Coding agents are now good enough to replace all of my manual coding tasks, I haven’t written code myself for months (and it’s not because I don’t know how), but we’re still figuring out how to make them better at the more “senior” software development tasks.
At Microsoft, I’ve been a part of a software project for the past couple years now that has been bootstrapping these LLM model runtimes that run agentic patterns: making the frameworks that make AI assistants, essentially. This is definitely a case of “building the plane while you fly” as we use the agents we build to help us build the agents that build. We started with primitives like chat loops, context-management, and memory (the types of things that showed up in early versions of Semantic Kernel). When tools became available in early version of OpenAI’s models, we jumped to the command line by creating a bash tool (several months before Claude Code did the same). Being on the command line gave us full access to the machine and let us escape the app sandbox that agents in our Semantic Workbench was locked in. Because of this, I mostly skipped MCP (we didn’t need access to tools through a web interface), and jumped into configuring agents with subagents, tools, and context on the filesystem. To support automation, we added the idea of “recipes”. We worked a bunch on skills (more like behaviors, or procedural memory, than Anthropic’s version of skills that are content bundles). We eventually wrapped this all together in our Amplifier library and tool that works just like Claude Code, but we can configure every aspect including the main orchestrator loop and multiple providers (for setting policy on which models you want to use for what). It’s open source, so check it out! Disclaimer: it might kill your LLM token budget quickly.
I recount this history because you can really see that we were building the plane as we’re flying. With each new development, we have more ability to develop more things. We’re automating ourselves out of a software development, and, from a broader scope, all knowledge work. I had an existential crisis about this last year that was somewhat resolved with the acceptance that problems weren’t going away, we’re just getting better at solving them, meaning we can solve more and bigger problems (yes, we can create more and bigger, too, but lets stay on topic).
So, a lot of what we’re working through right now is how we get to those more and bigger things. How can we run more agents, longer, more productively. What processes should they follow, how should they coordinate, what context do they need, how do they stay on track. They can build many kinds of working software in minutes, hours, or days. I’m talking about the kind of software that startups I’ve been a part of took years to build with a whole team (yes, I’m talking about you, NavigatingCancer.com, pretty much everything I made at the Nordstrom Innovation Lab, even the software platforms we created at Atlas Informatics, and Xinova). The types of things that unlocked this potential were adding context about our code preferences, pointing agents to similar code, giving them access to developer tooling, and, the big one, reverse-engineering how we make software, encoding those processes into Amplifier itself. Claude Code’s ToDo tool was one of the first process-improving leaps that really opened everyone’s eyes. It’s an extremely simple process of “Hey, this seems like a large ask, let’s break it into a few steps and work through them one by one”. Even simple processes sometimes have astounding results with LLM agents. Some of my colleagues who have never coded in their lives can make software better than anything I created in the first 10-20 years of my career.
So now we have processes that can build large things but when I work with one of these large projects that were built by someone else with Amplifier, I start to notice some things:
- It’s hard to understand why some pieces of the code exists. I can see what it does, but the intention of why that way of doing it is unclear. It looks like it had probably started life as some thing and then became another thing at some point without explanation.
- It’s challenging to get a sense of the importance of different parts. Something that might have a few thousands of lines of code behind it is really just an experimental side-feature. The really important part might just be a few functions.
- When a colleague asks me to look at one of their projects, it can be challenging to figure out what part they are trying to really show me… especially if they don’t have strong coding skills themselves. They struggle to find the language that would map their outside view to my inside view. I have to do a lot of (sometimes annoying to them) work to close the gap.
- Sometimes their things are completely divorced from anything else we’ve been working on–instead of using our project’s libraries or extension patterns, they’ll just entirely rewrite what is needed only lightly referencing our system.
- Sometimes what a thing does is not actually what the person who agenticly made it thinks it does. The output might look like what they want for a specific scenario they’ve tried, but their program could (and often does) just “fake” it out internally because that is the only thing they asked it to do.
- Often, the project I’m looking at could have been done in an entirely different way with a fraction of the code: usually because it reinvented standard tooling, or implemented a convoluted architectural pattern (I presume because the human driver told it to).
- I could go on…
The code I produce with Amplifier has many fewer of these issues and it’s precisely because I know architecture and systems design and language tooling well enough to call BS when the assistant proposes something that doesn’t look right. I know the language and the concepts enough to be able to interact with the assistant with these kinds of messages:
- Isn’t there a simpler way to do this? (What if we just… X)?
- Let’s take a step back and really ask what we’re trying to accomplish here. Are there simpler approaches?
- Doesn’t this diverge from standard python (python packaging, uv usage, pydantic settings, etc. etc. etc.)
- Who does this belong to? Does it really belong here?
- Are we mixing two concepts here? Aren’t these actually separate concerns?
- Are we changing the responsibility of X?
- Haven’t we conflated X with Y?
- Are these different things or the same thing? What are the unique qualities that make them different? What are they actually then?
- Is this proposal consistent with the previous design decisions we have made? If not, how would we reconcile them?
- What are the downsides of this proposal?
- How would this proposal effect DX (developer experience)?
- If we make this change, how will it effect the code base. Do an exhaustive search of everywhere it is used and validate the new design will be able to replace the old seamlessly.
Note that these are all questions about the system design itself, not the product. That’s why you can have a product with a good system design, or a product with a bad system design.
A valuable product is viable solution to an important problem. A system that satisfies project constraints is a viable solution. System design is the language of those constraints and how the system meets them. System design exists in system-space, not product/problem/solution-space. In other words, you can spend all the time in the world defining the problem and producing solutions and actually never build a viable solution because all your solutions are too costly to run, too slow, too fragile, too hard to use, or impossible to maintain. Systems designers work to satisfy hundreds of constraints like these by reasoning through a combinatorial-explosion of trade-offs while continuously searching through an inventory of system components they can choose from, wire up, and configure. This is systems-thinking and reasoning.
The obvious next question is, who cares? In my last post I suggested that many software engineering best practices are unnecessary for teams who use fleets of coding agents as they were created to overcome problems that teams of human developers have. Isn’t software design just one of these things? No. Let me explain.
When people prompt coding assistants today, many are essentially saying “Hey, I have this problem, make me a solution”. Assistants are good enough to do exactly that but then they’ll need to go back and say “well… not like that… I need it to also…”. They’ve realized they have a constraint. They might push forward with the assistant, asking it to change the solution to satisfy that constraint. It will comply. But then they realize there are a few more constraints and continue the process. At some point a later constraint will conflict with an earlier constraint but naive assistants won’t have kept a record of constraints and the part of the solution made intentionally to solve a previous constraint will be overwritten, devolving the entire product development experience into a game of whack-a-mole.
At some point, it will be better to just update the initial ask with all of the constraints and start over from the beginning. Of course, that’s a very expensive way to go about things, it would have been better to just figure out the constraints from the beginning. Oftentimes, for fuzzy problems or solution spaces, you can’t figure out what is needed or even what you you want exactly without building something and it’s best to just get started. Sometimes the system constraints can’t really be known unless you try. But there are other times, when really, if you would have put a few thought cycles into it, or worked on a model of the solution (like pseudo-code) you could have known. You were just too impatient to get a solution to your problem NOW. In times like that, it’s better to have not gone through the whole expensive implementation just for you to define something you actually did or could have known from the beginning. And in today’s agentic-fleet environments, this can be a crazy-expensive mistake, especially when your teammates are depending on you.
That’s a problem when you don’t have enough skill of your own to think in constraints, or you don’t have enough experience with the problem or solution or system space. Then, it’s better to find an expert or spend some time with your assistant to become an expert, than it is to “learn the hard way” by vibe-coding implementations and seeing how they fail.
New workflows like Superpowers do an exceptional job of moving the ball foward with all of this. They force the agent to walk the user through a “brainstorming” phase, which is really a design stage of trying to pin the user down on what the problem and desired solution and constraints are. It works exceptionally well if you understand what it’s asking you, and can use it to educate yourself as you go.
We have superpowers-like design assistance built in to our foundation experience with Amplifier, but it’s only the first step towards solving the design needs for teams using agentic fleets.
We’re also going to need:
- Ways to drive even more early design questioning–identifying important topic and helping devs work through the ones they might not have familiarity with
- Better knowledge when more design is needed
- Appropriate design reviews (challenging the user, even potentially refusing to proceed when needed)
- Ways to keep design specs up to date as the code evolves
- Better contextually-relevant, but lightweight design processes to stitch this all together.
And finally, since we’re all building this plane while we’re flying it, we’ll need some automated agentic guidance when it’s really best that we just land the plane we’re on and take off in the new one.