Build Your Own Legs Before the Crutches Fail

March 9, 202614 min read

Professionalai-assisted-devengineeringagentsdeveloper-workflows

TL;DR

AI is excellent borrowed competence, but borrowed competence has a short half-life.
Use it to get unstuck, learn faster, and explore options. Do not use it to skip understanding.
The safe loop is borrow -> inspect -> rebuild -> own.
The boring parts of software development are where borrowed work becomes durable: hooks, tests, CI, review, performance budgets, release gates, and runbooks.
If you can prompt for code but cannot explain why the gates are failing, you are getting weaker while feeling faster.

AI made me faster right up until it did not.

I had it sketch a refactor that looked clean on first pass: fewer conditionals, nicer names, one less ugly branch around an edge case I was already tired of thinking about. It compiled. The tests were mostly happy. For about twenty minutes, I felt like I had cheated physics.

Then I hit the case the model had helpfully "simplified" away.

The bug was not exotic. It was the sort of thing that shows up when a real system has history: one input shape that only exists because an older integration still exists, one weird ordering dependency, one path where "optional" really means "present but malformed." The generated code had the confidence of a senior engineer and the operational memory of a demo.

That experience did not make me anti-AI. It clarified the relationship.

AI-assisted development is real leverage. I use it constantly. It helps me get moving when I am cold on a codebase, tired, or stuck on the blank-page problem. But I think a lot of people are treating that leverage like permanent strength.

If you use AI as a crutch, the goal is not to become emotionally attached to the crutch. The goal is to stay upright long enough to build your own legs.

For technical people who do not write production code every day, this is the part that is easiest to miss: the visible code is not the whole job. A serious software team is surrounded by machinery that catches mistakes, forces explanations, measures risk, and makes deployment repeatable. AI can help with that machinery too, but it cannot replace the judgment those gates are designed to create.

Borrowed Competence Expires Fast

Autocomplete is borrowed memory. A linter is borrowed discipline. A debugger is borrowed visibility. AI is borrowed competence at a higher level: it can suggest structure, remind you of APIs, draft tests, explain a pattern you have not touched in a year, or give you a credible first pass at code you were going to write manually anyway.

That is useful. I do not buy the fake purity test around it.

The problem is not that the competence is borrowed. The problem is that a lot of it is non-retained. If you never force the transition from "the model produced something plausible" to "I understand why this works, where it breaks, and how I would recreate it," the capability evaporates the second the output gets weird.

The model feels like strength because it keeps you moving. But motion and stability are not the same thing. Crutches let you move before you can bear full weight. They are not proof that the leg underneath is healed.

In development terms, that means AI can increase your output while your underlying engineering instincts stay flat, or even get worse.

The Part Non-Developers Usually Do Not See

When people outside software think about development, they often picture the code editor: a person typing, a model suggesting, a feature appearing.

That is only one slice. The healthier mental model is a delivery line with gates:

Gate	Plain-English job	What it catches
Pre-commit hooks	Quick checks before code even leaves your machine.	Formatting drift, obvious lint errors, accidental secrets, generated files that were not refreshed.
Type checks	"Do these pieces still fit together?"	Wrong function shapes, missing fields, impossible states.
Unit tests	Small proof that one behavior still works.	Regression in a parser, formatter, validator, or business rule.
Integration tests	Proof that multiple pieces still cooperate.	API contract drift, broken data flow, auth assumptions.
End-to-end tests	Browser or workflow checks from the user's point of view.	A page that renders but cannot be used, a link that moved, a form that no longer submits.
CI pipelines	The shared, repeatable version of the checks.	"Works on my machine" problems.
Performance gates	Budget checks for speed, bundle size, memory, or latency.	Slow pages, oversized JavaScript, expensive hot paths.
Code review	A second brain reading for risk and intent.	Clever but fragile code, missing tests, unclear ownership, weak rollout plans.
Release gates	The final controls before production changes.	Failed builds, failed deploys, missing approvals, unsafe migrations.

None of this is glamorous. That is the point. Good engineering has boring rails.

AI can write code that looks plausible. These gates ask different questions: Is it consistent? Is it tested? Is it understandable? Is it fast enough? Is it deployable? Can someone else operate it next month?

That is where borrowed competence either becomes real capability or turns into a pile of confident guesses.

A Concrete Example: The AI Draft That "Works"

Imagine a model drafts a change to a healthcare data import flow. The new behavior is simple on paper: accept a partner's custom patient identifier, normalize it, and route the record to the right downstream workflow.

The first draft might look fine. It parses a sample message. It maps the new field. It passes the happy-path test.

Then the real delivery machinery starts asking uncomfortable questions:

Pre-commit hook: Did you update the generated schema and docs index, or only the hand-written code?
Linter: Did you add a branch that silently swallows bad input?
Type checker: Does the new field exist on every shape that now claims to carry it?
Unit test: What happens when the identifier is missing, duplicated, padded, lowercase, or malformed?
Integration test: Does the normalized identifier survive the parser, mapper, router, and event emitter?
End-to-end test: Can an operator use the playground or admin UI to inspect the transformed record?
Performance check: Did the normalization add a slow lookup on every record in a large batch?
Review: Is this partner-specific behavior isolated in a profile, or did it leak into the generic pipeline?
Release gate: Can we deploy this without breaking existing partner feeds?

That is software development. The code was the opening move.

For a technical non-developer, the important takeaway is not that every project needs a giant enterprise process. It is that production code has more failure modes than "does the sample run?" The gates are how teams make those failure modes visible before customers do.

Build Legs While You Are Still Moving

The trick for me is to treat AI output as scaffolding that should disappear as soon as the structure can stand on its own.

The loop I keep coming back to is:

borrow -> inspect -> rebuild -> own

Borrow: let the model do the low-friction first pass. That might be a draft implementation, a test outline, a summary of a file, or three candidate approaches with tradeoffs. The point is to compress setup time, not outsource the thinking permanently.

Inspect: read the output like it is guilty. Trace the control flow. Check the types. Look for branches that vanished because the model preferred elegance over reality. Ask the annoying questions: what happens on bad input, partial input, old input, duplicate input, slow input?

Rebuild: take the hot path, the tricky branch, or the most important abstraction and rewrite it until it feels like mine. Sometimes that means simplifying generated cleverness into something more boring. Sometimes it means redoing the test cases by hand. Sometimes it means deleting 40 percent of what the model wrote because it solved a prettier problem than the one I actually had.

Own: turn the result into something durable. Add the regression test. Write down the invariant. Turn the one-off fix into a checklist. Notice that you keep asking for the same sort of help and learn that subsystem. The final step is not "ship the generated code." It is "convert a temporary assist into repeatable capability."

That is the point where legs start to exist.

Pre-Commit Hooks: The First Guardrail

A pre-commit hook is a small automated check that runs before a change is committed to version control.

That sounds minor. It is not.

It is the difference between "I will remember to run the formatter" and "the formatter runs every time." It is the difference between "try not to commit secrets" and "a scanner blocks the commit if it sees something that looks like a key." It is a tiny local gate that removes entire categories of sloppy failure.

Typical hooks might run:

pnpm lint
pnpm test -- --findRelatedTests
pnpm run typecheck
git diff --check

The exact commands do not matter as much as the habit. The point is to make the cheap checks automatic and early.

AI changes the economics here. If a model can generate a lot of code quickly, then the first guardrail needs to be close to the keyboard. You do not want to discover five generated lint errors, a broken import, and a missing generated artifact after the work has already been pushed into a shared branch.

The hook is not there because developers cannot be trusted. It is there because humans are inconsistent and automation is cheap.

CI: The Shared Reality Check

Continuous integration, usually shortened to CI, is the shared pipeline that runs when code is pushed. It installs dependencies, builds the project, runs tests, validates artifacts, and reports whether the change is safe enough to merge.

For non-developers, I would summarize CI this way:

CI is the team asking a clean machine to prove the change works.

That clean machine matters. Your laptop might have old environment variables, cached files, a running service, or a dependency version that hides a problem. CI starts from a more controlled baseline.

In a serious pipeline, a change might need to pass:

lint and formatting,
type checking,
unit tests,
API route tests,
browser tests,
build output checks,
bundle-size limits,
container builds,
vulnerability scans,
deployment dry-runs.

When AI writes the first draft, CI is where the draft meets shared reality. It does not care that the answer sounded confident. It cares whether the repository still builds.

Quality Gates Are Product Decisions

Quality gates are often framed as engineering bureaucracy. I think that misses the point.

A gate is a product decision encoded as automation.

If you require accessibility checks, you are saying keyboard and screen-reader users count. If you require Lighthouse performance budgets, you are saying page speed is not a nice-to-have. If you block a deploy when tests fail, you are saying "we do not knowingly ship broken behavior to users."

These gates can be annoying. They should also be adjustable. A gate that blocks useful work for no real risk is just theater. But the right response is not to delete the gate the first time it complains. The right response is to understand what it protects.

For example, a performance gate might fail because a product page now ships too much JavaScript. That failure is not "the computer being picky." It is a signal that real users may pay the cost on slower devices or networks. The fix might be lazy-loading a chart, moving work to the server, trimming a dependency, or raising a threshold with a written reason. The important part is that someone has to make the tradeoff consciously.

That is how judgment gets built.

Code Review Is Not Just Nitpicking

Good code review is not a senior engineer proving they are smarter than the author.

Review is where intent, risk, and maintainability get forced into the open.

A useful review asks:

What behavior changed?
What are the edge cases?
What tests prove the important part?
What got simpler?
What got more complex?
What happens if this deploy fails?
Is the abstraction buying us something, or just making the code feel important?

AI-generated code makes review more important, not less. Generated code can be syntactically neat while hiding weak assumptions. It can use an API correctly in isolation while missing the local architecture. It can add tests that confirm the implementation instead of challenging it.

The reviewer is not only reviewing the code. They are reviewing the reasoning.

That is a muscle. If you stop doing it because the model is fast, you lose one of the few processes that reliably turns individual output into team knowledge.

Performance Work Is Not Vibes

Performance is one of the fastest places for AI-assisted development to get slippery.

Models are good at suggesting optimizations. They are less reliable at knowing whether a path is hot, whether the cost matters, or whether the "optimization" makes the system harder to operate.

The serious version starts with measurement:

What user path is slow?
What metric are we using: load time, first contentful paint, server response time, memory, CPU, query count, queue latency?
What is the baseline?
What changed?
Did the improvement survive a repeat run?

This is why performance gates and dashboards matter. They keep the argument tied to evidence. A pull request that says "optimized the page" is weak. A pull request that says "reduced the initial JavaScript for this route by 42 kB and kept Lighthouse performance above the agreed threshold" gives reviewers something to inspect.

For technical non-developers, the main idea is simple: performance is not a mood. It is a budget, a measurement, and a tradeoff.

The Better Outcome: Small Pieces That Compound

There is a better outcome than permanent dependence and a better metaphor than "just stop using the help."

A crutch helps when you cannot carry the load yet. An exoskeleton helps you carry more load than you otherwise could, while still forcing your own muscles to do real work. That is the version I want from AI-assisted development.

The difference is structure, and structure usually starts small.

In a recent project, one recurring pain point was contract drift: navigation configs that fell out of sync when docs moved, and schema definitions that diverged between a playground validator and the canonical source. A model can help spot that the shapes look inconsistent. But the durable fix is not "keep asking the model to reconcile it." The durable fix is to move toward a shared generated artifact so the playground, CLI, and docs all inherit the same contract.

Same story on the API side. A hardening pass on a serverless inference layer included an explicit compatibility audit: supported endpoints, request parsing against the spec, format-correct error responses, and tests around those envelopes. There was similar work around explicit cache-key contracts and fallback behavior in a context-aware router. That is what good use of the tool looks like to me: use the model to explore the surface area, compare patterns, summarize a spec, maybe even draft the first pass of the docs, then turn the result into code, tests, contracts, and runbooks that stop the same class of mistake from coming back.

A separate project pushes the same pattern further. Contract convergence across a dashboard, CLI, and bridge ensures the same command and error surfaces do not drift apart. Shared hooks, config sync, worktree-safe setup, and task workflows turn one good practice into a repeatable rail. That is where the support stops feeling like a crutch and starts feeling like an exoskeleton. I can climb a step or two higher because the surrounding structure is carrying coordination and recall cost without pretending to be my brain.

That is also how small assists snowball into genuinely useful tools and platforms. A one-off prompt becomes a reusable script. The script becomes a checked-in workflow. The workflow grows hooks, tests, a schema, and a runbook. Enough of those small pieces start to interlock, and eventually you are not just "using AI to go faster." You are standing on a stack of tools you built, understand, and can extend.

How To Tell If It Is Working

The easiest way to fool yourself here is to confuse activity with growth.

There are a few warning signs that tell me the tool is no longer helping me recover motion. It is replacing muscles I meant to keep:

I can prompt for a fix, but I cannot debug the failure without prompting again.
I am accepting abstractions I would struggle to explain to another engineer.
I am shipping code that passes the obvious checks while my confidence in it is getting lower, not higher.
I keep returning to the model for the same category of problem because I never turned the last answer into understanding.
The model "remembers" more of the system than I do because I stopped building my own map.
I treat CI failures as obstacles instead of feedback.
I cannot explain what the review comments are protecting.

The danger is not just bad code. Bad code is fixable. The deeper risk is engineering atrophy: throughput rises while independent problem-solving capacity drops. Everything feels fine until the crutches slip on something uneven, like a production incident, a legacy edge case, a performance cliff, a half-documented integration, or a failure that does not look like the training examples.

Then you find out whether you built legs or just got very good at leaning.

A Practical Starting Point

If you are technical but not a daily software developer, start by learning the gates around the code, not only the syntax inside it.

Ask a developer to walk you through one recent pull request and explain:

what changed,
which tests ran,
which gates blocked or passed,
what reviewers asked for,
what deploy step moved it to production,
what metric would reveal a problem later.

That one walkthrough will teach more than a generic "how coding works" article. It shows the real system of work: code, automation, review, deployment, monitoring, and rollback.

Then pick one gate and learn it well. Understand what lint catches. Understand why CI failed. Understand what a browser test proves. Understand why a performance budget exists. Each piece is small. Together, they become legs.

Closing

I expect AI to stay in my development loop. It is too useful not to.

But the long-term win is not just that the crutches get smarter. It is that some of that support hardens into something more like an exoskeleton: scaffolds, contracts, tests, notes, hooks, workflows, review habits, deployment gates, and small composable tools that let me attempt harder work without outsourcing the judgment.

It means being able to debug without asking permission from the tool.

It means using models to learn new subjects faster, then turning what I learned into something durable.

It means building enough of my own map that when the crutches wobble, I do not fall over with them, and when the scaffolds hold, I can reach a little higher than I could yesterday.

Use the help. Take the speed. Build the small pieces. Make them compose.

Then build your own legs before the crutches fail.

9 min read

agentsai-assisted-dev

The First 90 Days: Introducing AI-Assisted Dev to a New Team

How I would roll out AI-assisted development on a team that has not standardized: what to do in week one, what to earn the right to argue about later, and what almost always goes wrong.

8 min read

ai-assisted-dev

A One-Page AI Usage Policy That Actually Works

A short, adoptable AI usage policy for engineering teams: what to put on the page, what to leave off, and why the policy matters less than the habits it makes explicit.

6 min read

agents

Loom Mills: From Agent Swarms to Software Production Lines

The next Loom Core orchestration layer turns roadmap intent into reviewed, gated, observable work. The internal codename was Hive; the product metaphor is moving toward Mills.

Comments

Join the discussion. Be respectful.