When a team has worked together long enough, certain practices become
invisible. The senior engineer who rejects a pull request does not consult a
checklist; she recognizes, almost instantly, that the error handling is
incomplete, that the abstraction is premature, that the naming does not
follow the team’s conventions. The same instincts shape how she prompts an
AI to generate code, how she frames a refactoring request, what she asks the
AI to check before she considers a piece of work complete. Ask her to
explain and she can, but in the moment it is pattern recognition built from
years of reviews, production incidents, and architectural discussions. This
tacit knowledge (what to generate, what to check, what to flag, what to
reject) is the team’s most valuable and most fragile asset. It lives in
people’s heads, transfers slowly through pairing and code review, and walks
out the door when someone leaves.
In earlier work I have described techniques that improve how an
individual developer collaborates with AI: sharing curated project context,
structuring design conversations, externalizing decisions into living
documents. Each helps one person get better results. None of them solve a
different problem: two developers on the same team, using the same tool,
same codebase, same project context, producing materially different results.
The gap is not in what the AI knows about the project. It is in what the AI
is told to do with that knowledge. The instructions vary by person, and
they vary across every kind of interaction, not just code review.
The standards that shape how an AI generates code, refactors existing
systems, checks for security vulnerabilities, or reviews a pull request
should not be tips shared on Slack or tribal knowledge carried in a senior’s
head. They should be versioned artifacts that encode “how we do things here”
into a form that executes consistently for everyone.
The Consistency Problem
When AI-assisted development depends on who is prompting, seniors
become bottlenecks, not because they write the code, but because they are
the only ones who know what to ask for.
I have observed this pattern repeatedly. A senior engineer, when asking
the AI to generate a new service, instinctively specifies: follow our
functional style, use the existing error-handling middleware, place it in
lib/services/, make types explicit, use our logging utility rather than
console.log. When asking the AI to refactor, she specifies: preserve the
public contract, avoid premature abstraction, keep functions small and
single-purpose. When asking it to check security, she knows to specify:
check for SQL injection, verify authorization on every endpoint, ensure
secrets are not hardcoded.
A less experienced developer, faced with the same tasks, asks the AI to
“create a notification service” or “clean up this code” or “check if this
is secure.” Same codebase. Same AI. Completely different quality gates,
across every interaction, not just review.
This is not the junior developer’s fault; they have not yet developed
the instincts. But the inconsistency is expensive. AI-generated code
drifts from team conventions when one developer prompts and aligns when
another does. Refactoring quality varies by who requests it. Security
checks catch different things depending on who frames the question.
Technical debt accumulates unevenly.
The instinct is to treat this as a skills problem: train the juniors,
write better documentation, do more pairing. These all help, but they are
slow and they do not scale. The knowledge exists on the team; it simply
has no vehicle for consistent distribution. What is needed is a way to
take what the senior applies instinctively and make it available to
everyone, not as advice to remember, but as instructions that execute.
This is a systems problem, not a skills problem. And it requires a
systems solution.
The earlier techniques in this series address what the AI knows.
Knowledge
Priming
gives the AI the project’s conventions. Design-First
collaboration
builds alignment on architecture. Context
Anchoring
makes that alignment durable across sessions. This article is about
something different: making the AI apply the team’s judgment,
consistently, regardless of who is prompting, across every meaningful
interaction.
Executable Governance
Teams have always tried to codify their standards. The challenge has
always been the gap between documentation and practice. A checklist on a
wiki depends on someone reading it, remembering it, and applying it
consistently under time pressure. In my experience, that gap is where most
codification efforts quietly fail.
AI instructions change this dynamic in an interesting way. A team
standard encoded as an AI instruction does not depend on someone
remembering to apply it. The instruction is the application. When a
developer generates code using an instruction that embeds the team’s
architectural patterns, or runs a security check that encodes the team’s
threat model, the standards are applied as a side effect of the workflow,
not as a separate step that requires discipline. The governance is the
workflow.
I find it useful to think of this as two moves:
From tacit to explicit. The first move is the familiar one: taking
what the senior knows instinctively and writing it down. The difference is
that the target format is not a wiki page or a checklist, but a structured
instruction set that an AI can execute. The act of writing it surfaces
assumptions that were never articulated. What exactly makes a security
issue critical versus merely important? These distinctions, often
intuitive for the senior, must become precise for the AI.
From documentation to execution. The second move is the one that
matters. Linting rules are versioned config files, not personal
preferences. CI/CD pipelines are executable definitions, not wiki pages
describing deployment steps. AI instructions belong in the same category:
configuration that executes, not documentation that informs. When these
instructions live in the repository, reviewed through pull requests,
shared by default, they have the same status as any other piece of team
infrastructure. The developer does not need to carry the team’s full set
of standards in their head. They invoke an instruction. The team’s
judgment is applied consistently — not because the developer memorized it,
but because the infrastructure encodes it.
What This Looks Like
A well-structured executable instruction has a recognizable anatomy:
four elements, each doing distinct work. This anatomy applies regardless
of the instruction’s purpose.
- Role definition. Not because the AI needs a persona, but because
the role sets the expertise level and perspective. “Role: senior engineer
implementing a new service following the team’s architectural patterns”
establishes a different baseline than a generic prompt. The role is the
lens through which every subsequent instruction is applied. - Context requirements. What the instruction needs before it can
operate: the relevant code, access to the project’s architectural context,
any applicable constraints. This makes dependencies explicit rather than
hoping the developer remembers to provide them. - Categorized standards. The categories matter more than the
individual items. For a generation instruction, the categories might be:
architectural compliance (must follow), convention adherence (should
follow), and style preferences (nice to have). For a security instruction:
critical vulnerabilities (blockers), important concerns (must address
before merge), and advisories (track and evaluate). For a review
instruction: breaking issues, important findings, and suggestions. This
priority structure encodes the team’s judgment, not just their
knowledge. It tells the AI, and through the AI, the developer, what
matters most. - Output format. A structured response with a summary, categorized
findings, and clear next steps. The format ensures that output from the
instruction is comparable across runs and across developers, a property
that matters once multiple people are using the same instructions
regularly. For generation instructions, this shapes the completeness and
structure of the code produced — not a findings report, but the
conventions of the output itself.
The principle applies across the full range of AI interactions. For
example:
- Generation: encodes how the team builds new code (architecture patterns,
naming, error-handling, testing expectations) so output aligns from the first
pass. - Refactoring: encodes how the team improves existing code (preserve
contracts, avoid premature abstraction, propose incremental change). - Security: encodes the team’s threat model (what to check and how to
grade severity) so checks are team-specific rather than generic. - Review: encodes what the team checks in review (architecture alignment,
error handling, type safety, conventions) with a consistent severity
structure.
Keep instructions small and single-purpose. Smaller instructions
maintain focus, are easier to maintain, and compose flexibly.
Surfacing the tacit knowledge.
The most interesting part of
creating these instructions is the extraction process. It amounts to an
interview with the team’s senior engineers, structured around pointed
questions that span the full development workflow: What architectural
decisions should never be left to individual judgment? Which conventions
are corrected most often in generated code? Which security checks are
applied instinctively? What triggers an immediate rejection in review?
What separates a clean refactoring from an over-engineered one?
The answers map directly to instruction structures. Non-negotiable
architectural patterns become generation constraints. Frequent corrections
become convention checks. Security instincts become threat-model items.
Review rejections become critical checks. Recurring mistakes become
anti-patterns to flag. The interviews essentially write the instructions;
the act of creation is the act of organizing tacit knowledge into
explicit, prioritized checks.
I have found that this process has value even beyond the resulting
instructions. On one project, the extraction conversation revealed that
two senior engineers had quietly different thresholds for what counted as
a “critical” security concern versus an “important” one, a
disagreement that had never surfaced because each reviewed different pull
requests. The act of writing a shared instruction forced the distinction
to be made explicit. Once a less experienced developer on the team began
using the resulting instructions, the effect was immediate: their first
review flagged a missing authorization check on a newly added endpoint,
exactly the kind of issue that had previously only been caught when a
senior happened to be the reviewer. The instructions did not make the
developer more experienced. They made their inexperience less costly.
Where Standards Meet the Workflow
These instructions are not a single-purpose tool. They apply at
different points in the development workflow, and the point of application
shapes their value.
At generation-time, when a developer asks the AI to create a new
service, implement a feature, or write tests, a generation instruction
ensures the output follows team conventions from the start. This is where
encoded standards have the most leverage: they prevent misalignment rather
than catching it after the fact.
During development, a refactoring instruction keeps improvements
aligned with team norms, and a security instruction applies the team’s
threat model rather than a generic checklist. The standards are present
throughout the development process, not bolted on at the end.
At review-time, when a developer finishes a piece of work (whether
AI-generated or manually written), a review instruction applies the team’s
quality gate. But review-time is the last opportunity to catch
misalignment — the earlier in the workflow standards are applied, the
fewer issues that reach it.
Optionally in CI, some teams extend these instructions into their
continuous integration pipeline as an automated consistency check.
CI-level instructions must be fast enough not to slow the pipeline,
predictable enough to avoid noisy false positives, and maintained with the
same discipline as any other CI gate.
Calibration
This approach is most valuable when a team is large enough that
consistency cannot be maintained through conversation alone. A useful
heuristic: if AI-assisted output visibly varies in quality depending on
who is prompting, or if generation and review work is routing through a
handful of people because they are the only ones who know how to prompt
effectively, the inconsistency is the signal. Teams of five may not need
this. Teams of fifteen almost certainly do.
The costs are real. Creating good instructions requires effort — the
extraction interviews, the drafting, the iteration. Overly prescriptive
instructions become brittle; they produce false positives on edge cases or
fight against legitimate variations in approach. There is a maintenance
burden as standards evolve. And there is a risk of over-engineering: not
every interaction with AI needs a dedicated instruction.
The right starting point, in my experience, is one instruction. A
generation or review instruction is usually the highest-value choice; it
addresses the most common workflow, the widest quality gap, and the most
visible inconsistency. Additional instructions should follow adoption, not
precede it.
Conclusion
This is, at its core, a shift from judgment that lives in people’s
heads to judgment that executes as shared infrastructure. The senior
engineer’s instincts (the patterns she checks, the conventions she
enforces, the risks she flags) do not have to remain personal and
unscalable. They can be extracted, encoded into versioned instructions,
and applied consistently across every developer and every interaction with
AI.
The mechanism is not new. Teams already do this with linting rules, CI
pipelines, and infrastructure-as-code. What changes with AI is the scope
of what can be encoded. Linting catches syntax and style. Executable team
standards can encode architectural judgment, security awareness,
refactoring philosophy, and review rigor, the kind of knowledge that
previously transferred only through pairing, mentorship, and years of
shared experience.
The most interesting property of these standards is that they are
team-owned. They live in the repository. They evolve through pull
requests. They improve when practice reveals gaps. Every instruction that
misses something is an instruction waiting to be updated, and the update
is a commit that the whole team reviews. The standards are not just the
output of team knowledge; they are the mechanism through which team
knowledge gets codified, shared, and refined.



Speak Your Mind