Write and checkout the plan

Each AI-aided coding session should start in planning mode. It's become a pretty commonplace recommendation (Boris Cherny, creator of Claude Code, makes this point here). I also recommend checking that plan into the repository.

Summary

Start most coding sessions in planning mode. Iterate on it until it's in a good place.
Get the plan reviewed and approved. Merge it into the repo.

Why planning makes sense

Separate your process into two PRs:

The PR with the plan.
The PR with the implementation.

The main reason is that it mimics the classical research-design-implement loop.

The first part (the plan) is the RFC. Your reviewers know where they can focus their attention at this stage: the architecture, the technical choices, and naturally their tradeoffs. "It's easier to use an eraser on the drawing board, than a sledgehammer at the construction site"

There are many other reasons why this approach make sense:

It's much easier to "checklist" that you covered everything at this stage: monitoring, alerting, analytics, testings, docs, runbook, etc.
The AI's context can be focused on the technical decisions without being wasted on the implementation details minutia.

Why checking the plan into the repo makes sense

By checking the plan into the (git) repository, you achieve many things at the same time:

The plan archives the decisions made, their rationale, and tradeoff (similarly to ADR - Architecture Decisions Records), which is useful when you need to understand the architecture's history.
It documents the implementation (note: AI agents are great at keeping docs up to date), which is useful for debugging or runbooks.
It might serve as a template for future similar projects.
It might serve as inspiration for other teams looking at similar issues.
It will be found and mentioned by agents in a planning phase - which will provide them with useful context about past decisions and nuanced details about your unique engineering context.
Code review agents can reference the plan when verifying the implementation.
What takes time is rarely the code - it's usually the testing and deployment and learnings that come from confronting new code with the harsh reality of users, infra, security, etc. Continuing to iterate on the plan helps future product development by documenting those hard-learned lessons.
It makes the iterative process much safer, since at any given time you have a materialized version of what the agent is going to build. It forces you to verify each step.
It will save a lot of context tokens in the future as the agent can ramp up on the topic by just reading the plan, as opposed to reading many code files etc.

What's in a plan

The outline should look something like this:

# TICKET-NUMBER: Feature title

- Status: [approved/implemented/rejected]
- Author:
- Finished on:

Key learnings since this plan was implemented:

- TODO

## Overview

## Requirements

_Short summary of the functional requirements, or link to a PRD (Product Requirements Document). Key edge cases to consider._

## Architecture

(ideally with a schema)

- Current/future state
- Key principles
- Key technical decisions (language, framework, database, libraries) and their tradeoffs
- Dependencies
- Invariants we need to maintain
- Tradeoffs, existing solutions & patterns (with reference to files)
- What's out of scope

## Key implementation details

- Main types and model
- Database structure

## Implementation plan

_Split the implementation in different phases. For each phase:_

- Key tasks
- Success criteria (automated and manual manual tests)

## File structure

```raw
project/
├── src/
│   ├── component1/
│   └── component2/
├── tests/
└── docs/
```

## Rollout

- Rollback plan
- Feature flag usage
- Canary
- Migration (safe roll back)
- Data migration
- Communication plan with users and clients
- Risks (with prioritization and prevention/mitigation strategies)
- Cleanup (e.g., feature flags, docs, performance optimization)
- Future direction

## Other considerations

- Security (e.g., authorization)
- Performance & resource management
- Customer support
- Privacy (e.g., data minimization)
- Testing consideration

## Definition of done checklist

_This section only mentions the aspects relevant for the AI agent_

_P0: must have. P1 can be follow-up items, especially for a prototype_

- [ ] P0 Security: Has appropriate access control
- [ ] P0 Testing: Has appropriate test coverage (incl. unit test)
- [ ] P0 Rollout: Has rollout strategy (e.g., feature flag, canary, communication plan)
- [ ] P0 Observability: Has monitoring
- [ ] P0 Observability: Has alerting (incl. SLO monitoring)
- [ ] P0 Observability: Has user analytics
- [ ] P1 Observability: Has appropriate logging
- [ ] P1 Quality: Has clear and actionable errors
- [ ] P1 Maintainability: Has up-to-date documentation (including runbooks)
- [ ] P1 Quality: Has proper disaster recover (e.g., graceful degradation)
- [ ] P1 Quality: Has internationalization, if applicable (e.g., translations)
- [ ] P1 Rollout: Has risk prevention and mitigation strategies

## References

_List any other relevant plans, RFC, designs, related work, internal docs, packages, tickets._

Plan guidelines

Your agent should reference a document that explain how to use the template above:

# Planning a feature

## Plan template

The plan template is at ...

## Checklist before finalizing a plan

Verify:

### Research

- [ ] Looked for similar patterns in the codebase
- [ ] Looked for existing libraries
- [ ] Reviewed relevant plans

### Plan

- [ ] Plan is in the right folder and has the right filename

Tips

AI-generated text tends to be very repetitive. I summarize and remove repetition from time to time.
It's totally fine to update the plan as you learn new things during the implementation phase.
I often ask different models to review the plan and criticize it.
Have some naming convention for the plan, e.g., "tr-20209-add-toaster-thermostat.md"
Have rules to define what your plan outline should look like, and other best practices. Or provide a plan template (reference/verify your "definition of done" if you have one)
Use checklists. Checklist are faster to review, templatize, and maintain.
Make sure the plan include code examples so that you can get a sense of the final result (especially for introducing new abstractions and patterns).
Use PR draft mode - review the plan and make sure you agree with everything that's written.
Use subagents to do extra research on finer points.
Use schema (with mermaid). Human can focus on those.
The length of the plan should be commensurate with the complexity of the project.

Coding agents will increase the momentum towards GitOps

If everything is checked into the repo, coding agents have a much better shot at one-shotting whole projects or features. The more control they have over the end-to-end development workflow, the faster your velocity!

You should consider moving everything into code repositories:

Infra as code (Terraform/Pulumi)
Other declarative infra frameworks
Data frameworks such as dbt or sqlmesh
Documentation
Best practices including coding guidelines
Runbook
Dashboard and alert configuration
CMS (content management system)
Translations

It goes without saying that this doubles as documentation for your team, which will speed up onboarding new engineers.

Resources

Thanks

Thanks to my colleague Sean Koop for providing the inspiration for the practices described in this article.

dein.fr