Write and checkout the plan
Each AI-aided coding session should start in planning mode. It's become a pretty commonplace recommendation (Boris Cherny, creator of Claude Code, makes this point here). I also recommend checking that plan into the repository.
Summary
- Start most coding sessions in planning mode. Iterate on it until it's in a good place.
- Get the plan reviewed and approved. Merge it into the repo.
Why planning makes sense
Separate your process into two PRs:
- The PR with the plan.
- The PR with the implementation.
The main reason is that it mimics the classical research-design-implement loop.
The first part (the plan) is the RFC. Your reviewers know where they can focus their attention at this stage: the architecture, the technical choices, and naturally their tradeoffs. "It's easier to use an eraser on the drawing board, than a sledgehammer at the construction site"
There are many other reasons why this approach make sense:
- It's much easier to "checklist" that you covered everything at this stage: monitoring, alerting, analytics, testings, docs, runbook, etc.
- The AI's context can be focused on the technical decisions without being wasted on the implementation details minutia.
Why checking the plan into the repo makes sense
By checking the plan into the (git) repository, you achieve many things at the same time:
- The plan archives the decisions made, their rationale, and tradeoff (similarly to ADR - Architecture Decisions Records), which is useful when you need to understand the architecture's history.
- It documents the implementation (note: AI agents are great at keeping docs up to date), which is useful for debugging or runbooks.
- It might serve as a template for future similar projects.
- It might serve as inspiration for other teams looking at similar issues.
- It will be found and mentioned by agents in a planning phase - which will provide them with useful context about past decisions and nuanced details about your unique engineering context.
- Code review agents can reference the plan when verifying the implementation.
- What takes time is rarely the code - it's usually the testing and deployment and learnings that come from confronting new code with the harsh reality of users, infra, security, etc. Continuing to iterate on the plan helps future product development by documenting those hard-learned lessons.
What's in a plan
The outline should look something like this:
# TICKET-NUMBER: Feature title
Status: [approved/implemented/rejected]
Key learnings since this plan was implemented:
- TODO
## Overview
## Requirements
_Short summary of the functional requirements, or link to a PRD (Product Requirements Document)_
## Architecture
(ideally with a schema)
- Current/future state
- Key principles
- Key technical decisions (language, framework, database, libraries) and their tradeoffs
- Dependencies
- Invariants we need to maintain
- What's out of scope
## Key implementation details
## Implementation plan
_Split the implementation in different phases._
- Include success criteria (automated and manual) for each phase.
## File structure
```raw
project/
├── src/
│ ├── component1/
│ └── component2/
├── tests/
└── docs/
```
## Rollout
- Rollback plan
- Feature flag usage
- Canary
- Migration (safe roll back)
- Data migration
- Communication plan with users and clients
- Risks (with prioritization and prevention/mitigation strategies)
## Other considerations
- Security (e.g., authorization)
- Performance
- Customer support
- Privacy (e.g., data minimization)
- Testing consideration
## Definition of done checklist
_This section only mentions the aspects relevant for the AI agent_
_P0: must have. P1 can be follow-up items, especially for a prototype_
- [ ] P0 Security: Has appropriate access control
- [ ] P0 Testing: Has appropriate test coverage (incl. unit test)
- [ ] P0 Rollout: Has rollout strategy (e.g., feature flag, canary, communication plan)
- [ ] P0 Observability: Has monitoring
- [ ] P0 Observability: Has alerting (incl. SLO monitoring)
- [ ] P0 Observability: Has user analytics
- [ ] P1 Observability: Has appropriate logging
- [ ] P1 Quality: Has clear and actionable errors
- [ ] P1 Maintainability: Has up-to-date documentation (including runbooks)
- [ ] P1 Quality: Has proper disaster recover (e.g., graceful degradation)
- [ ] P1 Quality: Has translations (if applicable)
- [ ] P1 Rollout: Has risk prevention and mitigation strategies
## References
_List any other relevant plans, related work, internal docs, packages, tickets._
Tips
- AI-generated text tends to be very repetitive. I summarize and remove repetition from time to time.
- It's totally fine to update the plan as you learn new things during the implementation phase.
- I often ask different models to review the plan and criticize it.
- Have some naming convention for the plan, e.g., "tr-20209-add-toaster-thermostat.md"
- Have rules to define what your plan outline should look like, and other best practices. Or provide a plan template (reference/verify your "definition of done" if you have one)
- Use checklists. Checklist are faster to review, templatize, and maintain.
- Make sure the plan include code examples so that you can get a sense of the final result (especially for introducing new abstractions and patterns).
- Use PR draft mode - review the plan and make sure you agree with everything that's written.
- Use subagents to do extra research on finer points.
Coding agents will increase the momentum towards GitOps
If everything is checked into the repo, coding agents have a much better shot at one-shotting whole projects or features. The more control they have over the end-to-end development workflow, the faster your velocity!
You should consider moving everything into code repositories:
- Infra as code (Terraform/Pulumi)
- Other declarative infra frameworks
- Data frameworks such as dbt or sqlmesh
- Documentation
- Best practices including coding guidelines
- Runbook
- Dashboard and alert configuration
- CMS (content management system)
- Translations
It goes without saying that this doubles as documentation for your team, which will speed up onboarding new engineers.
Resources
- Architectural Decision Records
- Claude Code creator Boris shares his setup with 13 detailed steps
- Using LLMs at Oxide / RFD / Oxide
- Getting AI to Work in Complex Codebases
- Writing Helpful Error Messages, Google for Developers
- charlax/professional-programming
Thanks
Thanks to my colleague Sean Koop for providing the inspiration for the practices described in this article.